10:45am Lecture hall 1, level B

**Richard Rivera**, *PhD Student, IMDEA Software Institute*

Labeling a malicious executable as a variant of a known family is important for security applications such as triage, lineage, and for building reference datasets in turn used for evaluating malware clustering and training malware classication approaches. Oftentimes, such labeling is based on labels output by antivirus engines. While AV labels are well-known to be inconsistent, there is often no other information available for labeling, thus security analysts keep relying on them. However, current approaches for extracting family information from AV labels are manual and inaccurate. In this work, we describe AVCLASS, an automatic labeling tool that given the AV labels for a, potentially massive, number of samples outputs the most likely family names for each sample. AVCLASS implements novel automatic techniques to address 3 key challenges: normalization, removal of generic tokens, and alias detection. We have evaluated AVCLASS on 10 datasets comprising 8.9 M samples, larger than any dataset used by malware clustering and classication works. AVCLASS leverages labels from any AV engine, e.g., all 99 AV engines seen in VirusTotal, the largest engine set in the literature. AVCLASS's clustering achieves F1 measures up to 93.9 on labeled datasets and clusters are labeled with ne-grained family names commonly used by the AVvendors. We release AVCLASS to the community.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Platon Kotzias**, *PhD Student, IMDEA Software Institute*

Potentially unwanted programs (PUP) such as adware and rogueware, while not outright malicious, exhibit intrusive behavior that generates user complaints and makes security vendors flag them as undesirable. PUP has been little studied in the research literature despite recent indications that its prevalence may have surpassed that of malware.

In this work we perform the first systematic study of PUP prevalence and its distribution through pay-per-install (PPI) services, which link advertisers that want to promote their programs with affiliate publishers willing to bundle their programs with offers for other software. Using AV telemetry information comprising of 8 billion events on 3.9 million real hosts during a 19 month period, we discover that over half (54%) of the examined hosts have PUP installed. PUP publishers are highly popular, e.g., the top two PUP publishers rank 15 and 24 amongst all software publishers (benign and PUP). Furthermore, we analyze the who-installs-who relationships, finding that 65% of PUP downloads are performed by other PUP and that 24 PPI services distribute over a quarter of all PUP. We also examine the top advertiser programs distributed by the PPI services, observing that they are dominated by adware running in the browser (e.g., toolbars, extensions) and rogueware. Finally, we investigate the PUP-malware relationships in the form of malware installations by PUP and PUP installations by malware. We conclude that while such events exist, PUP distribution is largely disjoint from malware distribution.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Nataliia Stulova**, *PhD Student, IMDEA Software Institute*

We explore the effectiveness of abstract interpretation in detecting parts of program specifications that can be simplified statically to true or false, as well as the impact of such analysis in reducing the cost of the run-time checks required for the remaining parts of these specifications.

Based on an operational semantics for programs with assertion checking, and the results of abstract interpretation, we recall the corresponding analysis-based assertion simplification problem, and propose and study a number of practical assertion checking modes, each of which represents a trade-off between code annotation depth, execution time slowdown, and program behavior safety.

We propose techniques for taking advantage of the run-time checking semantics to improve the precision of the analysis. We also report on some experimental results on the effectiveness of the techniques proposed.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Isabel Garcia**, *Research Intern, IMDEA Software Institute*

Programmers currently enjoy access to a very high number of code repositories and libraries of ever increasing size. The ensuing potential for reuse is however hampered by the fact that searching within all this code becomes an immensely difficult task.

Most code search engines are based on syntactic techniques such as signature matching or keyword extraction. However, these techniques are inaccurate (because they basically rely on documentation) and at the same time do not offer very expressive code query languages.

We propose an approach focused on querying for semantic characteristics of code obtained automatically from the code itself. Program units are pre-processed using static analysis techniques, based on abstract interpretation, obtaining safe semantic approximations. An assertion-based code query language is used to express desired semantic characteristics of the code as partial specifications. Relevant code is found by comparing such partial specifications with the inferred semantics for each program element.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Miguel Ambrona**, *PhD Student, IMDEA Software Institute*

The talk will be an overview of the state of the art in Attribute-Based Encryption.

Attribute-based encryption (ABE) is a new paradigm for public-key encryption that enables fine-grained access control for encrypted data.

In ABE, ciphertexts are associated with descriptive values x in addition to a plaintext, secret keys are associated with values y, and a secret key decrypts the ciphertext if and only if P(x,y) = 1 for some boolean predicate P. Here, y together with P may express an arbitrarily complex access policy, which is in stark contrast to traditional public-key encryption, where access is all or nothing. The simplest example of ABE is that of identity-based encryption (IBE) where P corresponds to equality. The security requirement for ABE enforces resilience to collusion attacks, namely any group of users holding secret keys for different values learns nothing about the plaintext if none of them is individually authorized to decrypt the ciphertext. This should hold even if the adversary adaptively decides which secret keys to ask for.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Pablo Nogueira**, *Post-doctoral Researcher, IMDEA Software Institute*

What's solvability? For some terms of the pure lambda calculus reduction does not terminate. But these non-terminating terms are not the same garbage. If we equate them we obtain an inconsistent proof-theory. Indeed, some non-terminating terms are nevertheless useful if used as functions. More precisely, when applying these non-terminating terms to suitable operands we get back terminating results. (You've got to find this at least somewhat interesting.)

The set of solvables is made of terminating terms (which include normal forms) and those useful non-terminating terms. The rest are unsolvables which are the same garbage: equating them gives rise to a consistent proof theory.

Solvability has been studied in the pure "lambda-value" calculus that (broadly speaking) tries to model call-by-value evaluation. The problem is that the resulting proof-theory is not 'quite' consistent and some normal forms are unsolvable.

We did some exegetical work and found out the problem was with the use of a definition unsuited for pure lambda-value. We fixed this and got not only a consistent proof-theory but some extra results about lambda-value and full-reduction with open terms. In the process we learned a few sad lessons about social aspects of research.

Content of the Talk:

- Slide 1: Title
- Slides 2-8: Introduction to pure lambda calculus: syntax, terms, self-application (oops), reduction & conversion, normal forms, consistency.
- Slides 9-11: Solvability in the pure lambda calculus.
- Slides 12-13: The pure lambda-value calculus and what's meant for.
- Slides 14-15: Problems with v-solvability in pure lambda-value.
- Slides 16-18: Alternative definition (back to basics), contribs, and intuition to take home.
- Slide 19: (Non)Conclusions.

Complemented with hopefully explanatory verbosity.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**German Delbianco**, *PhD Student, IMDEA Software Institute*

Arguments about linearizability of a concurrent data structure are typically carried out by specifying the linearization points of the data structure's procedures. Proofs that use such specifications are often cumbersome as the linearization points' position in time can be dynamic, non-local and non-regional: it can depend on the interference, run-time values and events from the past, or even future, appear in procedures other than the one considered, and might be only determined after the considered procedure has terminated. In this paper we propose a new method, based on a Hoare-style logic, for reasoning about concurrent objects with such linearization points.

We embrace the dynamic nature of linearization points, and encode it as part of the data structure's auxiliary state, so that it can be dynamically modified in place by auxiliary code, as needed when some appropriate run-time event occurs. We name the idea linking-in-time, because it reduces temporal reasoning to spatial reasoning. For example, modifying a temporal position of a linearization point can be modeled similarly to a pointer update in a heap. We illustrate the method by verifying an intricate optimal snapshot algorithm due to Jayanti.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain