10:45am Lecture hall 1, level B

**Richard Rivera**, *PhD Student, IMDEA Software Institute*

Labeling a malicious executable as a variant of a known family is important for security applications such as triage, lineage, and for building reference datasets in turn used for evaluating malware clustering and training malware classication approaches. Oftentimes, such labeling is based on labels output by antivirus engines. While AV labels are well-known to be inconsistent, there is often no other information available for labeling, thus security analysts keep relying on them. However, current approaches for extracting family information from AV labels are manual and inaccurate. In this work, we describe AVCLASS, an automatic labeling tool that given the AV labels for a, potentially massive, number of samples outputs the most likely family names for each sample. AVCLASS implements novel automatic techniques to address 3 key challenges: normalization, removal of generic tokens, and alias detection. We have evaluated AVCLASS on 10 datasets comprising 8.9 M samples, larger than any dataset used by malware clustering and classication works. AVCLASS leverages labels from any AV engine, e.g., all 99 AV engines seen in VirusTotal, the largest engine set in the literature. AVCLASS's clustering achieves F1 measures up to 93.9 on labeled datasets and clusters are labeled with ne-grained family names commonly used by the AVvendors. We release AVCLASS to the community.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Platon Kotzias**, *PhD Student, IMDEA Software Institute*

Potentially unwanted programs (PUP) such as adware and rogueware,
while not outright malicious, exhibit intrusive behavior that
generates user complaints and makes security vendors flag them as
undesirable. PUP has been little studied in the research literature
despite recent indications that its prevalence may have surpassed that
of malware.

In this work we perform the first systematic study of PUP prevalence
and its distribution through pay-per-install (PPI) services, which
link advertisers that want to promote their programs with affiliate
publishers willing to bundle their programs with offers for other
software. Using AV telemetry information comprising of 8 billion
events on 3.9 million real hosts during a 19 month period, we discover
that over half (54%) of the examined hosts have PUP installed. PUP
publishers are highly popular, e.g., the top two PUP publishers rank
15 and 24 amongst all software publishers (benign and PUP).
Furthermore, we analyze the who-installs-who relationships, finding
that 65% of PUP downloads are performed by other PUP and that 24 PPI
services distribute over a quarter of all PUP. We also examine the
top advertiser programs distributed by the PPI services, observing
that they are dominated by adware running in the browser (e.g.,
toolbars, extensions) and rogueware. Finally, we investigate the
PUP-malware relationships in the form of malware installations by PUP
and PUP installations by malware. We conclude that while such events
exist, PUP distribution is largely disjoint from malware distribution.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Nataliia Stulova**, *PhD Student, IMDEA Software Institute*

We explore the effectiveness of abstract interpretation in detecting
parts of program specifications that can be simplified statically to
true or false, as well as the impact of such analysis in reducing the
cost of the run-time checks required for the remaining parts of these
specifications.

Based on an operational semantics for programs with assertion
checking, and the results of abstract interpretation, we recall the
corresponding analysis-based assertion simplification problem, and
propose and study a number of practical assertion checking modes, each
of which represents a trade-off between code annotation depth,
execution time slowdown, and program behavior safety.

We propose techniques for taking advantage of the run-time checking
semantics to improve the precision of the analysis. We also report on
some experimental results on the effectiveness of the techniques
proposed.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Isabel Garcia**, *Research Intern, IMDEA Software Institute*

Programmers currently enjoy access to a very high number of code
repositories and libraries of ever increasing size. The ensuing
potential for reuse is however hampered by the fact that searching
within all this code becomes an immensely difficult task.

Most code search engines are based on syntactic techniques such as
signature matching or keyword extraction. However, these techniques
are inaccurate (because they basically rely on documentation) and at
the same time do not offer very expressive code query languages.

We propose an approach focused on querying for semantic characteristics of
code obtained automatically from the code itself. Program units are
pre-processed using static analysis techniques, based on abstract
interpretation, obtaining safe semantic approximations. An
assertion-based code query language is used to express desired
semantic characteristics of the code as partial specifications.
Relevant code is found by comparing such partial specifications with
the inferred semantics for each program element.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Miguel Ambrona**, *PhD Student, IMDEA Software Institute*

The talk will be an overview of the state of the art in
Attribute-Based Encryption.

Attribute-based encryption (ABE) is a new paradigm for public-key
encryption that enables fine-grained access control for encrypted
data.

In ABE, ciphertexts are associated with descriptive values x in
addition to a plaintext, secret keys are associated with values y, and
a secret key decrypts the ciphertext if and only if P(x,y) = 1 for
some boolean predicate P. Here, y together with P may express an
arbitrarily complex access policy, which is in stark contrast to
traditional public-key encryption, where access is all or nothing.
The simplest example of ABE is that of identity-based encryption (IBE)
where P corresponds to equality. The security requirement for ABE
enforces resilience to collusion attacks, namely any group of users
holding secret keys for different values learns nothing about the
plaintext if none of them is individually authorized to decrypt the
ciphertext. This should hold even if the adversary adaptively decides
which secret keys to ask for.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**Pablo Nogueira**, *Post-doctoral Researcher, IMDEA Software Institute*

What's solvability? For some terms of the pure lambda calculus
reduction does not terminate. But these non-terminating terms are not
the same garbage. If we equate them we obtain an inconsistent
proof-theory. Indeed, some non-terminating terms are nevertheless
useful if used as functions. More precisely, when applying these
non-terminating terms to suitable operands we get back terminating
results. (You've got to find this at least somewhat interesting.)

The set of solvables is made of terminating terms (which include
normal forms) and those useful non-terminating terms. The rest are
unsolvables which are the same garbage: equating them gives rise to
a consistent proof theory.

Solvability has been studied in the pure "lambda-value" calculus that
(broadly speaking) tries to model call-by-value evaluation. The problem is
that the resulting proof-theory is not 'quite' consistent and some normal
forms are unsolvable.

We did some exegetical work and found out the problem was with the use of a
definition unsuited for pure lambda-value. We fixed this and got not only a
consistent proof-theory but some extra results about lambda-value and
full-reduction with open terms. In the process we learned a few sad lessons
about social aspects of research.

=Content of the Talk=

- Slide 1 : Title
- Slides 2-8: Introduction to pure lambda calculus: syntax, terms, self-application (oops), reduction & conversion, normal forms,
consistency.
- Slides 9-11: Solvability in the pure lambda calculus.
- Slides 12-13: The pure lambda-value calculus and what's meant for.
- Slides 14-15: Problems with v-solvability in pure lambda-value.
- Slides 16-18: Alternative definition (back to basics), contribs, and
intuition to take home.
- Slide 19: (Non)Conclusions.

Complemented with hopefully explanatory verbosity.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain

10:45am Lecture hall 1, level B

**German Delbianco**, *PhD Student, IMDEA Software Institute*

Arguments about linearizability of a concurrent data structure are
typically carried out by specifying the linearization points of the
data structure's procedures. Proofs that use such specifications are
often cumbersome as the linearization points' position in time can be
dynamic, non-local and non-regional: it can depend on the
interference, run-time values and events from the past, or even
future, appear in procedures other than the one considered, and might
be only determined after the considered procedure has terminated. In
this paper we propose a new method, based on a Hoare-style logic, for
reasoning about concurrent objects with such linearization points.

We embrace the dynamic nature of linearization points, and encode it
as part of the data structure's auxiliary state, so that it can be
dynamically modified in place by auxiliary code, as needed when some
appropriate run-time event occurs. We name the idea linking-in-time,
because it reduces temporal reasoning to spatial reasoning. For
example, modifying a temporal position of a linearization point can be
modeled similarly to a pointer update in a heap. We illustrate the
method by verifying an intricate optimal snapshot algorithm due to
Jayanti.

**Time and place:**

10:45am Lecture hall 1, level B

IMDEA Software Institute, Campus de Montegancedo

28223-Pozuelo de Alarcón, Madrid, Spain