## Software Seminar Series - Winter 2017

### Tuesday, March 28, 2017

10:45am Lecture hall 1, level B

Irfan Ul Haq, PhD Student, IMDEA Software Institute

### Malware Lineage in the Wild

#### Abstract:

Malware lineage studies the evolutionary relationships among malware and has important applications for malware analysis. A persistent limitation of existing approaches is that they only work on synthetic malware, malware that are not packed, or packed malware for which unpackers are available. This is problematic since to evade detection, the majority of malware are packed. In this work, we propose a novel malware lineage approach that works on malware samples collected in the wild. Given a set of malware executables from the same family, for which no source code is available and which may be packed, our approach produces a lineage graph where nodes are versions of the family and edges describe the relationships between versions. Our approach handles the challenges introduced by operating on real malware such as unpacking, disassembly, and limitations in the malware collection. To enable our malware lineage approach, we propose the first technique to identify the versions of a malware family and a scalable code indexing technique for determining what functions are shared between any pair of samples. We have evaluated the accuracy of our approach on 626 versions of 15 benign programs and have applied it to produce lineage graphs for 7 malware families.

Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, March 21, 2017

10:45am Meeting room 302 (Mountain View), level 3

Alejandro Aguirre, PhD Student, IMDEA Software Institute

### A Relational Logic for Higher-Order Programs

#### Abstract:

Relational program verification is a variant of program verification where one can reason about two programs and as a special case about two executions of a single program on different inputs. Relational program verification can be used for reasoning about a broad range of properties, including equivalence and refinement, and specialized notions such as continuity, information flow security or relative cost. In a higher-order setting, relational program verification can be achieved using relational refinement type systems which excel at relating structurally equivalent terms but provide limited support for relating terms with very different structures. We present a logic, called Relational Higher Order Logic (RHOL), for proving relational properties of a simply typed λ-calculus with inductive types and recursive definitions. RHOL retains the type-directed flavour of relational refinement type systems but achieves greater expressivity through rules which simultaneously reason about the two terms as well as rules which only contemplate one of the two terms.

Time and place:
10:45am Meeting room 302 (Mountain View), level 3
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, March 14, 2017

10:45am Meeting room 302 (Mountain View), level 3

Itsaka Rakotonirina, Research Intern, IMDEA Software Institute

### Reasoning about aggregation of information

#### Abstract:

Along with the conventional mathematic-driven approach of software security, 20 years of attacks harnessing the timing behaviour of programs to obtain secret data (timing attacks) pose a concerning threat to software systems. The complexity and the need for efficiency of softwares makes it difficult to expect constant-time implementations in general, requiring us to accept such leaks to some extent. The border between unimpactful and critical leaks does not lie in the amount of information leaked by isolated runs of the software. Rather, the key criterion is the ability to *aggregate* different secret bits over and over through several executions. As the question of aggregation has not been tackled much in the literature, there is a need for techniques allowing to distinguish between critical and non-critical leaks: in this work we present a novel approach to help with this distinction.

Time and place:
10:45am Meeting room 302 (Mountain View), level 3
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, March 07, 2017

10:45am Lecture hall 1, level B

Elena Gutierrez, PhD Student, IMDEA Software Institute

### Parikh Image of Pushdown Automata

#### Abstract:

There exist two main formalisms to describe context-free languages: context-free grammars and pushdown automata. In fact, there exists a standard algorithm for converting a pushdown automaton into a context-free grammar that preserves exactly the language. One may wonder which is more economical in terms of size. It has been shown before that, in general, pushdown automata are more concise to represent a particular language than context-free grammars. Now, we consider the open question: If we are interested in preserving, not each word exactly, but just the number of occurrences of each symbol, regardless of the order: are pushdown automata more concise? In this talk, I show that the answer is still positive. The number of occurrences of each symbol in a word is called its Parikh image. I present an infinite family of pushdown automata with n states, p stack symbols and size proportional to np, for which every context-free grammar that preserves the Parikh image must have size proportional to n²p. To present this result, I introduce a new tree-based semantics to describe runs of a pushdown automaton.

Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, February 28, 2017

10:45am Meeting room 302 (Mountain View), level 3

German Delbianco, PhD Student, IMDEA Software Institute

### Concurrent Data Structures Linked in Time

#### Abstract:

Arguments about correctness of a concurrent data structure are typically carried out by using the notion of linearizability and specifying the linearization points of the data structure's procedures. Such arguments are often cumbersome as the linearization points' position in time can be dynamic (depend on the interference, run-time values and events from the past, or even future), non-local (appear in procedures other than the one considered), and whose position in the execution trace may only be determined after the considered procedure has already terminated. In this talk I will present a new method, based on a separation-style logic, for reasoning about concurrent objects with such linearization points. We embrace the dynamic nature of linearization points, and encode it as part of the data structure's auxiliary state, so that it can be dynamically modified in place by auxiliary code, as needed when some appropriate run-time event occurs. We have named the idea linking-in-time, because it reduces temporal reasoning to spatial reasoning. For example, modifying a temporal position of a linearization point can be modelled similarly to a pointer update in separation logic. Furthermore, the auxiliary state provides a convenient way to concisely express the properties essential for reasoning about clients of such concurrent objects. In order to illustrate our approach, I will illustrate its application to verify (mechanically in Coq) an intricate optimal snapshot algorithm, due to Jayanti.

Time and place:
10:45am Meeting room 302 (Mountain View), level 3
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, February 21, 2017

10:45am Meeting room 302 (Mountain View), level 3

Arthur Blot, Research Intern, IMDEA Software Institute

### Compositional Synthesis of Leakage Resilient Programs

#### Abstract:

A promising approach to defend against side channel attacks is to build programs that are leakage resilient, in a formal sense. One such formal notion of leakage resilience is the n-threshold-probing model proposed in the seminal work by Ishai et al. In a recent work, Eldib and Wang have proposed a method for automatically synthesizing programs that are leakage resilient according to this model, for the case n = 1. In this paper, we show that the n-threshold-probing model of leakage resilience enjoys a certain compositionality property that can be exploited for synthesis. We use the property to design a synthesis method that efficiently synthesizes leakage-resilient programs in a compositional manner, for the general case of n > 1. We have implemented a pro- totype of the synthesis algorithm, and we demonstrate its effectiveness by synthesizing leakage-resilient versions of benchmarks taken from the literature.

Time and place:
10:45am Meeting room 302 (Mountain View), level 3
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, February 14, 2017

10:45am Lecture hall 1, level B

Srdjan Matic, PhD Student, IMDEA Software Institute

### Active Techniques for Revealing and Analyzing the Security of Hidden Servers

#### Abstract:

In the last years we have witnessed a boom in the use of techniques and tools that provide anonymity. Such techniques and tools are used by clients that want their communication to stay anonymous or to access censored content, as well as by administrators to hide the location of their servers. All those activities can be easily performed with the support of an anonymity network. An important component of an anonymity network is the hidden server, a machine whose IP address is kept secret. Such hidden servers are the target of research of this dissertation; more specifically, we focus on different types of hidden servers used in the Tor anonymity network. Our work comprises two parts, one dealing with Tor hidden services and the other one about bridges. In the first part we illustrated novel approaches that we developed for analyzing the security and revealing the location of hidden servers. We demonstrate our technique by implementing it in a tool, that later we used for deanonymizing over 100 real hidden services. In the second part, we perform the first systematic study of the Tor bridge infrastructure. Our study covers both the public bridge infrastructure available to all Tor users, and the previously unreported private bridge infrastructure, comprising private nodes for the exclusive use of those who know about their existence. Our results show how the public bridge ecosystem with clients is stable and those bridges rarely change their IP address. Furthermore we discuss the security implication of public datasets that can be leverage for recovering addresses of bridges, and how track a bridge across IP changes.

Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, January 31, 2017

10:45am Lecture hall 1, level B

Pedro Valero, PhD Student, IMDEA Software Institute

### A Language-theoretic View on Network Protocols

#### Abstract:

Input validation is the first line of defense against malformed or malicious inputs. It is therefore critical that the validator (which is often part of the parser) is free of bugs. To build dependable input validators, we propose using parser generators for context-free languages. In the context of network protocols, various works have pointed at context-free languages as falling short to specify precisely or concisely common idioms found in protocols. We review those assessments and perform a rigorous, language-theoretic analysis of several common protocol idioms. We then demonstrate the practical value of our findings by developing a modular, robust, and efficient input validator for HTTP relying on context-free grammars and regular expressions.

Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, January 24, 2017

10:45am Lecture hall 1, level B

Joaquín Arias Herrero, PhD Student, IMDEA Software Institute

### Tabled CLP for Reasoning over Stream Data

#### Abstract:

The interest in reasoning over stream data is growing as quickly as the amount of data generated. Our intention is to change the way stream data is analyzed. This is an important problem because we constantly have new sensors collecting information, new events from electronic devices and/or from customers and we want to reason about this information. For example, information about traffic jams and customer orders could be used to define a deliverer route. When there is a new order or a new traffic jam, we usually restart from scratch in order to recompute the route. However, if we have several deliveries and we analyze the information from thousands of sensors, we would like to reduce the computation requirements, e.g. reusing results from the previous computation. Nowadays, most of the applications that analyze stream data are specialized for specific problems (using complex algorithms and heuristics) and combine a computation language with a query language. As a result, when the problems become more complex (in e.g. reasoning requirements), in order to modify the application complex and error prone coding is required. We propose a framework based on a high-level language rooted in logic and constraints that will be able to provide customized services to different problems. The framework will discard wrong solutions in early stages and will reuse previous results that are still consistent with the current data set. The use of a constraint logic programming language will make it easier to translate the problem requirements into the code and will minimize the amount of re-engineering needed to comply with the requirements when they change.

Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus de Montegancedo

### Tuesday, January 17, 2017

10:45am Lecture hall 1, level B

Luca Nizzardo, PhD Student, IMDEA Software Institute

### Multi-Key Homomorphic Authenticators

#### Abstract:

Homomorphic authenticators (HAs) enable a client to authenticate a large collection of data elements $m_1, ... , m_t$ and outsource them, along with the corresponding authenticators, to an untrusted server. At any later point, the server can generate a short authenticator $\sigma_{f,y}$ vouching for the correctness of the output $y$ of a function $f$ computed on the outsourced data, i.e., $y = f(m_1, ... , m_t)$. Recently researchers have focused on HAs as a solution, with minimal communication and interaction, to the problem of delegating computation on outsourced data. The notion of HAs studied so far, however, only supports executions (and proofs of correctness) of computations over data authenticated by a single user. Motivated by realistic scenarios (ubiquitous computing, sensor networks, etc.) in which large datasets include data provided by multiple users, we study the concept of multi-key homomorphic authenticators. In a nutshell, multi-key HAs are like HAs with the extra feature of allowing the holder of public evaluation keys to compute on data authenticated under different secret keys. In this paper, we introduce and formally define multi-key HAs. Secondly, we propose a construction of a multi-key homomorphic signature based on standard lattices and supporting the evaluation of circuits of bounded polynomial depth. Thirdly, we provide a construction of multi-key homomorphic MACs based only on pseudorandom functions and supporting the evaluation of low-degree arithmetic circuits. Albeit being less expressive and only secretly verifiable, the latter construction presents interesting efficiency properties.

Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus de Montegancedo