IMDEA Software

IMDEA initiative

Home > Events > Invited Talks

Invited Talks

Louis-Marie Dando

Tuesday, July 27, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Louis-Marie Dando, Post-doctoral Researcher, ANR Delta, LIS, Marseille, France

Weighted Automata and Expressions over Pre-Rational Monoids

Abstract:

The Kleene theorem establishes a fundamental link between automata and expressions over the free monoid. Numerous generalisations of this result exist in the literature. Lifting this result to a weighted setting has been widely studied. Moreover, different monoids can be considered: for instance, two-way automata, and even tree-walking automata, can be described by expressions using the free inverse monoid. In the present work, we aim at combining both research directions and consider weighted extensions of automata and expressions over a class of monoids that we call pre-rational, generalising both the free inverse monoid and graded monoids. The presence of idempotent elements in these pre-rational monoids leads in the weighted setting to consider infinite sums. To handle such sums, we will have to restrict ourselves to rationally additive semirings. Our main result is thus a generalisation of the Kleene theorem for pre-rational monoids and rationally additive semirings. As a corollary, we obtain a class of expressions equivalent to weighted two-way automata, as well as one for tree-walking automata.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Chana Weil-Kennedy

Tuesday, July 6, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Chana Weil-Kennedy, PhD candidate, TU Munich

Verification of Immediate Observation Petri Nets

Abstract:

Petri nets are a popular and well-studied formal model for the representation and verification of parallel processes. The central problem for Petri nets is reachability: given two configurations of a Petri net, does there exist a run from one to the other. This problem is very difficult: it has recently been proven to be Tower-hard (non-elementary). This motivates the study of subclasses in the hopes of having a more reasonable complexity. Some classic syntactic restrictions of Petri nets are BPP nets (Branching Parallel Process), or marked graphs (sometimes called T-systems). We introduce and study a new syntactic restriction: immediate observation Petri nets (or IO nets), motivated by applications to population protocols — a model of distributed computation — and enzymatic chemical networks. In these areas, relevant analysis questions translate into parameterized problems: whether an infinite set of Petri nets with the same underlying net, but different initial markings, satisfy a given property. We study the parameterized reachability, coverability, and liveness problems for IO nets. We show that all three problems are in PSPACE for infinite sets of initial markings defined by counting constraints, a class sufficiently rich for the intended application. We also show that IO nets are globally flat, and so their safety properties can be checked by efficient symbolic model checking tools using acceleration techniques (like FAST). Finally, we introduce and study a powerful generalization of IO nets: Branching IO nets (BIO nets), in which transitions can create tokens. BIO nets extend both IO nets and BPP nets. We show that, while BIO nets are no longer globally flat, and their sets of reachable markings may be non-semilinear, they are still locally flat. As a consequence, the parameterized coverability and reachability problems remain in PSPACE. This makes BIO nets the first natural net class with non-semilinear reachability relation for which the reachability problem is provably simpler than for general Petri nets.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Damien Robissout

Friday, May 21, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Damien Robissout, PhD Student, Laboratoire Hubert Curien, University of Lyon, France

Online Performance Evaluation and Improvement of Deep Learning Networks for Side-Channel Analysis

Abstract:

The past ten years have seen an increasing presence of deep learning algorithms to perform a diverse set of tasks thanks to advances in technology. A few years ago, those algorithms started to be used to help perform profiled side-channel analysis. Side-Channel Analysis (SCA) is a type of attack against secure algorithms which uses leakages of information contained in physical values, such as the power consumption, in order to retrieve the secret or a part of the secret used during the encryption. Profiled SCA corresponds to attacks where the attacker has access to a device similar to the one under attack and this allows him to build a profile, or template, of the physical quantity he is monitoring. Attacks using deep learning algorithms to build the profiles, commonly refer to as DLSCA, have been shown to perform as well, or even better, as the most powerful profiled attack, templates attacks. They also benefit from having higher resilience against classical countermeasures such as desynchrinization and masking. On the other hand, this new method brings other challenges as the training of the networks determines their efficiency and thus is of high importance. In this presentation, after an introduction to DLSCA, we will explore how to evaluate the networks and their training in a side-channel context and how to improve the performances of basic networks.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Nikos Vasilakis

Tuesday, May 18, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Nikos Vasilakis, Researcher, MIT CSAIL, USA

Retrofitting Security, Module by Module

Abstract:

Software developers make pervasive use of third-party software supply chains to reduce costs and accelerate release cycles, at a risk to safety and security. I will introduce a series of techniques that exploit module boundaries to automate software compartmentalization and enforce security policies, enhancing safety and security. BreakApp isolates select modules using powerful system-level containment mechanisms. Iris leverages language-based protection to offer finer-grained control and lower performance overheads. Finally, Mir uses a constrained read-write-execute protection model and a hybrid analysis to fully automate compartmentalization.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Andrés Sánchez

Tuesday, May 11, 2021

11:00am Online via Zoom: Zoom5 - https://zoom.us/j/5911012202

Andrés Sánchez, Master Intern, EPFL, Switzerland

Leaking Secrets Through Compressed Caches

Abstract:

The hardware security crisis brought on by recent speculative execution attacks has shown that it is crucial to adopt a security-conscious approach to architecture research, analyzing the security of promising architectural techniques before they are deployed in hardware. We offer the first security analysis of cache compression, one such promising technique that is likely to appear in future processors. We find that cache compression is insecure because the compressibility of a cache line reveals information about its contents. Compressed caches introduce a new side channel that is especially insidious, as simply storing data transmits information about it. We present two techniques that make attacks on compressed caches practical. Pack+Probe allows an attacker to learn the compressibility of victim cache lines, and Safecracker leaks secret data efficiently by strategically changing the values of nearby data. Our evaluation on a proof-of-concept application shows that, on a common compressed cache architecture, Safecracker lets an attacker compromise a secret key in under 10 ms, and worse, leak large fractions of program memory when used in conjunction with latent memory safety vulnerabilities.


Time and place:
11:00am Online via Zoom: Zoom5 - https://zoom.us/j/5911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Miguel Ambrona

Tuesday, April 20, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Miguel Ambrona, Post-doctoral Researcher, NTT Secure Platform Laboratories, Japan

Acyclicity Programming for Sigma-Protocols

Abstract:

Cramer, Damgård, and Schoenmakers (CDS) built a proof system to demonstrate the possession of subsets of witnesses for a given collection of statements that belong to a prescribed access structure P by composing so-called sigma-protocols for each atomic statement. Their verifier complexity is linear in the size of the monotone span program representation of P. We propose an alternative method for combining sigma-protocols into a single non-interactive system for a compound statement in the random oracle model. In contrast to CDS, our verifier complexity is linear in the size of the acyclicity program representation of P, a complete model of monotone computation introduced in this work. We show that the acyclicity program size of a predicate is never larger than its de Morgan formula size and it is polynomially incomparable to its monotone span program size. We additionally present an extension of our proof system, with verifier complexity linear in the monotone circuit size of P, in the common reference string model.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Thaleia-Dimitra Doudali

Thursday, March 25, 2021

3:30pm Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Thaleia-Dimitra Doudali, PhD candidate, Georgia Institute of Technology, USA

Adding Machine Learning to the Management of Heterogeneous Resources

Abstract:

Computing platforms increasingly incorporate heterogeneous hardware technologies, as a way to scale application performance, resource capacities and achieve cost effectiveness. However, this heterogeneity, along with the greater irregularity in the behavior of emerging workloads, render existing resource management approaches ineffective. This results in a significant gap between the realized vs. achievable performance and efficiency. My research develops a practical approach for using machine learning to bridge this gap, specifically targeting systems with heterogeneous memory technologies. In this talk, I will answer the key challenges into realizing this approach. These include which machine learning (ML) method to use, which part of the memory management stack to target, and how to configure its deployment. I will present new techniques for integrating machine learning (ML) in existing system-level management of hybrid memory hardware, which, on average, bridge 80% of the performance gap. I will also describe system-level configuration management methods that lead to additional 3x improvements. Finally, I will present cross-stack synergies that further facilitate the practical integration of machine learning in systems.


Time and place:
3:30pm Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Matteo Dell'Amico

Wednesday, March 24, 2021

10:30am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Matteo Dell'Amico, Researcher, EURECOM, France

Approaches to Explore Complex Data

Abstract:

Analyzing large and heterogeneous data is a recurrent problem for analysts and researchers. For example, a dataset of network events may contain, among others, information as diverse as text, IP addresses, numeric and categorical fields, DNS names, log lines from firewalls, and inspection tools. All these data can be correlated in a non-trivial way; additionally, some information may be missing for some events, and the reliability of what is collected may vary. The typical workflow often seen for machine learning approaches–in which a feature extraction step converts the original data to numeric vectors, before being further processed–may fall short in these cases, because the conversion to numeric form may lose important information. Moreover, we might not have a reliable "ground truth" to feed a classifier. We will discuss an alternative approach that does not use vectorial data representation: relationships between data items are represented as a (dis)similarity function applied on the original data–an arbitrary piece of code written by experts, which gives them complete freedom to encode their domain knowledge. We will introduce clustering algorithms based on this approach which, among other desirable properties, avoid the computational bottleneck of comparing everything against everything. We will also see how this proved useful in practice, with a diverse set of use cases in domains such as text analysis, demographic estimation, and computer security. Finally, we'll consider opportunities for future research in this area.


Time and place:
10:30am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Danilo Francati

Thursday, March 18, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Danilo Francati, PhD candidate, Stevens Institute of Technology, Hoboken, USA

Kolmogorov complexity and cryptography: New connections and applications to space-demanding functions

Abstract:

Nowadays, attackers have large amount of resources (e.g., computational and space capabilities) that are used for malicious purposes, e.g., stealing sensitive information (such as passwords). To face these capacious adversaries, “resource-demanding” functions have been developed: Functions that deliberately consume (on evaluation) large quantities of resources (e.g., time, space, energy). In this talk, I will introduce a new notion of resource-demanding function named verifiable capacity-bound function (VCBF). The main VCBF property is that it imposes a lower bound on the number of bits read from memory during evaluation (referred to as minimum capacity). No adversary, even with unbounded resources, should produce an output without spending this minimum memory capacity. Moreover, a VCBF allows for an efficient public verification process: Given a proof-of-correctness, checking the validity of the output takes significantly fewer memory resources, sublinear in the target minimum capacity. Finally, it achieves soundness, i.e., no computationally bounded adversary can produce a proof that passes verification for a false output. With these properties, a VCBF can be viewed as a “space” analog of a verifiable delay function, introduced by Boneh et al. (CRYPTO18). I will then present the first VCBF construction relying on evaluating a degree-d polynomial f from $\mathbb{F}_p[x]$ at a random point. To prove the space complexity of the construction I will describe an analysis at the intersection of Kolmogorov complexity and cryptography.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Duc Le

Thursday, March 11, 2021

4:00pm Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Duc Le, PhD candidate, Purdue University, USA

Cryptographic Constructions for Resource-Constrained Devices with Applications to Blockchains

Abstract:

The blockchain offered a distributed way to provide security guarantees for financial transactions that avoid the single-point failure drawback of centralized approaches. However, this ability comes with the cost of storing a large (distributed) blockchain state and introducing additional computation and communication overheads to all participants. All these drawbacks raise a challenging scalability problem for resource-constrained devices in the blockchain network. Most scaling solutions typically require resource-constrained devices to rely on peers with a higher computational and storage capability. Such scaling solutions, however, expose the data of the resource-constrained devices to risks of compromise of the more powerful peers they rely on (e.g., accidental, or malicious through a break-in, insider misbehavior, malware infestation). This potential for leakage raises a privacy concern for these constrained clients, in addition to other scaling-related concerns. In this talk, we propose two cryptographic constructions that enable resource-constrained devices to securely and efficiently participate in the blockchain network. First, we investigate the Bitcoin Simplified Payment Verification (SPV) client, a widely adopted solution to resolve the storage problem for constrained devices. However, the current SPV solutions raise privacy concerns for the SPV clients when they rely on potentially malicious nodes. To address those concerns, we present T^3, a trusted hardware-secured Bitcoin full client that supports efficient oblivious search/update for Bitcoin SPV clients without endangering the clients' privacy. Second, to address the computational overhead of the gossip protocol used in all popular blockchain protocols, we propose a cryptographic primitive called the Flexible Signature. In a flexible signature scheme, the verification algorithm quantifies the validity of a signature based on the computational effort performed by the verifier. This primitive allows the resource-constrained devices to prevent the adversary from flooding spam transactions to the blockchain network with minimal computational overheads.


Time and place:
4:00pm Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Vanessa Frías-Martínez

Wednesday, March 3, 2021

4:00pm Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Vanessa Frías-Martínez, Associate Professor, University of Maryland, USA

Data-driven decision making for cities and communities

Abstract:

The pervasiveness of cell phones, mobile applications and social media generates vast amounts of digital traces that can reveal a wide range of human behavior. From mobility patterns to social networks, these signals expose insights about human behaviors and social interactions. In this talk, I will discuss approaches that can help local governments and non-profit organizations understand better the spatial dynamics of cities and communities, offering additional insights beyond more traditional sources of information. I will first give a high-level overview of the research that I lead in the Urban Computing Lab; followed by an in-depth description of two projects. The first project will present a novel approach to create cycling safety maps at large scale. These maps are used by departments of transportation to understand barriers to cycling adoption. In the second project I will describe a new method to enhance the fairness of mobility-based crime prediction models by addressing data bias. This work highlights that controlling for under-reporting can improve both the fairness and the accuracy of mobility-based crime prediction models. My ultimate goal is to illuminate the determinants of human dynamics and to understand the role that the context – represented as physical infrastructure and social fabric – plays in people’s mobility experiences, which can in turn assist in the design of more efficient and inclusive cities.


Time and place:
4:00pm Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Tianhao Wang

Monday, March 1, 2021

4:00pm Online via Zoom: Zoom5 - https://zoom.us/j/5911012202

Tianhao Wang, PhD candidate, Purdue University, USA

Collecting Sensitive Data with Local Differential Privacy

Abstract:

When collecting information, local differential privacy (LDP) relieves users' privacy concerns, as it adds noise to users' private information. The LDP technique has been deployed by Google, Apple, Microsoft, and Alibaba for data collection. In this talk, I will share our research on the basic primitives for LDP and a system that can handle analytical queries under LDP.


Time and place:
4:00pm Online via Zoom: Zoom5 - https://zoom.us/j/5911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Lana Josipović

Monday, February 22, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Lana Josipović, Doctoral Assistant, EPFL, Switzerland

High-Level Synthesis of Dynamically Scheduled Circuits

Abstract:

The slowdown in transistor scaling and the end of Moore's law indicate a need to invest in new computing paradigms; specialized hardware devices, such as FPGAs and ASICs, are a promising solution as they can achieve high processing capabilities and energy efficiency. However, a major barrier to the global success of specialized computing is the difficulty of hardware design. High-level synthesis (HLS) tools generate digital hardware designs from high-level programming languages (e.g., C/C++) and promise to liberate designers from low-level hardware description details. Yet, HLS tools are still acceptable only for certain classes of applications and criticized for the difficulty of extracting the desired level of performance: generating good circuits still requires tedious code restructuring and hardware design expertise. In this talk, I will present a new HLS methodology that produces dynamically scheduled, dataflow circuits out of C/C++ code; the resulting circuits achieve good performance out-of-the-box and realize behaviors that are beyond the capabilities of standard HLS tools. I will describe mathematical models to optimize the performance and area of the resulting circuits, as well as techniques to achieve characteristics that standard HLS cannot support, such as out-of-order memory accesses and speculative execution. These contributions redefine the HLS paradigm by introducing characteristics of modern superscalar processors to hardware designs; such behaviors are key for specialized computing to be successful in new contexts and broader application domains.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Nik Sultana

Tuesday, February 9, 2021

4:00pm Online via Zoom: Zoom5 - https://zoom.us/j/5911012202

Nik Sultana, Post-doctoral Researcher, University of Pennsylvania, USA

Programming for Distributed and Heterogeneous Resources

Abstract:

The multi-scale programmability of modern computer systems, including network fabrics and systems-on-chip, presents unprecedented opportunities to build richly-featured, efficient and reliable services. But programming such systems is challenging because of their distributed and heterogeneous nature, which necessitates reasoning about different execution domains and hardware targets, resource scheduling, security, and resilience to partial failure. In this talk I'll describe Flightplan, a solution to this problem for P4, a domain-specific language for programmable networking. Flightplan is a target-agnostic, programming toolchain that helps with splitting a program into a set of cooperating programs and maps them to run as a distributed system. I'll describe how Flightplan exploits features offered by different hardware targets and assists with configuring, testing, and handing-over between programs. I'll also touch upon how Flightplan's ideas can be generalized and adapted to other settings, outside of programmable networking, while retaining the semi-automated decomposition of monolithic systems into distributed systems to improve performance, utilization and security.


Time and place:
4:00pm Online via Zoom: Zoom5 - https://zoom.us/j/5911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Manuel Rigger

Monday, February 8, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Manuel Rigger, Post-doctoral Researcher, ETH Zurich, Switzerland

Robustifying Data-Centric Systems

Abstract:

Data is eating the world. Systems for storing and processing data, such as Database Management Systems (DBMSs), are thus pivotal for our computing infrastructure. It is critical that they function correctly --- incorrectly computed results (e.g., by omitting a row) can cause serious loss or damage. Despite their importance, finding such logic bugs in production DBMSs is a longstanding challenge. This talk presents novel, general approaches to effectively detecting logic bugs in DBMSs by tackling the test oracle problem, i.e., deciding whether the returned result for a query is correct. These approaches were realized as the SQLancer open-source tool, which was evaluated on widely-used, production-quality DBMSs, such as SQLite, MySQL, and PostgreSQL. To date, SQLancer has found over 450 unique previously unknown bugs in these systems, most of which have been fixed by the developers. This body of work has provided solid methodological and technical bases for effectively testing DBMSs and has already been widely adopted by industry. Many reliability challenges remain, not only for DBMSs, but also for other data-centric systems. The heterogeneous landscape in this space provides an exciting, fertile ground for new, practically relevant research for improving the reliability and performance of our society's data processing infrastructure.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Marios Kogias

Wednesday, February 3, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Marios Kogias, Researcher, Microsoft Research, Cambridge, United Kingdom

Building Latency-Critical Datacenter Systems

Abstract:

Online services play a major role in our everyday life for communication, entertainment, socializing, e-commerce, etc. These services run inside the datacenter under strict tail-latency service level objectives in order to remain interactive. The emergence of new hardware for IO has enabled microsecond-scale datacenter communications that challenge the efficiency of existing operating system and network mechanisms. Also, new in-network programmable devices start being deployed in datacenters and introduce a new computing paradigm that shifts functionality traditionally performed at the end-points to the network. In this talk I will revisit the operating systems, networking, and distributed systems infrastructure specifically targeting latency-critical datacenter systems, while drawing intuition from basic queueing theory results. In the first part of the talk, I will focus on ZygOS[SOSP 2017], a system optimized for μs-scale, in-memory computing on multicore servers. ZygOS implements a work-conserving scheduler within a specialized operating system designed for high request rates and a large number of network connections. ZygOS revealed the challenges associated with serving remote procedure calls (RPCs) on top of a byte-stream oriented protocol, such as TCP. In the second part of the talk, I will present R2P2[ATC 2019]. R2P2 is a transport protocol specifically designed for datacenter RPCs, that exposes the RPC abstraction to the endpoints and the network, making RPCs first-class datacenter citizens. R2P2 enables pushing functionality, such as scheduling, fault-tolerance, and tail-tolerance, inside the transport protocol. I will show how using R2P2 allowed us to offload RPC scheduling to programmable switches that can schedule requests directly on individual CPU cores.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Christian Schilling

Monday, February 1, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Christian Schilling, Interim Professor, University of Konstanz, Germany

Outside the Box: Scalable Formal Methods

Abstract:

I will talk about two fairly different but representative lines of my work. The first line of work is about reachability analysis of linear dynamical systems. That problem is considered solved from a theoretical point of view because arbitrary-precision algorithms using efficient set representations that scale to hundreds of dimensions are available. However, in many cases we are not interested in high precision but instead want a fast analysis result or need to deal with systems that have thousands of dimensions, to which even the best set representations do not scale. Fortunately, linear-algebra packages can easily deal with matrices of that order. We present a reachability approach that partially decomposes the system into low-dimensional subsystems to perform the set computations in low dimensions, which yields a dramatic increase in scalability. At the same time, our approach still performs the matrix computations in high dimensions, which preserves precision in practice. The second line of work is about runtime monitoring of neural networks. Since it is often impossible to specify the correct behavior of neural networks, they cannot be statically verified. Furthermore, due to their black-box nature, it is hard to understand when neural networks "go wrong." We focus on the general problem of detecting when a neural network operates outside its comfort range that it was trained and validated for. To address this "anomaly detection" problem, we equip a neural network with a monitor that observes the behavior at runtime. This behavior is then compared to an internal model of "normal" behavior that was obtained during training. As a choice for this internal model we propose an abstraction of the hidden feature layers. Abstractions provide us with both a fast answer at runtime and a one-sided guarantee: if the behavior is flagged as novel, then it was indeed not observed during training.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Rahul Gopinath

Friday, January 29, 2021

11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202

Rahul Gopinath, Post-doctoral Researcher, CISPA Helmholtz Center for Information Security, Germany

The Science of Fuzzing

Abstract:

Efficient and effective fuzzing requires the availability of the input specification for the program under test. However, such specifications are typically unavailable, obsolete, incomplete, or inaccurate, limiting the reach of fuzzers. This has led to a proliferation of hacky recipes by different fuzzers to get past the input parsing stage, with each recipe working on some but not all programs. That is, fuzzing most resembles alchemy than science at this point. In this talk, I show how to transform fuzzing to a science. I present an end-to-end framework for recovering precise input specifications of programs. Such mined specifications can immediately be used for effective exploration of the program input space as well as the space of the program behavior, leading to the identification of failure-inducing inputs. Next, given any failure-inducing input, I show how to generalize such inputs into abstract patterns, precisely identifying the failure causing parts of the input. Any number of such abstract patterns can then be combined using the full set of logical connectives --- to produce specialized grammars that can be used by any grammar fuzzer for precise control of produced inputs.


Time and place:
11:00am Online via Zoom: Zoom3 - https://zoom.us/j/3911012202
IMDEA Software Institute, Campus de Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain


Invited Talks - 2020