Irfan Ul Haq, PhD Student, Instituto IMDEA Software
Malware lineage studies the evolutionary relationships among malware and has important applications for malware analysis. A persistent limitation of existing approaches is that they only work on synthetic malware, malware that are not packed, or packed malware for which unpackers are available. This is problematic since to evade detection, a majority of malware are packed. In this work, we propose a novel malware lineage approach that works on malware samples collected in the wild.
Given a set of malware executables from the same family, for which no source code is available and which may be packed, our approach produces a lineage graph where nodes are versions of the family and edges describe the relationships between versions. To enable our malware lineage approach, we propose the first technique to identify the versions of a malware family and a scalable code indexing technique for determining shared functions between any pair of input samples. Our approach addresses the challenges introduced by operating on real malware such as unpacking, disassembly, and limitations in the malware collection. We have evaluated the accuracy of our approach on 13 open-source programs and have applied it to produce lineage graphs for 10 malware families. Our malware lineage graphs achieve on average a 26 times reduction from number of input samples to number of versions.
Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain
Vincent Laporte, Post-doctoral Researcher, Instituto IMDEA Software
Motivated by applications to security and high efficiency, we propose an automated methodology for validating on low-level intermediate representations the results of a source-level static analysis. Our methodology relies on two main ingredients: a relative-safety checker, an instance of a relational verifier which proves that a program is "safer" than another, and a transformation of programs into defensive form which verifies the analysis results at run-time. We prove the soundness of the methodology, and provide a formally verified instantiation based on the Verasco verified C static analyzer and the CompCert verified C compiler. We experiment with the effectiveness of our approach with client optimizations at RTL level, and static analyses for cache-based timing side-channels and memory usage at pre-assembly levels.
Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain
Isabel Garcia, PhD Student, Instituto IMDEA Software
Context-sensitive global analysis of large code bases can be expensive, which can be specially problematic in interactive uses of analyzers. However, in practice each development iteration implies small modifications which are often isolated within a few modules, and analysis cost can be reduced by reusing the results of previous analyses.
This has been achieved to date on one hand through modular analysis, which can reduce the memory consumption and often localize the computation during reanalysis mainly to the modules affected by changes. In parallel, context-sensitive incremental fixpoints have been proposed that achieve cost reductions at finer levels of granularity, such as changes in program lines. However, these fine-grained techniques are not directly applicable to modular programs.
This talk will describe a context sensitive, fixpoint analysis algorithm aimed at achieving both inter-modular (coarse-grain) and intra-modular (fine-grain) incrementality, solving the problems related to propagation of the fine-grain change information and effects across module boundaries, for additions and deletions in multiple modules.
Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain
Platon Kotzias, PhD Student, Instituto IMDEA Software
Potentially unwanted programs (PUP) are a category of undesirable software which includes adware and rogueware. PUP is often distributed through commercial pay-per-install (PPI) services. In this work we perform what we believe is the first analysis of the economics of commercial PPI services. To enable the economic analysis, we propose a novel attribution approach using entity graphs that capture the network of companies and persons behind a PUP operation, e.g., a commercial PPI service or a set of PUP. We analyze 3 Spain-based operations. Each operation runs a commercial PPI service, develops PUP, and manages download portals. For each operation, we collect financial statements submitted by the companies and audit reports when available. This data allows us to analyze not only the operation revenues, but also their profits (and losses), which can widely differ from revenues depending on operational costs. Our analysis answers 6 main questions. (1) How profitable are the commercial PPI services and the operations running them? We measure that the three operations have a total revenue of 202.5M €, net income (i.e., profits) of 23M €, and EBITDA of 24.7M €. Overall, expenses are high and margins low. (2) What are the revenue sources of the operations? The largest source of revenue is the PPI service, which provides up to 90% of an operation's revenue. But, we also observe the operations to draw revenue from advertising, download portals, PUP, and streaming services. (3)How has the PPI business evolved over time? Peak revenue and net income happened in 2013 and there was a sharp decrease starting mid-2014 when different vendors deployed new defenses that significantly impacted the PPI market, which did not recover afterwards. (4) How many companies are involved in an operation? We find that each operation runs from 15 up to 32 companies, but most of them are shell companies. (5) How many persons are involved in an operation? We find that a small number of 1–6 persons manage each operation. (6) How long have the operations been active? Operations start as early as 2003, but the PPI services do not operate until 2010–2011.
Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain
Antonio Faonio, Post-doctoral Researcher, Instituto IMDEA Software
Proofs of computational effort were devised to control denial of service attacks. Dwork and Naor (CRYPTO '92), for example, proposed to use such proofs to discourage spam. The idea is to couple each email message with a proof of work that demonstrates the sender performed some computational task. A proof of work can be either CPU-bound or memory-bound. In a CPU-bound proof, the prover must compute a CPU-intensive function that is easy to check by the verifier. A memory-bound proof, instead, forces the prover to access the main memory several times, effectively replacing CPU cycles with memory accesses.
I am going to talk about a new concept dubbed proof of space. To compute such a proof, the prover must use a specified amount of space, i.e., we are not interested in the number of accesses to the main memory (as in memory-bound proof of work) but rather on the amount of actual memory the prover must employ to compute the proof.
Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain
Alvaro Garcia Perez, Post-doctoral Researcher, Instituto IMDEA Software
The Paxos algorithm of Lamport is a classic consensus protocol for state machine replication in environments that admit crash failures. New versions of this protocol are constantly springing that compete with each other for better performance, widening the gap between the description of the Paxos algorithm and the real-world systems. We tackle the challenge of verifying these increasingly complex protocols by applying modular reasoning, this is, by verifying parts of the protocol separately instead of verifying the whole protocol in one go. To this end, we consider the abstractions from an existing decomposition of Paxos by Boichat et al. and we provide highly non-deterministic, atomic specifications of these abstractions that are strong enough to prove the algorithm correct.
In this talk I will present our advances in proving that the implementations of Boichat et al. refine (i.e., are linearizable with respect to) our non-deterministic specifications. We conjecture that variants of Paxos and other consensus algorithms, like Multi-Paxos, ZAB and Viewstamped Replication, could be verified in a similar fashion.
Time and place:
10:45am Lecture hall 1, level B
IMDEA Software Institute, Campus Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain
Nataliia Stulova, PhD Student, Instituto IMDEA Software
Untyped languages offer great flexibility in term creation and manipulation. To ensure correctness of data operations expensive run-time checks are often used. We propose a simple combination of a strict module system and term hiding mechanism. It allows to make strong assumptions about the shape of the terms in the calls across module boundaries. This in turn leads to significant overhead reductions in the run-time checks, which we demonstrate on a set of benchmarks.
Time and place:
11:45am Meeting room 202 (Hill View), level 2
IMDEA Software Institute, Campus Montegancedo
28223-Pozuelo de Alarcón, Madrid, Spain