The following are some of the lines of current or planned activity within the IMDEA Software Institute. The list is not intended to be an exhaustive enumeration of vision and agenda, but rather a presentation of some current efforts. The order of the lines and topics is not significant.
See also the list of topics and researchers.
Modeling has traditionally been a synonym for producing diagrams. Most models consist of a number of ``bubbles and arrows'' pictures and some accompanying text. The information conveyed by such models has a tendency to be incomplete, informal, imprecise, and sometimes even inconsistent. Many of the flaws in modeling are caused by the limitations of the diagrams being used. A diagram simply cannot express some of the essential information of a thorough specification. To specify software systems, formal languages offer some clear benefits over the use of diagrams. Formal languages are unambiguous, and cannot be interpreted differently by different people, for example, an analyst and a programmer. Formal languages make a model more precise and detailed, and are subtle of manipulation by automated tools to ensure correctness and consistency with other elements of the model. On the other hand, a model completely written in a formal language is often not easily understood.
The Unified Modeling Language (UML) uses notation that is mostly based on diagrams. However, to provide the level of conciseness and expressiveness that is required for certain aspects of a design, the UML standard defines the Object Constraint Language (OCL). OCL is a textual language with a notational style similar to common object-oriented programming languages. In UML 1.1, OCL appeared as the standard for specifying invariants and pre/post conditions. In UML 2.0, the rationale is that richer annotations than just constraints information should be included in a model. OCL is now also used for defining queries, referencing values, or stating conditions and business rules in a model. Although OCL was initially designed to be a formal language, its definition is not precise enough, and ambiguities, inconsistencies, and expressions with open and circular interpretations exist.
Over the last years, modeling has evolved to address issues like domain-specific metamodels definition, model transformation, design model testing, and model validation and simulation. Most of these new applications make extensive use of OCL and often require non-trivial extensions of this language. In this situation, a complete and unambiguous specification of OCL would be greatly beneficial, as a basis of automated tool support for its many different uses.
We have proposed semantics for a significant subset of OCL, which is based on a novel mapping from UML models with OCL expressions to equational theories. This mapping is described as a (meta-)function that, given an OCL expression and a UML diagram, generates an equational interpreter for the expression.
Thus, the interpreting function defines a semantics for OCL which is both formal and executable. In this topic we plan to extend these OCL formal semantics from UML class and object diagrams to sequence and state diagrams. This extension enables reasoning about the realizability of UML models. Models typically consist of various diagrams, each with their own set of OCL constraints, modeling different views of a system. In this context, realizability means that there exists at least one valid instance of the model, implying that the model is, in principle, implementable.
We plan to apply existing tools to prove the termination of OCL user-defined operations based on the termination of the generated equational interpreters. Finally, we will implement an OCL interpreter, based on our formal semantics. Our tool will provide a rigorous solution to the current lack of tool support for OCL evaluation on large scenarios with possibly millions of elements. We will develop this tool in collaboration with our industrial partners. With respect to this, SAP AG has shown initial interest in this research in the context of its MOINE Modeling Infrastructure, a SAP-technology which provides a MOF-base meta-model repository and OCL integration.
Model driven development holds the promise of reducing system development time and improving the quality of the resulting products. Recent investigations illustrate that security policies can be integrated into system design models, and that the resulting security-design models can be used as a basis for generating systems along with their security infrastructures. Moreover, when models have formal semantics, they can be reasoned about: one can query properties and understand potentially subtle consequences of the policies they define. SecureUML is a security modeling language closely related to Role Based Access Control. SecureUML can be combined with different design languages, thereby creating languages for formalizing security-design models that specify system designs together with their security requirements, allowing the generation access-control infrastructure from these models.
Security-design models are formal objects with both a concrete syntax and an abstract representation. The modeling language for security-design models can be described by a metamodel that formalizes the structure of well-formed models and scenarios, representing possible run-time instances as concrete system states. In this setting, security properties can be expressed as formulas in OCL.
This use of OCL results in an expressive language for formalizing queries, which utilizes the entire vocabulary present in this metamodel. Queries include relations between users, roles, permissions, actions, and even system states. For example, one can ask ``are there two roles such that one includes the set of actions of the other, but the roles are not related in the role hierarchy?'' Another example, involving system state, is ``which roles can be assigned to a user so as to allow her to perform an action on a concrete resource in the context of a given scenario?'' These queries can be answered by evaluating the corresponding OCL expression in the instances of the metamodel that represent the security-design models (or scenarios) under consideration.
The use of scenarios provides support for handling queries involving system state. We will extend this work supporting queries about the existence of states satisfying constraints or queries where the states themselves are existentially quantified. An example of the latter for a design metamodel that includes access to the system date is ``which operations are possible on week days but are impossible on weekends?'' Alternatively, in a banking model, one might ask ``which actions are possible on overdrawn bank accounts?'' Such queries cannot currently be evaluated as they require reasoning about the consequences of OCL formulas. This involves theorem proving as opposed to determining satisfiability of formulas in a concrete model.
In this topic we will also support the analysis of consistency of different system views. SecureUML can be combined with modeling languages like ComponentUML and ControllerUML to deal with different views of multi-tier architectures. In this setting, access control might be implemented at both the middle-tier -- for example, to implement a controller for a web-based application -- and a back-end persistence tier. If the policies for both of these tiers are formally modeled, we can potentially answer questions like ``will the controller ever enter a state in which the persistence tier throws a security exception?'' This query requires dealing with reachability of states, which again requires support from theorem proving or other forms of deduction such as constraint solving or state search.
This research will be done in collaboration with David Basin and his group at ETH Zürich.
While software security traditionally focuses on low-level protection mechanisms such as access control, the popularization of massively distributed systems dramatically increases the number and severity of vulnerabilities at application level. These vulnerabilities may be exploited by malicious software such as viruses, Trojan horses, etc., but also (unintentionally) by buggy software, with disastrous effects.
Language-based security aims to achieve security at the level of the programming language, with the immediate benefit of countering application-level attacks at the same level at which such attacks arise. Language-based security is attractive to programmers because it allows them to express security policies and enforcement mechanisms at within the programming language itself, using well-developed techniques that facilitate a rigorous specification and verification of security policies.
Language-based techniques can guarantee a wide range of policies including confidentiality, integrity, and availability, and their combination. However, their adoption in practice is still modest, partly because known enforcement methods are confined to simple policies, such as non-interference for confidentiality. Two pressing challenges are enhancing enforcement mechanisms to support flexible and customizable policies, and developing methods for providing a quantitative assessment of security. There are many facets to quantitative evaluation of program security, but provable cryptography is clearly a valuable source of inspiration. Furthermore, language-based security should account for lower-level aspects of the system. Information flow analyses should consider cache behavior that leads to timing leaks.
Language-based techniques often favour effectiveness to the detriment of generality and precision. There are complementary techniques that rely on logic and inherit its generality and accuracy. Proof Carrying Code is a general framework for mobile code security that uses program logics to guarantee the adherence of programs to a security policy. Initial experiments with Proof Carrying Code were conducted for basic safety properties, but the scope of the method has only been exploited very moderately. There is an enormous potential to adapt the ideas of Proof Carrying Code to other properties of interest, for example, resource and information flow policies.
One central issue in language-based techniques is whether verification methods that are used by the code producer to verify the correctness of its code can be matched by code consumers. Consumers do not trust the code producer and shall only execute incoming code if it gets appropriate guarantees that the code is innocuous. We focus on the development of static verification methods to be used at the consumer level, and study how to use certificates for transferring evidence from the producer to the consumer.
Language-based methods have been studied primarily for mobile code and very few methods are known to scale to distributed systems. In fact, there are remarkably few mechanisms to ensure security of distributed applications, using a combination of cryptographic and language-based methods. We will study means to ensure secure and distributed execution of computation-intensive algorithms. These algorithms perform computations that could not realistically be executed on a single node, at the price of providing results that must go through independent validation for being trusted. To ensure security, we rely on result certification. This requires the result to come equipped with a certificate, which provides sufficient evidence that the computation is correct. Certificates typically consist of a combination of data --additional witnesses that are produced during the computation and used to build the result-- and proofs --that establish some expected property of the witnesses, or of the final result. Taken to its extreme, the certificate is a trace of the computation and a proof that each computation step is correct, but many algorithms should allow for more succinct form of certificates. For example, succinct certificates have been developed in cryptography to assert the primality of large numbers.
We plan to use conventional programming language techniques to analyze widely-used cryptographic systems and to provide very strong guarantees of their correctness (cryptographic strength). Our goal is to demonstrate beyond reasonable doubt that standard cryptographic systems, some of which have a long history of flawed security proofs and hidden but effective attacks, are secure. We follow the methodology of provable security, that advocates a rigorous approach based on complexity theory. Here, the goals are specified precisely, and the security proof is carried rigorously, making all underlying assumptions explicit.
A convenient method to prove the correctness of cryptographic schemes is to use the game-playing technique, that organizes the construction of cryptographic proofs as sequences of probabilistic games as a natural solution for taming the complexity of performing cryptographic proofs. The game-playing technique is widely applicable, it supports reasoning in both the standard and the random oracle model of cryptography and has been extensively used for proving security properties of a variety of schemes and protocols. The code-based technique is an instance of the game-playing technique that has been used successfully to verify state-of-the-art cryptographic schemes. Its distinguishing feature is to take a code-centric view of games, security hypotheses and computational assumptions, that are expressed using probabilistic, imperative, polynomial-time programs. Under this view, game transformations become program transformations, and can be justified rigorously by semantic means. For instance, many transformations can be viewed as common program optimizations (e.g. constant propagation, common subexpression elimination), and are justified by proving that the original and transformed programs are equivalent with respect to indistinguishability. Although code-based proofs are easier to verify and are more easily amenable to machine-checking, they go far beyond established theories of program equivalence and exhibit a surprisingly rich and broad set of reasoning principles that draws from program verification, algebraic reasoning, and probability and complexity theory. Thus, despite the beneficial effect of their underlying framework, code-based proofs remain inherently difficult to verify.
In this topic we will develop a framework to construct machine-checked code-based proofs in the Coq proof assistants. Our objective is to apply the framework for proving the correctness of standard cryptographic constructions such as OAEP padding or triple DES. This work is conducted in collaboration with Benjamin Grégoire at INRIA Sophia-Antipolis.
The conventional understanding of software correctness is absence of errors or bugs, expressed in terms of conformance of all the executions of the program to a functional specification (like type correctness) or behavioral specification (like termination or possible sequences of actions). However, in an increasing number of computer applications the environment plays an essential role. For example, embedded systems must control and react to the environment, which also establishes constraints about the system's behavior like resource usage and reaction times. Therefore, it is necessary for these systems to extended the criteria of correctness with new kinds of aspects including non-functional global properties like maximum execution time, and usage of memory, power, or other types of resources.
One of the challenges of this topic is the extension of abstraction-based debugging and verification techniques to general resource usage properties beyond what can be handled with current technology. For example, in order to infer the resource usage of a computation one may need to correctly determine other more basic information (such as data sizes, suitable metrics for these, calling patterns, etc.). Consequently, new analysis frameworks that can properly combine different abstract domains and perform several analysis in the same framework are needed. Moreover, this challenge includes the study of new abstract domains for certain non-functional global properties or the improvement of the existing ones in order to obtain more accurate semantic approximations.
Another novel aspect of resource verification is that static checking must generate answers that go beyond the classical outcomes (true/false/unknown). To be useful, these answers must include conditions under which these classical outcomes are obtained. Such conditions can be parametrized by attributes of inputs, like input data size or value ranges. For example, it may be possible to say that the outcome is true if the input data size is in a given range.
Finally, another challenge is the development of a debugging/verification framework which permits the selection of the appropriate analysis information necessary to check a given resource usage specification, depending on the kind of information expressed in such specification. This information could be for example lower and upper bounds, and even asymptotic values of the resource usage of the computation. The research in this topic will also explore the necessarily iterative nature of resource debugging, and its possibilities of automation (paying special attention to diagnosis). This includes the study of extensions of existing visualization techniques beyond error detection.
Run-time verification is an emerging area of applicable formal methods that aims at providing a more rigorous basis for testing and monitoring. Like in static formal verification, a specification of the possible behaviors of the system is given, for example in some temporal logic. Where in classical static formal verification one intends to prove that a property holds for all possible executions of a system, in run-time verification only a single trace is explored. From the point of view of classical formal methods, run-time verification sacrifices completeness for efficiency. Run time-verification contributes to testing by providing rigorous techniques for specification and evaluation of richer properties, for example expressing temporal behaviors. Since testing is, by far, the dominant activity for validation in current software engineering practice, run-time verification can improve quality assurance in terms of coverage, duration and cost. Run-time verification is a very active field of formal methods, with a predictable fast transfer of results to mainstream practice.
One facet to study is specification formalisms. The initial approaches to run-time verification were based on variations of well known logics for specification of temporal properties over the linear the frame. These languages include linear temporal logic and the linear-time µ-calculus with some variations. Later, new formalisms emerged, ---like rule based specifications, rewriting or calculus of synchronous stream--- but there is still a need of new improvements. One of the uses of run-time verification is to collect statistics about execution traces, but traditional temporal languages were not designed for quantitative data evaluation. Other properties to study about a specification languages include its usability by engineers and its expressive power and conciseness, as well as its evaluation efficiency (space of a monitor, execution time per event processed, etc) and the generation of optimum schedules.
Another aspect to investigate is evaluation strategies. There are two main strategies: online and offline monitoring. In online monitoring a monitor is generated from the specification to run in tandem with the system. In offline monitoring, a trace is dumped during the execution of the system and analyzed a-posteriori. Online monitoring has given rise to the paradigm of monitor oriented programming (reflection of properties at run-time into the programming language), where the detection of a run-time violation of a specification triggers the execution of code intended to handle the case and steer the system into a safe state. Offline monitoring can be used for a more sophisticated class of properties that involve several passes, and for post-mortem analysis.
Finally, instrumentation is another important problem to address. The first approaches to run-time verification abstracted away the given program as a black-box that included the emission of signals to be analyzed by the monitor. One of the most important problems in run-time verification is how to instrument the original code by transforming it into a new program that emits the necessary signals efficiently, or that already includes the monitor embedded. Moreover, there is a promising interaction between static and run-time verification. On one hand, when a property cannot be proved completely at static time, run-time verification provides the means to transform the program into a safe version that checks at run-time only those cases where the verification failed. On the other hand, static analysis can improve drastically the quality of the instrumentation by taking into account those cases that cannot happen or that must happen. This facet of run-time verification is directly related to the topic of Abstraction-Based Programming, with fruitful interactions are likely to be exploited.
This topic will include a phase of tool prototyping for practical industrial programming languages and systems, and integration within existing frameworks. It is specially useful and challenging to perform run-time verification for embedded systems, for which we expect to leverage results from our work in resource analysis.
Synthesis is the activity of generating a computational system automatically from a formal specification. The theoretical concept of synthesis is old and has been thoroughly studied for the case of generating computable functions. In the last decade researchers have devoted a lot of attention to synthesis of reactive systems (for example, controllers). The main results and breakthroughs are based on classical languages and logics for formal specification, using automata theory and game theory for the algorithmic solutions. There is nowadays a clear understanding of the applications and limitations of automatic synthesis from specifications in different formalisms.
An emerging area of application consists on partial synthesis, where the generation stage tailors specific components that are then composed with other subsystems, possibly developed using different techniques. In some cases, these subsystems can even be developed with conventional --informal-- techniques common in software engineering practice. The correctness of the composition is guaranteed, as long as these components conform to a specification that is a generated automatically, as byproduct of the synthesis process. This conformance test --a proof obligation for the component-- can then be checked by classical verification techniques for each subsystem separately. Alternatively, the subsystems can be analyzed upfront --using specific static analysis-- to obtain enough information to then perform the synthesis that guarantees the desired global property.
This partial synthesis approach is particularly useful for generating aspects of the behavior of the system for which solutions are very difficult to obtain ``manually,'' and general algorithmic solutions either do not exist or are very inefficient. Examples include the synthesis of application-specific resource managers and schedulers for parallel and distributed systems. These approaches are based on a stage of static analysis of the system followed by a synthesis or program transformation stage. The use of static analysis is not intended to prove or disprove correctness but to obtain information about the particular case at hand. Even though some instances of this technique exist, there is not yet a general theory and methodology with wide applicability, which will enable the understanding of the usability and limitations of this approach.
Successful instances of this approach to synthesis can be understood as a practical use of abstraction-based programming, as presented above, and serve as illustrative idioms of this programming style.
The technique of Abstract Interpretation has allowed the development of sophisticated program analyses which are at the same time provably correct and practical. Apart from formal verification, the semantic approaches enabled by abstract interpretation analysis have also been traditionally applied to program optimization and transformation at compile time. More recently, novel and promising applications of abstract interpretation are emerging in a more general context of program development.
A significant new class of debugging, verification, and optimization tasks are made possible in a sound way through recent advances in abstract interpretation. This is starting to influence the design of program development environments, which are strarting to expose to the programmer the capabilities of the analyzer. The objective of such abstraction-based programming environments is to improve the software development process by incorporating static certification, semantic debugging, and optimization as an integral part of the compiler. Being able to infer non-trivial properties about the program allows the compiler to check whether a program conforms to a specification. However, most of the properties that are interesting about programs in the software-intensive or embedded contexts (such as time and memory consumption, dynamic data sizes, termination, absence of errors or exceptions) are undecidable. Consequently, safe approximation abstraction-based techniques are particularly useful in this context.
As a result of performing safe approximations and using them for checking program specifications, three possible cases arise in practice during program development. First, if the information inferred by the analyzer suffices to prove that the program adheres to the specification, then the program has been effectively certified and is provably correct. The fixpoint computed by the abstract interpreter (or a subset thereof, if fixpoint compression or certificate reduction are applied) is an automatically generated abstract certificate which can be used in proof-carrying code scenarios (an approach known as Abstraction Carrying Code). If the inferred information contradicts part of the specification then a static semantic error has been found. These semantic errors are similar to typing errors but refer not just to types but also to more complex properties such as those mentioned before (time and memory consumption, dynamic data sizes, termination, absence of errors or exceptions, etc.). The ability to generate real errors at compile time is referred to abstract debugging. A compiler with built-in abstract interpretation has the potential to help in identifying the nature, location the origin of the error (a process known as abstract diagnosis). Finally, if the compiler fails to prove and disprove a given part of the specification, it can automatically derive run-time checks from such part of the specification and even generate test cases for empiric evaluation.
The overall objective of this topic is to turn abstract interpretation into a basic and fundamental building block of program development tools and environments. This includes making abstraction-based certification a routine activity of application development, as well as semantic test generation, and debugging and error diagnostic based on abstract interpretation. Tools generated as a result of this research effort will integrate other (internal and external) tools and libraries, and in turn, be available for use in future programming tool frameworks.
Rigorous programming optimization is aimed at transforming programs into software that runs in the fastest and most resource-efficient way on the platforms and environmental conditions it is deployed on, while maintaining its observable behavior. This requires the development of well-founded program transformation techniques and models of execution platforms and environments.
Optimized programs are expected to preserve observable results, while performing better with respect to certain metrics, like time or resource consumption. Techniques range from methods for saving on memory and processing time on sequential processors, to automatic parallelization and adaptive task scheduling in parallel and distributed computers. Advanced goals include distributed scheduling, self-reconfiguration, and automatic adaptation to environmental conditions.
Reasoning a-priori about the benefits of a program transformation is challenging but possible to a certain extent, thanks to techniques such as abstract interpretation, partial evaluation, and combinations of them. At the same time, static techniques are specially effective when combined with run-time support to achieve objectives such as self-adaptation, self-configuration, resiliency, etc.
Automatic parallelization is of particular interest since highly parallel processors, which were previously only considered in high performance computing, have steadily made their way into mainstream computing. Nowadays, even standard desktop and laptop machines include multicore chips with up to four cores, and the tendency is that these figures will consistently grow in the foreseeable future. This enormous performance potential will go untapped by software developed with current programming practices which typically keep one thread busy. The alternative is to produce parallel programs or parallelize existing ones. Parallel programming is a tedious and highly error-prone task, worsening the inherent complexity of developing software. With current techniques, requiring programmers to parallelize programs by hand in order to exploit these new multicore architectures will inevitably lead to a decrease in productivity. An ideal alternative, at least in principle, is automatic parallelization, whereby the compiler is able to identify parts of the application that are independent and can thus be run in parallel.
The use of declarative languages is instrumental in this task since these languages preserve more of the parallelism that is intrinsic in applications. It is also desirable to develop parallelization technology for programming languages that are currently in widespread use. This goal requires new notions of independence and more powerful analysis techniques that can detect parallelism in the presence of complex processes with irregular access patterns to dynamically allocated data, as well as novel notions of memory coherence or synchronization that new multicore architectures bring about.
At the same time, some degree of manual parallelization will be needed to obtain the best performance. Current compilers are not able to automatically exploit parallelism with conventional algorithmic descriptions. Parallel languages and language extensions, including parallel design patterns, will be essential. A promising approach will include merging automatic and manual parallelization. Programmers can use annotations to instruct the compiler about opportunities for parallelization. Conversely, verifying parallel compilers can correct erroneous programmer parallelizations.
At the same time, new research is needed on run-time systems, including parallel abstract machines. New features include monitoring and general support for scheduling or adaptation to different load conditions and resource availability. Resource analysis is likely to provide useful results with potential applicability to program optimization and, in particular, automatic parallelization.
Finally, a simple approach to parallelization based only on detecting independence does not take into account a number of important overheads which occur in practice, such as process creation and scheduling. These overheads can produce a penalty in performance, which motivates methodologies to estimate the granularity of parallel tasks. In particular, it is important to anticipate as much work at compile time as possible to reduce the run-time overhead associated with performing granularity control. Much work remains to be done in this area, however, in order to develop really powerful and practical and scalable methodologies and tools for performing automatic adaptive resource usage control.
We envision the evolution of parallelizing compilers like that of traditional compilation technology. While in the infancy of compilation and optimization ``real programmers'' often coded large crucial parts of systems in assembler for efficiency reasons, nowadays even critical systems are routinely coded in higher level languages. Similarly, advances in automatic parallelization should gradually reduce the need for manual parallelization.
This is an area were there can be potential for collaboration with IMDEA Networks in concurrency and distributed execution, as well as with the parallelism-related activities at IMDEA Mathematics.
Programming languages are the main instrument for software development. Programming paradigms at a higher level of abstraction free programmers from being burdened with low-level details which eases the development of software with more complex functionality.
An interesting new trend is the influence of abstract interpretation and other sound static analysis techniques in programming language design, beyond the standard applications to debugging, verification, and optimization. For example, inferring type definitions (shapes) and many other data and control-related properties allow untyped languages to inherit --and in some cases even supersede-- some characteristics of strongly typing languages. Language designers can shift attention from type systems and type checking to the more general arena of assertion languages and assertion inference.
This approach involves a tight integration between programming and specification languages, in the spirit of design by contract. We propose to investigate language designs where programming and specification languages are blended together, and many properties can be defined within the source language. In this case, some uses of run-time verification emerge naturally as checking assertions that are unresolved at compile time.
Assertion languages should allow expressing specifications in a precise way as well as to support semantic-based interoperability, and allow the description of properties at different levels of abstraction. The assertion language also serves as the communication vehicle between a programmer and an interactive compiler. Additionally, a single language can potentially allow the programmer to document, test, modify, maintain, and compose components, thus being useful throughout the whole software development process.
Another effect of abstract interpretation-based programming language design is that extensive sound optimizations can be enabled by the developer. For example, generic code does not necessarily incur in any performance penalty. Instead, the information obtained from global analysis can allow a specialization and subsequent optimization that trims the overhead due to generic instantiation.
Another application of this principle is the development of languages which combine the power of several paradigms. This allows the study of how the different characteristics of these paradigms interact. Examples are improving support for applications which require concurrency or complex problem solving, which has been solved in different ways using different paradigms. As a more concrete example, constraint solving has been proved useful in applications including optimization, program analysis, or coordination of services, and it has also been shown that having a tight connection to constraint solving within a programming language can result in very compact, powerful, and efficient encodings of complex problems.
Many modern programming languages are not restricted to a fixed syntax and semantics, but rather consist of highly-extensible kernels upon which both the language designer and the programmer can build new abstractions. This approach greatly simplifies the development of application-specific languages tailored for a particular application area or domain.
In this topic we will focus on the development of a framework of tools upon which to experiment with new features and combinations of paradigms, as well as developing implementation technology. Naturally, this framework --being flexible and extensible --will benefit from being open-source code in order to build a community of contributing programmers and users.