Decentralized and Distributed Stream Runtime Verification: Enhancing Monitoring in Complex Systems

April 24, 2024

Decentralized and Distributed Stream Runtime Verification: Enhancing Monitoring in Complex Systems

Researcher Luis Miguel Danielsson, advised by Professor César Sánchez, has defended his thesis, titled: “Decentralized and Distributed Stream Runtime Verification”, the 16th of April, at the Computer Science School of the Polytechnic University of Madrid, surrounded by colleagues and family members.

Context

As software has become an important tool for solving a wide range of human problems, it requires managing all the details of how to handle data and computation in a certain algorithm. But building reliable software is a notoriously difficult task. Even more, in concurrent or distributed software, the intrinsic nature of the system makes programming even more difficult, having to manage two or more computations at the same time and understanding and managing their interconnections, such as communication or synchronization.

In this scenario it is even more important to have tools to mechanically determine whether a program is correct or faulty. This is the area of Software Verification, that is concerned with ensuring that a program behaves as expected, that is what is the desired outcome and how we want the program to compute it.

Luis Miguel Danielsson states that: “Verifying reactive systems is even harder when the environment is not of a computational nature, as human-in-the-loop systems, systems that depend on the physical environment (like drones), and distributed systems (like smart buildings). In these systems it is very difficult to model precisely the environment. For this reason, static verification techniques are often limited or even not practically viable at all for these kinds of systems in practice.”

The thesis

The purpose of this thesis is to bring Runtime Verification to Decentralized and Distributed Systems which are, by their concurrent nature, intrinsically complex, in order to improve privacy, efficiency or add fault tolerance to distributed monitoring that presents clear advantages over centralized solutions.

There are three general objectives of this thesis: define the problem of monitoring decentralized and distributed systems with rich verdicts; propose algorithmic solutions to decentralized monitoring and prove formally their correctness; and implement those solutions and conduct empirical evaluations.

Luis Miguel’s hypothesis is that they can provide correct decentralized and distributed algorithms based on Stream Runtime Verification that can effectively and efficiently monitor decentralized and distributed systems aiding in the task of improving their quality.

The solution proposed by Dr. Danielsson is based on Runtime Verification which attempts to alleviate the issues present in Static Verification by avoiding modeling the environment and its scalability problems by restricting the analysis to only one trace and not all traces of the system.

The first contribution of the Thesis is a decentralized monitoring algorithm based on streams for synchronous networks. Then, Luis Miguel presents extensions for more realistic network models, concretely timed asynchronous networks. Finally, he introduces a fault-tolerant enhancement that allows the monitors to be resilient against message duplication and message loss.

The second contribution of the Thesis is a comparison and translation between synchronous and asynchronous languages in the decentralized setting. To this end, Dr. Danielsson proposes a decentralized asynchronous algorithm and proceeds to make the same comparison and translations available in a previous work, just in a decentralized setting, finding that they are feasible but involve an additional cost.

“Runtime Verification is a middle-ground between static formal verification and testing: having a formal specification with the precision and clarity that it delivers in combination with the lightweight of having to analyze just one trace, as in testing” determines Luis Miguel. This makes RV suitable to analyze large complex programs which are not usually well handled neither in static verification nor in Testing.

In summary, when choosing an algorithm for monitoring a decentralized system, one should consider not only the specification formalism but how the data is collected and processed. When data processing is performed periodically the synchronous language and monitoring algorithm will perform better. In contrast, when data is obtained sporadically or in bursts the asynchronous language and algorithm will avoid unnecessary computation, and outperform a synchronous solution.