Back to homepage

Vanocka

Overview

Vanocka is an academic research prototype for underapproximated reachability analysis of multithreaded programs. See [POPL11] for more details about the underapproximation technique.

Reachability of a line of code in the multithreaded program is an undecidable problem. Pattern-based reachability analysis solves the following related and decidable problem: given multithreaded program, a line of code loc and a scheduler S. Is loc reachable following an execution scheduled by S? The scheduler S is specified using a restricted regular expressions called pattern.

Intuitively, the pattern specifies S in terms of the next thread to run and the content of shared memory at the context switch point. Vanocka allows for arbitrary recursion within each of the threads, however the domain of both local and shared variables is bounded.

Decidability of the pattern-based reachability problem holds because the coverage is restricted compared the general reachability problem. Therefore, in the case where Vanocka returns a negative answer, we can only conclude that loc is not reachable following the schedules given by the pattern even though there may exists an interleaving (not captured by the pattern) which leads the program to reach loc.

Formally, the pattern is given by a regular expression of the form w_1* ... w_n* - a sequence of finite words, each of them being able to repeat arbitrarily many times. For this reason, patterns allow to capture unboundedly many context switches between threads, which makes the pattern-based approach more general than bounded context switches one.

The pattern-based reachability algorithm approaches the problem from the perspective of formal languages. In particular, the reachability problem is reduced to non-disjointness of context-free languages. The theory behind the technique as well as decidability and complexity results are further discussed in [POPL11,CAV10]. Some implementation details of Vanocka can be found in the following talk given at IMDEA Software on Jan 20, 2011.

People involved in the development of Vanocka:

Running Vanocka

Download

You can download Vanocak-0.1.5 here [May 2012]. Users should discard any prior version.

Dependencies

To run the tool, you need to have following software installed.

Prolog SWIProlog tested with version 5.11.32
Solver Yices tested with version 1.0.31

Vanocka also supports Z3. However it has not been thoroughly tested.

Input

The user must provide, as input, the threads and the pattern. Each of them is given by a set of Prolog terms and rules. A thread is specified as a labelled control-flow graph whose labels are operations on data. Alternatively, one can provide each thread as a boolean program.

Program definition as Prolog rules

Data

The following table summarizes the supported data types and operations

Types Values Operations
bool true,false a=C, a==C
mutex locked,unlocked lock(a), unlock(a)
int(MIN,MAX) intergers from [MIN..MAX] a=C, a~C, not(a~C)

where ~ is either ==, <, >, >=, =<, inc(a), dec(a) (a is a variable of the given type, C is a constant) uint(MAX) stands for int(0,MAX), int(MAX) stands for in(-MAX,MAX) and int stands for int(-1,5). Moreover, we support assignment among variables. Such assignment must be, however explicitly listed in the memory definition as explained below.

Each thread contains a data definition of the form memDef(G,L,[A]) where both, G and L, are lists of variable definition. Each variable definition takes form type-name=initValue. The last, optional argument A is the list of assignments among variables used by the thread. The items are of the form a=b.

Threads

The control-flow graph captures the control flow of each thread in terms of procedure calls, program locations and operations over data. The data definition is the list of local and shared variables, each having a type and an initial value. All variables are global to all procedures. There is no support for procedure-local variables as well as arguments.

Thread definition is a Prolog term taking the form thread(TID,CFGRM,MEMDEF), where

  • TID is a unique thread identifier, used later in the pattern to identify threads.
  • CFGRM is a context-free grammar capturing the control flow. It is specified using the cfGrm term described later.
  • MEMDEF is a definition of the data (i.e. variables + operations) used by the thread. It is the memDef term described above.

Context-free grammar takes the form cfGrm(INIT,RULES),where INIT is an initial non-terminal (entry point of the thread) and RULES is a list of rules of the grammar. Non-terminals of the grammar stands for program locations while the terminals capture data operations. In the rules given below, P1,P2 and P3 stand for the non-terminals (i.e. program locations) while term(T) is a terminal (data operation). In rule, the first argument corresponds to the left hand side of the context-free grammar production while the remaining ones correspond to the right hand side.

  • rule(P1,P2): Step in the control flow not with no data operation.
  • rule(P1,P2,P3): Procedure call. P2 is a the entry location of a procedure. Upon completion of the procedure, resumes the execution at location P3.
  • rule(P1,eps): Return from a procedure call.
  • rule(P1,term(T),P2): Step in the control flow operating on some data. Depending on T, it is either a test (e.g. c<4) or an update (e.g. c=4).
  • rule(P1,term(T)): Last step in a procedure. Once the operation is performed, return from the function.

The set of threads is defined by the Prolog term threads(-THREADS) where the variable THREADS is a list of thread definitions (i.e. thread terms).

The statement assert(false) is used to define control points against which reachability is checked.

Program definition as Boolean Program

Alternatively, program can be specified as a boolean program. In such case, the threads(-THREADS) predicate is expected to invoke the provided parser of boolean programs in the following way threads(T):-createTaskDefinitionRulesFromBP('file.bp',T).. There, the file.bp is the name of the file to be parsed. The name is relative to the location of the prolog file where the threads predicate is defined. The boolean program is expected to define global variables and one function for each thread. The functions are, by convention, named as thread0, thread1, ..., threadN. Moreover, there can be a init function which is executed prior to the parallel execution of the threads. Neither function parameters nor return values are supported. Moreover, in case of recursion, the local variables of a function are common to all instances of the function, which changes the semantics of parallel execution or recursion. These limitations are given by the native way of program representation in vanocka (CFG).

Pattern definition

The pattern is a given by a list of words each of them made of the following symbols

  • go(TID) - the running thread is TID
  • yield(g(VAR1,VAR2)) - the current thread is getting inactive and yielding the other threads. The current value of shared variables is VAR1 and VAR2. The number of arguments of the g term depends on the number of variables shared by the threads.

The pattern is given by the term pattern(-PTRN) where the variable PTRN is a list of lists of terms yield/go.

From the language perspective, each context switch consists of a consecution of yield and go symbols. See an example of pattern in the tutorial.

Patterns for the boolean programs are specified in the same way, however, the pattern has to consider the implicit variables generated by the boolean program parser. In particular, the list of global variables starts with two implicit variables - mutex system and bool initDone. The former is used to implement critical sections (atomic_begin/end statements). Its value as it occurs in the pattern should be always unlocked. The latter is set to true when the thread init has completed. It is up to the user to ensure using the pattern that the init thread is scheduled first and that it is not interrupted as long as the value of initDone is false.

Command parameters

Vanocka is executed by the following command:

vanocka [options] taskDefinitionFile

taskDefinitionFile is a path leading to the task definition file. e.g. ./examples/bt1 Supported options:

-h Show this message
-c K Context switching mode. Ignore the given pattern and solve the reachability in K context switches
-m METHOD Choose the solver method. Currently supported clpfd and yices(default).
If clpfd is chosen, the internal prolog solver for finite domains is used. It is limited to look for solutions where all variables are smaller than 50.
-a AMSTMODE Abstraction mode - all/local/none.
-s IMPL Select set implementation. One of assoc,assertz,nb_set,ordsets. Default is assoc.
-t Print the time statistics
-q Print queue sizes during instantiation phase (coreachability)
-v Verbose output
-e Simulation of a thread with context switches following the pattern. Starting from the entry node, the user is shown the list of the options in terms of program locations and variable values and asked to choose one (entering the number followed by .). This way the user can simulate each thread having exactly the options given by the pattern.

If the formula is satisfiable, then there is a execution of the multithreaded programs which follows the patter.

Tutorial

The following Java example is taken from [SPIN08]. It is a Java version of Windows NT Bluetooth driver. In this section we show the step by step transformation of the example into the Vanocka input. The example consist of two classes. The Device class keeps the state of a device managed by the driver. The Bluetooth on the other hand provides the static methods add and stop provided to the user threads to acces the device. There is no main method in the example. In our setting, we analyze the situation when there are several threads sharing the same device and executing the add and stop methods in parallel (add and stop are the main procedures of our threads).

Source code

class Device { 
  int pendingIo; 
  boolean stoppingFlag; 
  boolean stoppingEvent; 
  boolean stopped; 
  
  Device() { 
    pendingIo = 1; 
    stoppingFlag = false; 
    stoppingEvent = false; 
    stopped = false; 
  } 
}

class Bluetooth { 
  static void add(Device d) { 
    int status = inc(d); 
    if (status > 0) { 
      assert(!d.stopped); 
      // Performs I/O 
    } 
    dec(d); 
  } 

  static void stop(Device d) { 
    d.stoppingFlag = true; 
    dec(d); 
    while (!d.stoppingEvent) {} 
    d.stopped = true; 
  }

  static int inc(Device d) { 
    int status; 
    synchronized(d) { 
      d.pendingIo++; 
    } 
    if (d.stoppingFlag) { 
      dec(d); 
      status = -1; 
    } else status = 1; 
      return status; 
    } 

  static void dec(Device d) { 
    int pio; 
    synchronized (d) { 
      d.pendingIo--; 
      pio = d.pendingIo; 
    } 
    if (pio == 0) 
      d.stoppingEvent = true; 
  } 
}

Data definition

In our setting, we assume a single instance of the Device class stored in the d variable which is shared by all threads. Apart from that, there are local integer variables status and pio. Thus, the data definition for each thread is the following term.

memDefinition(MEMDEF):-
  MEMDEF=memDef(
  [ mutex-d,                           %mutex of the object d
    int(-1,2)-item(d,pendingIo)=1,     %integer field pendingIo of the object d. The values can range from -1 to 2
    bool-item(d,stopFlag),             
    bool-item(d,stopEvent),
    bool-item(d,stopped)
  ],
  [ int(-1,2)-status,                  %local integer variable status ranging from -1 to 2
    int(-1,2)-pio
  ],
  [ pio=item(d,pendingIo)]             %list of assignments among variables used in the example.
  ).

Since we allow grounded prolog terms to identify variables, we can use term item(objName,fieldName) to represent the field of an object (objName.fieldName in Java). The definition consist of all the data of the shared object d ( including the mutex) and definition of local data. The last item of definition is the list of assigmnents used by the program. In this case, we allow the program to copy the content of shared variable d.pendingIo to the local variable pio.
Recall that all local variables are visible by all functions and neither function arguments nor return values are supported. Although this looks like a severe limitation, in this example, we can still implement those features. In particular, we do not have to pass arguments identifying the Driver instance since there is just one instance. Also, the return value of the inc method is stored into the status variable.

Thread definition

We are going to run two threads, one executes the method add while the other executes the methods stop. All of them start in an inactive state (each non-terminal nt has implicitely defined counterpart z(nt) denoting the situation when the thread is inactive and once it gets active it resumes at nt. We are going to investigate whether the assertion in the add function can be violated. Thus, for the one thread which executes the add function we added an assert(false) statement in it. The following prolog code fragment defines the threads.

cfGrmRules(R):-
  R = [

    %inc
    rule(inc0,term(item(d,stopFlag)==true),inc1),
      rule(inc1,term(status=(-1)),inc2),
      rule(inc2,eps),
    rule(inc0,term(item(d,stopFlag)==false),inc3),
    rule(inc3,term(lock(d)),inc4),
      rule(inc4,term(inc(item(d,pendingIo))),inc5),
    rule(inc5,term(unlock(d)),inc6),
    rule(inc6,term(status=0),inc7),
    rule(inc7,eps),

    %dec
    rule(dec0,term(lock(d)),dec1),
      rule(dec1,term(dec(item(d,pendingIo))),dec2),
      rule(dec2,term(pio=item(d,pendingIo)),dec3),
    rule(dec3,term(unlock(d)),dec4),
    rule(dec4,term(pio==0),dec5),rule(dec4,term(not(pio==0)),dec6),
      rule(dec5,term(item(d,stopEvent)=true),dec6),
    rule(dec6,eps)
  ]. 


threads(THREADS):-
  cfGrmRules(INCDECRULES),
  memDefinition(MEMDEF),
  THREADS= [
    thread(0,cfGrm(z(add0),[
      %add
      rule(add0,inc0,add1), %status is not a stack variable. Instead it is a local one. No returns and assignments.
      rule(add1,term(status==0),add2),rule(add1,term(not(status==0)),add3),
        rule(add2,term(item(d,stopped)==false),add3),
        rule(add2,term(item(d,stopped)==true),assertFailed),
        rule(assertFailed,term(assert(false))),
      rule(add3,dec0,add4),
      rule(add4,eps)
    |INCDECRULES]),MEMDEF ),
    thread(1,cfGrm(z(stop0),[
      %stop
      rule(stop0,term(item(d,stopFlag)=true),stop1),
      rule(stop1,dec0,stop2),
      rule(stop2,term(item(d,stopEvent)==false),stop2),
      rule(stop2,term(item(d,stopEvent)==true),stop3),
      rule(stop3,term(item(d,stopped)=true),stop4),
      rule(stop4,eps)
    |INCDECRULES]),MEMDEF)
  ].

Pattern definition

An thread interleaving leading to the assertion violation is described in [SPIN08]. The pattern below captures that interleaving:

pattern(P):-
P=[[go(0),yield(g(unlocked,1,false,false,false)),
    go(1),yield(g(unlocked,0,true,true,true)),go(0)]].

It requires that first, the thread 0 (executing the add function) runs. When it is preempted by the thread 1, the mutex is unlocked, variable item(d,pendingIo) has value 1, variable item(d,stopEvent) has value false and so on. The ordering of variable values follows the ordering of shared variable definitions.

When crafting pattern P we used the interleaving violating the assertion given in [SPIN08]. Let us now define pattern P1 that captures the interleaving of [SPIN08] and more.

pattern(P):-
P=[[go(0)],[go(1)],[yield(g(unlocked,1,false,false,false))],
   [go(0)],[go(1)],[yield(g(unlocked,0,true,true,true))],
   [go(0)],[go(1)]].

In this case, the pattern does not force which thread is activated next. For the shared memory, it is enforcing some particular value for the shared variables at the context switches points. Recall that every trace is a consecution of go and yield symbols. Thus, all traces generated by the pattern which do not follow this condition are ignored. In particular, at the beginning, since all threads are in the inactive state one of them has to produce a go symbol. Then, it must produce yield. In the meantime none of other threads is allowed to produce go.

The pattern can be modified so as not to enforce a particular value for the shared variables at the context switches point. Applying this modification to P1 would captures every possible interleaving made of 5 context switches. The tool has an option (-c) that generates such pattern automatically.

Running the tool

Having the task defined in the file bt3.pl, vanocka can be executed in the following way

  vanocka -a all bt3

In the above command, vanocka runs using the abstraction phase (useful for such a large example). The following result is interpreted that the assertion is violated (sat) and the error trace uses both words of the pattern once.

sat
(= x_0 1)
(= x_1 1)

If the output reads only unsat the assertion is not violated.

Other examples

Apart from the Bluetooth example presented in the previous section, Vanocka comes with other examples. We invite the potential user to carefully look at those in order to understand how to use Vanocka. Incidentally in the bp subdirectory, there several examples of boolean programs (either contrived or coming from real code).

Implementation

The overall structure of the tool is described in this presentation. However it has evolved significantly from what is described therein. To run the documentation server run the rundoc.sh script. (see PLDoc for more information)

References

  • [POPL11]: Javier Esparza and Pierre Ganty. Complexity of pattern-based verification for multithreaded programs. In POPL ’11: Proc. 38th ACM SIGACT-SIGPLAN Symp. on Principles of Programming Languages, pages 499-510 ACM Press, 2011.
  • [CAV10]: Pierre Ganty, Benjamin Monmege, and Rupak Majumdar. Bounded underapproximations. In CAV ’10: Proc. 20th Int. Conf. on Computer Aided Verification, volume 6174 of LNCS, pages 600–614. Springer, 2010.
  • [SPIN08]: Dejvuth Suwimonteerabuth, Javier Esparza and Stefan Schwoon. Symbolic Context-Bounded Analysis of Multithreaded Java Programs. In SPIN '08 Proceedings of the 15th international workshop on Model Checking Software.

About the name: Vanocka is a sweet bread baked in Czech Republic during Christmas [wiki]. Do you see threads and the pattern?

Back to Homepage

Updated: Wed May 30 16:47:20 CEST 2012

Copyright © 2011, 2012, 2013, 2014, Tomáš Poch, Pierre Ganty