Research Results
The researcher from the IMDEA Software Institute Pedro Valero, supervised by Pierre Ganty, defended his PhD thesis with the title “On the use of Quasiorders in Formal Language Theory”, on the 20th of July. He submitted his thesis as partial fulfillment of the requirements for the degree of Doctor of Philosophy in Software, Systems and Computing.
Quasiorders on words, i.e. binary relations between words, have proven useful for reasoning about formal languages from a theoretical perspective. For instance they have been used to characterize the class of regular languages and for showing that all maximal solutions of certain systems of inequalities on languages are regular.
In his work, Pedro shows that quasiorders also have practical applications by placing them at the core of several efficient algorithms.
More precisely, in his thesis, Pedro uses quasiorders on words to offer a new perspective on two well-studied problems from Formal Language Theory: deciding language inclusion, with applications to searching on compressed text, and manipulating the finite automata representations of regular languages.
The Language Inclusion Problem
The thesis considers the language inclusion problem L1 ⊆ L2, where L1 is regular or context-free and L2 is regular.This problem is solved by checking whether an over-approximation of L1 is included in L2, showing that the language inclusion problem is decidable whenever the over-approximating function satisfies a completeness condition (i.e. its loss of precision causes no false alarm) and prevents infinite ascending chains (i.e. it guarantees termination of least fixpoint computations).
Such over-approximation of L1 is defined using quasiorder relations on words where the over-approximation gives the language of all words “greater than or equal to” a given input word for that quasiorder. In his thesis, Pedro presents a range of quasiorders in order to systematically design decision procedures for different language inclusion problems such as regular languages into regular languages or into trace sets of one-counter nets and context-free languages into regular languages.
Some of the obtained inclusion checking procedures correspond to well-known algorithms, like the so-called antichains algorithms, while others correspond to novel algorithms such as the presented greatest fixpoint language inclusion check, which relies on quotients of languages.
Searching on Compressed Text
Secondly, the thesis instantiates the quasiorder-based framework for the scenario in which L1 consists on a single word generated by a context-free grammar and L2 is the regular language generated by an automaton. The resulting algorithm can be used for deciding whether a grammar-compressed text contains a match for a regular expression.
This algorithm is then extended in order to count the number of lines in the uncompressed text that contain a match for the regular expression. Remarkably, this extension runs in time linear in the size of the compressed data, which might be exponentially smaller than the uncompressed text.
Furthermore, Pedro’s thesis proposes efficient data structures that yield optimal complexity bounds and an implementation –zearch– that outperforms the state of the art, offering up to 40% speedup with respect to highly optimized implementations of the standard decompress and search approach.
Residual Finite-State Automata
The last technical contribution of this thesis is a framework of finite-state automata constructions based on quasiorders over words that provides new insights on residual finite-state automata (RFA for short).
This framework is used to present a generalization of the double-reversal method for RFAs along the lines of the one for deterministic automata and to offer a new perspective on NL\∗, an on-line learning algorithm for RFAs. These results evidence that quasiorders are fundamental to residual automata in the same way congruences are fundamental for deterministic automata.