Superlinear speedup (Work in Progress)

Wednesday, March 14, 2012

11:00am Amphitheatre H-1002

Neil Jones, Professor, DIKU University of Copenhagen, Denmark

Abstract:

Many interesting program transformations (by Burstall-Darlington, Bird, Pettorossi, and many others) have been published that give superlinear program speedups on some program examples. However, these techniques seem all to require a “Eureka step” where the transformer understands some essential property relevant to the problem being solved (e.g., associativity, commutativity, occurrence of repeated subproblems, etc.). Such transformations have proven to be very difficult to automate.

On the other hand, fully automatic transformers exist, including: classical compiler optimisations, deforestation, partial evaluation and Turchin’s supercompilation. However these can be seen only to yield linear time improvements. For example, a limit in string pattern matching was until recently to achieve the speedup of the Knuth-Morris-Pratt algorithm. (The KMP speedup is still linear, although its constant coefficient can be proportional to the length of the subject pattern.)

In 2007 Geoff Hamilton showed that his “distillation” transformation (a further development of supercompilation) can sometimes yield superlinear speedups. It has automatically transformed the quadratic-time “naive reverse” program, and the exponential-time “Fibonacci” program, each into a linear-time equivalent program that uses accumulating parameters. On the other hand, distillation works with a higher-order source language; and is a very complex algorithm, involving positive information propagation, homeomorphic embedding, generalisation by tree matching, and folding). It is not yet clear which programs can be sped up so dramatically, and when and why this speedup occurs.

My current work (joint with Hamilton) is to discover an essential “inner core” of distillation. One approach is to study simpler program classes that allow superlinear speedup. Surprisingly, asymptotic speedups can sometimes be obtained even for first-order tail recursive programs (in other words, imperative flowcharts). The most natural example (discovered just recently) transforms the natural factorial sum program for f(n) = 1! + 2! +…+ n! from quadratic time to linear time.

A disclaimer: we work in a simple imperative program context: no caches, parallelism, etc.

Some examples that suggest principles to be discovered and automated:

In functional programs:

finding shared subcomputations (e.g., the Fibonacci example)
finding unneeded computations (e.g., most of the computation done by “naive reverse”)

In imperative programs:

finding unneeded computations (e.g., generalising the usual compiler “dead code” analysis can give quadratic speedups)
finding shared subcomputations (e.g., the factorial sum example)
code motion to move an entire nested loop outside an enclosing loop
strength reduction
common subexression elimination across loop boundaries, eg extending “value numbering”

Alas, these principles seem to be buried in the complexities of the distillation algorithm and the subtleties of its input language. One goal of our current work is to extract the essential transformations involved, ideally to be able to extend classical compiler optimisations (currently able only to yield small linear speedups) to a well-understood and automated “turbo” version that achieves substantially greater speedups.