IMDEA Software

IMDEA initiative

Home > Events > Invited Talks > 2019 > The State of the Art of Data Analytics Systems and What is Wrong about it

Ingo Mueller

Friday, April 5, 2019

10:45am Meeting room 302 (Mountain View), level 3

Ingo Mueller, Post-doctoral Researcher, ETH Zurich, Switzerland

The State of the Art of Data Analytics Systems and What is Wrong about it

Abstract:

Few technological advances have affected as many aspects of science, economy, and society in general as the ability to collect, analyze, and understand large amounts of data. Data analytics systems play an important role in this development as they translate the exponential performance improvements made by hardware into similar improvements at higher abstraction levels. As one example, I will present a thorough study of a core database primitive, grouping with aggregation, done in the context of a commercial system for relational in-memory processing. For this primitive alone, we had to address a number of challenges: (provable) cache-efficiency, CPU-friendliness, parallelism within and across processors, robust handling of skewed data, adaptive processing, processing with constrained memory, and integration with modern database architectures.

I argue that this approach corresponds to the state of the art of system building: Today’s systems typically implement one analysis/platform combination, requiring data scientists to constantly switch tools and duplicating implementation effort of systems and their applications. Still, they are all very similar on a conceptual level, suggesting that we have not fundamentally understood what makes up the essence of our systems. I will thus sketch my vision and research theme for the foreseeable future: a common abstraction for a large span of types of data analytics that can run efficiently on a variety of hardware platforms.