Patrick Damme

Balanced Processing of Analytical Queries Based on Lightweight Compression of Intermediate Results

Modern in-memory column-stores are widely accepted as the adequate database architecture for the efficient processing of complex analytical queries over large relational data volumes. These systems keep the entire data in main memory and typically employ lightweight data compression to address the bottleneck between main memory and the CPU. Many different lightweight compression algorithms have been proposed in the past years, but none of them is suitable in all cases and employing an inappropriate algorithm incurs a high overhead. While lightweight compression is already well established for the base data, the efficient representation of intermediate results generated during query processing has attracted insufficient attention so far. This is a significant lack, since in in-memory systems, accessing intermediates is as expensive as accessing the base data. Thus, our vision is a balanced processing of analytical queries based on lightweight compression of intermediate results. That means, we address the challenges of compressing all intermediate results in a query execution plan in a lightweight way and processing them without full decompression by compression-enabled physical operators. Furthermore, the decision of the compression algorithm to use for each intermediate is done by compression-aware strategies for the query optimizer in order to balance the benefits and overheads of compression.