Why Sort is row-based in Velox — A Quantitative Assessment
TL;DR
Velox is a fully vectorized execution engine[1]. Its internal columnar memory layout enhances cache locality, exposes more inter-instruction parallelism to CPUs, and enables the use of SIMD instructions, significantly accelerating large-scale query processing.
However, some operators in Velox utilize a hybrid layout, where datasets can be temporarily converted
to a row-oriented format. The OrderBy operator is one example, where our implementation first
materializes the input vectors into rows, containing both sort keys and payload columns, sorts them, and
converts the rows back to vectors.
In this article, we explain the rationale behind this design decision and provide experimental evidence for its implementation. We show a prototype of a hybrid sorting strategy that materializes only the sort-key columns, reducing the overhead of materializing payload columns. Contrary to expectations, the end-to-end performance did not improve—in fact, it was even up to 3× slower. We present the two variants and discuss why one is counter-intuitively faster than the other.

