Skip to main content

24 posts tagged with "tech-blog"

View All Tags

SEGFAULT due to Dependency Update

· 4 min read
Deepak Majeti
Software Engineer @ IBM
Christian Zentgraf
Software Engineer @ IBM

Background

Velox depends on several libraries. Some of these dependencies include open-source libraries from Meta, including Folly and Facebook Thrift. These libraries are in active development and also depend on each other, so they all have to be updated to the same version at the same time.

Updating these dependencies typically involves modifying the Velox code to align with any public API or semantic changes in these dependencies. However, a recent upgrade of Folly and Facebook Thrift to version v2025.04.28.00 caused a SEGFAULT only in one unit test in Velox named velox_functions_remote_client_test.

A Velox Primer, Part 3

· 10 min read
Orri Erling
Software Engineer @ Meta
Pedro Pedreira
Software Engineer @ Meta

At the end of the previous article, we were halfway through running our first distributed query:

SELECT l_partkey, count(*) FROM lineitem GROUP BY l_partkey;

We discussed how a query starts, how tasks are set up, and the interactions between plans, operators, and drivers. We have also presented how the first stage of the query is executed, from table scan to partitioned output - or the producer side of the shuffle.  

In this article, we will discuss the second query stage, or the consumer side of the shuffle.

A Velox Primer, Part 2

· 9 min read
Orri Erling
Software Engineer @ Meta
Pedro Pedreira
Software Engineer @ Meta

In this article, we will discuss how a distributed compute engine executes a query similar to the one presented in our first article:

SELECT l_partkey, count(*) FROM lineitem GROUP BY l_partkey;

We use the TPC-H schema to illustrate the example, and Prestissimo as the compute engine orchestrating distributed query execution. Prestissimo is responsible for the query engine frontend (parsing, resolving metadata, planning, optimizing) and distributed execution (allocating resources and shipping query fragments), and Velox is responsible for the execution of plan fragments within a single worker node. Throughout this article, we will present which functions are performed by Velox and which by the distributed engine - Prestissimo, in this example.

A Velox Primer, Part 1

· 6 min read
Orri Erling
Software Engineer @ Meta
Pedro Pedreira
Software Engineer @ Meta

This is the first part of a series of short articles that will take you through Velox’s internal structures and concepts. In this first part, we will discuss how distributed queries are executed, how data is shuffled among different stages, and present Velox concepts such as Tasks, Splits, Pipelines, Drivers, and Operators that enable such functionality.

Velox Query Tracing

· 6 min read
Meng Duan (macduan)
Software Engineer @ ByteDance
Xiaoxuan Meng
Software Engineer @ Meta
Jialiang Tan
Software Engineer @ Meta

TL;DR

The query trace tool helps analyze and debug query performance and correctness issues. It helps prevent interference from external noise in a production environment (such as storage, network, etc.) by allowing replay of a part of the query plan and dataset in an isolated environment, such as a local machine. This is much more efficient for query performance analysis and issue debugging, as it eliminates the need to replay the whole query in a production environment.

Optimizing and Migrating Velox CI Workloads to Github Actions

· 4 min read
Jacob Wujciak-Jens
Software Engineer @ Voltron Data
Krishna Pai
Software Engineer @ Meta

TL;DR

In late 2023, the Meta OSS (Open Source Software) Team requested all Meta teams to move the CI deployments from CircleCI to Github Actions. Voltron Data and Meta in collaboration migrated all the deployed Velox CI jobs. For the year 2024, Velox CI spend was on track to overshoot the allocated resources by a considerable amount of money. As part of this migration effort, the CI workloads were consolidated and optimized by Q2 2024, bringing down the projected 2024 CI spend by 51%.

Further Optimizing TRY_CAST and TRY

· 9 min read
Masha Basmanova
Software Engineer @ Meta

TL;DR

Queries that use TRY or TRY_CAST may experience poor performance and high CPU usage due to excessive exception throwing. We optimized CAST to indicate failure without throwing and introduced a mechanism for scalar functions to do the same. Microbenchmark measuring worst case performance of CAST improved 100x. Samples of production queries show 30x cpu time improvement.

Improve LIKE's performance

· 5 min read
James Xu
Software Engineer @ Alibaba

What is LIKE?

LIKE is a very useful SQL operator. It is used to do string pattern matching. The following examples for LIKE usage are from the Presto doc:

SELECT * FROM (VALUES ('abc'), ('bcd'), ('cde')) AS t (name)
WHERE name LIKE '%b%'
--returns 'abc' and 'bcd'

SELECT * FROM (VALUES ('abc'), ('bcd'), ('cde')) AS t (name)
WHERE name LIKE '_b%'
--returns 'abc'

SELECT * FROM (VALUES ('a_c'), ('_cd'), ('cde')) AS t (name)
WHERE name LIKE '%#_%' ESCAPE '#'
--returns 'a_c' and '_cd'

reduce_agg lambda aggregate function

· 5 min read
Masha Basmanova
Software Engineer @ Meta

Definition

Reduce_agg is the only lambda aggregate Presto function. It allows users to define arbitrary aggregation logic using 2 lambda functions.

reduce_agg(inputValue T, initialState S, inputFunction(S, T, S), combineFunction(S, S, S)) → S

Reduces all non-NULL input values into a single value. inputFunction will be invoked for
each non-NULL input value. If all inputs are NULL, the result is NULL. In addition to taking
the input value, inputFunction takes the current state, initially initialState, and returns the
new state. combineFunction will be invoked to combine two states into a new state. The final
state is returned. Throws an error if initialState is NULL or inputFunction or combineFunction
returns a NULL.

Learnings from optimizing try_cast

· 4 min read
Laith Sakka
Software Engineer @ Meta

One of the queries shadowed internally at Meta was much slower in Velox compared to presto(2 CPU days vs. 4.5 CPU hours). Initial investigation identified that the overhead is related to casting empty strings inside a try_cast.

In this blogpost I summarize my learnings from investigating and optimizing try_cast.