Skip to main content

3 posts tagged with "functions"

View All Tags

The hidden traps of regex in LIKE and split

· 6 min read
Masha Basmanova
Software Engineer @ Meta

SQL functions sometimes use regular expressions under the hood in ways that surprise users. Two common examples are the LIKE operator and Spark's split function.

In Presto, split takes a literal string delimiter and regexp_split is a separate function for regex-based splitting. Spark's split, however, always treats the delimiter as a regular expression.

Both LIKE and Spark's split can silently produce wrong results and waste CPU when used with column values instead of constants. Understanding why this happens helps write faster, more correct queries — and helps engine developers make better design choices.

reduce_agg lambda aggregate function

· 5 min read
Masha Basmanova
Software Engineer @ Meta

Definition

Reduce_agg is the only lambda aggregate Presto function. It allows users to define arbitrary aggregation logic using 2 lambda functions.

reduce_agg(inputValue T, initialState S, inputFunction(S, T, S), combineFunction(S, S, S)) → S

Reduces all non-NULL input values into a single value. inputFunction will be invoked for
each non-NULL input value. If all inputs are NULL, the result is NULL. In addition to taking
the input value, inputFunction takes the current state, initially initialState, and returns the
new state. combineFunction will be invoked to combine two states into a new state. The final
state is returned. Throws an error if initialState is NULL or inputFunction or combineFunction
returns a NULL.

array_sort lambda function

· 6 min read
Masha Basmanova
Software Engineer @ Meta

Presto provides an array_sort function to sort arrays in ascending order with nulls placed at the end.

presto> select array_sort(array[2, 5, null, 1, -1]);
_col0
---------------------
[-1, 1, 2, 5, null]

There is also an array_sort_desc function that sorts arrays in descending order with nulls placed at the end.

presto> select array_sort_desc(array[2, 5, null, 1, -1]);
_col0
---------------------
[5, 2, 1, -1, null]

Both array_sort and array_sort_desc place nulls at the end of the array.