The hidden traps of regex in LIKE and split
SQL functions sometimes use regular expressions under the hood in ways that surprise users. Two common examples are the LIKE operator and Spark's split function.
In Presto,
split
takes a literal string delimiter and
regexp_split
is a separate function for regex-based splitting. Spark's split, however,
always treats the delimiter as a regular expression.
Both LIKE and Spark's split can silently produce wrong results and waste CPU when used with column values instead of constants. Understanding why this happens helps write faster, more correct queries — and helps engine developers make better design choices.
