List Intersections in DuckDB: Finding Common Elements
DuckDB, a powerful and versatile analytical database, comes equipped with various functions to manipulate array data efficiently. One such function is list_intersect()
, designed to identify common elements between two lists.
For instance, SELECT list_intersect([1, 2, 3], [2, 3, 4])
returns [2, 3]
, revealing the elements shared by both arrays while eliminating duplicates.
However, there’s a limitation—this function exclusively compares two lists. But fear not! In scenarios demanding the intersection of more than two lists, DuckDB’s flexibility allows us to craft innovative solutions.
To address the limitation of comparing only two lists, let’s explore a custom method to find intersections among multiple lists inside a larger list.
DuckDB Query to find common elements within list of lists:
Consider the following query to find common elements within list of lists:
Loading editor...
WITH lol(l) AS ( SELECT [[1,2], [2,3], [2,4]] UNION ALL SELECT [[3,4,5], [4,5,5,6], [3,4,4,5,6]] ), unnested_lol AS ( SELECT l, unnest(l, recursive := true) unnested_l FROM lol ), common_elements AS ( SELECT l, unnested_l, len(l) AS len_l, count(l) AS count_l FROM unnested_lol GROUP BY 1,2 ) SELECT l, list(unnested_l) AS common_element FROM common_elements WHERE count_l >= len_l GROUP BY 1
The query starts by defining a Common Table Expression (CTE) lol
, which contains set of list of lists to analyze for intersections.
The subsequent CTE, unnested_lol
, unnests the lists, transforming them into individual elements.
common_elements
counts the occurrences of each element within the lists.
The final query filters out and retrieves the lists l
where the count of common elements count_l
equal or larger than the length of the original list len_l
.
Stay tuned for more informative recipes in the Ducklicious DuckDB Cookbook.