List Intersections in DuckDB: Finding Common Elements

Updated: 2024-01-04

DuckDB, a powerful and versatile analytical database, comes equipped with various functions to manipulate array data efficiently. One such function is list_intersect(), designed to identify common elements between two lists.

For instance, SELECT list_intersect([1, 2, 3], [2, 3, 4]) returns [2, 3], revealing the elements shared by both arrays while eliminating duplicates.

However, there’s a limitation—this function exclusively compares two lists. But fear not! In scenarios demanding the intersection of more than two lists, DuckDB’s flexibility allows us to craft innovative solutions.

To address the limitation of comparing only two lists, let’s explore a custom method to find intersections among multiple lists inside a larger list.

DuckDB Query to find common elements within list of lists:

Consider the following query to find common elements within list of lists:

Loading editor...

WITH lol(l) AS (
    SELECT [[1,2], [2,3], [2,4]]
    UNION ALL
    SELECT [[3,4,5], [4,5,5,6], [3,4,4,5,6]]
 ), unnested_lol AS (
    SELECT l, unnest(l, recursive := true) unnested_l
    FROM lol  
 ), common_elements AS (
    SELECT l, unnested_l, len(l) AS len_l, count(l) AS count_l
    FROM unnested_lol
    GROUP BY 1,2
 )
SELECT l, list(unnested_l) AS common_element
FROM common_elements
WHERE count_l >= len_l
GROUP BY 1

The query starts by defining a Common Table Expression (CTE) lol, which contains set of list of lists to analyze for intersections. The subsequent CTE, unnested_lol, unnests the lists, transforming them into individual elements. common_elements counts the occurrences of each element within the lists. The final query filters out and retrieves the lists l where the count of common elements count_l equal or larger than the length of the original list len_l.

Stay tuned for more informative recipes in the Ducklicious DuckDB Cookbook.