Creating Indexes on ARRAY Columns in YSQL

Secondary indexes boost database performance by enabling faster row retrieval.

In YSQL you can create Unique, Partial, Covering, and Secondary Indexes with JSONB.

A question that comes up a lot is “Can I create a secondary index on a table column that is an ARRAY?”. Let’s see…

				
					yugabyte=# CREATE TABLE test (id INT, a int[]);
CREATE TABLE

yugabyte=# INSERT INTO test VALUES (1, '{1,1,6}'), (2, '{1,6,1}'), (3, '{2,3,6}'), (4, '{2,5,8}');
INSERT 0 4

yugabyte=# CREATE INDEX test_a_idx ON test(a);
ERROR:  INDEX on column of type 'INT4ARRAY' not yet supported

Uh oh. Looks like we can’t create an index directly on an array column.

This is being tracked in the YugabyteDB Github issue #6606.

As a work around, what if we convert the array to TEXT and create an index on that.

				
					yugabyte=# CREATE INDEX test_a_idx ON test(array_to_string(a, ','));
ERROR:  functions in index expression must be marked IMMUTABLE

UGH. We need to create our own IMMUTABLE function…

				
					CREATE OR REPLACE FUNCTION arr_as_text(arr IN INT[])
  RETURNS TEXT IMMUTABLE LANGUAGE PLPGSQL
AS $$
BEGIN
 RETURN arr::text;
END;
$$;

Example:

				
					yugabyte=# CREATE OR REPLACE FUNCTION arr_as_text(arr IN INT[])
yugabyte-#   RETURNS TEXT IMMUTABLE LANGUAGE PLPGSQL
yugabyte-# AS $$
yugabyte$# BEGIN
yugabyte$#   RETURN arr::text;
yugabyte$# END;
yugabyte$# $$;
CREATE FUNCTION

yugabyte=# CREATE INDEX test_a_idx ON test(arr_as_text(a));
CREATE INDEX

Great, we have the index. But is it useful?

For the index to work, we have to include a call to our function, plus we can only query on the entire array, with the index values in the exact order.

				
					yugabyte=# EXPLAIN SELECT * FROM test WHERE a = '{2,3,6}';
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on test  (cost=0.00..102.50 rows=1000 width=36)
   Filter: (a = '{2,3,6}'::integer[])
(2 rows)

yugabyte=# EXPLAIN SELECT * FROM test WHERE arr_as_text(a) = '{2,3,6}';
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Scan using test_a_idx on test  (cost=0.00..7.72 rows=10 width=36)
   Index Cond: (arr_as_text(a) = '{2,3,6}'::text)
(2 rows)

yugabyte=# SELECT * FROM test WHERE arr_as_text(a) = '{2,3,6}';
 id |    a
----+---------
  3 | {2,3,6}
(1 row)

The index does not support the contains functionality either.

				
					yugabyte=# EXPLAIN SELECT * FROM test WHERE a @> '{2}';
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on test  (cost=0.00..102.50 rows=1000 width=36)
   Filter: (a @> '{2}'::integer[])
(2 rows)

yugabyte=# SELECT * FROM test WHERE a @> '{2}';
 id |    a
----+---------
  4 | {2,5,8}
  3 | {2,3,6}
(2 rows)

If we want an index that allows us to use the contains operator, we can create a gin index!

Generalized inverted indexes (GIN indexes) are a type of index in YugabyteDB that index elements inside container columns. This makes queries with conditions on elements inside the columns more efficient. For example, if you had a query like WHERE myintarray @> '{3}' (meaning “is 3 an element of myintarray?”), it would benefit from a GIN index because you can look up the key 3 in the gin index.

				
					yugabyte=# CREATE INDEX NONCONCURRENTLY test_a_gin_idx ON test USING ybgin (a);
CREATE INDEX

yugabyte=# EXPLAIN SELECT * FROM test WHERE a @> '{2}';
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Index Scan using test_a_gin_idx on test  (cost=4.00..12.05 rows=1000 width=36)
   Index Cond: (a @> '{2}'::integer[])
(2 rows)

yugabyte=# SELECT * FROM test WHERE a @> '{2}';
 id |    a
----+---------
  4 | {2,5,8}
  3 | {2,3,6}
(2 rows)

Have Fun!