Secondary indexes in YugabyteDB are critical for improving database query performance by allowing faster access to rows. However, like any tool, they must be used wisely. When too many indexes exist, or when indexes are redundant, they can hurt database performance, increase storage costs, and complicate maintenance tasks.
In today’s YugabyteDB Tip, we’ll explore how to identify redundant indexes, why removing them is important, and how to decide if a secondary index could make your primary key redundant.
A redundant index is an index that does not provide any new or unique benefits over an existing index. This can occur for various reasons:
- Overlapping columns: If two indexes cover the same columns or similar queries, one may be unnecessary.
- Duplicate functionality: An index that serves the same purpose as another can be considered redundant.
For example, imagine two indexes that both cover the same columns in a table but with different definitions or even covering columns. If the functionality of one index is entirely covered by another, the first index becomes redundant.
Identifying redundant indexes typically requires comparing the column sets and usage patterns of all indexes on a table. Here are some approaches:
Column Overlap: Look for indexes that cover the same columns in the same order. If one index includes all the columns of another (plus perhaps some additional ones), the second index may be redundant.
Index Usage: Examine whether an index is being used frequently. You can track this using database statistics from views like
pg_stat_user_indexes
. If an index has not been used in a long time (idx_scan = 0
), it’s a strong candidate for removal.Query Patterns: Indexes designed to optimize query patterns that are no longer relevant (or never were) can also become redundant. Reviewing query logs or running EXPLAIN plans can help identify which indexes are being used by your most common queries.
We can use an automated SQL script to help identify such indexes in your YugabyteDB environment. The SQL looks for indexes whose columns overlap with others and checks for usage statistics to highlight indexes that are not being used in queries.
** Note **
The provided script help identify potentially redundant indexes. Before dropping any of them, be sure to review and validate with EXPLAIN
to ensure they’re truly unnecessary!
For ease of use and reusability, we can encapsulate the logic of the SQL script into a database view named redundant_index_vw
:
CREATE OR REPLACE VIEW redundant_index_vw AS
WITH index_info AS (
SELECT
idx.indexrelid AS index_oid,
idx.indrelid AS table_oid,
idx.indisunique,
pg_get_expr(idx.indpred, idx.indrelid) IS NOT DISTINCT FROM NULL AS is_partial,
pg_get_expr(idx.indpred, idx.indrelid) AS where_clause,
i.relname AS index_name,
ns.nspname AS schema_name,
tbl.relname AS table_name,
pg_get_indexdef(idx.indexrelid) AS indexdef,
CASE
WHEN con.contype = 'p' THEN 'primary'
WHEN idx.indisunique THEN 'unique'
ELSE 'regular'
END AS index_type,
ARRAY(
SELECT a.attname
FROM generate_subscripts(idx.indkey, 1) AS s(i)
JOIN pg_attribute a ON a.attrelid = idx.indrelid AND a.attnum = idx.indkey[s.i]
) AS index_columns
FROM pg_index idx
JOIN pg_class i ON i.oid = idx.indexrelid
JOIN pg_class tbl ON tbl.oid = idx.indrelid
JOIN pg_namespace ns ON ns.oid = i.relnamespace
LEFT JOIN pg_constraint con ON con.conindid = idx.indexrelid
WHERE ns.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
),
redundant_indexes AS (
SELECT
a.schema_name,
a.index_name AS redundant_index,
a.table_name,
a.index_type,
COALESCE(psui.idx_scan, 0) AS idx_scan,
b.index_name AS covering_index,
a.index_oid,
pg_get_indexdef(a.index_oid) AS indexdef,
format('DROP INDEX IF EXISTS %I.%I;', a.schema_name, a.index_name) AS base_drop_stmt
FROM index_info a
JOIN index_info b
ON a.table_oid = b.table_oid
AND a.index_oid <> b.index_oid
AND a.is_partial = b.is_partial
AND a.where_clause IS NOT DISTINCT FROM b.where_clause
AND (
cardinality(a.index_columns) <= cardinality(b.index_columns)
AND a.index_columns = b.index_columns[1:cardinality(a.index_columns)]
)
AND (
NOT a.indisunique OR b.indisunique
)
LEFT JOIN pg_stat_user_indexes psui ON psui.indexrelid = a.index_oid
)
SELECT
schema_name,
redundant_index,
table_name,
index_type,
idx_scan,
covering_index,
CASE
WHEN indexdef ILIKE '%include (%' THEN base_drop_stmt || ' -- WARNING: This index has INCLUDE columns'
ELSE base_drop_stmt
END AS drop_statement
FROM redundant_indexes
ORDER BY idx_scan, schema_name, table_name, redundant_index;
Let’s look at a few examples using YugabyteDB version 2024.2.2.1:
yugabyte=# SELECT split_part(version(), '-', 3) "YB Version";
YB Version
------------
2024.2.2.1
(1 row)
First, let’s examine a table named users
, with several indexes defined:
CREATE TABLE users (id int PRIMARY KEY, name TEXT, normalized_name TEXT, email TEXT);
CREATE UNIQUE INDEX users_name_email_uk ON users(name, email);
CREATE INDEX users_name_idx ON users(name);
CREATE UNIQUE INDEX users_name_uk ON users (name);
CREATE UNIQUE INDEX users_normalized_name_uk ON users(normalized_name);
yugabyte=# \d users;
Table "public.users"
Column | Type | Collation | Nullable | Default
-----------------+---------+-----------+----------+---------
id | integer | | not null |
name | text | | |
normalized_name | text | | |
email | text | | |
Indexes:
"users_pkey" PRIMARY KEY, lsm (id HASH)
"users_name_email_uk" UNIQUE, lsm (name HASH, email ASC)
"users_name_idx" lsm (name HASH)
"users_name_uk" UNIQUE, lsm (name HASH)
"users_normalized_name_uk" UNIQUE, lsm (normalized_name HASH)
yugabyte=# EXPLAIN (ANALYZE) SELECT * FROM users WHERE name = 'lucy';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using users_name_uk on users (cost=0.00..4.12 rows=1 width=100) (actual time=0.885..0.885 rows=0 loops=1)
Index Cond: (name = 'lucy'::text)
Planning Time: 0.149 ms
Execution Time: 0.951 ms
Peak Memory Usage: 24 kB
(5 rows)
yugabyte=# SELECT * FROM redundant_index_vw WHERE table_name = 'users';
schema_name | redundant_index | table_name | index_type | idx_scan | covering_index | drop_statement
-------------+-----------------+------------+------------+----------+---------------------+---------------------------------------------
public | users_name_idx | users | regular | 0 | users_name_uk | DROP INDEX IF EXISTS public.users_name_idx;
public | users_name_idx | users | regular | 0 | users_name_email_uk | DROP INDEX IF EXISTS public.users_name_idx;
public | users_name_uk | users | unique | 1 | users_name_email_uk | DROP INDEX IF EXISTS public.users_name_uk;
(3 rows)
The output of the redundant_index_vw
view shows that the indexes users_name_idx
and users_name_uk
are redundant, as queries can be served by the users_name_email_uk
index.
Note that in the EXPLAIN (ANALYZE)
output above, the optimizer selected users_name_uk
over the other indexes because it was the most recently created.
We can remove the redundant indexes and verify that users_name_email_uk
does, in fact, support the query.
yugabyte=# SELECT DISTINCT drop_statement FROM redundant_index_vw WHERE table_name = 'users';
drop_statement
---------------------------------------------
DROP INDEX IF EXISTS public.users_name_idx;
DROP INDEX IF EXISTS public.users_name_uk;
(2 rows)
yugabyte=# \gexec
DROP INDEX
DROP INDEX
yugabyte=# EXPLAIN (ANALYZE) SELECT * FROM users WHERE name = 'lucy';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using users_name_email_uk on users (cost=0.00..16.25 rows=100 width=100) (actual time=1.022..1.022 rows=0 loops=1)
Index Cond: (name = 'lucy'::text)
Planning Time: 2.227 ms
Execution Time: 1.199 ms
Peak Memory Usage: 24 kB
(5 rows)
As a second example, let’s look at a table named test
and its indexes:
CREATE TABLE test (c1 INT PRIMARY KEY, c2 INT);
CREATE INDEX test_c1 ON test(c1);
CREATE UNIQUE INDEX test_c1_c2_uk ON test(c1, c2);
CREATE UNIQUE INDEX test_c2_uk ON test(c2) INCLUDE (c1);
yugabyte=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | not null |
c2 | integer | | |
Indexes:
"test_pkey" PRIMARY KEY, lsm (c1 HASH)
"test_c1" lsm (c1 HASH)
"test_c1_c2_uk" UNIQUE, lsm (c1 HASH, c2 ASC)
"test_c2_uk" UNIQUE, lsm (c2 HASH) INCLUDE (c1)
yugabyte=# SELECT * FROM redundant_index_vw WHERE table_name = 'test';
schema_name | redundant_index | table_name | index_type | idx_scan | covering_index | drop_statement
-------------+-----------------+------------+------------+----------+----------------+----------------------------------------
public | test_c1 | test | regular | 0 | test_c1_c2_uk | DROP INDEX IF EXISTS public.test_c1;
public | test_c1 | test | regular | 0 | test_pkey | DROP INDEX IF EXISTS public.test_c1;
public | test_pkey | test | primary | 0 | test_c1_c2_uk | DROP INDEX IF EXISTS public.test_pkey;
(3 rows)
This is a peculiar case, as the redundant_index_vw
view shows that the primary key is redundant! This suggests that perhaps a defined unique key could be a better candidate to serve as the table’s primary key instead.
In our example, we can confidently drop the test_c1
index and modify the primary key, replacing it with the unique key:
yugabyte=# EXPLAIN SELECT * FROM test WHERE c1 = 1;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..4.11 rows=1 width=8)
Index Cond: (c1 = 1)
(2 rows)
yugabyte=# SELECT DISTINCT drop_statement FROM redundant_index_vw WHERE table_name = 'test' AND index_type <> 'primary';
drop_statement
--------------------------------------
DROP INDEX IF EXISTS public.test_c1;
(1 row)
yugabyte=# \gexec
DROP INDEX
yugabyte=# DROP INDEX test_c1_c2_uk;
DROP INDEX
yugabyte=# ALTER TABLE test DROP CONSTRAINT test_pkey;
NOTICE: table rewrite may lead to inconsistencies
DETAIL: Concurrent DMLs may not be reflected in the new table.
HINT: See https://github.com/yugabyte/yugabyte-db/issues/19860. Set 'ysql_suppress_unsafe_alter_notice' yb-tserver gflag to true to suppress this notice.
ALTER TABLE
yugabyte=# ALTER TABLE test ADD CONSTRAINT test_pkey PRIMARY KEY(c1, c2);
NOTICE: table rewrite may lead to inconsistencies
DETAIL: Concurrent DMLs may not be reflected in the new table.
HINT: See https://github.com/yugabyte/yugabyte-db/issues/19860. Set 'ysql_suppress_unsafe_alter_notice' yb-tserver gflag to true to suppress this notice.
ALTER TABLE
yugabyte=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | not null |
c2 | integer | | not null |
Indexes:
"test_pkey" PRIMARY KEY, lsm (c1 HASH, c2 ASC)
"test_c2_uk" UNIQUE, lsm (c2 HASH) INCLUDE (c1)
yugabyte=# EXPLAIN SELECT * FROM test WHERE c1 = 1;
QUERY PLAN
-------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..15.25 rows=100 width=8)
Index Cond: (c1 = 1)
(2 rows)
Note: Be cautious when modifying a table’s primary key if the table already contains data, as discussed in the YugabyteDB tip Pitfalls of Adding a PK to an Existing Table in YSQL.
Have Fun!
