Identifying and Removing Redundant Indexes in Your Database

Secondary indexes in YugabyteDB are critical for improving database query performance by allowing faster access to rows. However, like any tool, they must be used wisely. When too many indexes exist, or when indexes are redundant, they can hurt database performance, increase storage costs, and complicate maintenance tasks.

In today’s YugabyteDB Tip, we’ll explore how to identify redundant indexes, why removing them is important, and how to decide if a secondary index could make your primary key redundant.

A redundant index is an index that does not provide any new or unique benefits over an existing index. This can occur for various reasons:

  1. Overlapping columns: If two indexes cover the same columns or similar queries, one may be unnecessary.
  2. Duplicate functionality: An index that serves the same purpose as another can be considered redundant.

For example, imagine two indexes that both cover the same columns in a table but with different definitions or even covering columns. If the functionality of one index is entirely covered by another, the first index becomes redundant.

Identifying redundant indexes typically requires comparing the column sets and usage patterns of all indexes on a table. Here are some approaches:

  1. Column Overlap: Look for indexes that cover the same columns in the same order. If one index includes all the columns of another (plus perhaps some additional ones), the second index may be redundant.

  2. Index Usage: Examine whether an index is being used frequently. You can track this using database statistics from views like pg_stat_user_indexes. If an index has not been used in a long time (idx_scan = 0), it’s a strong candidate for removal.

  3. Query Patterns: Indexes designed to optimize query patterns that are no longer relevant (or never were) can also become redundant. Reviewing query logs or running EXPLAIN plans can help identify which indexes are being used by your most common queries.

We can use an automated SQL script to help identify such indexes in your YugabyteDB environment. The SQL looks for indexes whose columns overlap with others and checks for usage statistics to highlight indexes that are not being used in queries.

** Note **

The provided script help identify potentially redundant indexes. Before dropping any of them, be sure to review and validate with EXPLAIN to ensure they’re truly unnecessary!

For ease of use and reusability, we can encapsulate the logic of the SQL script into a database view named redundant_index_vw:

				
					CREATE OR REPLACE VIEW redundant_index_vw AS
WITH index_info AS (
  SELECT
    idx.indexrelid AS index_oid,
    idx.indrelid AS table_oid,
    idx.indisunique,
    pg_get_expr(idx.indpred, idx.indrelid) IS NOT DISTINCT FROM NULL AS is_partial,
    pg_get_expr(idx.indpred, idx.indrelid) AS where_clause,
    i.relname AS index_name,
    ns.nspname AS schema_name,
    tbl.relname AS table_name,
    pg_get_indexdef(idx.indexrelid) AS indexdef,
    CASE
      WHEN con.contype = 'p' THEN 'primary'
      WHEN idx.indisunique THEN 'unique'
      ELSE 'regular'
    END AS index_type,
    ARRAY(
      SELECT a.attname
      FROM generate_subscripts(idx.indkey, 1) AS s(i)
      JOIN pg_attribute a ON a.attrelid = idx.indrelid AND a.attnum = idx.indkey[s.i]
    ) AS index_columns
  FROM pg_index idx
  JOIN pg_class i ON i.oid = idx.indexrelid
  JOIN pg_class tbl ON tbl.oid = idx.indrelid
  JOIN pg_namespace ns ON ns.oid = i.relnamespace
  LEFT JOIN pg_constraint con ON con.conindid = idx.indexrelid
  WHERE ns.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
),
redundant_indexes AS (
  SELECT
    a.schema_name,
    a.index_name AS redundant_index,
    a.table_name,
    a.index_type,
    COALESCE(psui.idx_scan, 0) AS idx_scan,
    b.index_name AS covering_index,
    a.index_oid,
    pg_get_indexdef(a.index_oid) AS indexdef,
    format('DROP INDEX IF EXISTS %I.%I;', a.schema_name, a.index_name) AS base_drop_stmt
  FROM index_info a
  JOIN index_info b
    ON a.table_oid = b.table_oid
   AND a.index_oid <> b.index_oid
   AND a.is_partial = b.is_partial
   AND a.where_clause IS NOT DISTINCT FROM b.where_clause
   AND (
     cardinality(a.index_columns) <= cardinality(b.index_columns)
     AND a.index_columns = b.index_columns[1:cardinality(a.index_columns)]
   )
   AND (
     NOT a.indisunique OR b.indisunique
   )
  LEFT JOIN pg_stat_user_indexes psui ON psui.indexrelid = a.index_oid
)
SELECT
  schema_name,
  redundant_index,
  table_name,
  index_type,
  idx_scan,
  covering_index,
  CASE
    WHEN indexdef ILIKE '%include (%' THEN base_drop_stmt || ' -- WARNING: This index has INCLUDE columns'
    ELSE base_drop_stmt
  END AS drop_statement
FROM redundant_indexes
ORDER BY idx_scan, schema_name, table_name, redundant_index;

				
			

Let’s look at a few examples using YugabyteDB version 2024.2.2.1:

				
					yugabyte=# SELECT split_part(version(), '-', 3) "YB Version";
 YB Version
------------
 2024.2.2.1
(1 row)
				
			

First, let’s examine a table named users, with several indexes defined:

				
					CREATE TABLE users (id int PRIMARY KEY, name TEXT, normalized_name TEXT, email TEXT);
CREATE UNIQUE INDEX users_name_email_uk ON users(name, email);
CREATE INDEX users_name_idx ON users(name);
CREATE UNIQUE INDEX users_name_uk ON users (name);
CREATE UNIQUE INDEX users_normalized_name_uk ON users(normalized_name);
				
			
				
					yugabyte=# \d users;
                    Table "public.users"
     Column      |  Type   | Collation | Nullable | Default
-----------------+---------+-----------+----------+---------
 id              | integer |           | not null |
 name            | text    |           |          |
 normalized_name | text    |           |          |
 email           | text    |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, lsm (id HASH)
    "users_name_email_uk" UNIQUE, lsm (name HASH, email ASC)
    "users_name_idx" lsm (name HASH)
    "users_name_uk" UNIQUE, lsm (name HASH)
    "users_normalized_name_uk" UNIQUE, lsm (normalized_name HASH)
				
			
				
					yugabyte=# EXPLAIN (ANALYZE) SELECT * FROM users WHERE name = 'lucy';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using users_name_uk on users  (cost=0.00..4.12 rows=1 width=100) (actual time=0.885..0.885 rows=0 loops=1)
   Index Cond: (name = 'lucy'::text)
 Planning Time: 0.149 ms
 Execution Time: 0.951 ms
 Peak Memory Usage: 24 kB
(5 rows)
				
			
				
					yugabyte=# SELECT * FROM redundant_index_vw WHERE table_name = 'users';
 schema_name | redundant_index | table_name | index_type | idx_scan |   covering_index    |               drop_statement

-------------+-----------------+------------+------------+----------+---------------------+---------------------------------------------
 public      | users_name_idx  | users      | regular    |        0 | users_name_uk       | DROP INDEX IF EXISTS public.users_name_idx;
 public      | users_name_idx  | users      | regular    |        0 | users_name_email_uk | DROP INDEX IF EXISTS public.users_name_idx;
 public      | users_name_uk   | users      | unique     |        1 | users_name_email_uk | DROP INDEX IF EXISTS public.users_name_uk;
(3 rows)
				
			

The output of the redundant_index_vw view shows that the indexes users_name_idx and users_name_uk are redundant, as queries can be served by the users_name_email_uk index.

Note that in the EXPLAIN (ANALYZE) output above, the optimizer selected users_name_uk over the other indexes because it was the most recently created.

We can remove the redundant indexes and verify that users_name_email_uk does, in fact, support the query.

				
					yugabyte=# SELECT DISTINCT drop_statement FROM redundant_index_vw WHERE table_name = 'users';
               drop_statement
---------------------------------------------
 DROP INDEX IF EXISTS public.users_name_idx;
 DROP INDEX IF EXISTS public.users_name_uk;
(2 rows)

yugabyte=# \gexec
DROP INDEX
DROP INDEX

yugabyte=# EXPLAIN (ANALYZE) SELECT * FROM users WHERE name = 'lucy';
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using users_name_email_uk on users  (cost=0.00..16.25 rows=100 width=100) (actual time=1.022..1.022 rows=0 loops=1)
   Index Cond: (name = 'lucy'::text)
 Planning Time: 2.227 ms
 Execution Time: 1.199 ms
 Peak Memory Usage: 24 kB
(5 rows)
				
			

As a second example, let’s look at a table named test and its indexes:

				
					CREATE TABLE test (c1 INT PRIMARY KEY, c2 INT);
CREATE INDEX test_c1 ON test(c1);
CREATE UNIQUE INDEX test_c1_c2_uk ON test(c1, c2);
CREATE UNIQUE INDEX test_c2_uk ON test(c2) INCLUDE (c1);
				
			
				
					yugabyte=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           | not null |
 c2     | integer |           |          |
Indexes:
    "test_pkey" PRIMARY KEY, lsm (c1 HASH)
    "test_c1" lsm (c1 HASH)
    "test_c1_c2_uk" UNIQUE, lsm (c1 HASH, c2 ASC)
    "test_c2_uk" UNIQUE, lsm (c2 HASH) INCLUDE (c1)
				
			
				
					yugabyte=# SELECT * FROM redundant_index_vw WHERE table_name = 'test';
 schema_name | redundant_index | table_name | index_type | idx_scan | covering_index |             drop_statement
-------------+-----------------+------------+------------+----------+----------------+----------------------------------------
 public      | test_c1         | test       | regular    |        0 | test_c1_c2_uk  | DROP INDEX IF EXISTS public.test_c1;
 public      | test_c1         | test       | regular    |        0 | test_pkey      | DROP INDEX IF EXISTS public.test_c1;
 public      | test_pkey       | test       | primary    |        0 | test_c1_c2_uk  | DROP INDEX IF EXISTS public.test_pkey;
(3 rows)
				
			

This is a peculiar case, as the redundant_index_vw view shows that the primary key is redundant! This suggests that perhaps a defined unique key could be a better candidate to serve as the table’s primary key instead.

In our example, we can confidently drop the test_c1 index and modify the primary key, replacing it with the unique key:

				
					yugabyte=# EXPLAIN SELECT * FROM test WHERE c1 = 1;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..4.11 rows=1 width=8)
   Index Cond: (c1 = 1)
(2 rows)

yugabyte=# SELECT DISTINCT drop_statement FROM redundant_index_vw WHERE table_name = 'test' AND index_type <> 'primary';
            drop_statement
--------------------------------------
 DROP INDEX IF EXISTS public.test_c1;
(1 row)

yugabyte=# \gexec
DROP INDEX

yugabyte=# DROP INDEX test_c1_c2_uk;
DROP INDEX

yugabyte=# ALTER TABLE test DROP CONSTRAINT test_pkey;
NOTICE:  table rewrite may lead to inconsistencies
DETAIL:  Concurrent DMLs may not be reflected in the new table.
HINT:  See https://github.com/yugabyte/yugabyte-db/issues/19860. Set 'ysql_suppress_unsafe_alter_notice' yb-tserver gflag to true to suppress this notice.
ALTER TABLE

yugabyte=# ALTER TABLE test ADD CONSTRAINT test_pkey PRIMARY KEY(c1, c2);
NOTICE:  table rewrite may lead to inconsistencies
DETAIL:  Concurrent DMLs may not be reflected in the new table.
HINT:  See https://github.com/yugabyte/yugabyte-db/issues/19860. Set 'ysql_suppress_unsafe_alter_notice' yb-tserver gflag to true to suppress this notice.
ALTER TABLE

yugabyte=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           | not null |
 c2     | integer |           | not null |
Indexes:
    "test_pkey" PRIMARY KEY, lsm (c1 HASH, c2 ASC)
    "test_c2_uk" UNIQUE, lsm (c2 HASH) INCLUDE (c1)
    
yugabyte=# EXPLAIN SELECT * FROM test WHERE c1 = 1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..15.25 rows=100 width=8)
   Index Cond: (c1 = 1)
(2 rows)
				
			

Note: Be cautious when modifying a table’s primary key if the table already contains data, as discussed in the YugabyteDB tip Pitfalls of Adding a PK to an Existing Table in YSQL.

Have Fun!

Our daughter’s dog Maple relaxing under a cherry tree—shouldn’t she be partial to maple trees instead?