Identify Hash Partitioned Indexes on TIMESTAMP Columns

In a distributed SQL database like YugabyteDB, the way data and indexes are partitioned across nodes has a major impact on performance. One subtle but critical pitfall is creating indexes on timestamp columns, which end up being hashed across nodes.

While this may seem harmless—especially to users familiar with PostgreSQL—it can be a serious performance anti-pattern in YugabyteDB.

What’s Going On?

In YugabyteDB, indexes are automatically hash-sharded across the cluster unless explicitly defined otherwise. This means that when you create an index on a column like created_at, YugabyteDB evenly distributes index entries across all nodes based on a hash of the timestamp value.

Now imagine running a query like:

				
					SELECT *
  FROM events
 WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
 ORDER BY created_at;
				
			

Although the filter is range-based, the hash-sharded index doesn’t help much. The query has to scan index entries across all nodes, gather the results, and perform a global sort, which leads to:

  1. High latency

  2. Inefficient use of indexes

  3. Poor ordering performance

Why PostgreSQL Users Get Caught Off Guard

PostgreSQL users may assume that indexing a timestamp column is always beneficial. But PostgreSQL does not shard index data across multiple nodes — it’s a single-node system. In contrast, YugabyteDB’s distributed architecture changes the game.

Even though the index is built the same way syntactically, YugabyteDB’s default hash-based data and index sharding means that indexing timestamp columns without considering access patterns can cripple performance.

Best Practice: Use Range-Sharded Primary Keys or Indexes for Timestamps

f your access patterns rely on time-based filtering or ordering, consider:

  1. Designing your table so that timestamp is part of the primary key in a way that preserves ordering

  2. Using co-location to keep time-related data together

  3. Avoiding secondary indexes on timestamps unless you’re confident they’ll be efficient for your access patterns

Identify and Fix Hash-Indexed Timestamps

It’s important to audit your indexes to find those built on timestamp or timestamptz columns. These can usually be detected by inspecting index definitions and verifying how they’re distributed.

By identifying and redesigning these indexes, you can dramatically improve performance for range queries and reduce unnecessary overhead in your cluster.

The following query helps identify indexes on timestamp or timestamptz columns that are hash-sharded across nodes. Based on your workload, consider replacing them with range-partitioned alternatives to improve performance. I’ve encapsulated the SQL in a view for easier reuse.

				
					CREATE OR REPLACE VIEW hash_based_timestamp_index_vw AS
SELECT n.nspname AS schema_name,
       t.relname AS table_name,
       i.relname AS index_name,
       a.attname AS column_name,
       pos.n AS index_ordinal_position,
       typ.typname AS column_type,
       pg_get_indexdef((n.nspname || '.' || i.relname)::regclass) AS index_def
  FROM pg_class t
  JOIN pg_namespace n ON t.relnamespace = n.oid
  JOIN pg_index ix ON t.oid = ix.indrelid
  JOIN pg_class i ON i.oid = ix.indexrelid
  JOIN generate_subscripts(ix.indkey, 1) AS pos(n)
    ON true
  JOIN pg_attribute a ON a.attrelid = t.oid
   AND a.attnum = ix.indkey[pos.n]
  JOIN pg_type typ ON a.atttypid = typ.oid
 WHERE t.relkind = 'r'  -- only ordinary tables
   AND a.attnum > 0
   AND typ.typname IN ('timestamp', 'timestamptz')
   AND pg_get_indexdef(i.oid) ILIKE '%' || a.attname || ' HASH%'
 ORDER BY schema_name, table_name, index_name, index_ordinal_position;
				
			

Example:

				
					yugabyte=# CREATE TABLE test(id INT PRIMARY KEY, ts TIMESTAMP);
CREATE TABLE

yugabyte=# CREATE INDEX test_idx1 ON test(ts);
CREATE INDEX

yugabyte=# CREATE INDEX test_idx2 ON test(ts ASC);
CREATE INDEX

yugabyte=# CREATE INDEX test_idx3 ON test(ts, id);
CREATE INDEX

yugabyte=# CREATE INDEX test_idx4 ON test((ts, id));
CREATE INDEX

yugabyte=# SELECT * FROM hash_based_timestamp_index_vw;
 schema_name | table_name | index_name | column_name | index_ordinal_position | column_type |                             index_def
-------------+------------+------------+-------------+------------------------+-------------+-------------------------------------------------------------------
 public      | test       | test_idx1  | ts          |                      0 | timestamp   | CREATE INDEX test_idx1 ON public.test USING lsm (ts HASH)
 public      | test       | test_idx3  | ts          |                      0 | timestamp   | CREATE INDEX test_idx3 ON public.test USING lsm (ts HASH, id ASC)
(2 rows)
				
			

Have Fun!

While hiking in Shenandoah National Park, we saw a sign that read: “Warning: Climbing on, around, or over waterfalls is dangerous.” Apparently, at some point, someone needed that clarified. I can only assume it went poorly.