In a distributed SQL database like YugabyteDB, the way data and indexes are partitioned across nodes has a major impact on performance. One subtle but critical pitfall is creating indexes on timestamp columns, which end up being hashed across nodes.
While this may seem harmless—especially to users familiar with PostgreSQL—it can be a serious performance anti-pattern in YugabyteDB.
What’s Going On?
In YugabyteDB, indexes are automatically hash-sharded across the cluster unless explicitly defined otherwise. This means that when you create an index on a column like created_at
, YugabyteDB evenly distributes index entries across all nodes based on a hash of the timestamp value.
Now imagine running a query like:
SELECT *
FROM events
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY created_at;
Although the filter is range-based, the hash-sharded index doesn’t help much. The query has to scan index entries across all nodes, gather the results, and perform a global sort, which leads to:
High latency
Inefficient use of indexes
Poor ordering performance
Why PostgreSQL Users Get Caught Off Guard
PostgreSQL users may assume that indexing a timestamp
column is always beneficial. But PostgreSQL does not shard index data across multiple nodes — it’s a single-node system. In contrast, YugabyteDB’s distributed architecture changes the game.
Even though the index is built the same way syntactically, YugabyteDB’s default hash-based data and index sharding means that indexing timestamp columns without considering access patterns can cripple performance.
Best Practice: Use Range-Sharded Primary Keys or Indexes for Timestamps
f your access patterns rely on time-based filtering or ordering, consider:
Designing your table so that timestamp is part of the primary key in a way that preserves ordering
Using co-location to keep time-related data together
Avoiding secondary indexes on timestamps unless you’re confident they’ll be efficient for your access patterns
Identify and Fix Hash-Indexed Timestamps
It’s important to audit your indexes to find those built on timestamp
or timestamptz
columns. These can usually be detected by inspecting index definitions and verifying how they’re distributed.
By identifying and redesigning these indexes, you can dramatically improve performance for range queries and reduce unnecessary overhead in your cluster.
The following query helps identify indexes on timestamp
or timestamptz
columns that are hash-sharded across nodes. Based on your workload, consider replacing them with range-partitioned alternatives to improve performance. I’ve encapsulated the SQL in a view for easier reuse.
CREATE OR REPLACE VIEW hash_based_timestamp_index_vw AS
SELECT n.nspname AS schema_name,
t.relname AS table_name,
i.relname AS index_name,
a.attname AS column_name,
pos.n AS index_ordinal_position,
typ.typname AS column_type,
pg_get_indexdef((n.nspname || '.' || i.relname)::regclass) AS index_def
FROM pg_class t
JOIN pg_namespace n ON t.relnamespace = n.oid
JOIN pg_index ix ON t.oid = ix.indrelid
JOIN pg_class i ON i.oid = ix.indexrelid
JOIN generate_subscripts(ix.indkey, 1) AS pos(n)
ON true
JOIN pg_attribute a ON a.attrelid = t.oid
AND a.attnum = ix.indkey[pos.n]
JOIN pg_type typ ON a.atttypid = typ.oid
WHERE t.relkind = 'r' -- only ordinary tables
AND a.attnum > 0
AND typ.typname IN ('timestamp', 'timestamptz')
AND pg_get_indexdef(i.oid) ILIKE '%' || a.attname || ' HASH%'
ORDER BY schema_name, table_name, index_name, index_ordinal_position;
Example:
yugabyte=# CREATE TABLE test(id INT PRIMARY KEY, ts TIMESTAMP);
CREATE TABLE
yugabyte=# CREATE INDEX test_idx1 ON test(ts);
CREATE INDEX
yugabyte=# CREATE INDEX test_idx2 ON test(ts ASC);
CREATE INDEX
yugabyte=# CREATE INDEX test_idx3 ON test(ts, id);
CREATE INDEX
yugabyte=# CREATE INDEX test_idx4 ON test((ts, id));
CREATE INDEX
yugabyte=# SELECT * FROM hash_based_timestamp_index_vw;
schema_name | table_name | index_name | column_name | index_ordinal_position | column_type | index_def
-------------+------------+------------+-------------+------------------------+-------------+-------------------------------------------------------------------
public | test | test_idx1 | ts | 0 | timestamp | CREATE INDEX test_idx1 ON public.test USING lsm (ts HASH)
public | test | test_idx3 | ts | 0 | timestamp | CREATE INDEX test_idx3 ON public.test USING lsm (ts HASH, id ASC)
(2 rows)
Have Fun!