Prevent Hot Shards with Bucket-Based Indexes

πŸš€ Introduction

In distributed databases, not all indexes are created equal.

Some indexes look perfectly fine… until your system scales.

A classic example is a timestamp-based index:

				
					CREATE INDEX idx_rides_time_desc
ON ride_events(event_time DESC);
				
			

This works great for queries like:

				
					SELECT *
FROM ride_events
WHERE event_time >= TIMESTAMP '2026-02-01 00:00:00'
  AND event_time <  TIMESTAMP '2026-02-08 00:00:00'
ORDER BY event_time DESC;
				
			

Fast reads. Clean plans. No obvious issues.

πŸ‘‰ But under heavy write load, this index can quietly become your biggest bottleneck.

πŸ”₯ The Real Problem: Hot Shards

Imagine a ride-sharing system inserting 10,000 rides per second.

Every new row has a recent event_time.

With this index:

				
					(event_time DESC)
				
			

all those new inserts target the same logical key range…Β the newest timestamps.

In YugabyteDB, that means:

  • ● One tablet becomes the active write target
  • ● One node absorbs most of the write traffic
  • ● CPU, disk, and compaction pressure pile up
  • ● Other nodes sit mostly idle

πŸ‘‰ This is the classic hot shard problem.

πŸ”₯ Why This Happens

πŸ”₯ Problem
  • ● event_time is monotonic (always increasing)
  • ● New inserts always target the latest index range
  • ● That range lives on one tablet at a time
πŸ‘‰ The bottleneck is not the query… it’s write concentration.

🎯 The Solution: Bucket-Based Indexes

Bucket-based indexes fix this by distributing writes across multiple logical streams.

Instead of indexing directly on event_time, we prepend a bucket:

				
					CREATE INDEX idx_rides_bucketed
ON ride_events (
    (yb_hash_code(event_time) % 3) ASC,
    event_time DESC
)
INCLUDE (ride_id, city, driver_id, surge_multiplier)
SPLIT AT VALUES ((1), (2));
				
			
βš™οΈ How It Works
  • ● (yb_hash_code(event_time) % 3) creates 3 buckets
  • ● Each bucket becomes its own ordered stream
  • ● Writes are spread across multiple tablets
πŸ‘‰ No single node becomes a write hotspot

πŸ§ͺ Demo Setup

				
					CREATE TABLE ride_events (
    ride_id BIGSERIAL PRIMARY KEY,
    city TEXT,
    driver_id INT,
    surge_multiplier NUMERIC,
    event_time TIMESTAMP NOT NULL
);

CREATE INDEX idx_rides_bucketed
ON ride_events (
    (yb_hash_code(event_time) % 3) ASC,
    event_time DESC
)
INCLUDE (ride_id, city, driver_id, surge_multiplier)
SPLIT AT VALUES ((1), (2));
				
			

Load a large dataset across a wide time range:

				
					INSERT INTO ride_events (city, driver_id, surge_multiplier, event_time)
SELECT
    (ARRAY['NYC','SF','CHI','MIA'])[floor(random()*4)+1],
    (random()*10000)::int,
    round((random()*3 + 1)::numeric, 2),
    TIMESTAMP '2026-01-01 00:00:00' + (random() * interval '90 days')
FROM generate_series(1, 500000);
				
			

🧩 Enable Range Scan Optimization

				
					SET yb_max_saop_merge_streams = 64;
SET yb_enable_derived_saops = true;
SET yb_enable_derived_equalities = true;
				
			
πŸ’‘ Important
These settings allow YugabyteDB to efficiently merge results across buckets for queries like:
ORDER BY event_time DESC LIMIT N
πŸ‘‰ This is what makes bucket-based indexes work seamlessly for range scans.

πŸ” Query Behavior

				
					EXPLAIN (ANALYZE)
SELECT ride_id, city, driver_id, surge_multiplier, event_time
FROM ride_events
WHERE event_time >= TIMESTAMP '2026-02-01 00:00:00'
  AND event_time <  TIMESTAMP '2026-02-08 00:00:00'
ORDER BY event_time DESC
LIMIT 10;
				
			

What you’ll see:

				
					 Limit  (cost=60.24..61.04 rows=10 width=29) (actual time=0.604..0.617 rows=10 loops=1)
   ->  Index Only Scan using idx_rides_bucketed_range on ride_events  (cost=60.24..310613.68 rows=3879373 width=29) (actual time=0.602..0.613 rows=10 loops=1)
         Index Cond: ((event_time >= '2026-02-01 00:00:00'::timestamp without time zone) AND (event_time < '2026-02-08 00:00:00'::timestamp without time zone) AND (((yb_hash_code(event_time) % 3)) = ANY ('{0,1,2}'::integer[])))
         Merge Sort Key: event_time DESC
         Merge Stream Key: (yb_hash_code(event_time) % 3)
         Merge Streams: 3
         Heap Fetches: 0
 Planning Time: 0.111 ms
 Execution Time: 0.677 ms
 Peak Memory Usage: 64 kB
				
			
πŸš€ Why This Is So Good
  • ● YugabyteDB scans 3 bucket streams
  • ● Merges them in event_time DESC order
  • ● Returns only the first 10 rows
  • ● Needs no heap fetches
πŸ‘‰ This preserves fast ordered reads while avoiding the write hotspot of a plain monotonic index.

🎯 Final Takeaway

🧠 The Key Insight
Bucket-based indexes are designed to fix a write scalability problem, not just improve query speed.
  • ● Plain timestamp indexes concentrate writes into a single hot shard
  • ● Bucket-based indexes distribute those writes across multiple tablets
  • ● YugabyteDB then merges ordered bucket streams to preserve query behavior
πŸ‘‰ The goal is to scale writes without breaking ordered reads.

⚑ Distribute the writes. Preserve the order. Scale the system. ⚑

Have Fun!