Prevent Hot Shards with Bucket-Based Indexes

🚀 Introduction

In distributed databases, not all indexes are created equal.

Some indexes look perfectly fine… until your system scales.

A classic example is a timestamp-based index:

				
					CREATE INDEX idx_rides_time_desc
ON ride_events(event_time DESC);

This works great for queries like:

				
					SELECT *
FROM ride_events
WHERE event_time >= TIMESTAMP '2026-02-01 00:00:00'
  AND event_time <  TIMESTAMP '2026-02-08 00:00:00'
ORDER BY event_time DESC;

Fast reads. Clean plans. No obvious issues.

👉 But under heavy write load, this index can quietly become your biggest bottleneck.

🔥 The Real Problem: Hot Shards

Imagine a ride-sharing system inserting 10,000 rides per second.

Every new row has a recent event_time.

With this index:

				
					(event_time DESC)

all those new inserts target the same logical key range… the newest timestamps.

In YugabyteDB, that means:

● One tablet becomes the active write target
● One node absorbs most of the write traffic
● CPU, disk, and compaction pressure pile up
● Other nodes sit mostly idle

👉 This is the classic hot shard problem.

🔥 Why This Happens

🔥 Problem

● event_time is monotonic (always increasing)
● New inserts always target the latest index range
● That range lives on one tablet at a time

👉 The bottleneck is not the query… it’s write concentration.

🎯 The Solution: Bucket-Based Indexes

Bucket-based indexes fix this by distributing writes across multiple logical streams.

Instead of indexing directly on event_time, we prepend a bucket:

				
					CREATE INDEX idx_rides_bucketed
ON ride_events (
    (yb_hash_code(event_time) % 3) ASC,
    event_time DESC
)
INCLUDE (ride_id, city, driver_id, surge_multiplier)
SPLIT AT VALUES ((1), (2));

⚙️ How It Works

● (yb_hash_code(event_time) % 3) creates 3 buckets
● Each bucket becomes its own ordered stream
● Writes are spread across multiple tablets

👉 No single node becomes a write hotspot

🧪 Demo Setup

				
					CREATE TABLE ride_events (
    ride_id BIGSERIAL PRIMARY KEY,
    city TEXT,
    driver_id INT,
    surge_multiplier NUMERIC,
    event_time TIMESTAMP NOT NULL
);

CREATE INDEX idx_rides_bucketed
ON ride_events (
    (yb_hash_code(event_time) % 3) ASC,
    event_time DESC
)
INCLUDE (ride_id, city, driver_id, surge_multiplier)
SPLIT AT VALUES ((1), (2));

Load a large dataset across a wide time range:

				
					INSERT INTO ride_events (city, driver_id, surge_multiplier, event_time)
SELECT
    (ARRAY['NYC','SF','CHI','MIA'])[floor(random()*4)+1],
    (random()*10000)::int,
    round((random()*3 + 1)::numeric, 2),
    TIMESTAMP '2026-01-01 00:00:00' + (random() * interval '90 days')
FROM generate_series(1, 500000);

🧩 Enable Range Scan Optimization

				
					SET yb_max_saop_merge_streams = 64;
SET yb_enable_derived_saops = true;
SET yb_enable_derived_equalities = true;

💡 Important

These settings allow YugabyteDB to efficiently merge results across buckets for queries like:

    ORDER BY event_time DESC LIMIT N
  

👉 This is what makes bucket-based indexes work seamlessly for range scans.

🔍 Query Behavior

				
					EXPLAIN (ANALYZE)
SELECT ride_id, city, driver_id, surge_multiplier, event_time
FROM ride_events
WHERE event_time >= TIMESTAMP '2026-02-01 00:00:00'
  AND event_time <  TIMESTAMP '2026-02-08 00:00:00'
ORDER BY event_time DESC
LIMIT 10;

What you’ll see:

				
					 Limit  (cost=60.24..61.04 rows=10 width=29) (actual time=0.604..0.617 rows=10 loops=1)
   ->  Index Only Scan using idx_rides_bucketed_range on ride_events  (cost=60.24..310613.68 rows=3879373 width=29) (actual time=0.602..0.613 rows=10 loops=1)
         Index Cond: ((event_time >= '2026-02-01 00:00:00'::timestamp without time zone) AND (event_time < '2026-02-08 00:00:00'::timestamp without time zone) AND (((yb_hash_code(event_time) % 3)) = ANY ('{0,1,2}'::integer[])))
         Merge Sort Key: event_time DESC
         Merge Stream Key: (yb_hash_code(event_time) % 3)
         Merge Streams: 3
         Heap Fetches: 0
 Planning Time: 0.111 ms
 Execution Time: 0.677 ms
 Peak Memory Usage: 64 kB

🚀 Why This Is So Good

● YugabyteDB scans 3 bucket streams
● Merges them in event_time DESC order
● Returns only the first 10 rows
● Needs no heap fetches

👉 This preserves fast ordered reads while avoiding the write hotspot of a plain monotonic index.

🎯 Final Takeaway

🧠 The Key Insight

Bucket-based indexes are designed to fix a write scalability problem, not just improve query speed.

● Plain timestamp indexes concentrate writes into a single hot shard
● Bucket-based indexes distribute those writes across multiple tablets
● YugabyteDB then merges ordered bucket streams to preserve query behavior

👉 The goal is to scale writes without breaking ordered reads.

⚡ Distribute the writes. Preserve the order. Scale the system. ⚡

Have Fun!