Bucket-Based Indexes Still Support Fast Point Lookups in YugabyteDB

Intro

In a previous tip, Prevent Hot Shards with Bucket-Based Indexes, we saw how bucket-based indexes can prevent hot shards by spreading writes across multiple tablets… especially for monotonically increasing values like timestamps.

But once people see a bucketed index definition, the next question is almost always:

👉 What happens to point lookups?

If the index key starts with a computed bucket, do we now need to know that bucket ahead of time?

Thankfully, no.

For exact-key lookups, YugabyteDB automatically computes the bucket from the search values and injects it into the index condition. That means queries still go directly to the correct tablet instead of scanning every bucket.

⚡ You get write scalability and efficient point lookups. ⚡

Demo Setup

Before we start, we’ll enable one session setting that allows YugabyteDB to derive additional equality predicates for computed columns (like our bucket column):

				
					SET yb_enable_derived_equalities = true;

👉 This is what allows the planner to automatically compute the bucket value and inject it into the index condition for point lookups.

If you extend this pattern to IN (...) queries or bucket-based merge across many buckets, you may also want:

				
					SET yb_enable_derived_saops = true;
SET yb_max_saop_merge_streams = 64;

Now let’s build a table with a bucket-based primary key and load a larger dataset so the optimizer has real choices.

				
					CREATE TABLE orders (
  customer_id int,
  order_id int,
  status_code int,
  warehouse_id int,
  bucket_id int GENERATED ALWAYS AS (yb_hash_code(customer_id, order_id) % 8) STORED,
  PRIMARY KEY (bucket_id ASC, customer_id, order_id)
)
SPLIT AT VALUES ((1), (2), (3), (4), (5), (6), (7));

Load ~100K rows:

				
					INSERT INTO orders (customer_id, order_id, status_code, warehouse_id)
SELECT
  (g % 1000) + 1,          -- customer_id
   g,                      -- order_id
  (g % 20) + 1,            -- status_code
  (g % 200) + 1            -- warehouse_id
FROM generate_series(1, 100000) g;

Now we have:

● evenly distributed data across buckets
● realistic cardinality
● meaningful query planning decisions

Now let’s test a point lookup.

EXPLAIN Output (Point Lookup on Primary Key)

Here’s a query that looks up a single row:

				
					EXPLAIN (ANALYZE, DIST)
SELECT *
FROM orders
WHERE customer_id = 4
  AND order_id = 22003;

Example plan:

				
					 Index Scan using orders_pkey on orders  (cost=20.02..21.30 rows=1 width=20) (actual time=0.375..0.379 rows=1 loops=1)
   Index Cond: ((bucket_id = (yb_hash_code(4, 22003) % 8)) AND (customer_id = 4) AND (order_id = 22003))
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 0.247 ms
   Storage Table Read Ops: 1
   Storage Table Rows Scanned: 1
 Planning Time: 0.117 ms
 Execution Time: 0.418 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 0.247 ms
 Storage Read Ops: 1
 Storage Rows Scanned: 1
 Catalog Read Requests: 0
 Catalog Read Ops: 0
 Catalog Write Requests: 0
 Storage Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 0.247 ms
 Peak Memory Usage: 64 kB

✅ Key Observation:

YugabyteDB automatically computes the correct bucket using yb_hash_code(...) and injects it into the index condition.

This means the query is routed directly to the correct tablet… no need to scan all buckets.

Secondary Index on Real Query Columns

This behavior is not limited to the primary key.

It also works for bucket-based secondary indexes when your query provides equality predicates on the hashed columns.

Let’s build an index for a realistic access pattern:

				
					CREATE INDEX orders_wh_status_bucket_idx
ON orders (
  (yb_hash_code(warehouse_id, status_code) % 8) ASC,
  warehouse_id ASC,
  status_code ASC
)
SPLIT AT VALUES ((1), (2), (3), (4), (5), (6), (7));

Now run a lookup:

				
					EXPLAIN (ANALYZE, DIST)
SELECT *
FROM orders
WHERE warehouse_id = 1
  AND status_code = 1;

Plan:

				
					 Index Scan using orders_wh_status_bucket_idx on orders  (cost=40.03..68.67 rows=25 width=20) (actual time=2.091..2.246 rows=500 loops=1)
   Index Cond: (((yb_hash_code(warehouse_id, status_code) % 8) = (yb_hash_code(1, 1) % 8)) AND (warehouse_id = 1) AND (status_code = 1))
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 1.236 ms
   Storage Table Read Ops: 8
   Storage Table Rows Scanned: 500
   Storage Index Read Requests: 1
   Storage Index Read Execution Time: 0.516 ms
   Storage Index Read Ops: 1
   Storage Index Rows Scanned: 500
 Planning Time: 0.124 ms
 Execution Time: 2.342 ms
 Storage Read Requests: 2
 Storage Read Execution Time: 1.752 ms
 Storage Read Ops: 9
 Storage Rows Scanned: 1000
 Catalog Read Requests: 0
 Catalog Read Ops: 0
 Catalog Write Requests: 0
 Storage Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 1.752 ms
 Peak Memory Usage: 64 kB

What to notice

● The query does not include the bucket
● YugabyteDB computes it automatically
● The lookup is still targeted to a single bucket/tablet

👉 This makes bucket-based indexing practical for real-world query patterns.

✅ Key Observation:

YugabyteDB automatically computes the correct bucket using yb_hash_code(...) and injects it into the index condition.

This means the query is routed directly to the correct tablet… no need to scan all buckets.

🎯 Final Takeaway

Bucket-based indexes are often introduced to solve a write-scaling problem, but they do not compromise read efficiency for point lookups.

- ✅ Writes are evenly distributed (no hot shards)
- ✅ Range scans still work across buckets
- ✅ Point lookups remain fast and targeted
- ✅ No application changes required

⚡ You don’t need to know the bucket… YugabyteDB figures it out for you. ⚡

Have Fun!