Designing the Best Index for a Query in YugabyteDB

Across this indexing series, we improved the same query step by step.

We started with a simple index on tenant_id.

Then we:

● reduced wasted filtering
● explained how IN() predicates create multiple index ranges
● aligned the index with ORDER BY created_at
● and finally built a fully covering index that produced an Index Only Scan

By the end of Tip #6, the query had improved dramatically.

What began as a plan that scanned nearly 60,000 rows and took about 524 ms had become a plan that scanned only 306 index rows and ran in about 1.4 ms.

So what is the best overall index for this query?

⚡ Final Goal

The best index is not the one that optimizes just one part of the query. It is the one that balances filtering, ordering, early stop, and covering behavior for the actual workload.

🧭 The Index Tuning Journey

This series shows how index tuning usually happens in stages. Each tip removes one layer of wasted work so YugabyteDB can answer the query more efficiently.

Tip	Focus	What Improves
1	Choose the right column order in an index	The index matches the query shape better
2	Why your index scan still reads thousands of rows	Shows why index usage alone is not enough
3	How to eliminate “Rows Removed by Filter”	More filtering happens in the index before table reads
4	How IN() predicates create multiple index ranges	Multiple predicate combinations create distinct index ranges
5	How ORDER BY LIMIT Can Change the Best Index in YugabyteDB	Sort work can be reduced or eliminated
6	How covering indexes eliminate table reads	The query may avoid table access entirely
7	Designing the best index for a query	All tuning ideas come together into one practical design

The Query We Tuned Throughout the Series

This is the same query we have used in every tip:

				
					SELECT *
FROM app.activity_log
WHERE tenant_id =
'11111111-1111-1111-1111-111111111111'
AND category IN ('LOGIN','EXPORT','SYNC')
AND status IN ('PENDING','RUNNING')
ORDER BY created_at
LIMIT 100;

The Final Index

				
					CREATE INDEX activity_log_full_cover_idx
ON app.activity_log (tenant_id, created_at)
INCLUDE (status, category, id, updated_at, user_id, source_id);

This index gave us:

● the correct leading column for tenant access
● the correct ordering for ORDER BY created_at
● early stop with LIMIT 100
● a fully covering index
● an Index Only Scan

Why This Index Won

At first glance, it may seem surprising that the best index was not:

				
					(tenant_id, status, category)

or even:

				
					(tenant_id, status, category, created_at)

But those indexes optimize a different part of the problem.

Those indexes are good at filtering

They push status and category into the index condition.

That reduces filter waste.

But for this query, they still force YugabyteDB to:

● scan many qualifying rows
● sort them by created_at
● and only then return the first 100

That is why they were still much slower.

The winning index is good at ordering

The final index is ordered by:

				
					tenant_id, created_at

That means rows for one tenant are already in the exact order the query wants.

So YugabyteDB can:

1. seek to the tenant
2. read rows in created_at order
3. apply the status and category filter from the index entry
4. stop once 100 matches are found

That is the key advantage.

📌 Final Takeaway

For this query, the best index was the one that preserved the required row order so YugabyteDB could stop early, while also covering all selected columns.

Performance Across the Series

Here is the full progression.

Step	Index Design	Main Benefit	Result
Tip 2	`(tenant_id)`	Basic tenant lookup	Still scanned ~59,689 rows
Tip 3	`(tenant_id, status, category)`	Moves filtering into index scan	Removed filter waste, but still sorted
Tip 5 attempt	`(tenant_id, status, category, created_at)`	Looks like it should help ORDER BY	Did not eliminate the sort
Tip 5 winner	`(tenant_id, created_at) INCLUDE (status, category)`	Preserves order and stops early	Dropped runtime to ~3.2 ms
Tip 6 final	`(tenant_id, created_at) INCLUDE (status, category, id, updated_at, user_id, source_id)`	Index Only Scan, no table reads	Dropped runtime to ~1.4 ms

The Most Important Design Lessons

This series highlights several important truths about index design.

1. The best filtering index is not always the best overall index

An index like:

				
					(tenant_id, status, category)

looks attractive because it matches the predicates closely.

But if the query also needs:

				
					ORDER BY created_at LIMIT 100

then filtering is only part of the problem.

2. For ORDER BY LIMIT queries, preserving order can matter more than pushing every filter into the key

The winning index let YugabyteDB read rows in the required order and stop early.

That mattered more than making every predicate part of the index key.

3. INCLUDE columns are powerful

INCLUDE lets you keep the index ordered for access and sorting, while still storing extra columns needed for filtering or output.

4. Covering indexes can remove table reads completely

Once every required column was stored in the index, the query became an Index Only Scan with:

				
					Heap Fetches: 0

That was the final optimization.

⚠️ Best Indexes Are Workload-Specific

The best index for one query is not always the best index for the entire application. Always balance read performance against write cost, storage overhead, and how many important queries the index can support.

So What Is the Best Index for This Query?

For this specific query pattern, the best overall index is:

				
					CREATE INDEX activity_log_full_cover_idx
ON app.activity_log (tenant_id, created_at)
INCLUDE (status, category, id, updated_at, user_id, source_id);

Why?

Because it balances all four goals:

● filtering via included columns
● ordering via created_at
● early stop via LIMIT 100
● covering via included output columns

That makes it the best overall design we found in this series.

Conclusion

The main lesson of this series is simple:

● A good index does not just help YugabyteDB find rows.
● A great index helps YugabyteDB do less total work.

For this query, that meant:

● starting with the right leading column
● reducing wasted filtering
● understanding multi-value predicates
● aligning the index with ORDER BY
● allowing early stop with LIMIT
● and finally eliminating table reads entirely

That is how we moved from ~524 ms to ~1.4 ms.

🚀 Performance Progress Across the Series

Initial Plan

~524 ms

After Filtering Improvements

~394–478 ms

After ORDER BY Optimization

~3.2 ms

After Full Covering Index

~1.4 ms

Each tip removed one layer of wasted work so YugabyteDB could answer the query more efficiently.

Have Fun!