How IN() predicates create multiple index ranges

In the YugabyteDB Tip How to Eliminate “Rows Removed by Filter” in YugabyteDB, we improved our index so that filtering happens directly inside the index scan. That eliminated wasted filtering work and reduced unnecessary table reads.

However, the query plan still showed something interesting:

				
					Index Scan using activity_log_tenant_status_category_idx
  Storage Table Rows Scanned: 59689

Even though the index now applies all filtering predicates, YugabyteDB still reads 59,689 rows before returning the first 100 rows.

Why?

The answer lies in how the database must evaluate multiple predicate combinations created by the query.

⚡ Key Idea

When a query uses IN() predicates on indexed columns, YugabyteDB internally evaluates multiple combinations of values. Each combination corresponds to a distinct range in the index.

🧭 The Index Tuning Journey

This series shows how index tuning usually happens in stages. Each tip removes one layer of wasted work so YugabyteDB can answer the query more efficiently.

Tip	Focus	What Improves
1	Choose the right column order in an index	The index matches the query shape better
2	Why your index scan still reads thousands of rows	Shows why index usage alone is not enough
3	How to eliminate “Rows Removed by Filter”	More filtering happens in the index before table reads
4	How IN() predicates create multiple index ranges	Multiple predicate combinations create distinct index ranges
5	How ORDER BY LIMIT Can Change the Best Index in YugabyteDB	Sort work can be reduced or eliminated
6	How covering indexes eliminate table reads	The query may avoid table access entirely
7	Designing the best index for a query	All tuning ideas come together into one practical design

Recap of Our Current Index

From Tip #3, we created the index:

				
					CREATE INDEX activity_log_tenant_status_category_idx
ON app.activity_log (tenant_id, status, category);

And our query still looks like this:

				
					EXPLAIN (ANALYZE, DIST)
SELECT *
FROM app.activity_log
WHERE tenant_id =
'11111111-1111-1111-1111-111111111111'
AND category IN ('LOGIN','EXPORT','SYNC')
AND status IN ('PENDING','RUNNING')
ORDER BY created_at
LIMIT 100;

What the Plan Shows

The plan now uses the index effectively:

				
					Index Scan using activity_log_tenant_status_category_idx
Index Cond:
  (tenant_id = '11111111-1111-1111-1111-111111111111')
  AND (status = ANY ('{PENDING,RUNNING}'))
  AND (category = ANY ('{LOGIN,EXPORT,SYNC}'))

Storage Table Rows Scanned: 59689
Storage Index Rows Scanned: 59689

Notice that all predicates now appear in the Index Cond.

This means filtering happens during the index scan, not after reading rows from the table.

However, the planner must still evaluate multiple combinations of values.

Why Multiple Index Ranges Exist

Because the query includes two IN() predicates:

				
					status IN ('PENDING','RUNNING')
category IN ('LOGIN','EXPORT','SYNC')

the planner must consider every combination of those values.

For this query, there are six matching combinations:

				
					status = PENDING + category = LOGIN
status = PENDING + category = EXPORT
status = PENDING + category = SYNC
status = RUNNING + category = LOGIN
status = RUNNING + category = EXPORT
status = RUNNING + category = SYNC

Each of these combinations corresponds to a distinct range inside the index.

Visualization of the Index Ranges

Here is the idea visually.

How Multiple Predicate Values Create Multiple Index Ranges

Index order: (tenant_id, status, category)

Matching ranges evaluated by the planner

X, PENDING, LOGIN

X, PENDING, EXPORT

X, PENDING, SYNC

X, RUNNING, LOGIN

X, RUNNING, EXPORT

X, RUNNING, SYNC

Each of these ranges must be scanned independently.

🧠 Why This Matters

Even though filtering now happens inside the index scan, YugabyteDB still needs to read all rows that match these six combinations. That is why the query still scans 59,689 rows.

When Does a Skip Scan Happen?

The behavior we just described, evaluating multiple combinations created by IN() predicates, can sometimes resemble something called a skip scan.

A skip scan occurs when the leading column of an index is not constrained by the query.

For example, imagine an index like this:

				
					(status, category, created_at)

And a query like:

				
					SELECT *
FROM app.activity_log
WHERE category = 'LOGIN';

Because the query does not specify status, YugabyteDB cannot jump directly to a single range of the index.

Instead, it may internally evaluate multiple ranges like this:

				
					status = PENDING   + category = LOGIN
status = RUNNING   + category = LOGIN
status = FAILED    + category = LOGIN
status = COMPLETE  + category = LOGIN

Each of these becomes a separate range in the index.

This technique, jumping between ranges of the leading column, is known as a skip scan.

🧠 Skip Scan vs Multiple Predicate Ranges

Both behaviors involve scanning multiple ranges of an index. The difference is:

● Skip scan: occurs when the leading index column is not constrained
● Multiple predicate ranges: occur when queries contain IN() predicates on indexed columns

In our example, the leading column tenant_id is constrained, so YugabyteDB can directly seek to the correct tenant range without performing a skip scan.

Why the Query Still Reads Thousands of Rows

Even though skip scans can improve how the index is traversed, the query still shows:

				
					Sort Key: created_at
Sort Method: top-N heapsort

Because created_at is not part of the index ordering, YugabyteDB must:

1. read all qualifying rows
2. sort them
3. return the first 100

So although filtering improved, the query still reads 59,689 rows before sorting.

📌 Takeaway

IN() predicates create multiple value combinations that correspond to separate index ranges. Even when filtering happens inside the index, YugabyteDB still needs to scan every row that matches those combinations.

Conclusion

In Tip #3, we eliminated wasted filtering by moving more predicates into the index.

In Tip #4, we saw how YugabyteDB can scan multiple targeted index ranges using skip scans instead of reading one large range.

However, even with skip scans, the query still reads tens of thousands of rows before sorting.

The next step is to fix the remaining bottleneck.

Next Tip

In the next tip, we’ll see how the right index ordering can eliminate the expensive sort step and allow YugabyteDB to stop scanning much earlier for queries like:

				
					ORDER BY created_at
LIMIT 100

That’s where the largest performance improvement usually happens.

Have Fun!