In the previous tip Why Your Index Scan Still Reads Thousands of Rows, the second in our index tuning series, we saw an important lesson about index scans:
- An index being used does not mean the scan is efficient.
tenant_id, but YugabyteDB still had to read a large number of rows and discard many of them afterwards. The index helped locate the correct tenant, but additional filtering still happened later in the execution plan. Now we fix the next bottleneck.
Instead of reading rows and throwing many away afterward, we want YugabyteDB to reject non-matching rows earlier, directly from the index.
This series shows how index tuning usually happens in stages. Each tip removes one layer of wasted work so YugabyteDB can answer the query more efficiently.
| Tip | Focus | What Improves |
|---|---|---|
| 1 | Choose the right column order in an index | The index matches the query shape better |
| 2 | Why your index scan still reads thousands of rows | Shows why index usage alone is not enough |
| 3 | How to eliminate “Rows Removed by Filter” | More filtering happens in the index before table reads |
| 4 | How IN() predicates create multiple index ranges | Multiple predicate combinations create distinct index ranges |
| 5 | How ORDER BY LIMIT Can Change the Best Index in YugabyteDB | Sort work can be reduced or eliminated |
| 6 | How covering indexes eliminate table reads | The query may avoid table access entirely |
| 7 | Designing the best index for a query | All tuning ideas come together into one practical design |
In Tip #2, we saw that an index scan may still read thousands of rows. The index helped locate the correct tenant, but additional filtering still happened after the table rows were fetched.
The Query from Tip #2
We continue with the same query used previously:
EXPLAIN (ANALYZE, DIST)
SELECT *
FROM app.activity_log
WHERE tenant_id =
'11111111-1111-1111-1111-111111111111'
AND category IN (
'LOGIN',
'EXPORT',
'SYNC'
)
AND status IN (
'PENDING',
'RUNNING'
)
ORDER BY created_at
LIMIT 100;
Our table looks like this:
yugabyte=# \d app.activity_log;
Table "app.activity_log"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
id | uuid | | not null |
tenant_id | uuid | | not null |
category | text | | not null |
status | text | | not null |
created_at | timestamp with time zone | | not null |
updated_at | timestamp with time zone | | |
user_id | uuid | | |
source_id | uuid | | |
Indexes:
"activity_log_pkey" PRIMARY KEY, lsm (id HASH)
"activity_tenant_idx" lsm (tenant_id HASH)
The only useful index for this query is:
CREATE INDEX activity_tenant_idx
ON app.activity_log (tenant_id);
Why the Current Index Still Wastes Work
The index allows YugabyteDB to quickly locate rows belonging to the tenant.
But the query still has two additional filters:
category IN (...)
status IN (...)
Because category and status are not present in the index, YugabyteDB must:
-
1. Read rows for the tenant from the index
-
2. Fetch those rows from the table
-
3. Apply the category/status filter afterward
-
4. Discard many rows
This behavior appears in execution plans as:
Rows Removed by Filter
This metric indicates that YugabyteDB read rows from storage but discarded them later because they did not satisfy all predicates.
Large values here usually mean:
- ● your index is missing important filter columns
- ● the database is doing extra table reads
- ● performance can often be improved with a better index design
The Fix: Add Filter Columns to the Index
To allow YugabyteDB to filter earlier, we need the index to include the additional predicates.
Create a new index:
CREATE INDEX activity_log_tenant_status_category_idx
ON app.activity_log (tenant_id, status, category);
Now the index contains all three filtered columns:
tenant_id
status
category
This means YugabyteDB can evaluate these predicates during the index scan, instead of after reading the table row.
Running the Query Again
Now rerun the query.
EXPLAIN (ANALYZE, DIST)
SELECT *
FROM app.activity_log
WHERE tenant_id =
'11111111-1111-1111-1111-111111111111'
AND category IN (
'LOGIN',
'EXPORT',
'SYNC'
)
AND status IN (
'PENDING',
'RUNNING'
)
ORDER BY created_at
LIMIT 100;
The plan now uses the new index.
Limit (cost=29773.22..29773.47 rows=100 width=94) (actual time=478.559..478.588 rows=100 loops=1)
-> Sort (cost=29773.22..29922.88 rows=59863 width=94) (actual time=478.557..478.571 rows=100 loops=1)
Sort Key: created_at
Sort Method: top-N heapsort Memory: 48kB
-> Index Scan using activity_log_tenant_status_category_idx on activity_log (cost=100.94..27485.30 rows=59863 width=94) (actual time=15.494..461.795 rows=59689 loops=1)
Index Cond: ((tenant_id = '11111111-1111-1111-1111-111111111111'::uuid) AND (status = ANY ('{PENDING,RUNNING}'::text[])) AND (category = ANY ('{LOGIN,EXPORT,SYNC}'::text[])))
Storage Table Read Requests: 59
Storage Table Read Execution Time: 378.104 ms
Storage Table Read Ops: 59
Storage Table Rows Scanned: 59689
Storage Index Read Requests: 59
Storage Index Read Execution Time: 7.812 ms
Storage Index Read Ops: 59
Storage Index Rows Scanned: 59689
Planning Time: 16.651 ms
Execution Time: 478.675 ms
Storage Read Requests: 118
Storage Read Execution Time: 385.916 ms
Storage Read Ops: 118
Storage Rows Scanned: 119378
Catalog Read Requests: 35
Catalog Read Execution Time: 22.111 ms
Catalog Read Ops: 35
Catalog Write Requests: 0
Storage Write Requests: 0
Storage Flush Requests: 0
Storage Execution Time: 408.027 ms
Peak Memory Usage: 821 kB
The new index is now doing exactly what we wanted for this stage of the tuning process.
All three filtered columns… tenant_id, status, and category … now appear in the Index Cond. That means YugabyteDB can apply those predicates during the index scan itself, instead of reading rows first and filtering them afterward.
status and category into the access path itself.
- ● No more filter waste:
Rows Removed by Filterno longer appears - ● All predicates are now in the Index Cond
- ● Only qualifying rows are fetched from the table: 59,689 rows
- ● The remaining bottleneck is now sorting by
created_at
This is the main goal of this tip: make the index reject non-matching rows before table reads happen.
Before vs After
Below is the conceptual difference between the two index designs.
| Index | Filtering Behavior | Effect |
|---|---|---|
(tenant_id)
|
category and status filtered after table read | Many rows scanned and discarded |
(tenant_id, status, category)
|
status and category applied during index scan | Only qualifying rows fetched from table |
Important Design Reminder
You generally do not want one index per query.
Instead, design indexes that:
- ● support multiple important queries
- ● minimize extra writes
- ● balance read performance with storage overhead
In this example, the new index:
(tenant_id, status, category)
starts with the same leading column as the original index:
(tenant_id)
That means it can often serve the same access pattern while also supporting the additional filtering predicates.
If no important queries still depend on the single-column index, the old index may now be redundant and can often be dropped:
DROP INDEX activity_tenant_idx;
Always confirm index usage against your real workload before removing an existing index.
Conclusion
In Tip #2, we saw that an index scan may still read thousands of rows.
In Tip #3, we improved the index so YugabyteDB can apply more filtering directly within the index scan itself. The new plan shows that all three predicates are now applied in the Index Cond, eliminating wasted filtering work.
However, the query still reads 59,689 qualifying rows and performs a top-N heapsort on created_at before returning the first 100 rows.
So while this tip eliminates wasted filtering, it does not yet eliminate wasted ordering work.
In the next tip, we’ll explore how YugabyteDB can sometimes scan multiple smaller index ranges instead of one large range, a technique known as skip scanning.
Have Fun!
