How to Eliminate “Rows Removed by Filter” in YugabyteDB

In the previous tip Why Your Index Scan Still Reads Thousands of Rows, the second in our index tuning series, we saw an important lesson about index scans:

  • An index being used does not mean the scan is efficient.
Our query used an index on tenant_id, but YugabyteDB still had to read a large number of rows and discard many of them afterwards. The index helped locate the correct tenant, but additional filtering still happened later in the execution plan.

Now we fix the next bottleneck.

Instead of reading rows and throwing many away afterward, we want YugabyteDB to reject non-matching rows earlier, directly from the index.

⚡ Key Insight
A well-designed index doesn’t just find rows faster. It also allows YugabyteDB to discard non-matching rows before reading the table, dramatically reducing unnecessary I/O.
🧭 The Index Tuning Journey

This series shows how index tuning usually happens in stages. Each tip removes one layer of wasted work so YugabyteDB can answer the query more efficiently.

Tip Focus What Improves
1 Choose the right column order in an index The index matches the query shape better
2 Why your index scan still reads thousands of rows Shows why index usage alone is not enough
3 How to eliminate “Rows Removed by Filter” More filtering happens in the index before table reads
4 How IN() predicates create multiple index ranges Multiple predicate combinations create distinct index ranges
5 How ORDER BY LIMIT Can Change the Best Index in YugabyteDB Sort work can be reduced or eliminated
6 How covering indexes eliminate table reads The query may avoid table access entirely
7 Designing the best index for a query All tuning ideas come together into one practical design

In Tip #2, we saw that an index scan may still read thousands of rows. The index helped locate the correct tenant, but additional filtering still happened after the table rows were fetched.

The Query from Tip #2

We continue with the same query used previously:

				
					EXPLAIN (ANALYZE, DIST)
SELECT *
FROM app.activity_log
WHERE tenant_id =
'11111111-1111-1111-1111-111111111111'
AND category IN (
'LOGIN',
'EXPORT',
'SYNC'
)
AND status IN (
'PENDING',
'RUNNING'
)
ORDER BY created_at
LIMIT 100;
				
			

Our table looks like this:

				
					yugabyte=# \d app.activity_log;
                        Table "app.activity_log"
   Column   |           Type           | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
 id         | uuid                     |           | not null |
 tenant_id  | uuid                     |           | not null |
 category   | text                     |           | not null |
 status     | text                     |           | not null |
 created_at | timestamp with time zone |           | not null |
 updated_at | timestamp with time zone |           |          |
 user_id    | uuid                     |           |          |
 source_id  | uuid                     |           |          |
Indexes:
    "activity_log_pkey" PRIMARY KEY, lsm (id HASH)
    "activity_tenant_idx" lsm (tenant_id HASH)
				
			

The only useful index for this query is:

				
					CREATE INDEX activity_tenant_idx
ON app.activity_log (tenant_id);
				
			

Why the Current Index Still Wastes Work

The index allows YugabyteDB to quickly locate rows belonging to the tenant.

But the query still has two additional filters:

				
					category IN (...)
status IN (...)
				
			

Because category and status are not present in the index, YugabyteDB must:

  • 1. Read rows for the tenant from the index

  • 2. Fetch those rows from the table

  • 3. Apply the category/status filter afterward

  • 4. Discard many rows

This behavior appears in execution plans as:

				
					Rows Removed by Filter
				
			
🔎 What “Rows Removed by Filter” Means

This metric indicates that YugabyteDB read rows from storage but discarded them later because they did not satisfy all predicates.

Large values here usually mean:

  • ● your index is missing important filter columns
  • ● the database is doing extra table reads
  • ● performance can often be improved with a better index design

The Fix: Add Filter Columns to the Index

To allow YugabyteDB to filter earlier, we need the index to include the additional predicates.

Create a new index:

				
					CREATE INDEX activity_log_tenant_status_category_idx
ON app.activity_log (tenant_id, status, category);
				
			

Now the index contains all three filtered columns:

				
					tenant_id
status
category
				
			

This means YugabyteDB can evaluate these predicates during the index scan, instead of after reading the table row.

Running the Query Again

Now rerun the query.

				
					EXPLAIN (ANALYZE, DIST)
SELECT *
FROM app.activity_log
WHERE tenant_id =
'11111111-1111-1111-1111-111111111111'
AND category IN (
'LOGIN',
'EXPORT',
'SYNC'
)
AND status IN (
'PENDING',
'RUNNING'
)
ORDER BY created_at
LIMIT 100;
				
			

The plan now uses the new index.

				
					 Limit  (cost=29773.22..29773.47 rows=100 width=94) (actual time=478.559..478.588 rows=100 loops=1)
   ->  Sort  (cost=29773.22..29922.88 rows=59863 width=94) (actual time=478.557..478.571 rows=100 loops=1)
         Sort Key: created_at
         Sort Method: top-N heapsort  Memory: 48kB
         ->  Index Scan using activity_log_tenant_status_category_idx on activity_log  (cost=100.94..27485.30 rows=59863 width=94) (actual time=15.494..461.795 rows=59689 loops=1)
               Index Cond: ((tenant_id = '11111111-1111-1111-1111-111111111111'::uuid) AND (status = ANY ('{PENDING,RUNNING}'::text[])) AND (category = ANY ('{LOGIN,EXPORT,SYNC}'::text[])))
               Storage Table Read Requests: 59
               Storage Table Read Execution Time: 378.104 ms
               Storage Table Read Ops: 59
               Storage Table Rows Scanned: 59689
               Storage Index Read Requests: 59
               Storage Index Read Execution Time: 7.812 ms
               Storage Index Read Ops: 59
               Storage Index Rows Scanned: 59689
 Planning Time: 16.651 ms
 Execution Time: 478.675 ms
 Storage Read Requests: 118
 Storage Read Execution Time: 385.916 ms
 Storage Read Ops: 118
 Storage Rows Scanned: 119378
 Catalog Read Requests: 35
 Catalog Read Execution Time: 22.111 ms
 Catalog Read Ops: 35
 Catalog Write Requests: 0
 Storage Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 408.027 ms
 Peak Memory Usage: 821 kB
				
			

The new index is now doing exactly what we wanted for this stage of the tuning process.

All three filtered columns… tenant_id, status, and category … now appear in the Index Cond. That means YugabyteDB can apply those predicates during the index scan itself, instead of reading rows first and filtering them afterward.

📊 What Changed
The new index moved status and category into the access path itself.
  • No more filter waste: Rows Removed by Filter no longer appears
  • All predicates are now in the Index Cond
  • Only qualifying rows are fetched from the table: 59,689 rows
  • The remaining bottleneck is now sorting by created_at

This is the main goal of this tip: make the index reject non-matching rows before table reads happen.

Before vs After

Below is the conceptual difference between the two index designs.

Index Filtering Behavior Effect
(tenant_id) category and status filtered after table read Many rows scanned and discarded
(tenant_id, status, category) status and category applied during index scan Only qualifying rows fetched from table

Important Design Reminder

You generally do not want one index per query.

Instead, design indexes that:

  • ● support multiple important queries
  • ● minimize extra writes
  • ● balance read performance with storage overhead

In this example, the new index:

				
					(tenant_id, status, category)
				
			

starts with the same leading column as the original index:

				
					(tenant_id)
				
			

That means it can often serve the same access pattern while also supporting the additional filtering predicates.

If no important queries still depend on the single-column index, the old index may now be redundant and can often be dropped:

				
					DROP INDEX activity_tenant_idx;
				
			

Always confirm index usage against your real workload before removing an existing index.

Conclusion

In Tip #2, we saw that an index scan may still read thousands of rows.

In Tip #3, we improved the index so YugabyteDB can apply more filtering directly within the index scan itself. The new plan shows that all three predicates are now applied in the Index Cond, eliminating wasted filtering work.

However, the query still reads 59,689 qualifying rows and performs a top-N heapsort on created_at before returning the first 100 rows.

So while this tip eliminates wasted filtering, it does not yet eliminate wasted ordering work.

In the next tip, we’ll explore how YugabyteDB can sometimes scan multiple smaller index ranges instead of one large range, a technique known as skip scanning.

Have Fun!

🏆 Honored to receive SE of the Year 2026 at SKO. Grateful to work with such an incredible team and amazing customers every day. Big thanks to everyone I collaborate with across sales, engineering, and support... this is truly a team sport.