Use IN, Not OR, for Multi-Value Lookups

When querying multiple exact values on the same column in YugabyteDB, the difference between IN and OR isn’t just stylistic…

👉 It can completely change your execution plan… and your performance.

💡 Version Note
The EXPLAIN plans and cost estimates shown in this tip were generated on YugabyteDB 2025.2.2.0.
Actual plans, costs, and optimizer behavior may vary slightly across versions and cluster configurations.

🔍 The Setup

These two queries look equivalent:

				
					-- OR version
SELECT * FROM use_in WHERE id = 1 OR id = 5 OR id = 1000;

-- IN version
SELECT * FROM use_in WHERE id IN (1, 5, 1000);
				
			

But the optimizer treats them very differently.

📊 What the Planner Does

❌ OR Query Plan (More Work)
  • ● Multiple Bitmap Index Scans
  • BitmapOr merge step
  • YB Bitmap Table Scan (extra fetch)
✅ IN Query Plan (Optimized)
  • ● Single Index Scan
  • ● Uses:
				
					id = ANY ('{1,5,1000}'::integer[])
				
			
  • ○ No bitmap merging
  • ○ Cleaner execution path
💡 Key Insight
IN is internally rewritten as = ANY(...), allowing YugabyteDB to treat the lookup as a single batched index operation instead of multiple independent branches.

🔥 The Real Gotcha: Covering Indexes

Now it gets even more interesting.

Let’s say you create a covering index:

				
					CREATE INDEX use_in_c3_idx ON use_in(c3) INCLUDE (c2, id);
				
			

This index contains everything needed to satisfy the query… no table lookup required.

✅ IN Query (Best Case)
				
					SELECT * FROM use_in WHERE c3 IN (1, 5, 1000);
				
			

➡️ Uses:

				
					Index Only Scan
				
			

✔️ Single, efficient index scan

✔️ Index Only Scan (no table access!)

✔️ Fully leverages the covering index

✔️ Minimal work, minimal I/O

Example:

				
					yugabyte=# EXPLAIN SELECT * FROM use_in WHERE c3 IN (1, 5, 1000);
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Index Only Scan using use_in_c3_idx on use_in  (cost=20.03..24.70 rows=3 width=12)
   Index Cond: (c3 = ANY ('{1,5,1000}'::integer[]))
(2 rows)
				
			
❌ OR Query (The Trap)
				
					SELECT * FROM use_in WHERE c3 = 1 OR c3 = 5 OR c3 = 1000;
				
			
🚨 That means:
  • ✖️ Still uses the index… but 3 separate times
  • ✖️ Builds and merges bitmaps (BitmapOr)
  • ✖️ Forces a table fetch (YB Bitmap Table Scan)
  • ✖️ Cannot take advantage of the covering index
🎯 Why This Matters (Especially in YugabyteDB)
  • ● In a single-node database, this might just mean a bit more CPU.
  • ● In YugabyteDB, it can mean:
  • ● More RPCs to storage (DocDB)
  • ● More data movement
  • ● Higher latency
  • ● Reduced scalability under load

Example:

				
					yugabyte=# EXPLAIN SELECT * FROM use_in WHERE c3 = 1 OR c3 = 5 OR c3 = 1000;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 YB Bitmap Table Scan on use_in  (cost=164.66..189.29 rows=3 width=12)
   ->  BitmapOr  (cost=144.66..144.66 rows=1 width=0)
         ->  Bitmap Index Scan on use_in_c3_idx  (cost=0.00..48.22 rows=1 width=0)
               Index Cond: (c3 = 1)
         ->  Bitmap Index Scan on use_in_c3_idx  (cost=0.00..48.22 rows=1 width=0)
               Index Cond: (c3 = 5)
         ->  Bitmap Index Scan on use_in_c3_idx  (cost=0.00..48.22 rows=1 width=0)
               Index Cond: (c3 = 1000)
(8 rows)
				
			
🚨 Performance Trap
OR still uses the index… just not efficiently. It triggers multiple index scans and bitmap merging, which forces a table lookup and prevents more efficient paths like Index Only Scan.. especially painful in a distributed system.

✅ Best Practice

If you’re matching multiple values on the same column, use IN:
				
					WHERE col IN (...)
				
			

✔ Cleaner
✔ Faster
✔ Better plans
✔ More index-friendly
✔ Better distributed execution

❗ When OR Still Makes Sense

Use OR when:

  • ● Conditions span different columns
  • ● Logic is not reducible to a single column
  • ● You’re combining fundamentally different predicates

🏁 Final Takeaway

  • IN is not just syntactic sugar, it’s a planner hint
  • ● It enables simpler, faster, and more scalable execution
  • ● It can be the difference between:
    • 🔥 Index Only Scan
    • 💥 Bitmap + Table Scan
⚡ TL;DR
For multiple exact matches on the same column, use IN instead of OR — it produces simpler plans, avoids bitmap merging, and enables more efficient index usage.

Have Fun!

Aisle 17 at a grocery store in Dallas while visiting my daughter: where your life choices are either probiotic… or problematic 🍺