Why Compaction Never Settles After Dropping a Table (The “Ghost Table” Problem)

🚨 The Symptom

You dropped a large table… days (or weeks) ago.

But something feels off:

🔥 Compaction is still elevated on a subset of nodes
🐢 Query latency is higher on those same nodes
🤔 Nothing in your workload explains it

So naturally, you start tuning queries… but nothing improves.

⚠️ Important

If only some nodes are slow … and it’s always the same ones … you’re likely not dealing with a query problem.

😵 Why This Fools Engineers

This issue is incredibly deceptive because it looks like a workload problem, but it’s not.

You’ll often see:

● Only some nodes affected
● No obvious spike in writes
● No slow query smoking gun

So teams:

● Add indexes
● Rewrite queries
● Scale nodes

Meanwhile, the real issue is sitting quietly in the storage layer.

🧩 What’s Actually Happening

When you drop a table in YugabyteDB:

👉 Its tablets are marked for deletion (tombstoned)
👉 Background compaction is responsible for cleaning them up

But…

🧠 Key Insight

If a stale Raft leader still exists for those tablets, cleanup can stall, leaving behind SST files that continue to impact performance.

That leaves behind:

● Orphaned SST files
● Blocked tombstone GC
● Persistent compaction pressure

🔬 Real-World Scenario

📦 Scenario

● 6-node cluster (RF=3)
● Dropped ~200GB table 2 weeks ago
● 3 nodes show constant compaction + higher latency
● Workload unchanged

Classic “ghost table” signature

🔍 Step-by-Step Diagnosis in YBA

1️⃣ Compare Compaction Across Nodes

YBA Path:

Universe → Metrics → DocDB → Compaction

🔎 What to Look For

● Same 2–3 nodes consistently higher
● Pattern is steady, not bursty

👉 This rules out a write spike immediately

2️⃣ Check SST File Counts

YBA Path:

Metrics → RocksDB → SST Files per Level

🔎 What to Look For

L0/L1 files accumulating on affected nodes
Files not compacting down

👉 GC is blocked

3️⃣ Inspect Tablet State on Affected Nodes

TServer UI:

http://<node-ip>:9000/tablets

🔎 What to Look For

Tablets in TOMBSTONED or SHUTDOWN
Still present / lingering

👉 These are remnants of the dropped table

📊 Signal Summary

Metric / Location	Healthy	Ghost Table Signature
Compaction (DocDB)	Even across nodes	3–5× higher on same nodes for days
SST File Count	L0 compacts quickly	L0/L1 growing, not compacting
TServer Tablets (:9000)	Only active tablets	TOMBSTONED tablets still present
Read Latency	Uniform	Higher on affected nodes

❌ What Teams Usually Think

🚫 Common Misdiagnosis

“We have a slow query”
“We need better indexes”
“This node is underpowered”

✅ What’s Actually Happening

✅ Reality

Tombstoned tablets are not fully cleaned up
Stale Raft state is blocking GC
Compaction is stuck doing unnecessary work

🛠 How to Fix It

🛠️ Recommended Fix

Try manual compaction / GC
Perform a rolling restart of affected TServers

👉 This clears stale Raft state and allows cleanup to complete

📈 Monitor Recovery

● Compaction flattens
● SST files decrease
● Latency normalizes

🎯 Final Takeaway

If compaction stays elevated long after a table drop, and it’s isolated to specific nodes, stop tuning queries!

You’re not dealing with a workload issue.

You’re dealing with a ghost of a table that hasn’t fully died yet.

Have Fun!

🙌 Acknowledgment

Special thanks to Dan Farrell, Senior Pre-Sales Engineer at YugabyteDB, for providing the detailed insights that helped shape this tip.