Why Compaction Never Settles After Dropping a Table (The “Ghost Table” Problem)

🚨 The Symptom

You dropped a large table… days (or weeks) ago.

But something feels off:

  • 🔥 Compaction is still elevated on a subset of nodes
  • 🐢 Query latency is higher on those same nodes
  • 🤔 Nothing in your workload explains it

So naturally, you start tuning queries… but nothing improves.

⚠️ Important
If only some nodes are slow … and it’s always the same ones … you’re likely not dealing with a query problem.

😵 Why This Fools Engineers

This issue is incredibly deceptive because it looks like a workload problem, but it’s not.

You’ll often see:

  • ● Only some nodes affected
  • ● No obvious spike in writes
  • ● No slow query smoking gun

So teams:

  • ● Add indexes
  • ● Rewrite queries
  • ● Scale nodes

Meanwhile, the real issue is sitting quietly in the storage layer.

🧩 What’s Actually Happening

When you drop a table in YugabyteDB:

  • 👉 Its tablets are marked for deletion (tombstoned)
  • 👉 Background compaction is responsible for cleaning them up

But…

🧠 Key Insight
If a stale Raft leader still exists for those tablets, cleanup can stall, leaving behind SST files that continue to impact performance.

That leaves behind:

  • ● Orphaned SST files
  • ● Blocked tombstone GC
  • ● Persistent compaction pressure

🔬 Real-World Scenario

📦 Scenario
  • ● 6-node cluster (RF=3)
  • ● Dropped ~200GB table 2 weeks ago
  • ● 3 nodes show constant compaction + higher latency
  • ● Workload unchanged
Classic “ghost table” signature

🔍 Step-by-Step Diagnosis in YBA

1️⃣ Compare Compaction Across Nodes

YBA Path:

  • Universe → Metrics → DocDB → Compaction
🔎 What to Look For
  • ● Same 2–3 nodes consistently higher
  • ● Pattern is steady, not bursty
👉 This rules out a write spike immediately
2️⃣ Check SST File Counts

YBA Path:

  • Metrics → RocksDB → SST Files per Level
🔎 What to Look For
  • L0/L1 files accumulating on affected nodes
  • Files not compacting down
👉 GC is blocked
3️⃣ Inspect Tablet State on Affected Nodes

TServer UI:

  • http://<node-ip>:9000/tablets
🔎 What to Look For
  • Tablets in TOMBSTONED or SHUTDOWN
  • Still present / lingering
👉 These are remnants of the dropped table

📊 Signal Summary

Metric / Location Healthy Ghost Table Signature
Compaction (DocDB) Even across nodes 3–5× higher on same nodes for days
SST File Count L0 compacts quickly L0/L1 growing, not compacting
TServer Tablets (:9000) Only active tablets TOMBSTONED tablets still present
Read Latency Uniform Higher on affected nodes

❌ What Teams Usually Think

🚫 Common Misdiagnosis
  • “We have a slow query”
  • “We need better indexes”
  • “This node is underpowered”

✅ What’s Actually Happening

✅ Reality
  • Tombstoned tablets are not fully cleaned up
  • Stale Raft state is blocking GC
  • Compaction is stuck doing unnecessary work

🛠 How to Fix It

🛠️ Recommended Fix
  1. Try manual compaction / GC
  2. Perform a rolling restart of affected TServers
👉 This clears stale Raft state and allows cleanup to complete

📈 Monitor Recovery

  • ● Compaction flattens
  • ● SST files decrease
  • ● Latency normalizes

🎯 Final Takeaway

If compaction stays elevated long after a table drop, and it’s isolated to specific nodes, stop tuning queries!

You’re not dealing with a workload issue.

You’re dealing with a ghost of a table that hasn’t fully died yet.

Have Fun!

🙌 Acknowledgment
Special thanks to Dan Farrell, Senior Pre-Sales Engineer at YugabyteDB, for providing the detailed insights that helped shape this tip.
This is Maple, our daughter’s Golden Retriever, in full play mode.