Why Your Database Latency Spikes at Exactly 9:15am (It’s Probably Not the Query)

👻 “Nothing Changed… So Why Is It Slow?”

Every day at exactly 9:15am, your application slows down.

  • ● Latency jumps from 2ms → 400ms
  • ● It lasts ~90 seconds
  • ● Then everything goes back to normal

You check:

  • ● CPU? ✅ Normal
  • ● Disk? ✅ Normal
  • ● Memory? ✅ Normal
  • ● Query plans? ✅ No change

So… what changed?

  • 👉 Nothing inside the database.

That’s what makes this problem so frustrating… and so easy to misdiagnose.

🎯 Key Insight
If latency spikes but CPU, disk, and ops/sec stay flat, your queries are probably not slow. They are waiting for a connection. This may be the Connection Pool Cliff: a burst of requests overwhelms available connections, and requests queue before execution even begins.

At this point, the key question becomes:

  • If the database isn’t busy… where is the latency coming from?

🧠 The Real Problem (That Doesn’t Look Like One)

This pattern comes straight out of a real YugabyteDB diagnostic scenario:

  • ● Latency spikes to ~400ms
  • ● Happens at the same time every day
  • ● Lasts a short, predictable window
  • ● All database metrics remain flat

From the YBA diagnostic breakdown:

  • ● YSQL ops/sec drops or flattens
  • ● Total connections hit the ceiling
  • ● Latency rises while the database is mostly idle

👉 That combination is the tell:

  • Requests are waiting to run, not running slowly

⚠️ Why This Gets Misdiagnosed

⚠️ Common Trap
Most teams assume:
“Latency went up → queries got slower”
But in this case, the queries haven’t even started yet.

The time you’re seeing includes:

  • 1. Waiting for a connection
  • 2. Waiting in a pool or queue
  • 3, Only then executing SQL

Monitoring tools often blur these together.

📊 The Signature Pattern in YBA

1️⃣ Ops/sec vs Connections
  • ● Ops/sec → flat or drops
  • ● Connections → spike to max

👉 If the DB were the bottleneck, ops/sec would rise… not fall.

2️⃣ Latency vs Node Resources
  • ● Latency → spikes sharply
  • ● CPU / Disk / Memory → flat

👉 This is the smoking gun:

  • If resources don’t move, the database isn’t struggling
3️⃣ (Optional) YCM Wait Signals

If using YSQL Connection Manager:

  • ● Logical connections queue up
  • ● Queue drains in ~90 seconds
  • ● Matches latency spike exactly

👉 That’s direct proof:

  • You’re looking at connection throttling, not slow queries

🧪 Demo: Reproducing the Connection Pool Cliff

For this demo, we’ll do two things:

  • 1. Create connection pressure
  • 2. Show that even a trivial query appears slower

The goal is not to run a heavy query.

The goal is to prove:

  • Latency can increase even when the database is doing almost no work
🔧 Step 1: Establish a Baseline

First, let’s measure a trivial query with no load:

				
					ysqlsh -c "\timing on" -c "SELECT now();"
				
			

Example result:

				
					Time: 5.687 ms
				
			

👉 This is our baseline.

🔍 Step 2: Create Connection Pressure

Now we simulate a burst of concurrent sessions.

  • ⚠️ Note: This is a lab demo. Do not run this in production.
				
					MAX_CONN=$(ysqlsh -At -c "SHOW max_connections;")
TARGET=$((MAX_CONN * 80 / 100))

echo "Opening $TARGET sessions..."

for i in $(seq 1 $TARGET); do
  ysqlsh -c "BEGIN; SELECT pg_sleep(60);" >/dev/null 2>&1 &
done
				
			

This creates a large number of idle sessions holding connections open.

🔧 Step 3: Run the Same Trivial Query Again

Now rerun the exact same query:

				
					ysqlsh -c "\timing on" -c "SELECT now();"
				
			

Example result under connection pressure:

				
					Time: 94.843 ms
				
			
🔍 The Smoking Gun
SELECT now() did not become expensive.

The added latency comes from connection pressure, not query execution.
🔍 Step 4: Validate What’s Actually Happening

Let’s confirm that the database is not actually busy.

Check session states:

				
					SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state
ORDER BY state;
				
			

You’ll typically see:

  • ● Many sessions in idle in transaction
  • ● Very few actively executing queries
⚠️ Important Clarification
⚠️ This Is a Simplified Reproduction
In real systems, this problem usually appears before hitting max_connections.

The actual bottleneck is often the application connection pool, not the database limit itself.
🧠 Mapping This to the Real World

In production, the same pattern looks like this:

  • ● Application pool size = 50
  • ● Incoming requests = 500
  • ● 450 requests wait

👉 The database may still have capacity
👉 But requests are queued before execution

🧠 The Mental Model

Think of this like a restaurant:

  • ● Tables = connections
  • ● Customers = requests

At 9:15am:

  • ● All tables are full
  • ● Customers wait

👉 The kitchen (database) is fine
👉 The bottleneck is getting seated

📈 YBA Signal Summary

Metric Healthy Connection Pool Cliff
YSQL Ops/sec Tracks workload Flat or drops
Connections Stable Spike to max
CPU / Disk Increase with load Flat
Latency Low Sharp spike

🛠️ Fixing the Problem

✅ Pre-Warm the Pool
  • ● Open connections before peak traffic
  • ● Avoid burst creation
✅ Use YSQL Connection Manager (YCM)
  • ● Multiplex connections
  • ● Smooth spikes
  • ● Prevent exhaustion
✅ Tune Application Behavior
  • ● Avoid “connect on demand” at peak
  • ● Right-size pools
👉 Next Step

This tip shows how to diagnose connection bottlenecks.

The next question is: How many connections should you actually have?

 

👉 Check out my related tip on Right-Sizing Connections in YugabyteDB with YSQL Connection Manager (YCM) to avoid creating this problem in the first place.

🎯 Final Takeaway

When latency spikes, don’t assume queries are slow.

Ask:

  • ● Are connections maxed out?
  • ● Are ops/sec dropping?
  • ● Are resources flat?

If so…

  • 👉 Your database isn’t slow.
  • 👉 Your requests are just waiting their turn.

Have Fun!

MAR 29: First documented use of the word “database” in a professional setting… and coincidentally, the birthday of one of my favorite AEs at YugabyteDB. Not saying there’s a connection... but both have been driving transactions ever since.