Why Your Database Latency Spikes at Exactly 9:15am (It’s Probably Not the Query)

👻 “Nothing Changed… So Why Is It Slow?”

Every day at exactly 9:15am, your application slows down.

● Latency jumps from 2ms → 400ms
● It lasts ~90 seconds
● Then everything goes back to normal

You check:

● CPU? ✅ Normal
● Disk? ✅ Normal
● Memory? ✅ Normal
● Query plans? ✅ No change

So… what changed?

👉 Nothing inside the database.

That’s what makes this problem so frustrating… and so easy to misdiagnose.

🎯 Key Insight

If latency spikes but CPU, disk, and ops/sec stay flat, your queries are probably not slow. They are waiting for a connection. This may be the Connection Pool Cliff: a burst of requests overwhelms available connections, and requests queue before execution even begins.

At this point, the key question becomes:

If the database isn’t busy… where is the latency coming from?

🧠 The Real Problem (That Doesn’t Look Like One)

This pattern comes straight out of a real YugabyteDB diagnostic scenario:

● Latency spikes to ~400ms
● Happens at the same time every day
● Lasts a short, predictable window
● All database metrics remain flat

From the YBA diagnostic breakdown:

● YSQL ops/sec drops or flattens
● Total connections hit the ceiling
● Latency rises while the database is mostly idle

👉 That combination is the tell:

Requests are waiting to run, not running slowly

⚠️ Why This Gets Misdiagnosed

⚠️ Common Trap

Most teams assume:

“Latency went up → queries got slower”

But in this case, the queries haven’t even started yet.

The time you’re seeing includes:

1. Waiting for a connection
2. Waiting in a pool or queue
3, Only then executing SQL

Monitoring tools often blur these together.

📊 The Signature Pattern in YBA

1️⃣ Ops/sec vs Connections

● Ops/sec → flat or drops
● Connections → spike to max

👉 If the DB were the bottleneck, ops/sec would rise… not fall.

2️⃣ Latency vs Node Resources

● Latency → spikes sharply
● CPU / Disk / Memory → flat

👉 This is the smoking gun:

If resources don’t move, the database isn’t struggling

3️⃣ (Optional) YCM Wait Signals

If using YSQL Connection Manager:

● Logical connections queue up
● Queue drains in ~90 seconds
● Matches latency spike exactly

👉 That’s direct proof:

You’re looking at connection throttling, not slow queries

🧪 Demo: Reproducing the Connection Pool Cliff

For this demo, we’ll do two things:

1. Create connection pressure
2. Show that even a trivial query appears slower

The goal is not to run a heavy query.

The goal is to prove:

Latency can increase even when the database is doing almost no work

🔧 Step 1: Establish a Baseline

First, let’s measure a trivial query with no load:

				
					ysqlsh -c "\timing on" -c "SELECT now();"

Example result:

				
					Time: 5.687 ms

👉 This is our baseline.

🔍 Step 2: Create Connection Pressure

Now we simulate a burst of concurrent sessions.

⚠️ Note: This is a lab demo. Do not run this in production.

				
					MAX_CONN=$(ysqlsh -At -c "SHOW max_connections;")
TARGET=$((MAX_CONN * 80 / 100))

echo "Opening $TARGET sessions..."

for i in $(seq 1 $TARGET); do
  ysqlsh -c "BEGIN; SELECT pg_sleep(60);" >/dev/null 2>&1 &
done

This creates a large number of idle sessions holding connections open.

🔧 Step 3: Run the Same Trivial Query Again

Now rerun the exact same query:

				
					ysqlsh -c "\timing on" -c "SELECT now();"

Example result under connection pressure:

				
					Time: 94.843 ms

🔍 The Smoking Gun

SELECT now() did not become expensive.

The added latency comes from connection pressure, not query execution.

🔍 Step 4: Validate What’s Actually Happening

Let’s confirm that the database is not actually busy.

Check session states:

				
					SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state
ORDER BY state;

You’ll typically see:

● Many sessions in idle in transaction
● Very few actively executing queries

⚠️ Important Clarification

⚠️ This Is a Simplified Reproduction

In real systems, this problem usually appears before hitting max_connections.

The actual bottleneck is often the application connection pool, not the database limit itself.

🧠 Mapping This to the Real World

In production, the same pattern looks like this:

● Application pool size = 50
● Incoming requests = 500
● 450 requests wait

👉 The database may still have capacity
👉 But requests are queued before execution

🧠 The Mental Model

Think of this like a restaurant:

● Tables = connections
● Customers = requests

At 9:15am:

● All tables are full
● Customers wait

👉 The kitchen (database) is fine
👉 The bottleneck is getting seated

📈 YBA Signal Summary

Metric	Healthy	Connection Pool Cliff
YSQL Ops/sec	Tracks workload	Flat or drops
Connections	Stable	Spike to max
CPU / Disk	Increase with load	Flat
Latency	Low	Sharp spike

🛠️ Fixing the Problem

✅ Pre-Warm the Pool

● Open connections before peak traffic
● Avoid burst creation

✅ Use YSQL Connection Manager (YCM)

● Multiplex connections
● Smooth spikes
● Prevent exhaustion

✅ Tune Application Behavior

● Avoid “connect on demand” at peak
● Right-size pools

👉 Next Step

This tip shows how to diagnose connection bottlenecks.

The next question is: How many connections should you actually have?

👉 Check out my related tip on Right-Sizing Connections in YugabyteDB with YSQL Connection Manager (YCM) to avoid creating this problem in the first place.

🎯 Final Takeaway

When latency spikes, don’t assume queries are slow.

Ask:

● Are connections maxed out?
● Are ops/sec dropping?
● Are resources flat?

If so…

👉 Your database isn’t slow.
👉 Your requests are just waiting their turn.

Have Fun!