Using new DocDB metrics in pg_stat_statements to tune YSQL workloads

YugabyteDB 2025.2 introduces a powerful set of DocDB-aware columns in pg_stat_statements. These metrics finally let you answer a long-standing question:

“What is this SQL query actually doing to DocDB?”

Instead of inferring storage behavior from latency alone, you can now see read/write RPCs, operations, rows scanned, and wait time per SQL statement… all in one place.

This tip focuses on how these new columns help with tuning, capacity planning, and performance troubleshooting, using a small INSERT/SELECT demo only as an illustration.

🧠 Why this matters (the real point of the feature)

Historically, tuning YugabyteDB required correlating:

● SQL-level views (pg_stat_statements)
● Node metrics (Prometheus)
● Logs
● Mental models of DocDB behavior

With the new columns, a single query now tells a cross-layer story:

				
					SQL → YSQL execution → DocDB RPCs → DocDB operations → rows scanned/returned

That unlocks entirely new tuning workflows.

🧰 New DocDB columns (quick refresher)

The new fields added to pg_stat_statements include:

● docdb_read_rpcs
● docdb_write_rpcs
● docdb_read_operations
● docdb_write_operations
● docdb_rows_scanned
● docdb_rows_returned
● docdb_wait_time
● catalog_wait_time

Think of them as storage-level counters attributed to a SQL fingerprint.

🎯 Practical tuning scenarios this enables

1️⃣ Identifying RPC-heavy queries (chatty workloads)

Two queries may have similar latency but very different storage footprints.

				
					SELECT
  query,
  docdb_read_rpcs,
  docdb_read_operations,
  total_exec_time
FROM pg_stat_statements
ORDER BY docdb_read_rpcs DESC
LIMIT 10;

What this tells you

● Queries with high docdb_*_rpcs but low rows returned are often:
- ○ Poorly batched
- ○ Executed in tight loops
- ○ Sensitive to latency amplification

Tuning levers

● Client-side batching
● Server-side batching (YSQL session settings)
● Query refactoring to reduce round trips

2️⃣ Distinguishing “CPU slow” vs “storage slow”

Latency alone doesn’t tell you where time is spent.

				
					SELECT
  query,
  total_exec_time,
  docdb_wait_time
FROM pg_stat_statements
ORDER BY docdb_wait_time DESC;

Interpretation

● High total_exec_time, low docdb_wait_time → CPU / executor / planning cost
● High docdb_wait_time → storage-bound (network, Raft, disk, compaction pressure)

This helps avoid blind tuning:

● Don’t add indexes if you’re network-bound
● Don’t tune batching if the executor dominates

3️⃣ Catching accidental full or wide scans

This one is huge.

				
					SELECT
  query,
  docdb_rows_scanned,
  docdb_rows_returned
FROM pg_stat_statements
ORDER BY docdb_rows_scanned DESC;

Red flags

● rows_scanned ≫ rows_returned
● Especially painful on:
- ○ Large partitioned tables
- ○ Geo / time-series / multi-tenant schemas

What to do

● Add or fix indexes
● Revisit predicates
● Check hash vs range key design
● Validate partition pruning

4️⃣ Understanding write amplification at the SQL layer

One of the most valuable uses of the new DocDB columns is understanding write amplification caused by SQL execution behavior, not just data volume.

A simple experiment makes this visible.

				
					CREATE TABLE test(c1 INT PRIMARY KEY);

INSERT INTO test SELECT generate_series(1, 3072);

SELECT
  docdb_read_rpcs,
  docdb_write_rpcs,
  docdb_read_operations,
  docdb_write_operations,
  docdb_rows_scanned,
  docdb_rows_returned,
  docdb_wait_time
FROM pg_stat_statements
WHERE query = 'INSERT INTO test SELECT generate_series($1, $2)';

				
					yugabyte=# SELECT
yugabyte-#   docdb_read_rpcs,
yugabyte-#   docdb_write_rpcs,
yugabyte-#   docdb_read_operations,
yugabyte-#   docdb_write_operations,
yugabyte-#   docdb_rows_scanned,
yugabyte-#   docdb_rows_returned,
yugabyte-#   docdb_wait_time
yugabyte-# FROM pg_stat_statements
yugabyte-# WHERE query = 'INSERT INTO test SELECT generate_series($1, $2)';
 docdb_read_rpcs | docdb_write_rpcs | docdb_read_operations | docdb_write_operations | docdb_rows_scanned | docdb_rows_returned | docdb_wait_time
-----------------+------------------+-----------------------+------------------------+--------------------+---------------------+-----------------
               0 |                1 |                     0 |                   3072 |                  0 |                   0 |               0
(1 row)

At this point, everything looks straightforward:

● One SQL statement
● One DocDB write RPC
● 3072 write operations

Reset and try again:

				
					TRUNCATE TABLE test;
SELECT pg_stat_statements_reset();

INSERT INTO test SELECT generate_series(1, 3073);

SELECT
  docdb_read_rpcs,
  docdb_write_rpcs,
  docdb_read_operations,
  docdb_write_operations,
  docdb_rows_scanned,
  docdb_rows_returned,
  docdb_wait_time
FROM pg_stat_statements
WHERE query = 'INSERT INTO test SELECT generate_series($1, $2)';

				
					yugabyte=# SELECT
yugabyte-#   docdb_read_rpcs,
yugabyte-#   docdb_write_rpcs,
yugabyte-#   docdb_read_operations,
yugabyte-#   docdb_write_operations,
yugabyte-#   docdb_rows_scanned,
yugabyte-#   docdb_rows_returned,
yugabyte-#   docdb_wait_time
yugabyte-# FROM pg_stat_statements
yugabyte-# WHERE query = 'INSERT INTO test SELECT generate_series($1, $2)';
 docdb_read_rpcs | docdb_write_rpcs | docdb_read_operations | docdb_write_operations | docdb_rows_scanned | docdb_rows_returned | docdb_wait_time
-----------------+------------------+-----------------------+------------------------+--------------------+---------------------+-----------------
               0 |                2 |                     0 |                   3073 |                  0 |                   0 |               0
(1 row)

With a single extra row, docdb_write_rpcs jumps from 1 → 2.

What actually happened?

This behavior is expected once you know that YSQL buffers writes up to ysql_session_max_batch_size (default 3072) before flushing them to DocDB.

● 3072 rows → fit in one batch → one write RPC
● 3073 rows → spill into a second batch → two write RPCs

The important lesson here is not the magic number 3072.

It’s this:

A single SQL statement can generate multiple DocDB write RPCs, and pg_stat_statements now shows you exactly when that happens.

5️⃣ Distinguishing catalog pressure from data pressure

The new catalog_wait_time column adds another important dimension.

				
					SELECT
  query,
  catalog_wait_time,
  docdb_wait_time
FROM pg_stat_statements
ORDER BY catalog_wait_time DESC;

High catalog wait often points to:

● Heavy DDL churn
● Frequent plan invalidations
● Metadata contention in highly concurrent schemas

This explains performance dips that aren’t tied to data volume.

🧩 A simple mental model

When reading these metrics, think in layers:

				
					SQL statement
  → DocDB RPCs
    → DocDB operations
      → Rows scanned / returned
        → Wait time

Any surprise at one layer usually becomes obvious at the next.

🏁 Summary

The new DocDB columns in pg_stat_statements are not about edge cases or trivia.

They fundamentally change how you tune YugabyteDB by:

● Bridging SQL and storage visibility
● Making RPC and operation cost explicit
● Exposing scan inefficiencies
● Clarifying where time is really spent

Once you start tuning with these metrics, it’s hard to imagine going back.

Have Fun!