Automatically keeping table statistics up to date for better query plans

🌟 Introduction

In a distributed SQL engine such as YugabyteDB, query performance hinges on the optimizer’s ability to choose the right execution plan. To do that, the cost-based optimizer (CBO) needs accurate, up-to-date table statistics (row counts, value distributions, etc.).

Traditionally, you’d run ANALYZE manually after major loads or schema changes. But starting in v2025.1 (STS) YugabyteDB introduces the Auto Analyze service to automate that housekeeping step, freeing you from needing a cron job or manual schedule, and helping ensure the CBO always has current statistics.

In this post we’ll walk through:

● What Auto Analyze is and why it matters
● How to enable and configure it
● A quick demo showing it in action
● Some caveats and best practices
● Summary and key takeaways

🔍 What is Auto Analyze?

Auto Analyze is a background service built into YugabyteDB that continuously monitors table changes (INSERT, UPDATE, DELETE) and automatically runs ANALYZE when enough mutations occur.

This ensures:

● Statistics stay up-to-date
● The CBO has accurate cardinality estimates
● Query plans remain efficient
● You reduce operational overhead (no more manual scheduling or cron jobs)

How It Works

Auto Analyze maintains a mutation counter for each table. When the number of mutations exceeds:

				
					threshold + (scale_factor * table_size)

… then YugabyteDB automatically performs an ANALYZE on the table.

Where:

● threshold -> minimum mutation count to trigger analysis
● scale_factor -> scales with table size
● table_size -> based on existing statistics (pg_class.reltuples)

This approach makes Auto Analyze both adaptive and workload-aware.

⚙️ Enabling & Configuring Auto Analyze

To enable Auto Analyze, you need two things:

Enable the Auto Analyze service
Enable mutation counters (required for tracking updates)

Enable on Masters

				
					--master_flags "ysql_enable_auto_analyze_service=true"

Update

● Starting with YugabyteDB 2025.2, the gFlag used to enable or disable Auto Analyze has been renamed to ysql_enable_auto_analyze.
● When upgrading to v2025.2, Auto Analyze is enabled by default if the universe already has the cost-based optimizer enabled.

Enable on TServers

				
					--tserver_flags "ysql_enable_auto_analyze_service=true,ysql_enable_table_mutation_counter=true"

Update

● Starting with YugabyteDB 2025.2, the gFlag used to enable or disable Auto Analyze has been renamed to ysql_enable_auto_analyze.
● When upgrading to v2025.2, Auto Analyze is enabled by default if the universe already has the cost-based optimizer enabled.

For example (single-node yugabyted) using YugabyteDB 2025.2.0:

				
					./bin/yugabyted start \
  --master_flags "ysql_enable_auto_analyze=true" \
  --tserver_flags "ysql_enable_auto_analyze=true,ysql_enable_table_mutation_counter=true"

If you’re enabling this on an existing cluster, you’ll need a rolling restart after applying these flags across the cluster.

Configuration flags (for YugabyteDB 2025.2 and above)

Flag	Default	Description / Notes
`ysql_enable_auto_analyze`	`false`	Enables the Auto Analyze service, which automatically runs `ANALYZE` to refresh table statistics when mutation thresholds are exceeded.
`ysql_auto_analyze_threshold`	`50`	Minimum number of table mutations required before Auto Analyze can trigger.
`ysql_auto_analyze_scale_factor`	`0.1`	Fraction of total table rows that must be mutated before triggering `ANALYZE`.
`ysql_auto_analyze_min_cooldown_per_table`	`10000` (10s)	Minimum cooldown period between Auto Analyze runs for the same table.
`ysql_auto_analyze_max_cooldown_per_table`	`86400000` (24h)	Maximum cooldown period between Auto Analyze runs for the same table.
`ysql_auto_analyze_cooldown_per_table_scale_factor`	`2`	Exponential factor used to increase cooldown time between successive Auto Analyze runs on a table.
`ysql_auto_analyze_batch_size`	`10`	Maximum number of tables Auto Analyze attempts to process in a single batch.
`ysql_cluster_level_mutation_persist_interval_ms`	`10000`	Interval at which node-level mutation counts are persisted at the cluster level.
`ysql_cluster_level_mutation_persist_rpc_timeout_ms`	`10000`	RPC timeout for persisting cluster-level mutation counts.
`ysql_node_level_mutation_reporting_interval_ms`	`5000`	Interval at which node-level mutation counts are reported to the Auto Analyze service.
`ysql_node_level_mutation_reporting_timeout_ms`	`5000`	RPC timeout for node-level mutation reporting.
`ysql_enable_auto_analyze_service` Deprecated	`false`	Deprecated flag for enabling Auto Analyze. Replaced by `ysql_enable_auto_analyze`. Requires consistent values across all nodes and a restart to change.
`ysql_enable_table_mutation_counter` Deprecated	`false`	Deprecated mutation counter flag. Auto Analyze now manages mutation tracking internally.

Tuning Tips:

● Adjust threshold and scale factor for workload patterns
- ○ For large tables with steady mutation rates, lowering the ysql_auto_analyze_scale_factor makes Auto Analyze trigger more frequently, keeping statistics up to date for the cost-based optimizer.
- ○ For high-churn OLTP workloads, reducing the ysql_auto_analyze_threshold helps Auto Analyze react quickly to frequent changes.
- ○ Increasing either threshold or scale_factor can reduce the frequency of ANALYZE runs when workloads cause too many redundant statistical refreshes.
● Balance cooldown settings to avoid over-analysis
- ○ The ysql_auto_analyze_min_cooldown_per_table and ysql_auto_analyze_max_cooldown_per_table settings control how soon Auto Analyze will consider a table again after a recent run. For very active tables, shortening the minimum cooldown can help keep stats fresh without waiting the full default cooldown period.
- ○ Conversely, increasing the cooldowns for less active tables can reduce unnecessary ANALYZE invocations.
● Monitor and tune batch sizes
- ○ The ysql_auto_analyze_batch_size controls how many tables Auto Analyze tries to process in a single batch. If you have a large number of tables changing frequently, adjusting this value can help balance overhead vs. timeliness.
● Keep flags consistent across nodes
- ○ Ensure that all TServers have the same Auto Analyze gFlag settings to avoid inconsistent behavior in statistics collection and analysis.

🧪 Demo: Auto Analyze in Action

Below is a hands-on demo showing how Auto Analyze updates statistics as data changes.

🟦 STEP 1: Start the cluster with Auto Analyze enabled

				
					# Using YugabyteDB 2025.2.0
./bin/yugabyted start \
  --master_flags "ysql_enable_auto_analyze=true" \
  --tserver_flags "ysql_enable_auto_analyze=true,ysql_enable_table_mutation_counter=true"

🟩 STEP 2: Create a test table

				
					CREATE TABLE demo_auto (
  id  SERIAL PRIMARY KEY,
  val INT
);

🟨 STEP 3: Check initial statistics

				
					SELECT relname, reltuples
FROM pg_class
WHERE relname = 'demo_auto';

You may see reltuples = -1 initially (unknown statistics).

🟧 STEP 4: Insert some rows

				
					INSERT INTO demo_auto (val)
SELECT generate_series(1,100);

After a short delay:

				
					SELECT relname, reltuples
FROM pg_class
WHERE relname = 'demo_auto';

🟥 STEP 5: Insert more rows to exceed mutation thresholds

				
					INSERT INTO demo_auto (val)
SELECT generate_series(101,5000);

Wait again and re-check reltuples… Auto Analyze should automatically refresh statistics.

🔵 STEP 6: (Optional) Check query plan behavior

				
					EXPLAIN SELECT * FROM demo_auto WHERE val = 500;

You should now see improved row estimate accuracy.

🖥️ Example Demo output:

				
					yugabyte=# CREATE TABLE demo_auto (
yugabyte(#   id  SERIAL PRIMARY KEY,
yugabyte(#   val INT
yugabyte(# );
CREATE TABLE

yugabyte=# SELECT relname, reltuples
yugabyte-# FROM pg_class
yugabyte-# WHERE relname = 'demo_auto';
  relname  | reltuples
-----------+-----------
 demo_auto |        -1
(1 row)

yugabyte=# INSERT INTO demo_auto (val)
yugabyte-# SELECT generate_series(1,100);
INSERT 0 100

yugabyte=# SELECT pg_sleep(10);
 pg_sleep
----------

(1 row)

yugabyte=# SELECT relname, reltuples
yugabyte-# FROM pg_class
yugabyte-# WHERE relname = 'demo_auto';
  relname  | reltuples
-----------+-----------
 demo_auto |       100
(1 row)

yugabyte=# INSERT INTO demo_auto (val)
yugabyte-# SELECT generate_series(101,5000);
INSERT 0 4900

yugabyte=# SELECT pg_sleep(10);
 pg_sleep
----------

(1 row)

yugabyte=# SELECT relname, reltuples
yugabyte-# FROM pg_class
yugabyte-# WHERE relname = 'demo_auto';
  relname  | reltuples
-----------+-----------
 demo_auto |      5000
(1 row)

yugabyte=# EXPLAIN SELECT * FROM demo_auto WHERE val = 500;
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on demo_auto  (cost=0.00..512.50 rows=5000 width=8)
   Storage Filter: (val = 500)
(2 rows)

⚠️ Caveats & Best Practices

Before running Auto Analyze in production, keep these considerations in mind:

● Initial and bulk loads still benefit from manual ANALYZE
- ○ ○ Auto Analyze triggers based on mutation counts and scale factors, so after large initial data loads or major data migrations, you may still want to run a manual ANALYZE to ensure immediate accurate stats.
● Auto Analyze interacts with schema changes
- ○ ANALYZE is treated like a DDL operation, which can conflict with heavy schema migrations. Consider disabling Auto Analyze temporarily during large structural changes to avoid unexpected contention.
● Mutation counting must be enabled
- ○ For Auto Analyze to function correctly, make sure that mutation counters (ysql_node_level_mutation_reporting_interval_ms/timeouts, etc.) are configured and enabled. Without accurate mutation metrics, thresholds won’t be evaluated properly.
● Caveats with very small or very large tables
- ○ Very small tables may trigger frequent, unnecessary ANALYZE operations unless thresholds and scale factors are tuned appropriately.
- ○ Very large tables may accumulate enough mutations before triggering an ANALYZE that the statistics become stale unless scaled appropriately. Adjust scale_factor and cooldown settings for your workload patterns.
● Consistent versioning and feature support
- ○ Be aware of feature maturity and version differences. For example, some earlier flags like ysql_enable_auto_analyze_service and ysql_enable_table_mutation_counter are deprecated in favor of ysql_enable_auto_analyze starting in 2025.2, so review your configs when upgrading.

Auto Analyze is one more way YugabyteDB continues to reduce manual work while improving performance across distributed SQL deployments. 🚀

Have Fun!