In YugabyteDB, collecting up-to-date table statistics is critical when the cost-based optimizer (CBO) is enabled. These statistics guide the optimizer in choosing efficient execution plans. YugabyteDB supports the standard PostgreSQL ANALYZE command — currently as a beta feature — and it plays an essential role in query performance tuning.
Note: The ANALYZE command is expected to reach general availability (GA) soon in YugabyteDB. I’ll update this tip once it does!
But how can you tell if a table has been explicitly analyzed?
This tip walks through how to verify whether a manual ANALYZE was run on a table in YugabyteDB.
YugabyteDB‘s CBO relies on table statistics like row counts and value distributions to make smart decisions. If these stats are missing or stale, your queries may end up using suboptimal plans — especially for complex joins or filters.
Since ANALYZE is still in beta, it’s not always run automatically. When the CBO is enabled (yb_enable_optimizer_stats = true), it’s best practice to manually run ANALYZE after major data changes.
Like in PostgreSQL, YugabyteDB tracks the last time ANALYZE or auto-analyze was run using pg_stat_all_tables. The key columns are:
last_analyze: ManualANALYZEtimestamplast_autoanalyze: Auto-analyze timestamp
Example:
yugabyte=# CREATE TABLE my_table(c1 INT PRIMARY KEY, c2 TEXT);
CREATE TABLE
yugabyte=# SELECT schemaname,
yugabyte-# relname,
yugabyte-# last_analyze
yugabyte-# FROM pg_stat_all_tables
yugabyte-# WHERE relname = 'my_table';
schemaname | relname | last_analyze
------------+----------+--------------
public | my_table |
(1 row)
yugabyte=# ANALYZE my_table;
ANALYZE
yugabyte=# SELECT schemaname,
yugabyte-# relname,
yugabyte-# last_analyze
yugabyte-# FROM pg_stat_all_tables
yugabyte-# WHERE relname = 'my_table';
schemaname | relname | last_analyze
------------+----------+-------------------------------
public | my_table | 2025-05-01 15:17:35.262759+00
(1 row)
This approach works well if you’re connected to the same YugabyteDB node where ANALYZE was run. However, in most real-world setups, your YugabyteDB cluster will have multiple nodes. Since pg_stat_all_tables reflects statistics local to the node you’re connected to, you might not see the last_analyze timestamp if the ANALYZE command was executed on a different node.
yugabyte=# SELECT host,
yugabyte-# CASE WHEN host = split_part(inet_server_addr()::TEXT, '/', 1) THEN 'X' ELSE NULL END connected_host
yugabyte-# FROM yb_servers()
yugabyte-# ORDER BY host;
host | connected_host
-----------+----------------
127.0.0.1 | X
127.0.0.2 |
127.0.0.3 |
127.0.0.4 |
127.0.0.5 |
(5 rows)
yugabyte=# \! ysqlsh -h 127.0.0.2 -c "SELECT schemaname, relname, last_analyze FROM pg_stat_all_tables WHERE relname = 'my_table';"
schemaname | relname | last_analyze
------------+----------+--------------
public | my_table |
(1 row)
A better alternative is to check the pg_class table, whose data is visible across all nodes in the cluster. If the reltuples column contains a value greater than zero, it indicates that ANALYZE has been executed on the table.
yugabyte=# SELECT nspname, relname, reltuples
yugabyte-# FROM pg_class
yugabyte-# JOIN pg_namespace
yugabyte-# ON relnamespace = pg_namespace.oid
yugabyte-# WHERE reltuples > 0
yugabyte-# AND relname = 'my_table'
yugabyte-# ORDER
yugabyte-# BY nspname, relname;
nspname | relname | reltuples
---------+---------+-----------
(0 rows)
Wait! How come there were no records returned? It’s because the table has no rows as we didn’t insert any.
Let’s insert some new data and run ANALYZE again to refresh the table statistics.
yugabyte=# INSERT INTO my_table SELECT generate_series(1, 100000);
INSERT 0 100000
yugabyte=# SELECT nspname, relname, reltuples
yugabyte-# FROM pg_class
yugabyte-# JOIN pg_namespace
yugabyte-# ON relnamespace = pg_namespace.oid
yugabyte-# WHERE relname = 'my_table'
yugabyte-# ORDER
yugabyte-# BY nspname, relname;
nspname | relname | reltuples
---------+----------+-----------
public | my_table | 0
(1 row)
yugabyte=# ANALYZE my_table;
ANALYZE
yugabyte=# SELECT nspname, relname, reltuples
yugabyte-# FROM pg_class
yugabyte-# JOIN pg_namespace
yugabyte-# ON relnamespace = pg_namespace.oid
yugabyte-# WHERE reltuples > 0
yugabyte-# AND relname = 'my_table'
yugabyte-# ORDER
yugabyte-# BY nspname, relname;
nspname | relname | reltuples
---------+----------+-----------
public | my_table | 100000
(1 row)
Notice that the pg_stat_all_tables system view reflects an updated last_analyze timestamp after we ran ANALYZE a second time. I’ve also included the analyze_count column below to show how many times the table has been analyzed.
yugabyte=# SELECT schemaname,
yugabyte-# relname,
yugabyte-# last_analyze,
yugabyte-# analyze_count
yugabyte-# FROM pg_stat_all_tables
yugabyte-# WHERE relname = 'my_table';
schemaname | relname | last_analyze | analyze_count
------------+----------+-------------------------------+---------------
public | my_table | 2025-05-01 15:38:08.920405+00 | 2
(1 row)
Quick Summary for YugabyteDB
Use
pg_stat_all_tablesto check ifANALYZEwas manually run — but keep in mind, this view is node-local, so you’ll need to query it on each node individually.Focus on the
last_analyzetimestamp to determine when the last manualANALYZEoccurred.Alternatively, query
pg_class, which is cluster-wide. If the table has data andreltuplesis greater than zero, that’s a strong indicator thatANALYZEhas been executed.Always run
ANALYZEmanually after major data modifications, especially when using the cost-based optimizer.Keeping statistics up to date ensures the optimizer makes smarter decisions — resulting in faster queries and better overall performance.
Helpful SQL used in today’s tip:
-- Tables Analyzed on this server
SELECT schemaname,
relname,
last_analyze,
analyze_count
FROM pg_stat_all_tables
WHERE analyze_count > 0
ORDER
BY schemaname,
relname;
-- Tables analyzed previously on any node
SELECT nspname, relname, reltuples
FROM pg_class
JOIN pg_namespace
ON relnamespace = pg_namespace.oid
WHERE nspname NOT IN ('information_schema', 'pg_catalog', 'pg_toast')
AND reltuples > 0
ORDER
BY nspname, relname;
Have Fun!
