In a previous YugabyteDB Tip, we explored how to use cbo_stat_dump to recreate production optimizer behavior in a clean environment:
That utility is incredibly useful because it:
β Extracts table DDL
β Exports
pg_class.reltuples(row count estimates)β Dumps
pg_statistic(column statistics)β Generates an
import_statistics.sqlscript
But sometimes customers:
β Cannot download from GitHub (security restrictions)
β Cannot install additional Linux packages
β Prefer a pure SQL solution
Good news: we can generate a portable import_statistics.sql using SQL only.
What Youβll Learn
β How to export optimizer statistics using only SQL
β How to target specific tables (not the whole database)
β How to generate a portable
import_statistics.sqlfileβ How to replay stats in a clean environment
Important
- β Reading
pg_statistictypically requires elevated privileges. - β Statistics may include sampled column values β treat exports as potentially sensitive.
- β This is a workaround when
cbo_stat_dumpcannot be used.
Step 1: Define the Tables You Want
Unlike cbo_stat_dump, which extracts tables from a query automatically, this SQL-only method assumes:
- You already know which tables you need.
We explicitly define them using regclass so everything stays portable across clusters.
Example:
WITH target_tables AS (
SELECT unnest(ARRAY[
'demo.customers'::regclass,
'demo.orders'::regclass
]) AS rel
)
SELECT rel FROM target_tables;
You can modify that array to include exactly the tables involved in your problem query.
Step 2: Generate pg_class.reltuples Updates
The planner uses pg_class.reltuples for table cardinality estimates.
This generates portable update statements:
-- Step 2: Generate pg_class UPDATE statements (targeted tables + indexes)
WITH target_tables AS (
SELECT unnest(ARRAY[
'demo.customers'::regclass,
'demo.orders'::regclass
]) AS rel
),
target_rels AS (
-- 1) the tables
SELECT rel AS oid
FROM target_tables
UNION ALL
-- 2) all indexes on those tables
SELECT i.indexrelid AS oid
FROM pg_index i
JOIN target_tables tt ON tt.rel = i.indrelid
)
SELECT
format(
'UPDATE pg_class SET reltuples = %s WHERE oid = %L::regclass;',
c.reltuples,
n.nspname || '.' || c.relname
) AS sql
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN target_rels tr ON tr.oid = c.oid
WHERE n.nspname NOT IN ('pg_catalog','pg_toast','information_schema')
ORDER BY 1;
This ensures:
β No OIDs are hard-coded
β Statements remain portable
- β Indexes are included
Step 3: Generate pg_statistic INSERT Statements
This is the heart of the export.
We extract only statistics for the tables defined in target_tables.
-- Step 3: Generate pg_statistic INSERT statements (targeted tables + indexes)
WITH target_tables AS (
SELECT unnest(ARRAY[
'demo.customers'::regclass,
'demo.orders'::regclass
]) AS rel
),
target_rels AS (
-- tables
SELECT rel AS oid
FROM target_tables
UNION ALL
-- indexes
SELECT i.indexrelid AS oid
FROM pg_index i
JOIN target_tables tt ON tt.rel = i.indrelid
),
stats AS (
SELECT
n.nspname,
c.relname,
a.attnum,
tn.nspname AS typnspname,
t.typname,
s.*
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN target_rels tr ON tr.oid = c.oid
JOIN pg_attribute a ON a.attrelid = c.oid
AND a.attnum > 0
AND NOT a.attisdropped
JOIN pg_statistic s ON s.starelid = c.oid
AND s.staattnum = a.attnum
JOIN pg_type t ON t.oid = a.atttypid
JOIN pg_namespace tn ON tn.oid = t.typnamespace
WHERE n.nspname NOT IN ('pg_catalog','pg_toast','information_schema')
)
SELECT format($fmt$
INSERT INTO pg_statistic (
starelid, staattnum, stainherit, stanullfrac, stawidth, stadistinct,
stakind1, stakind2, stakind3, stakind4, stakind5,
staop1, staop2, staop3, staop4, staop5,
stanumbers1, stanumbers2, stanumbers3, stanumbers4, stanumbers5,
stavalues1, stavalues2, stavalues3, stavalues4, stavalues5
) VALUES (
%L::regclass, %s, %s, %s, %s, %s,
%s, %s, %s, %s, %s,
%s, %s, %s, %s, %s,
%s, %s, %s, %s, %s,
%s, %s, %s, %s, %s
);
$fmt$,
(nspname || '.' || relname),
staattnum,
stainherit,
stanullfrac,
stawidth,
stadistinct,
stakind1, stakind2, stakind3, stakind4, stakind5,
staop1, staop2, staop3, staop4, staop5,
CASE WHEN stanumbers1 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers1) || '::real[]' END,
CASE WHEN stanumbers2 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers2) || '::real[]' END,
CASE WHEN stanumbers3 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers3) || '::real[]' END,
CASE WHEN stanumbers4 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers4) || '::real[]' END,
CASE WHEN stanumbers5 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers5) || '::real[]' END,
CASE WHEN stavalues1 IS NULL THEN 'NULL'
ELSE quote_literal(stavalues1) || '::' ||
quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
END,
CASE WHEN stavalues2 IS NULL THEN 'NULL'
ELSE quote_literal(stavalues2) || '::' ||
quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
END,
CASE WHEN stavalues3 IS NULL THEN 'NULL'
ELSE quote_literal(stavalues3) || '::' ||
quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
END,
CASE WHEN stavalues4 IS NULL THEN 'NULL'
ELSE quote_literal(stavalues4) || '::' ||
quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
END,
CASE WHEN stavalues5 IS NULL THEN 'NULL'
ELSE quote_literal(stavalues5) || '::' ||
quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
END
) AS sql
FROM stats
ORDER BY nspname, relname, attnum;
Step 4: Save the Output
Create a file named generate_stats.sql and paste into it both generator sections:
β the
pg_class.reltuplesUPDATE generator (Step 2), andβ the
pg_statisticINSERT generator (Step 3).
Together, these two sections will output the complete import_statistics.sql needed to recreate the optimizerβs full statistical model.
Run in ysqlsh and redirect:
ysqlsh -h -p 5433 -U -d -At -f generate_stats.sql > import_statistics.sql
Now you have:
β
UPDATE pg_classstatementsβ
INSERT INTO pg_statisticstatements
Just like cbo_stat_dump generates.
Step 5: Replay in a Clean Environment
1. Load table DDL
2, Run
import_statistics.sql3. Run
EXPLAIN4. Compare plan shape
This follows the exact workflow from:
The optimizer doesnβt care about your data.
It cares about statistics.
If you can reproduce pg_class.reltuples and pg_statistic, you can usually reproduce the plan β without copying production data.
When Should You Still Use cbo_stat_dump?
If possible, the utility is still preferred because:
β It automatically extracts tables from a query
β Handles edge cases more cleanly
β Supports extended statistics
But when tooling is restricted, SQL alone is enough.
Conclusion
This approach gives you a pure-SQL fallback for reproducing optimizer behavior.
No GitHub downloads.
No additional Linux packages.
No production data movement.
Just the plannerβs mental model… exported and replayed.
And thatβs usually all you need.
Have Fun!
