Export Optimizer Statistics with SQL Only (No cbo_stat_dump Required)

In a previous YugabyteDB Tip, we explored how to use cbo_stat_dump to recreate production optimizer behavior in a clean environment:

That utility is incredibly useful because it:

  • ● Extracts table DDL

  • ● Exports pg_class.reltuples (row count estimates)

  • ● Dumps pg_statistic (column statistics)

  • ● Generates an import_statistics.sql script

But sometimes customers:

  • ● Cannot download from GitHub (security restrictions)

  • ● Cannot install additional Linux packages

  • ● Prefer a pure SQL solution

Good news: we can generate a portable import_statistics.sql using SQL only.

What You’ll Learn

  • ● How to export optimizer statistics using only SQL

  • ● How to target specific tables (not the whole database)

  • ● How to generate a portable import_statistics.sql file

  • ● How to replay stats in a clean environment

Important

  • ● Reading pg_statistic typically requires elevated privileges.
  • ● Statistics may include sampled column values β€” treat exports as potentially sensitive.
  • ● This is a workaround when cbo_stat_dump cannot be used.

Step 1: Define the Tables You Want

Unlike cbo_stat_dump, which extracts tables from a query automatically, this SQL-only method assumes:

  • You already know which tables you need.

We explicitly define them using regclass so everything stays portable across clusters.

Example:

				
					WITH target_tables AS (
  SELECT unnest(ARRAY[
    'demo.customers'::regclass,
    'demo.orders'::regclass
  ]) AS rel
)
SELECT rel FROM target_tables;
				
			

You can modify that array to include exactly the tables involved in your problem query.

Step 2: Generate pg_class.reltuples Updates

The planner uses pg_class.reltuples for table cardinality estimates.

This generates portable update statements:

				
					-- Step 2: Generate pg_class UPDATE statements (targeted tables + indexes)
WITH target_tables AS (
  SELECT unnest(ARRAY[
    'demo.customers'::regclass,
    'demo.orders'::regclass
  ]) AS rel
),
target_rels AS (
  -- 1) the tables
  SELECT rel AS oid
  FROM target_tables

  UNION ALL

  -- 2) all indexes on those tables
  SELECT i.indexrelid AS oid
  FROM pg_index i
  JOIN target_tables tt ON tt.rel = i.indrelid
)
SELECT
  format(
    'UPDATE pg_class SET reltuples = %s WHERE oid = %L::regclass;',
    c.reltuples,
    n.nspname || '.' || c.relname
  ) AS sql
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN target_rels tr ON tr.oid = c.oid
WHERE n.nspname NOT IN ('pg_catalog','pg_toast','information_schema')
ORDER BY 1;
				
			

This ensures:

  • ● No OIDs are hard-coded

  • ● Statements remain portable

  • ● Indexes are included

Step 3: Generate pg_statistic INSERT Statements

This is the heart of the export.

We extract only statistics for the tables defined in target_tables.

				
					-- Step 3: Generate pg_statistic INSERT statements (targeted tables + indexes)
WITH target_tables AS (
  SELECT unnest(ARRAY[
    'demo.customers'::regclass,
    'demo.orders'::regclass
  ]) AS rel
),
target_rels AS (
  -- tables
  SELECT rel AS oid
  FROM target_tables

  UNION ALL

  -- indexes
  SELECT i.indexrelid AS oid
  FROM pg_index i
  JOIN target_tables tt ON tt.rel = i.indrelid
),
stats AS (
  SELECT
    n.nspname,
    c.relname,
    a.attnum,
    tn.nspname AS typnspname,
    t.typname,
    s.*
  FROM pg_class c
  JOIN pg_namespace n ON n.oid = c.relnamespace
  JOIN target_rels tr ON tr.oid = c.oid
  JOIN pg_attribute a ON a.attrelid = c.oid
                    AND a.attnum > 0
                    AND NOT a.attisdropped
  JOIN pg_statistic s ON s.starelid = c.oid
                     AND s.staattnum = a.attnum
  JOIN pg_type t ON t.oid = a.atttypid
  JOIN pg_namespace tn ON tn.oid = t.typnamespace
  WHERE n.nspname NOT IN ('pg_catalog','pg_toast','information_schema')
)
SELECT format($fmt$
INSERT INTO pg_statistic (
  starelid, staattnum, stainherit, stanullfrac, stawidth, stadistinct,
  stakind1, stakind2, stakind3, stakind4, stakind5,
  staop1, staop2, staop3, staop4, staop5,
  stanumbers1, stanumbers2, stanumbers3, stanumbers4, stanumbers5,
  stavalues1, stavalues2, stavalues3, stavalues4, stavalues5
) VALUES (
  %L::regclass, %s, %s, %s, %s, %s,
  %s, %s, %s, %s, %s,
  %s, %s, %s, %s, %s,
  %s, %s, %s, %s, %s,
  %s, %s, %s, %s, %s
);
$fmt$,
  (nspname || '.' || relname),
  staattnum,
  stainherit,
  stanullfrac,
  stawidth,
  stadistinct,

  stakind1, stakind2, stakind3, stakind4, stakind5,
  staop1, staop2, staop3, staop4, staop5,

  CASE WHEN stanumbers1 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers1) || '::real[]' END,
  CASE WHEN stanumbers2 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers2) || '::real[]' END,
  CASE WHEN stanumbers3 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers3) || '::real[]' END,
  CASE WHEN stanumbers4 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers4) || '::real[]' END,
  CASE WHEN stanumbers5 IS NULL THEN 'NULL' ELSE quote_literal(stanumbers5) || '::real[]' END,

  CASE WHEN stavalues1 IS NULL THEN 'NULL'
       ELSE quote_literal(stavalues1) || '::' ||
            quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
  END,
  CASE WHEN stavalues2 IS NULL THEN 'NULL'
       ELSE quote_literal(stavalues2) || '::' ||
            quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
  END,
  CASE WHEN stavalues3 IS NULL THEN 'NULL'
       ELSE quote_literal(stavalues3) || '::' ||
            quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
  END,
  CASE WHEN stavalues4 IS NULL THEN 'NULL'
       ELSE quote_literal(stavalues4) || '::' ||
            quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
  END,
  CASE WHEN stavalues5 IS NULL THEN 'NULL'
       ELSE quote_literal(stavalues5) || '::' ||
            quote_ident(typnspname) || '.' || quote_ident(typname) || '[]'
  END
) AS sql
FROM stats
ORDER BY nspname, relname, attnum;
				
			

Step 4: Save the Output

Create a file named generate_stats.sql and paste into it both generator sections:

  • ● the pg_class.reltuples UPDATE generator (Step 2), and

  • ● the pg_statistic INSERT generator (Step 3).

Together, these two sections will output the complete import_statistics.sql needed to recreate the optimizer’s full statistical model.

Run in ysqlsh and redirect:

				
					ysqlsh -h <host> -p 5433 -U <user> -d <db> -At -f generate_stats.sql > import_statistics.sql
				
			

Now you have:

  • ● UPDATE pg_class statements

  • ● INSERT INTO pg_statistic statements

Just like cbo_stat_dump generates.

Step 5: Replay in a Clean Environment

  • 1. Load table DDL

  • 2, Run import_statistics.sql

  • 3. Run EXPLAIN

  • 4. Compare plan shape

Key Insight

The optimizer doesn’t care about your data.

It cares about statistics.

If you can reproduce pg_class.reltuples and pg_statistic, you can usually reproduce the plan β€” without copying production data.

When Should You Still Use cbo_stat_dump?

If possible, the utility is still preferred because:

  • ● It automatically extracts tables from a query

  • ● Handles edge cases more cleanly

  • ● Supports extended statistics

But when tooling is restricted, SQL alone is enough.

Conclusion

This approach gives you a pure-SQL fallback for reproducing optimizer behavior.

No GitHub downloads.
No additional Linux packages.
No production data movement.

Just the planner’s mental model… exported and replayed.

And that’s usually all you need.

Have Fun!

A couple of new friends stopped by for breakfast on the back deck this morning!