Identifying YSQL Catalog Tables to Preload - Without Grepping Logs Across Nodes

When a new YSQL backend session executes a query for the first time, PostgreSQL must read metadata from system catalog tables

In distributed systems like YugabyteDB, those catalog lookups may involve remote RPCs before the catalog cache is populated. For workloads that open many short-lived connections (or during the first query on new connections), these catalog reads can add measurable latency.

YugabyteDB provides catalog preloading gFlags that allow commonly accessed catalog tables to be cached locally inside each backend process.

The challenge, however, is figuring out:

Which catalog tables should you preload?

The official documentation suggests enabling logging and examining the PostgreSQL logs to identify catalog cache misses.

But in a multi-node cluster, that can mean collecting logs from many nodes.

Fortunately, there is a much easier workflow.

💡 Key Insight

You do not need to search PostgreSQL logs across a distributed cluster to determine which catalog tables your queries access.

Instead, you can:

1 Launch a single-node YugabyteDB instance using yugabyted

2 Recreate the schema and indexes

3 Enable catalog cache debug logging

4 Pipe the session output directly into a shell command

This allows you to automatically generate the exact list of catalog tables to preload.

Why Catalog Cache Preloading Helps

Each new backend session must populate its local catalog cache. If an entry is not cached yet, PostgreSQL performs a lookup against the system catalogs.

In YugabyteDB that lookup can involve:

● RPC calls
● tablet reads
● additional metadata resolution

Preloading certain catalogs can remove this overhead.

Common candidates include:

Catalog Table	Why It Appears
`pg_cast`	Type casting between datatypes
`pg_operator`	Operator resolution used in expressions and predicates
`pg_opclass`	Index operator classes used by B-tree and other index types
`pg_amop`	Mapping between operators and index access methods
`pg_proc`	Function metadata for built-in and user-defined functions

Full documentation:

👉 Customize preloading of YSQL catalog caches

Step 1 – Start a Local YugabyteDB Node

Launch a temporary single-node YugabyteDB instance with yugabyted.

				
					./bin/yugabyted start \
  --advertise_address=127.0.0.1 \
  --base_dir=/tmp/yb-catcache-demo

This node does not need to match the production topology.

The only important things are:

● Same YugabyteDB version
● Same schema
● Same queries

Step 2 – Create a Schema That Exercises Catalog Tables

The example below intentionally includes:

● partitioning
● triggers
● functions
● indexes
● operator usage

All of these cause catalog lookups.

				
					CREATE DATABASE catcache_demo;
\c catcache_demo

CREATE SCHEMA app;

CREATE TABLE app.events (
  tenant_id   int NOT NULL,
  event_id    bigserial NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now(),
  payload     text,
  amount_text text,
  PRIMARY KEY (tenant_id, created_at, event_id)
) PARTITION BY RANGE (created_at);

CREATE TABLE app.events_2026_01
  PARTITION OF app.events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE app.events_2026_02
  PARTITION OF app.events
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Create some functions and triggers.

				
					CREATE OR REPLACE FUNCTION app.amount_as_numeric(t text)
RETURNS numeric
LANGUAGE sql
IMMUTABLE
AS $$
SELECT NULLIF(t,'')::numeric;
$$;

Add indexes.

				
					CREATE INDEX events_created_at_idx
ON app.events (created_at DESC);

CREATE INDEX events_payload_idx
ON app.events (payload);

CREATE INDEX events_amount_expr_idx
ON app.events (app.amount_as_numeric(amount_text));

Insert sample data.

				
					INSERT INTO app.events (tenant_id, created_at, payload, amount_text)
SELECT
1,
timestamp '2026-02-10' + (g * interval '1 minute'),
'payload',
CASE WHEN g % 5 = 0 THEN '12.34' ELSE '' END
FROM generate_series(1,5000) g;

ANALYZE app.events;

Step 3 – Enable Catalog Cache Debug Logging

Now enable debugging for catalog cache lookups.

				
					SET yb_debug_log_catcache_events = 1;
SET backtrace_functions = 'SearchCatCacheMiss';
SET client_min_messages = LOG;

These settings cause PostgreSQL to emit log lines when catalog entries are read.

Documentation:

👉 Manually collect logs for catalog reads

Step 4 – Automatically Extract Catalog Tables

We want to list the catalog tables that are needed for the following query:

				
					SELECT tenant_id, event_id, created_at
FROM app.events
WHERE tenant_id = 1
AND created_at >= '2026-02-10'
AND created_at < '2026-02-11'
AND app.amount_as_numeric(amount_text) > 10
ORDER BY created_at DESC
LIMIT 50;

Here is the trick.

Rather than manually inspecting PostgreSQL logs across nodes, we can reproduce the workload locally and extract the catalog tables directly from the session output.

The following shell command runs the query and pipes both stdout and stderr through a small shell pipeline that extracts the catalog tables referenced during planning.

				
					DEFAULT="pg_am|pg_amproc|pg_cast|pg_inherits|pg_policy|pg_proc|pg_tablespace|pg_trigger"

ysqlsh -d catcache_demo 2>&1 <<'SQL' \
| grep "Target rel:" \
| awk '{print $3}' \
| sort -u \
| awk -v re="$DEFAULT" 're=="" || $0 !~ re' \
| paste -sd, -
SET yb_debug_log_catcache_events = 1;
SET backtrace_functions = 'SearchCatCacheMiss';
SET client_min_messages = LOG;

EXPLAIN (ANALYZE, DIST)
SELECT tenant_id, event_id, created_at
FROM app.events
WHERE tenant_id = 1
AND created_at >= '2026-02-10'
AND created_at < '2026-02-11'
AND app.amount_as_numeric(amount_text) > 10
ORDER BY created_at DESC
LIMIT 50;
SQL

The DEFAULT variable defines a list of catalog tables that YugabyteDB already preloads automatically when the flag:

				
					ysql_catalog_preload_additional_tables=true

is enabled. Those default preloaded catalogs are:

● pg_am
● pg_amproc
● pg_cast
● pg_inherits
● pg_policy
● pg_proc
● pg_tablespace
● pg_trigger

This ensures the output only contains additional catalog tables referenced by the workload that may benefit from explicit preloading.

If you did not enable that flag, then set the DEFAULT variable to “”.

Example output:

				
					pg_attribute,pg_class,pg_index,pg_namespace,pg_opclass,pg_operator,pg_partitioned_table,pg_statistic,pg_type

These are the tables we want.

Note that if I run the script with the DEFAULT variable to “”, the output is:

				
					pg_am,pg_amop,pg_amproc,pg_attribute,pg_cast,pg_class,pg_index,pg_namespace,pg_opclass,pg_operator,pg_partitioned_table,pg_proc,pg_statistic,pg_type

Step 5 – Configure Additional Catalog Tables to Preload

The list produced in the previous step represents catalog tables accessed by the query planner that are not already covered by the default preload list.

If desired, these tables can be supplied to the YugabyteDB TServer configuration using:

				
					--ysql_catalog_preload_additional_table_list=pg_amop,pg_attribute,pg_class,pg_index,pg_namespace,pg_opclass,pg_operator,pg_partitioned_table,pg_statistic,pg_type

Caveats

⚠️ Use catalog preloading carefully

Preloading additional catalog tables can improve query planning latency, but adding too many tables may have unintended side effects. Keep the following considerations in mind:

● Memory usage per backend increases. Every backend session maintains its own catalog cache, so preloading many catalog tables increases memory consumption per session.
● Backend startup time can increase. Preloading requires catalog entries to be loaded when a backend starts. Large preload lists may slightly increase connection startup time.
● Schema changes may invalidate cached entries. DDL operations can invalidate catalog cache entries, which means the cache must be repopulated.
● Only preload catalogs that appear frequently in your workload. The goal is to eliminate repeated catalog lookups for commonly used metadata, not to preload every system catalog.

Conclusion

Catalog cache tuning is a simple but powerful technique for reducing first-query planning latency in YugabyteDB.

In traditional PostgreSQL environments, catalog reads occur locally and are usually inexpensive. In a distributed system like YugabyteDB, however, catalog lookups may require RPC calls to system catalog tablets, which can introduce additional latency during query planning.

By recreating your schema on a lightweight single-node yugabyted instance and enabling catalog cache debug logging, you can easily identify which catalog tables are accessed during query planning.

Using a small shell pipeline (grep, awk, sort, and paste), the catalog tables referenced by your query workload can be extracted automatically, without searching logs across multiple nodes.

This approach turns catalog cache tuning into a repeatable, automated workflow that helps ensure YugabyteDB backends start with the metadata they need already cached, reducing planning overhead and improving query responsiveness for real-world workloads.

Have Fun!