In a previous YugabyteDB Tip, Replicating a Small Dimension Table for Local Reads in Every Region, we looked at how to use duplicate covering indexes to make a small dimension table read locally from every region.
That pattern works great when the cluster topology is known ahead of time.
But what happens when your cluster changes?
For example, maybe you start with three regions:
[root@localhost yb]# ysqlsh -h 127.0.0.1 -c "SELECT host, cloud, region, zone FROM yb_servers();"
host | cloud | region | zone
-----------+-------+-----------+------------
127.0.0.3 | aws | us-west-2 | us-west-2a
127.0.0.2 | aws | us-east-2 | us-east-2a
127.0.0.1 | aws | us-east-1 | us-east-1a
(3 rows)
Then later, you add a node in a brand-new region:
[root@localhost yb]# yugabyted start \
--advertise_address=127.0.0.4 \
--cloud_location=aws.us-west-1.us-west-1a \
--join=127.0.0.1 \
--base_dir=~/yb04 > start4.log
Now the cluster has four regions:
[root@localhost yb]# ysqlsh -h 127.0.0.1 -c "SELECT host, cloud, region, zone FROM yb_servers();"
host | cloud | region | zone
-----------+-------+-----------+------------
127.0.0.3 | aws | us-west-2 | us-west-2a
127.0.0.2 | aws | us-east-2 | us-east-2a
127.0.0.1 | aws | us-east-1 | us-east-1a
127.0.0.4 | aws | us-west-1 | us-west-1a
(4 rows)
If your application in us-west-1 needs the same low-latency local reads, you need to create a new tablespace and duplicate covering index for that region.
But you probably do not want to write a custom stored procedure for every dimension table.
A better pattern is to make the automation configuration-driven.
You define which dimension tables should get regional duplicate covering indexes in a metadata table. Then one generic procedure can discover regions, loop through the configured dimension tables, and queue the required DDL.
CREATE TABLESPACE cannot run inside a stored procedure because it cannot run inside a transaction block. For this reason, the stored procedure should detect missing regions and queue the required DDL, while a separate top-level process executes the generated CREATE TABLESPACE and CREATE INDEX statements. The Automation Pattern
The automation pattern looks like this:
| Step | What Happens | Where It Runs |
|---|---|---|
| Configure tables | Store the list of dimension tables, key columns, INCLUDE columns, index name pattern, and replication factor. | Configuration table |
| Detect regions | Query yb_servers() for distinct cloud and region values. | PL/pgSQL procedure |
| Compare state | Compare configured tables and discovered regions against catalog objects and the registry table. | PL/pgSQL procedure |
| Queue DDL | Insert required CREATE TABLESPACE and CREATE INDEX commands into a queue table. | PL/pgSQL procedure |
| Execute DDL | Run queued DDL as top-level SQL statements. | Manual execution, shell cron, CI/CD, or operations workflow |
| Record result | Mark each queued command as done or error. | External execution process |
CREATE TABLESPACE and CREATE INDEX commands. Track Regions, Not Nodes
This automation should track regions, not individual nodes.
If you add another node to us-east-1, you do not need a new duplicate covering index.
But if you add a new region, such as us-west-1, then you probably do want a new regional index so applications in that region can read locally.
Existing Dimension Table
For this demo, the dimension table is public.store_dimension.
\d public.store_dimension
Example output:
. Table "public.store_dimension"
Column | Type | Collation | Nullable | Default
--------------+---------+-----------+----------+---------
store_id | integer | | not null |
store_name | text | | not null |
region_code | text | | not null |
manager_name | text | | not null |
store_type | text | | not null |
Indexes:
"store_dimension_pkey" PRIMARY KEY, lsm (store_id HASH), tablespace "ts_dim_global"
"store_dimension_east_1_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_1"
"store_dimension_east_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_2"
"store_dimension_west_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_west_2"
Tablespace: "ts_dim_global"
The automation should preserve this naming convention.
For example:
us-east-1 -> east_1 -> ts_dim_east_1 -> store_dimension_east_1_idx
us-east-2 -> east_2 -> ts_dim_east_2 -> store_dimension_east_2_idx
us-west-2 -> west_2 -> ts_dim_west_2 -> store_dimension_west_2_idx
us-west-1 -> west_1 -> ts_dim_west_1 -> store_dimension_west_1_idx
Use a Configuration Table
Instead of hardcoding one specific table into the procedure, create a configuration table.
Each row represents one dimension or reference table that should receive regional duplicate covering indexes.
CREATE SCHEMA IF NOT EXISTS yb_tip;
CREATE TABLE IF NOT EXISTS yb_tip.dimension_index_config (
config_id bigserial PRIMARY KEY,
table_schema text NOT NULL DEFAULT 'public',
table_name text NOT NULL,
index_name_prefix text NOT NULL,
index_name_suffix text NOT NULL DEFAULT '_idx',
hash_key_columns text[] NOT NULL,
include_columns text[] NOT NULL DEFAULT ARRAY[]::text[],
unique_index boolean NOT NULL DEFAULT true,
replication_factor int NOT NULL DEFAULT 3,
enabled boolean NOT NULL DEFAULT true,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
CONSTRAINT dimension_index_config_rf_chk
CHECK (replication_factor > 0),
CONSTRAINT dimension_index_config_hash_key_columns_chk
CHECK (array_length(hash_key_columns, 1) > 0)
);
This table stores structured metadata, not full DDL.
That makes the procedure easier to validate and safer to use.
Add a Dimension Table to the Configuration
For this demo, assume the dimension table is named public.store_dimension.
The regional covering index should be created on store_id HASH and include several descriptive columns.
INSERT INTO yb_tip.dimension_index_config (
table_schema,
table_name,
index_name_prefix,
index_name_suffix,
hash_key_columns,
include_columns,
unique_index,
replication_factor,
enabled
)
VALUES (
'public',
'store_dimension',
'store_dimension',
'_idx',
ARRAY['store_id'],
ARRAY['store_name', 'region_code', 'manager_name', 'store_type'],
true,
3,
true
);
For us-west-1, this configuration would generate an index name like this:
store_dimension_west_1_idx
And the generated index DDL would look like this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
CREATE UNIQUE INDEX store_dimension_west_1_idx
ON public.store_dimension (store_id HASH)
INCLUDE (store_name, region_code, manager_name, store_type)
TABLESPACE ts_dim_west_1;
store_dimension_east_1_idx, so the automation normalizes us-east-1 to east_1. If the config references columns that do not exist, the generated CREATE INDEX statement will fail. Validate the Configuration
Because the procedure generates index DDL from the configuration table, validate the configured columns before queueing or executing DDL.
WITH configured_columns AS (
SELECT
c.config_id,
c.table_schema,
c.table_name,
unnest(c.hash_key_columns || c.include_columns) AS column_name
FROM yb_tip.dimension_index_config c
WHERE c.enabled = true
)
SELECT
cc.config_id,
cc.table_schema,
cc.table_name,
cc.column_name AS missing_column
FROM configured_columns cc
WHERE NOT EXISTS (
SELECT 1
FROM information_schema.columns ic
WHERE ic.table_schema = cc.table_schema
AND ic.table_name = cc.table_name
AND ic.column_name = cc.column_name
)
ORDER BY cc.config_id, cc.column_name;
Expected results:
.config_id | table_schema | table_name | missing_column
-----------+--------------+------------+----------------
(0 rows)
Create the Registry Table
Next, create a registry table.
This table tracks which configured table and region combinations have been discovered.
CREATE TABLE IF NOT EXISTS yb_tip.dimension_index_ddl_queue (
queue_id bigserial PRIMARY KEY,
config_id bigint REFERENCES yb_tip.dimension_index_config(config_id),
cloud text NOT NULL,
region text NOT NULL,
table_schema text,
table_name text,
tablespace_name text NOT NULL,
index_name text,
ddl_type text NOT NULL,
ddl_command text NOT NULL,
status text NOT NULL DEFAULT 'pending',
created_at timestamptz NOT NULL DEFAULT now(),
executed_at timestamptz,
error_message text,
CONSTRAINT dimension_index_ddl_queue_status_chk
CHECK (status IN ('pending', 'done', 'error')),
CONSTRAINT dimension_index_ddl_queue_type_chk
CHECK (ddl_type IN ('create_tablespace', 'create_index'))
);
The registry gives the automation memory and makes the generated object names easy to audit.
Create the DDL Queue Table
Next, create a queue table.
The procedure will insert generated DDL into this table instead of running the DDL directly.
CREATE TABLE IF NOT EXISTS yb_tip.dimension_index_region_registry (
config_id bigint NOT NULL REFERENCES yb_tip.dimension_index_config(config_id),
cloud text NOT NULL,
region text NOT NULL,
table_schema text NOT NULL,
table_name text NOT NULL,
tablespace_name text NOT NULL,
index_name text NOT NULL,
replica_regions jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
last_seen_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (config_id, cloud, region)
);
Notice that config_id, table_schema, table_name, and index_name can be NULL for create_tablespace commands.
That is because a regional tablespace can be shared by multiple configured dimension tables.
Create the Queueing Procedure
Now create the procedure that discovers regions and queues any missing DDL.
This procedure loops through:
- ● each enabled row in
yb_tip.dimension_index_config - ● each distinct region from
yb_servers()
For each configured table and region, it queues missing tablespace and index DDL.
CREATE OR REPLACE PROCEDURE yb_tip.queue_missing_dimension_local_indexes()
LANGUAGE plpgsql
AS $$
DECLARE
v_cfg record;
v_target record;
v_region_token text;
v_tablespace_name text;
v_index_name text;
v_placement_blocks jsonb;
v_replica_regions jsonb;
v_replica_placement text;
v_tablespace_exists boolean;
v_index_exists boolean;
v_table_exists regclass;
v_key_columns_sql text;
v_include_columns_sql text;
v_include_clause text;
v_index_verb text;
v_create_tablespace_ddl text;
v_create_index_ddl text;
BEGIN
/*
* Prevent overlapping queue-generation runs.
*/
IF NOT pg_try_advisory_lock(hashtext('yb_tip.queue_missing_dimension_local_indexes')) THEN
RAISE NOTICE 'Another queue-generation run is already active. Skipping.';
RETURN;
END IF;
/*
* Loop through each enabled dimension table configuration.
*/
FOR v_cfg IN
SELECT *
FROM yb_tip.dimension_index_config
WHERE enabled = true
ORDER BY config_id
LOOP
/*
* Make sure the configured table exists.
*/
SELECT to_regclass(format('%I.%I', v_cfg.table_schema, v_cfg.table_name))
INTO v_table_exists;
IF v_table_exists IS NULL THEN
RAISE NOTICE 'Configured table %.% does not exist. Skipping config_id %.',
v_cfg.table_schema,
v_cfg.table_name,
v_cfg.config_id;
CONTINUE;
END IF;
/*
* Validate that all configured key and INCLUDE columns exist.
*/
IF EXISTS (
WITH configured_columns AS (
SELECT unnest(v_cfg.hash_key_columns || v_cfg.include_columns) AS column_name
)
SELECT 1
FROM configured_columns cc
WHERE NOT EXISTS (
SELECT 1
FROM information_schema.columns ic
WHERE ic.table_schema = v_cfg.table_schema
AND ic.table_name = v_cfg.table_name
AND ic.column_name = cc.column_name
)
) THEN
RAISE NOTICE 'Configured columns for %.% are invalid. Skipping config_id %.',
v_cfg.table_schema,
v_cfg.table_name,
v_cfg.config_id;
CONTINUE;
END IF;
/*
* Build the index key column list.
* Example: ARRAY['store_id'] -> store_id HASH
*/
SELECT string_agg(format('%I HASH', col), ', ')
INTO v_key_columns_sql
FROM unnest(v_cfg.hash_key_columns) AS col;
/*
* Build the INCLUDE column list.
*/
SELECT string_agg(format('%I', col), ', ')
INTO v_include_columns_sql
FROM unnest(v_cfg.include_columns) AS col;
v_include_clause :=
CASE
WHEN v_include_columns_sql IS NULL OR length(v_include_columns_sql) = 0 THEN ''
ELSE format(' INCLUDE (%s)', v_include_columns_sql)
END;
v_index_verb :=
CASE
WHEN v_cfg.unique_index THEN 'CREATE UNIQUE INDEX'
ELSE 'CREATE INDEX'
END;
/*
* Loop through each distinct cloud + region currently visible
* in the cluster topology.
*/
FOR v_target IN
SELECT DISTINCT cloud, region
FROM yb_servers()
WHERE cloud IS NOT NULL
AND region IS NOT NULL
ORDER BY cloud, region
LOOP
/*
* Normalize region names to match the naming convention from the previous tip.
*
* Examples:
* us-east-1 -> east_1
* us-east-2 -> east_2
* us-west-2 -> west_2
* us-west-1 -> west_1
*/
v_region_token :=
regexp_replace(lower(v_target.region), '^us-', '');
v_region_token :=
regexp_replace(v_region_token, '[^a-z0-9]+', '_', 'g');
/*
* Use one regional tablespace per region. Multiple configured
* dimension tables can share the same regional tablespace.
*/
v_tablespace_name := 'ts_dim_' || v_region_token;
/*
* Use the configured prefix and suffix to build the index name.
* Example: store_dimension + west_1 + _idx
*/
v_index_name :=
v_cfg.index_name_prefix || '_' || v_region_token || v_cfg.index_name_suffix;
/*
* PostgreSQL-compatible identifiers are limited to 63 bytes.
* If the generated index name is too long, shorten it and add a hash.
*/
IF length(v_index_name) > 63 THEN
v_index_name :=
left(v_cfg.index_name_prefix || '_' || v_region_token, 50)
|| '_'
|| substr(md5(v_cfg.config_id::text || '.' || v_target.cloud || '.' || v_target.region), 1, 8);
END IF;
/*
* Pick the target region first, then enough peer regions to satisfy
* the configured replication factor.
*
* The target region receives leader_preference = 1 so reads from that
* region can prefer the local index leader.
*/
WITH discovered_regions AS (
SELECT DISTINCT cloud, region
FROM yb_servers()
WHERE cloud IS NOT NULL
AND region IS NOT NULL
),
selected_regions AS (
SELECT cloud, region
FROM discovered_regions
ORDER BY
CASE
WHEN cloud = v_target.cloud AND region = v_target.region THEN 0
ELSE 1
END,
cloud,
region
LIMIT v_cfg.replication_factor
)
SELECT
jsonb_agg(
CASE
WHEN cloud = v_target.cloud AND region = v_target.region THEN
jsonb_build_object(
'cloud', cloud,
'region', region,
'zone', '*',
'min_num_replicas', 1,
'leader_preference', 1
)
ELSE
jsonb_build_object(
'cloud', cloud,
'region', region,
'zone', '*',
'min_num_replicas', 1
)
END
ORDER BY
CASE
WHEN cloud = v_target.cloud AND region = v_target.region THEN 0
ELSE 1
END,
cloud,
region
),
jsonb_agg(
jsonb_build_object('cloud', cloud, 'region', region)
ORDER BY
CASE
WHEN cloud = v_target.cloud AND region = v_target.region THEN 0
ELSE 1
END,
cloud,
region
)
INTO v_placement_blocks, v_replica_regions
FROM selected_regions;
IF jsonb_array_length(v_placement_blocks) < v_cfg.replication_factor THEN
RAISE NOTICE 'Not enough regions to satisfy RF %. Skipping config_id %, region %.',
v_cfg.replication_factor,
v_cfg.config_id,
v_target.region;
CONTINUE;
END IF;
v_replica_placement :=
jsonb_build_object(
'num_replicas', v_cfg.replication_factor,
'placement_blocks', v_placement_blocks
)::text;
/*
* Check whether the regional tablespace already exists.
*/
SELECT EXISTS (
SELECT 1
FROM pg_tablespace
WHERE spcname = v_tablespace_name
)
INTO v_tablespace_exists;
/*
* Check whether the regional duplicate covering index already exists.
*/
SELECT EXISTS (
SELECT 1
FROM pg_class c
JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE n.nspname = v_cfg.table_schema
AND c.relname = v_index_name
AND c.relkind = 'i'
)
INTO v_index_exists;
/*
* Generate one-line DDL commands.
* Keeping them one line makes them easier for external automation to consume.
*/
v_create_tablespace_ddl := format(
'CREATE TABLESPACE %I WITH (replica_placement = %L);',
v_tablespace_name,
v_replica_placement
);
v_create_index_ddl := format(
'%s %I ON %I.%I (%s)%s TABLESPACE %I;',
v_index_verb,
v_index_name,
v_cfg.table_schema,
v_cfg.table_name,
v_key_columns_sql,
v_include_clause,
v_tablespace_name
);
/*
* Queue CREATE TABLESPACE if needed.
*
* The tablespace is per region, not per table. Therefore, only one
* pending or errored CREATE TABLESPACE command should exist for a region.
*/
IF NOT v_tablespace_exists THEN
INSERT INTO yb_tip.dimension_index_ddl_queue (
config_id,
cloud,
region,
table_schema,
table_name,
tablespace_name,
index_name,
ddl_type,
ddl_command
)
SELECT
NULL,
v_target.cloud,
v_target.region,
NULL,
NULL,
v_tablespace_name,
NULL,
'create_tablespace',
v_create_tablespace_ddl
WHERE NOT EXISTS (
SELECT 1
FROM yb_tip.dimension_index_ddl_queue
WHERE cloud = v_target.cloud
AND region = v_target.region
AND ddl_type = 'create_tablespace'
AND status IN ('pending', 'error')
);
END IF;
/*
* Queue CREATE INDEX if needed.
*/
IF NOT v_index_exists THEN
INSERT INTO yb_tip.dimension_index_ddl_queue (
config_id,
cloud,
region,
table_schema,
table_name,
tablespace_name,
index_name,
ddl_type,
ddl_command
)
SELECT
v_cfg.config_id,
v_target.cloud,
v_target.region,
v_cfg.table_schema,
v_cfg.table_name,
v_tablespace_name,
v_index_name,
'create_index',
v_create_index_ddl
WHERE NOT EXISTS (
SELECT 1
FROM yb_tip.dimension_index_ddl_queue
WHERE config_id = v_cfg.config_id
AND cloud = v_target.cloud
AND region = v_target.region
AND ddl_type = 'create_index'
AND status IN ('pending', 'error')
);
END IF;
/*
* Record the discovered table + region combination and expected object names.
*/
INSERT INTO yb_tip.dimension_index_region_registry (
config_id,
cloud,
region,
table_schema,
table_name,
tablespace_name,
index_name,
replica_regions,
created_at,
last_seen_at
)
VALUES (
v_cfg.config_id,
v_target.cloud,
v_target.region,
v_cfg.table_schema,
v_cfg.table_name,
v_tablespace_name,
v_index_name,
v_replica_regions,
now(),
now()
)
ON CONFLICT (config_id, cloud, region)
DO UPDATE SET
table_schema = EXCLUDED.table_schema,
table_name = EXCLUDED.table_name,
tablespace_name = EXCLUDED.tablespace_name,
index_name = EXCLUDED.index_name,
replica_regions = EXCLUDED.replica_regions,
last_seen_at = now();
END LOOP;
END LOOP;
PERFORM pg_advisory_unlock(hashtext('yb_tip.queue_missing_dimension_local_indexes'));
EXCEPTION
WHEN OTHERS THEN
PERFORM pg_advisory_unlock(hashtext('yb_tip.queue_missing_dimension_local_indexes'));
RAISE;
END;
$$;
Run the Queueing Procedure Manually
Before scheduling anything, run the procedure manually:
CALL yb_tip.ensure_store_dimension_local_indexes();
Then check the registry:
SELECT
r.config_id,
r.cloud,
r.region,
r.table_name,
r.tablespace_name,
r.index_name
FROM yb_tip.dimension_index_region_registry r
ORDER BY r.config_id, r.cloud, r.region;
Example output:
| config_id | cloud | region | table_name | tablespace_name | index_name |
|---|---|---|---|---|---|
| 1 | aws | us-east-1 | store_dimension | ts_dim_east_1 | store_dimension_east_1_idx |
| 1 | aws | us-east-2 | store_dimension | ts_dim_east_2 | store_dimension_east_2_idx |
| 1 | aws | us-west-2 | store_dimension | ts_dim_west_2 | store_dimension_west_2_idx |
Then check the DDL queue:
SELECT
queue_id,
config_id,
cloud,
region,
table_name,
ddl_type,
status,
ddl_command
FROM yb_tip.dimension_index_ddl_queue
ORDER BY
CASE ddl_type
WHEN 'create_tablespace' THEN 1
WHEN 'create_index' THEN 2
END,
queue_id;
If the regional tablespaces and indexes already exist, the queue may be empty. If objects are missing, the procedure queues the required DDL.
Enable pg_cron for Queue Generation
You can use pg_cron to run the detection and queueing procedure.
pg_cron must be enabled at the YugabyteDB process level. pg_cron enabled:
yugabyted start \
--advertise_address=127.0.0.1 \
--cloud_location=aws.us-east-1.us-east-1a \
--base_dir=~/yb01 \
--master_flags "enable_pg_cron=true" \
--tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" > start1.log
sleep 4
yugabyted start \
--advertise_address=127.0.0.2 \
--cloud_location=aws.us-east-2.us-east-2a \
--join=127.0.0.1 \
--base_dir=~/yb02 \
--master_flags "enable_pg_cron=true" \
--tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" > start2.log
yugabyted start \
--advertise_address=127.0.0.3 \
--cloud_location=aws.us-west-2.us-west-2a \
--join=127.0.0.1 \
--base_dir=~/yb03 \
--master_flags "enable_pg_cron=true" \
--tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" > start3.log
Then connect to the yugabyte database and create the extension:
CREATE EXTENSION IF NOT EXISTS pg_cron;
pg_cron must be enabled before creating the extension. Set enable_pg_cron=true and set ysql_cron_database_name if you want to control which database owns the pg_cron metadata. In this demo, ysql_cron_database_name=yugabyte, so the extension is created in the yugabyte database.
Schedule the queueing procedure to run once per hour:
SELECT cron.schedule(
'queue-dimension-local-index-ddl',
'0 * * * *',
$$CALL yb_tip.queue_missing_dimension_local_indexes();$$
);
Then verify:
SELECT jobid, schedule, command, active
FROM cron.job
ORDER BY jobid;
Check recent run history:
SELECT
jobid,
status,
start_time,
end_time,
return_message
FROM cron.job_run_details
ORDER BY start_time DESC
LIMIT 10;
pg_cron is responsible for detecting topology changes and queueing DDL. The actual DDL execution still happens as top-level SQL through a separate worker or controlled operations process. Execute the Queued DDL Manually
For a demo, you can pull the pending DDL and execute it manually.
SELECT ddl_command
FROM yb_tip.dimension_index_ddl_queue
WHERE status = 'pending'
ORDER BY
CASE ddl_type
WHEN 'create_tablespace' THEN 1
WHEN 'create_index' THEN 2
END,
queue_id;
Run the CREATE TABLESPACE statements first.
Then run the CREATE INDEX statements.
After each command succeeds, mark it as done:
UPDATE yb_tip.dimension_index_ddl_queue
SET status = 'done',
executed_at = now(),
error_message = NULL
WHERE queue_id = ;
Automate DDL Execution with an External Worker
For a local demo, it is fine to run the DDL worker from the same machine where the demo cluster is running.
For production, though, avoid tying the worker to a single YugabyteDB node.
In a distributed system, the DDL worker should be treated as part of your operations layer. It should connect to YugabyteDB through a stable YSQL endpoint, such as:
- ● a load balancer
- ● a Kubernetes Service
- ● a DNS name
- ● a Smart Driver endpoint
- ● a CI/CD or automation runner with access to the cluster
For example, in a demo you might use:
YSQLSH="ysqlsh -h 127.0.0.1 -U yugabyte -d yugabyte"
But in production, use something more like:
YSQLSH="ysqlsh -h ysql-lb.example.com -U yugabyte -d yugabyte"
Where Should the Worker Run?
The worker can run from several places:
| Option | How It Works | Failure Behavior |
|---|---|---|
| Kubernetes CronJob | Runs the worker as a short-lived pod that connects through a YSQL Service. | If a Kubernetes worker node fails, the next run can be scheduled elsewhere. |
| CI/CD or automation runner | Runs from Jenkins, GitHub Actions, GitLab, Argo, Rundeck, or another automation platform. | Availability depends on the automation platform, not on a single database node. |
| Multiple active workers | More than one worker can run the same logic, as long as the queue uses locking or safe status transitions. | If one worker fails, another worker can continue processing pending DDL. |
| Database-only pg_cron dispatcher | Uses pg_cron to schedule queued DDL from inside the database. | Keeps more of the workflow inside YugabyteDB, but adds more complexity. |
Make the Queue Idempotent
The most important part of the design is that the DDL queue must be safe to retry.
- 1. Worker picks up a pending
CREATE TABLESPACEcommand. - 2. Worker executes the DDL successfully.
- 3. Worker fails before marking the queue row as done.
When the next worker starts, it may see the same queue row still marked as pending.
That is why the worker should check the catalog before executing DDL.
The logic should be:
For each pending queue row:
. Check whether the target object already exists.
. If it exists, mark the queue row as done.
. If it does not exist, execute the DDL.
. If the DDL succeeds, mark the queue row as done.
. If the DDL fails, mark the queue row as error.
This makes the process resumable.
A failed worker does not corrupt the queue. Another worker can safely pick up where it left off.
Example External Worker
Here is a simple worker pattern.
For a demo, 127.0.0.1 is fine. For production, replace it with a stable YSQL endpoint.
#!/usr/bin/env bash
set -uo pipefail
YSQLSH="ysqlsh -h 127.0.0.1 -U yugabyte -d yugabyte"
LOCK_FILE="/tmp/yb_tip_dimension_index_ddl.lock"
exec 200>"$LOCK_FILE"
flock -n 200 || {
echo "Another local DDL worker is already running. Exiting."
exit 0
}
# Ask the database to discover regions and queue any missing DDL.
$YSQLSH -v ON_ERROR_STOP=1 -c "CALL yb_tip.queue_missing_dimension_local_indexes();" || exit 1
# Fetch only the queue ids first.
# This avoids shell parsing problems with nullable columns or long DDL text.
mapfile -t queue_ids < <($YSQLSH -At -c "
SELECT queue_id
FROM yb_tip.dimension_index_ddl_queue
WHERE status = 'pending'
ORDER BY
CASE ddl_type
WHEN 'create_tablespace' THEN 1
WHEN 'create_index' THEN 2
END,
queue_id;
")
for queue_id in "${queue_ids[@]}"; do
ddl_type=$($YSQLSH -At -c "
SELECT ddl_type
FROM yb_tip.dimension_index_ddl_queue
WHERE queue_id = ${queue_id};
")
tablespace_name=$($YSQLSH -At -c "
SELECT COALESCE(tablespace_name, '')
FROM yb_tip.dimension_index_ddl_queue
WHERE queue_id = ${queue_id};
")
table_schema=$($YSQLSH -At -c "
SELECT COALESCE(table_schema, '')
FROM yb_tip.dimension_index_ddl_queue
WHERE queue_id = ${queue_id};
")
index_name=$($YSQLSH -At -c "
SELECT COALESCE(index_name, '')
FROM yb_tip.dimension_index_ddl_queue
WHERE queue_id = ${queue_id};
")
ddl_command=$($YSQLSH -At -c "
SELECT ddl_command
FROM yb_tip.dimension_index_ddl_queue
WHERE queue_id = ${queue_id};
")
echo "Processing queue_id=${queue_id}, ddl_type=${ddl_type}"
if [[ -z "$ddl_command" ]]; then
echo "No DDL command found for queue_id=${queue_id}. Marking error."
$YSQLSH -v ON_ERROR_STOP=1 -c "
UPDATE yb_tip.dimension_index_ddl_queue
SET status = 'error',
executed_at = now(),
error_message = 'DDL command was empty.'
WHERE queue_id = ${queue_id};
"
continue
fi
object_exists="f"
if [[ "$ddl_type" == "create_tablespace" ]]; then
object_exists=$($YSQLSH -At -c "
SELECT EXISTS (
SELECT 1
FROM pg_tablespace
WHERE spcname = '${tablespace_name}'
);
")
fi
if [[ "$ddl_type" == "create_index" ]]; then
object_exists=$($YSQLSH -At -c "
SELECT EXISTS (
SELECT 1
FROM pg_class c
JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE c.relname = '${index_name}'
AND n.nspname = '${table_schema}'
AND c.relkind = 'i'
);
")
fi
if [[ "$object_exists" == "t" ]]; then
echo "Target object already exists. Marking queue_id=${queue_id} as done."
$YSQLSH -v ON_ERROR_STOP=1 -c "
UPDATE yb_tip.dimension_index_ddl_queue
SET status = 'done',
executed_at = now(),
error_message = NULL
WHERE queue_id = ${queue_id};
"
continue
fi
echo "Executing: ${ddl_command}"
if $YSQLSH -v ON_ERROR_STOP=1 -c "$ddl_command"; then
$YSQLSH -v ON_ERROR_STOP=1 -c "
UPDATE yb_tip.dimension_index_ddl_queue
SET status = 'done',
executed_at = now(),
error_message = NULL
WHERE queue_id = ${queue_id};
"
else
$YSQLSH -v ON_ERROR_STOP=1 -c "
UPDATE yb_tip.dimension_index_ddl_queue
SET status = 'error',
executed_at = now(),
error_message = 'DDL execution failed. Check external worker logs.'
WHERE queue_id = ${queue_id};
"
fi
done
Schedule it with operating system cron:
0 * * * * /usr/local/bin/yb_tip_dimension_index_ddl_worker.sh >> /var/log/yb_tip_dimension_index_ddl_worker.log 2>&1
Add a New Region
Now add a new node in a new region:
yugabyted start \
--advertise_address=127.0.0.4 \
--cloud_location=aws.us-west-1.us-west-1a \
--join=127.0.0.1 \
--master_flags "enable_pg_cron=true" \
--tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" \
--base_dir=~/yb04 > start4.log
Confirm the new region appears:
SELECT host, cloud, region, zone
FROM yb_servers()
ORDER BY cloud, region, zone;
Expected output:
. host | cloud | region | zone
-----------+-------+-----------+------------
127.0.0.1 | aws | us-east-1 | us-east-1a
127.0.0.2 | aws | us-east-2 | us-east-2a
127.0.0.4 | aws | us-west-1 | us-west-1a
127.0.0.3 | aws | us-west-2 | us-west-2a
(4 rows)
Run the queueing procedure again:
CALL yb_tip.queue_missing_dimension_local_indexes();
Now check the registry:
SELECT
r.config_id,
r.cloud,
r.region,
r.table_schema,
r.table_name,
r.tablespace_name,
r.index_name,
r.last_seen_at
FROM yb_tip.dimension_index_region_registry r
ORDER BY r.config_id, r.cloud, r.region;
You should see the new us-west-1 region for each configured dimension table:
| config_id | cloud | region | table_name | tablespace_name | index_name |
|---|---|---|---|---|---|
| 1 | aws | us-east-1 | store_dimension | ts_dim_east_1 | store_dimension_east_1_idx |
| 1 | aws | us-east-2 | store_dimension | ts_dim_east_2 | store_dimension_east_2_idx |
| 1 | aws | us-west-1 | store_dimension | ts_dim_west_1 | store_dimension_west_1_idx |
| 1 | aws | us-west-2 | store_dimension | ts_dim_west_2 | store_dimension_west_2_idx |
Then check the DDL queue for the new region:
SELECT
queue_id,
config_id,
cloud,
region,
table_name,
ddl_type,
status,
ddl_command
FROM yb_tip.dimension_index_ddl_queue
WHERE region = 'us-west-1'
ORDER BY
CASE ddl_type
WHEN 'create_tablespace' THEN 1
WHEN 'create_index' THEN 2
END,
queue_id;
You should see new pending DDL for the new region if the objects do not already exist.
Example:
| queue_id | config_id | region | table_name | ddl_type | status | ddl_command |
|---|---|---|---|---|---|---|
| 1 | NULL | us-west-1 | NULL | create_tablespace | pending | CREATE TABLESPACE ts_dim_west_1 WITH (…); |
| 2 | 1 | us-west-1 | store_dimension | create_index | pending | CREATE UNIQUE INDEX store_dimension_west_1_idx …; |
We can wait for the shell script to be exected via the cron job, or we can simply execute it manually.
[root@localhost ~]# ./yb_tip_dimension_index_ddl_worker.sh
CALL
Processing queue_id=1, ddl_type=create_tablespace
Executing: CREATE TABLESPACE ts_dim_west_1 WITH (...);
CREATE TABLESPACE
UPDATE 1
Processing queue_id=2, ddl_type=create_index
Executing: CREATE UNIQUE INDEX store_dimension_west_1_idx ON public.store_dimension (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type) TABLESPACE ts_dim_west_1;
CREATE INDEX
UPDATE 1
Now validate the final objects:
SELECT spcname
FROM pg_tablespace
WHERE spcname LIKE 'ts_dim_%'
ORDER BY spcname;
Expected:
. spcname
---------------
ts_dim_east_1
ts_dim_east_2
ts_dim_global
ts_dim_west_1
ts_dim_west_2
(5 rows)
And validate the indexes:
SELECT
schemaname,
tablename,
indexname,
tablespace
FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'store_dimension'
AND indexname LIKE 'store_dimension_%_idx'
ORDER BY indexname;
Expected:
.schemaname | tablename | indexname | tablespace
------------+-----------------+-----------------------------+---------------
public | store_dimension | store_dimension_east_1_idx | ts_dim_east_1
public | store_dimension | store_dimension_east_2_idx | ts_dim_east_2
public | store_dimension | store_dimension_west_1_idx | ts_dim_west_1
public | store_dimension | store_dimension_west_2_idx | ts_dim_west_2
We can also verify the objects by describing the table:
yugabyte=# \d public.store_dimension;
Table "public.store_dimension"
Column | Type | Collation | Nullable | Default
--------------+---------+-----------+----------+---------
store_id | integer | | not null |
store_name | text | | not null |
region_code | text | | not null |
manager_name | text | | not null |
store_type | text | | not null |
Indexes:
"store_dimension_pkey" PRIMARY KEY, lsm (store_id HASH), tablespace "ts_dim_global"
"store_dimension_east_1_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_1"
"store_dimension_east_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_2"
"store_dimension_west_1_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_west_1"
"store_dimension_west_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_west_2"
Tablespace: "ts_dim_global"
Why Keep RF at 3?
In this example, each duplicate covering index uses replication factor 3.
When a fourth region is added, the new region-specific index is placed in the target region plus two peer regions.
It does not automatically become RF=4.
That is intentional.
Important Considerations
| Consideration | Why It Matters |
|---|---|
| CREATE TABLESPACE must be top-level | Do not try to run CREATE TABLESPACE inside a PL/pgSQL procedure. Queue the DDL and execute it separately. |
| Track regions, not nodes | Adding another node to an existing region should not create another duplicate covering index. |
| Use a small config list | Only configure tables that truly benefit from regional local reads. This should be intentional, not automatic for every table. |
| Validate the config | The configured key and INCLUDE columns must exist on the target table before index DDL is generated. |
| DDL is not free | Creating a new index on an existing table can consume CPU, IO, and network resources. Schedule DDL execution carefully. |
| Small tables only | This pattern is best for small, frequently read dimension tables. Do not blindly duplicate large fact tables. |
| Index write amplification | Every duplicate covering index adds write overhead. That is usually acceptable for small dimension tables, but it should still be intentional. |
| Config table permissions | Restrict who can insert or update configuration rows. The procedure generates executable DDL from this metadata. |
| Worker placement | Do not run the DDL worker as a single point of failure on one database node. Use an external automation layer, connect through a stable YSQL endpoint, and make the queue safe to retry. |
| Failure handling | Keep failed commands in the queue with an error status so they can be reviewed, corrected, and retried safely. |
Final Takeaway
Duplicate covering indexes are a powerful way to keep small dimension-table reads local in a multi-region YugabyteDB cluster.
But when the cluster topology changes, those local access paths need to change too.
A practical automation pattern is to make the process configuration-driven:
- ● store the list of eligible dimension tables in a config table
- ● validate that the configured columns exist
- ● detect new regions from
yb_servers() - ● generate the required regional tablespace and index DDL
- ● queue the DDL for review or execution
- ● execute the DDL as top-level SQL through a reliable operations worker
For demos, that worker can be a simple script.
For production, it should not depend on a single YugabyteDB node. Run it from an external automation layer, connect through a stable YSQL endpoint, and make the queue idempotent so another worker can safely continue if one fails.
For small dimension tables that need fast local reads in every region, this pattern can make the design much easier to maintain as the cluster grows.
Have Fun!
