Automating Local-Read Indexes When a New Region Is Added

In a previous YugabyteDB Tip, Replicating a Small Dimension Table for Local Reads in Every Region, we looked at how to use duplicate covering indexes to make a small dimension table read locally from every region.

That pattern works great when the cluster topology is known ahead of time.

But what happens when your cluster changes?

For example, maybe you start with three regions:

				
					[root@localhost yb]# ysqlsh -h 127.0.0.1 -c "SELECT host, cloud, region, zone FROM yb_servers();"

   host    | cloud |  region   |    zone
-----------+-------+-----------+------------
 127.0.0.3 | aws   | us-west-2 | us-west-2a
 127.0.0.2 | aws   | us-east-2 | us-east-2a
 127.0.0.1 | aws   | us-east-1 | us-east-1a
(3 rows)
				
			

Then later, you add a node in a brand-new region:

				
					[root@localhost yb]# yugabyted start \
  --advertise_address=127.0.0.4 \
  --cloud_location=aws.us-west-1.us-west-1a \
  --join=127.0.0.1 \
  --base_dir=~/yb04 > start4.log
				
			

Now the cluster has four regions:

				
					[root@localhost yb]# ysqlsh -h 127.0.0.1 -c "SELECT host, cloud, region, zone FROM yb_servers();"

   host    | cloud |  region   |    zone
-----------+-------+-----------+------------
 127.0.0.3 | aws   | us-west-2 | us-west-2a
 127.0.0.2 | aws   | us-east-2 | us-east-2a
 127.0.0.1 | aws   | us-east-1 | us-east-1a
 127.0.0.4 | aws   | us-west-1 | us-west-1a
(4 rows)
				
			

If your application in us-west-1 needs the same low-latency local reads, you need to create a new tablespace and duplicate covering index for that region.

But you probably do not want to write a custom stored procedure for every dimension table.

A better pattern is to make the automation configuration-driven.

You define which dimension tables should get regional duplicate covering indexes in a metadata table. Then one generic procedure can discover regions, loop through the configured dimension tables, and queue the required DDL.

Important: CREATE TABLESPACE cannot run inside a stored procedure because it cannot run inside a transaction block. For this reason, the stored procedure should detect missing regions and queue the required DDL, while a separate top-level process executes the generated CREATE TABLESPACE and CREATE INDEX statements.

The Automation Pattern

The automation pattern looks like this:

Step What Happens Where It Runs
Configure tables Store the list of dimension tables, key columns, INCLUDE columns, index name pattern, and replication factor. Configuration table
Detect regions Query yb_servers() for distinct cloud and region values. PL/pgSQL procedure
Compare state Compare configured tables and discovered regions against catalog objects and the registry table. PL/pgSQL procedure
Queue DDL Insert required CREATE TABLESPACE and CREATE INDEX commands into a queue table. PL/pgSQL procedure
Execute DDL Run queued DDL as top-level SQL statements. Manual execution, shell cron, CI/CD, or operations workflow
Record result Mark each queued command as done or error. External execution process
Key idea: Use a configuration table to define which dimension tables need regional duplicate covering indexes. Use the database procedure for discovery and DDL generation. Use a top-level SQL execution process for the actual CREATE TABLESPACE and CREATE INDEX commands.

Track Regions, Not Nodes

This automation should track regions, not individual nodes.

If you add another node to us-east-1, you do not need a new duplicate covering index.

But if you add a new region, such as us-west-1, then you probably do want a new regional index so applications in that region can read locally.

Important: New node does not mean new index. New region means new index. The goal is to provide one local-read access path per configured dimension table per region, not one index per server.

Existing Dimension Table

For this demo, the dimension table is public.store_dimension.

				
					\d public.store_dimension
				
			

Example output:

				
					.            Table "public.store_dimension"
    Column    |  Type   | Collation | Nullable | Default
--------------+---------+-----------+----------+---------
 store_id     | integer |           | not null |
 store_name   | text    |           | not null |
 region_code  | text    |           | not null |
 manager_name | text    |           | not null |
 store_type   | text    |           | not null |
Indexes:
    "store_dimension_pkey" PRIMARY KEY, lsm (store_id HASH), tablespace "ts_dim_global"
    "store_dimension_east_1_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_1"
    "store_dimension_east_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_2"
    "store_dimension_west_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_west_2"
Tablespace: "ts_dim_global"
				
			

The automation should preserve this naming convention.

For example:

				
					us-east-1 -> east_1 -> ts_dim_east_1 -> store_dimension_east_1_idx
us-east-2 -> east_2 -> ts_dim_east_2 -> store_dimension_east_2_idx
us-west-2 -> west_2 -> ts_dim_west_2 -> store_dimension_west_2_idx
us-west-1 -> west_1 -> ts_dim_west_1 -> store_dimension_west_1_idx
				
			

Use a Configuration Table

Instead of hardcoding one specific table into the procedure, create a configuration table.

Each row represents one dimension or reference table that should receive regional duplicate covering indexes.

				
					CREATE SCHEMA IF NOT EXISTS yb_tip;

CREATE TABLE IF NOT EXISTS yb_tip.dimension_index_config (
  config_id           bigserial PRIMARY KEY,

  table_schema        text        NOT NULL DEFAULT 'public',
  table_name          text        NOT NULL,

  index_name_prefix   text        NOT NULL,
  index_name_suffix   text        NOT NULL DEFAULT '_idx',

  hash_key_columns    text[]      NOT NULL,
  include_columns     text[]      NOT NULL DEFAULT ARRAY[]::text[],

  unique_index        boolean     NOT NULL DEFAULT true,
  replication_factor  int         NOT NULL DEFAULT 3,

  enabled             boolean     NOT NULL DEFAULT true,

  created_at          timestamptz NOT NULL DEFAULT now(),
  updated_at          timestamptz NOT NULL DEFAULT now(),

  CONSTRAINT dimension_index_config_rf_chk
    CHECK (replication_factor > 0),

  CONSTRAINT dimension_index_config_hash_key_columns_chk
    CHECK (array_length(hash_key_columns, 1) > 0)
);
				
			

This table stores structured metadata, not full DDL.

That makes the procedure easier to validate and safer to use.

Use guardrails: A configuration-driven approach is powerful, but it should only be used for a small, intentional list of dimension or reference tables. Every duplicate covering index adds write overhead, so avoid automatically applying this pattern to large or frequently updated tables.

Add a Dimension Table to the Configuration

For this demo, assume the dimension table is named public.store_dimension.

The regional covering index should be created on store_id HASH and include several descriptive columns.

				
					INSERT INTO yb_tip.dimension_index_config (
  table_schema,
  table_name,
  index_name_prefix,
  index_name_suffix,
  hash_key_columns,
  include_columns,
  unique_index,
  replication_factor,
  enabled
)
VALUES (
  'public',
  'store_dimension',
  'store_dimension',
  '_idx',
  ARRAY['store_id'],
  ARRAY['store_name', 'region_code', 'manager_name', 'store_type'],
  true,
  3,
  true
);
				
			

For us-west-1, this configuration would generate an index name like this:

				
					store_dimension_west_1_idx
				
			

And the generated index DDL would look like this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

				
					CREATE UNIQUE INDEX store_dimension_west_1_idx
ON public.store_dimension (store_id HASH)
INCLUDE (store_name, region_code, manager_name, store_type)
TABLESPACE ts_dim_west_1;
				
			
Configuration matters: The configuration table must match the real table definition and the naming convention used by your existing objects. In this demo, the existing indexes use names like store_dimension_east_1_idx, so the automation normalizes us-east-1 to east_1. If the config references columns that do not exist, the generated CREATE INDEX statement will fail.

Validate the Configuration

Because the procedure generates index DDL from the configuration table, validate the configured columns before queueing or executing DDL.

				
					WITH configured_columns AS (
  SELECT
    c.config_id,
    c.table_schema,
    c.table_name,
    unnest(c.hash_key_columns || c.include_columns) AS column_name
  FROM yb_tip.dimension_index_config c
  WHERE c.enabled = true
)
SELECT
  cc.config_id,
  cc.table_schema,
  cc.table_name,
  cc.column_name AS missing_column
FROM configured_columns cc
WHERE NOT EXISTS (
  SELECT 1
  FROM information_schema.columns ic
  WHERE ic.table_schema = cc.table_schema
    AND ic.table_name = cc.table_name
    AND ic.column_name = cc.column_name
)
ORDER BY cc.config_id, cc.column_name;
				
			

Expected results:

				
					.config_id | table_schema | table_name | missing_column
-----------+--------------+------------+----------------
(0 rows)
				
			

Create the Registry Table

Next, create a registry table.

This table tracks which configured table and region combinations have been discovered.

				
					CREATE TABLE IF NOT EXISTS yb_tip.dimension_index_ddl_queue (
  queue_id          bigserial PRIMARY KEY,

  config_id         bigint REFERENCES yb_tip.dimension_index_config(config_id),

  cloud             text        NOT NULL,
  region            text        NOT NULL,

  table_schema      text,
  table_name        text,

  tablespace_name   text        NOT NULL,
  index_name        text,

  ddl_type          text        NOT NULL,
  ddl_command       text        NOT NULL,

  status            text        NOT NULL DEFAULT 'pending',

  created_at        timestamptz NOT NULL DEFAULT now(),
  executed_at       timestamptz,
  error_message     text,

  CONSTRAINT dimension_index_ddl_queue_status_chk
    CHECK (status IN ('pending', 'done', 'error')),

  CONSTRAINT dimension_index_ddl_queue_type_chk
    CHECK (ddl_type IN ('create_tablespace', 'create_index'))
);
				
			

The registry gives the automation memory and makes the generated object names easy to audit.

Create the DDL Queue Table

Next, create a queue table.

The procedure will insert generated DDL into this table instead of running the DDL directly.

				
					CREATE TABLE IF NOT EXISTS yb_tip.dimension_index_region_registry (
  config_id        bigint      NOT NULL REFERENCES yb_tip.dimension_index_config(config_id),

  cloud            text        NOT NULL,
  region           text        NOT NULL,

  table_schema     text        NOT NULL,
  table_name       text        NOT NULL,

  tablespace_name  text        NOT NULL,
  index_name       text        NOT NULL,
  replica_regions  jsonb       NOT NULL,

  created_at       timestamptz NOT NULL DEFAULT now(),
  last_seen_at     timestamptz NOT NULL DEFAULT now(),

  PRIMARY KEY (config_id, cloud, region)
);
				
			

Notice that config_id, table_schema, table_name, and index_name can be NULL for create_tablespace commands.

That is because a regional tablespace can be shared by multiple configured dimension tables.

Create the Queueing Procedure

Now create the procedure that discovers regions and queues any missing DDL.

This procedure loops through:

  • ● each enabled row in yb_tip.dimension_index_config
  • ● each distinct region from yb_servers()

For each configured table and region, it queues missing tablespace and index DDL.

				
					CREATE OR REPLACE PROCEDURE yb_tip.queue_missing_dimension_local_indexes()
LANGUAGE plpgsql
AS $$
DECLARE
  v_cfg                   record;
  v_target                record;

  v_region_token          text;
  v_tablespace_name       text;
  v_index_name            text;

  v_placement_blocks      jsonb;
  v_replica_regions       jsonb;
  v_replica_placement     text;

  v_tablespace_exists     boolean;
  v_index_exists          boolean;
  v_table_exists          regclass;

  v_key_columns_sql       text;
  v_include_columns_sql   text;
  v_include_clause        text;

  v_index_verb            text;

  v_create_tablespace_ddl text;
  v_create_index_ddl      text;
BEGIN
  /*
   * Prevent overlapping queue-generation runs.
   */
  IF NOT pg_try_advisory_lock(hashtext('yb_tip.queue_missing_dimension_local_indexes')) THEN
    RAISE NOTICE 'Another queue-generation run is already active. Skipping.';
    RETURN;
  END IF;

  /*
   * Loop through each enabled dimension table configuration.
   */
  FOR v_cfg IN
    SELECT *
    FROM yb_tip.dimension_index_config
    WHERE enabled = true
    ORDER BY config_id
  LOOP
    /*
     * Make sure the configured table exists.
     */
    SELECT to_regclass(format('%I.%I', v_cfg.table_schema, v_cfg.table_name))
    INTO v_table_exists;

    IF v_table_exists IS NULL THEN
      RAISE NOTICE 'Configured table %.% does not exist. Skipping config_id %.',
        v_cfg.table_schema,
        v_cfg.table_name,
        v_cfg.config_id;

      CONTINUE;
    END IF;

    /*
     * Validate that all configured key and INCLUDE columns exist.
     */
    IF EXISTS (
      WITH configured_columns AS (
        SELECT unnest(v_cfg.hash_key_columns || v_cfg.include_columns) AS column_name
      )
      SELECT 1
      FROM configured_columns cc
      WHERE NOT EXISTS (
        SELECT 1
        FROM information_schema.columns ic
        WHERE ic.table_schema = v_cfg.table_schema
          AND ic.table_name = v_cfg.table_name
          AND ic.column_name = cc.column_name
      )
    ) THEN
      RAISE NOTICE 'Configured columns for %.% are invalid. Skipping config_id %.',
        v_cfg.table_schema,
        v_cfg.table_name,
        v_cfg.config_id;

      CONTINUE;
    END IF;

    /*
     * Build the index key column list.
     * Example: ARRAY['store_id'] -> store_id HASH
     */
    SELECT string_agg(format('%I HASH', col), ', ')
    INTO v_key_columns_sql
    FROM unnest(v_cfg.hash_key_columns) AS col;

    /*
     * Build the INCLUDE column list.
     */
    SELECT string_agg(format('%I', col), ', ')
    INTO v_include_columns_sql
    FROM unnest(v_cfg.include_columns) AS col;

    v_include_clause :=
      CASE
        WHEN v_include_columns_sql IS NULL OR length(v_include_columns_sql) = 0 THEN ''
        ELSE format(' INCLUDE (%s)', v_include_columns_sql)
      END;

    v_index_verb :=
      CASE
        WHEN v_cfg.unique_index THEN 'CREATE UNIQUE INDEX'
        ELSE 'CREATE INDEX'
      END;

    /*
     * Loop through each distinct cloud + region currently visible
     * in the cluster topology.
     */
    FOR v_target IN
      SELECT DISTINCT cloud, region
      FROM yb_servers()
      WHERE cloud IS NOT NULL
        AND region IS NOT NULL
      ORDER BY cloud, region
    LOOP
      /*
       * Normalize region names to match the naming convention from the previous tip.
       *
       * Examples:
       *   us-east-1 -> east_1
       *   us-east-2 -> east_2
       *   us-west-2 -> west_2
       *   us-west-1 -> west_1
       */
      v_region_token :=
        regexp_replace(lower(v_target.region), '^us-', '');

      v_region_token :=
        regexp_replace(v_region_token, '[^a-z0-9]+', '_', 'g');

      /*
       * Use one regional tablespace per region. Multiple configured
       * dimension tables can share the same regional tablespace.
       */
      v_tablespace_name := 'ts_dim_' || v_region_token;

      /*
       * Use the configured prefix and suffix to build the index name.
       * Example: store_dimension + west_1 + _idx
       */
      v_index_name :=
        v_cfg.index_name_prefix || '_' || v_region_token || v_cfg.index_name_suffix;

      /*
       * PostgreSQL-compatible identifiers are limited to 63 bytes.
       * If the generated index name is too long, shorten it and add a hash.
       */
      IF length(v_index_name) > 63 THEN
        v_index_name :=
          left(v_cfg.index_name_prefix || '_' || v_region_token, 50)
          || '_'
          || substr(md5(v_cfg.config_id::text || '.' || v_target.cloud || '.' || v_target.region), 1, 8);
      END IF;

      /*
       * Pick the target region first, then enough peer regions to satisfy
       * the configured replication factor.
       *
       * The target region receives leader_preference = 1 so reads from that
       * region can prefer the local index leader.
       */
      WITH discovered_regions AS (
        SELECT DISTINCT cloud, region
        FROM yb_servers()
        WHERE cloud IS NOT NULL
          AND region IS NOT NULL
      ),
      selected_regions AS (
        SELECT cloud, region
        FROM discovered_regions
        ORDER BY
          CASE
            WHEN cloud = v_target.cloud AND region = v_target.region THEN 0
            ELSE 1
          END,
          cloud,
          region
        LIMIT v_cfg.replication_factor
      )
      SELECT
        jsonb_agg(
          CASE
            WHEN cloud = v_target.cloud AND region = v_target.region THEN
              jsonb_build_object(
                'cloud', cloud,
                'region', region,
                'zone', '*',
                'min_num_replicas', 1,
                'leader_preference', 1
              )
            ELSE
              jsonb_build_object(
                'cloud', cloud,
                'region', region,
                'zone', '*',
                'min_num_replicas', 1
              )
          END
          ORDER BY
            CASE
              WHEN cloud = v_target.cloud AND region = v_target.region THEN 0
              ELSE 1
            END,
            cloud,
            region
        ),
        jsonb_agg(
          jsonb_build_object('cloud', cloud, 'region', region)
          ORDER BY
            CASE
              WHEN cloud = v_target.cloud AND region = v_target.region THEN 0
              ELSE 1
            END,
            cloud,
            region
        )
      INTO v_placement_blocks, v_replica_regions
      FROM selected_regions;

      IF jsonb_array_length(v_placement_blocks) < v_cfg.replication_factor THEN
        RAISE NOTICE 'Not enough regions to satisfy RF %. Skipping config_id %, region %.',
          v_cfg.replication_factor,
          v_cfg.config_id,
          v_target.region;

        CONTINUE;
      END IF;

      v_replica_placement :=
        jsonb_build_object(
          'num_replicas', v_cfg.replication_factor,
          'placement_blocks', v_placement_blocks
        )::text;

      /*
       * Check whether the regional tablespace already exists.
       */
      SELECT EXISTS (
        SELECT 1
        FROM pg_tablespace
        WHERE spcname = v_tablespace_name
      )
      INTO v_tablespace_exists;

      /*
       * Check whether the regional duplicate covering index already exists.
       */
      SELECT EXISTS (
        SELECT 1
        FROM pg_class c
        JOIN pg_namespace n
          ON n.oid = c.relnamespace
        WHERE n.nspname = v_cfg.table_schema
          AND c.relname = v_index_name
          AND c.relkind = 'i'
      )
      INTO v_index_exists;

      /*
       * Generate one-line DDL commands.
       * Keeping them one line makes them easier for external automation to consume.
       */
      v_create_tablespace_ddl := format(
        'CREATE TABLESPACE %I WITH (replica_placement = %L);',
        v_tablespace_name,
        v_replica_placement
      );

      v_create_index_ddl := format(
        '%s %I ON %I.%I (%s)%s TABLESPACE %I;',
        v_index_verb,
        v_index_name,
        v_cfg.table_schema,
        v_cfg.table_name,
        v_key_columns_sql,
        v_include_clause,
        v_tablespace_name
      );

      /*
       * Queue CREATE TABLESPACE if needed.
       *
       * The tablespace is per region, not per table. Therefore, only one
       * pending or errored CREATE TABLESPACE command should exist for a region.
       */
      IF NOT v_tablespace_exists THEN
        INSERT INTO yb_tip.dimension_index_ddl_queue (
          config_id,
          cloud,
          region,
          table_schema,
          table_name,
          tablespace_name,
          index_name,
          ddl_type,
          ddl_command
        )
        SELECT
          NULL,
          v_target.cloud,
          v_target.region,
          NULL,
          NULL,
          v_tablespace_name,
          NULL,
          'create_tablespace',
          v_create_tablespace_ddl
        WHERE NOT EXISTS (
          SELECT 1
          FROM yb_tip.dimension_index_ddl_queue
          WHERE cloud = v_target.cloud
            AND region = v_target.region
            AND ddl_type = 'create_tablespace'
            AND status IN ('pending', 'error')
        );
      END IF;

      /*
       * Queue CREATE INDEX if needed.
       */
      IF NOT v_index_exists THEN
        INSERT INTO yb_tip.dimension_index_ddl_queue (
          config_id,
          cloud,
          region,
          table_schema,
          table_name,
          tablespace_name,
          index_name,
          ddl_type,
          ddl_command
        )
        SELECT
          v_cfg.config_id,
          v_target.cloud,
          v_target.region,
          v_cfg.table_schema,
          v_cfg.table_name,
          v_tablespace_name,
          v_index_name,
          'create_index',
          v_create_index_ddl
        WHERE NOT EXISTS (
          SELECT 1
          FROM yb_tip.dimension_index_ddl_queue
          WHERE config_id = v_cfg.config_id
            AND cloud = v_target.cloud
            AND region = v_target.region
            AND ddl_type = 'create_index'
            AND status IN ('pending', 'error')
        );
      END IF;

      /*
       * Record the discovered table + region combination and expected object names.
       */
      INSERT INTO yb_tip.dimension_index_region_registry (
        config_id,
        cloud,
        region,
        table_schema,
        table_name,
        tablespace_name,
        index_name,
        replica_regions,
        created_at,
        last_seen_at
      )
      VALUES (
        v_cfg.config_id,
        v_target.cloud,
        v_target.region,
        v_cfg.table_schema,
        v_cfg.table_name,
        v_tablespace_name,
        v_index_name,
        v_replica_regions,
        now(),
        now()
      )
      ON CONFLICT (config_id, cloud, region)
      DO UPDATE SET
        table_schema = EXCLUDED.table_schema,
        table_name = EXCLUDED.table_name,
        tablespace_name = EXCLUDED.tablespace_name,
        index_name = EXCLUDED.index_name,
        replica_regions = EXCLUDED.replica_regions,
        last_seen_at = now();

    END LOOP;
  END LOOP;

  PERFORM pg_advisory_unlock(hashtext('yb_tip.queue_missing_dimension_local_indexes'));

EXCEPTION
  WHEN OTHERS THEN
    PERFORM pg_advisory_unlock(hashtext('yb_tip.queue_missing_dimension_local_indexes'));
    RAISE;
END;
$$;
				
			
Note: This procedure does not create the tablespace or index directly. It only generates and queues the required DDL. The queued DDL still needs to be executed as top-level SQL.

Run the Queueing Procedure Manually

Before scheduling anything, run the procedure manually:

				
					CALL yb_tip.ensure_store_dimension_local_indexes();
				
			

Then check the registry:

				
					SELECT
  r.config_id,
  r.cloud,
  r.region,
  r.table_name,
  r.tablespace_name,
  r.index_name
FROM yb_tip.dimension_index_region_registry r
ORDER BY r.config_id, r.cloud, r.region;
				
			

Example output:

config_id cloud region table_name tablespace_name index_name
1 aws us-east-1 store_dimension ts_dim_east_1 store_dimension_east_1_idx
1 aws us-east-2 store_dimension ts_dim_east_2 store_dimension_east_2_idx
1 aws us-west-2 store_dimension ts_dim_west_2 store_dimension_west_2_idx

Then check the DDL queue:

				
					SELECT
  queue_id,
  config_id,
  cloud,
  region,
  table_name,
  ddl_type,
  status,
  ddl_command
FROM yb_tip.dimension_index_ddl_queue
ORDER BY
  CASE ddl_type
    WHEN 'create_tablespace' THEN 1
    WHEN 'create_index' THEN 2
  END,
  queue_id;
				
			

If the regional tablespaces and indexes already exist, the queue may be empty. If objects are missing, the procedure queues the required DDL.

Enable pg_cron for Queue Generation

You can use pg_cron to run the detection and queueing procedure.

Before creating the extension, pg_cron must be enabled at the YugabyteDB process level.
For this local yugabyted demo, restart each node with pg_cron enabled:
				
					yugabyted start \
  --advertise_address=127.0.0.1 \
  --cloud_location=aws.us-east-1.us-east-1a \
  --base_dir=~/yb01 \
  --master_flags "enable_pg_cron=true" \
  --tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" > start1.log

sleep 4

yugabyted start \
  --advertise_address=127.0.0.2 \
  --cloud_location=aws.us-east-2.us-east-2a \
  --join=127.0.0.1 \
  --base_dir=~/yb02 \
  --master_flags "enable_pg_cron=true" \
  --tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" > start2.log

yugabyted start \
  --advertise_address=127.0.0.3 \
  --cloud_location=aws.us-west-2.us-west-2a \
  --join=127.0.0.1 \
  --base_dir=~/yb03 \
  --master_flags "enable_pg_cron=true" \
  --tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" > start3.log
				
			

Then connect to the yugabyte database and create the extension:

				
					CREATE EXTENSION IF NOT EXISTS pg_cron;
				
			
Important: In YugabyteDB, pg_cron must be enabled before creating the extension. Set enable_pg_cron=true and set ysql_cron_database_name if you want to control which database owns the pg_cron metadata. In this demo, ysql_cron_database_name=yugabyte, so the extension is created in the yugabyte database.

Schedule the queueing procedure to run once per hour:

				
					SELECT cron.schedule(
  'queue-dimension-local-index-ddl',
  '0 * * * *',
  $$CALL yb_tip.queue_missing_dimension_local_indexes();$$
);
				
			

Then verify:

				
					SELECT jobid, schedule, command, active
FROM cron.job
ORDER BY jobid;
				
			

Check recent run history:

				
					SELECT
  jobid,
  status,
  start_time,
  end_time,
  return_message
FROM cron.job_run_details
ORDER BY start_time DESC
LIMIT 10;
				
			
Tip: In this pattern, pg_cron is responsible for detecting topology changes and queueing DDL. The actual DDL execution still happens as top-level SQL through a separate worker or controlled operations process.

Execute the Queued DDL Manually

For a demo, you can pull the pending DDL and execute it manually.

				
					SELECT ddl_command
FROM yb_tip.dimension_index_ddl_queue
WHERE status = 'pending'
ORDER BY
  CASE ddl_type
    WHEN 'create_tablespace' THEN 1
    WHEN 'create_index' THEN 2
  END,
  queue_id;
				
			

Run the CREATE TABLESPACE statements first.

Then run the CREATE INDEX statements.

After each command succeeds, mark it as done:

				
					UPDATE yb_tip.dimension_index_ddl_queue
SET status = 'done',
    executed_at = now(),
    error_message = NULL
WHERE queue_id = <queue_id>;
				
			
Demo tip: For a blog demo, manual execution is often easier to show. It makes the generated DDL visible and avoids hiding the most important part of the workflow.

Automate DDL Execution with an External Worker

For a local demo, it is fine to run the DDL worker from the same machine where the demo cluster is running.

For production, though, avoid tying the worker to a single YugabyteDB node.

In a distributed system, the DDL worker should be treated as part of your operations layer. It should connect to YugabyteDB through a stable YSQL endpoint, such as:

  • ● a load balancer
  • ● a Kubernetes Service
  • ● a DNS name
  • ● a Smart Driver endpoint
  • ● a CI/CD or automation runner with access to the cluster

For example, in a demo you might use:

				
					YSQLSH="ysqlsh -h 127.0.0.1 -U yugabyte -d yugabyte"
				
			

But in production, use something more like:

				
					YSQLSH="ysqlsh -h ysql-lb.example.com -U yugabyte -d yugabyte"
				
			
Production note: Do not tie the DDL worker to a single YugabyteDB node. Run it from an external automation layer and connect through a stable YSQL endpoint. If high availability is required, run more than one worker and make the queue processing idempotent so a failed worker can be safely replaced by another worker.
Where Should the Worker Run?

The worker can run from several places:

Option How It Works Failure Behavior
Kubernetes CronJob Runs the worker as a short-lived pod that connects through a YSQL Service. If a Kubernetes worker node fails, the next run can be scheduled elsewhere.
CI/CD or automation runner Runs from Jenkins, GitHub Actions, GitLab, Argo, Rundeck, or another automation platform. Availability depends on the automation platform, not on a single database node.
Multiple active workers More than one worker can run the same logic, as long as the queue uses locking or safe status transitions. If one worker fails, another worker can continue processing pending DDL.
Database-only pg_cron dispatcher Uses pg_cron to schedule queued DDL from inside the database. Keeps more of the workflow inside YugabyteDB, but adds more complexity.
Make the Queue Idempotent

The most important part of the design is that the DDL queue must be safe to retry.

For example, this failure case is possible:
  • 1. Worker picks up a pending CREATE TABLESPACE command.
  • 2. Worker executes the DDL successfully.
  • 3. Worker fails before marking the queue row as done.

When the next worker starts, it may see the same queue row still marked as pending.

That is why the worker should check the catalog before executing DDL.

The logic should be:

				
					For each pending queue row:
.  Check whether the target object already exists.
.  If it exists, mark the queue row as done.
.  If it does not exist, execute the DDL.
.  If the DDL succeeds, mark the queue row as done.
.  If the DDL fails, mark the queue row as error.
				
			

This makes the process resumable.

A failed worker does not corrupt the queue. Another worker can safely pick up where it left off.

Example External Worker

Here is a simple worker pattern.

For a demo, 127.0.0.1 is fine. For production, replace it with a stable YSQL endpoint.

				
					#!/usr/bin/env bash

set -uo pipefail

YSQLSH="ysqlsh -h 127.0.0.1 -U yugabyte -d yugabyte"
LOCK_FILE="/tmp/yb_tip_dimension_index_ddl.lock"

exec 200>"$LOCK_FILE"
flock -n 200 || {
  echo "Another local DDL worker is already running. Exiting."
  exit 0
}

# Ask the database to discover regions and queue any missing DDL.
$YSQLSH -v ON_ERROR_STOP=1 -c "CALL yb_tip.queue_missing_dimension_local_indexes();" || exit 1

# Fetch only the queue ids first.
# This avoids shell parsing problems with nullable columns or long DDL text.
mapfile -t queue_ids < <($YSQLSH -At -c "
SELECT queue_id
FROM yb_tip.dimension_index_ddl_queue
WHERE status = 'pending'
ORDER BY
  CASE ddl_type
    WHEN 'create_tablespace' THEN 1
    WHEN 'create_index' THEN 2
  END,
  queue_id;
")

for queue_id in "${queue_ids[@]}"; do

  ddl_type=$($YSQLSH -At -c "
    SELECT ddl_type
    FROM yb_tip.dimension_index_ddl_queue
    WHERE queue_id = ${queue_id};
  ")

  tablespace_name=$($YSQLSH -At -c "
    SELECT COALESCE(tablespace_name, '')
    FROM yb_tip.dimension_index_ddl_queue
    WHERE queue_id = ${queue_id};
  ")

  table_schema=$($YSQLSH -At -c "
    SELECT COALESCE(table_schema, '')
    FROM yb_tip.dimension_index_ddl_queue
    WHERE queue_id = ${queue_id};
  ")

  index_name=$($YSQLSH -At -c "
    SELECT COALESCE(index_name, '')
    FROM yb_tip.dimension_index_ddl_queue
    WHERE queue_id = ${queue_id};
  ")

  ddl_command=$($YSQLSH -At -c "
    SELECT ddl_command
    FROM yb_tip.dimension_index_ddl_queue
    WHERE queue_id = ${queue_id};
  ")

  echo "Processing queue_id=${queue_id}, ddl_type=${ddl_type}"

  if [[ -z "$ddl_command" ]]; then
    echo "No DDL command found for queue_id=${queue_id}. Marking error."

    $YSQLSH -v ON_ERROR_STOP=1 -c "
      UPDATE yb_tip.dimension_index_ddl_queue
      SET status = 'error',
          executed_at = now(),
          error_message = 'DDL command was empty.'
      WHERE queue_id = ${queue_id};
    "

    continue
  fi

  object_exists="f"

  if [[ "$ddl_type" == "create_tablespace" ]]; then
    object_exists=$($YSQLSH -At -c "
      SELECT EXISTS (
        SELECT 1
        FROM pg_tablespace
        WHERE spcname = '${tablespace_name}'
      );
    ")
  fi

  if [[ "$ddl_type" == "create_index" ]]; then
    object_exists=$($YSQLSH -At -c "
      SELECT EXISTS (
        SELECT 1
        FROM pg_class c
        JOIN pg_namespace n
          ON n.oid = c.relnamespace
        WHERE c.relname = '${index_name}'
          AND n.nspname = '${table_schema}'
          AND c.relkind = 'i'
      );
    ")
  fi

  if [[ "$object_exists" == "t" ]]; then
    echo "Target object already exists. Marking queue_id=${queue_id} as done."

    $YSQLSH -v ON_ERROR_STOP=1 -c "
      UPDATE yb_tip.dimension_index_ddl_queue
      SET status = 'done',
          executed_at = now(),
          error_message = NULL
      WHERE queue_id = ${queue_id};
    "

    continue
  fi

  echo "Executing: ${ddl_command}"

  if $YSQLSH -v ON_ERROR_STOP=1 -c "$ddl_command"; then
    $YSQLSH -v ON_ERROR_STOP=1 -c "
      UPDATE yb_tip.dimension_index_ddl_queue
      SET status = 'done',
          executed_at = now(),
          error_message = NULL
      WHERE queue_id = ${queue_id};
    "
  else
    $YSQLSH -v ON_ERROR_STOP=1 -c "
      UPDATE yb_tip.dimension_index_ddl_queue
      SET status = 'error',
          executed_at = now(),
          error_message = 'DDL execution failed. Check external worker logs.'
      WHERE queue_id = ${queue_id};
    "
  fi

done
				
			
Demo tip: For a local blog demo, running the worker from the same host as the demo cluster is fine. For production, run the worker from a highly available automation layer and connect through a stable YSQL endpoint.

Schedule it with operating system cron:

				
					0 * * * * /usr/local/bin/yb_tip_dimension_index_ddl_worker.sh >> /var/log/yb_tip_dimension_index_ddl_worker.log 2>&1
				
			
Demo tip: For a local blog demo, running the worker from the same host as the demo cluster is fine. For production, run the worker from a highly available automation layer and connect through a stable YSQL endpoint.
Add a New Region

Now add a new node in a new region:

				
					yugabyted start \
  --advertise_address=127.0.0.4 \
  --cloud_location=aws.us-west-1.us-west-1a \
  --join=127.0.0.1 \
  --master_flags "enable_pg_cron=true" \
  --tserver_flags "enable_pg_cron=true,ysql_cron_database_name=yugabyte" \
  --base_dir=~/yb04 > start4.log
				
			

Confirm the new region appears:

				
					SELECT host, cloud, region, zone
FROM yb_servers()
ORDER BY cloud, region, zone;
				
			

Expected output:

				
					.  host    | cloud |  region   |    zone
-----------+-------+-----------+------------
 127.0.0.1 | aws   | us-east-1 | us-east-1a
 127.0.0.2 | aws   | us-east-2 | us-east-2a
 127.0.0.4 | aws   | us-west-1 | us-west-1a
 127.0.0.3 | aws   | us-west-2 | us-west-2a
(4 rows)
				
			

Run the queueing procedure again:

				
					CALL yb_tip.queue_missing_dimension_local_indexes();
				
			

Now check the registry:

				
					SELECT
  r.config_id,
  r.cloud,
  r.region,
  r.table_schema,
  r.table_name,
  r.tablespace_name,
  r.index_name,
  r.last_seen_at
FROM yb_tip.dimension_index_region_registry r
ORDER BY r.config_id, r.cloud, r.region;
				
			

You should see the new us-west-1 region for each configured dimension table:

config_id cloud region table_name tablespace_name index_name
1 aws us-east-1 store_dimension ts_dim_east_1 store_dimension_east_1_idx
1 aws us-east-2 store_dimension ts_dim_east_2 store_dimension_east_2_idx
1 aws us-west-1 store_dimension ts_dim_west_1 store_dimension_west_1_idx
1 aws us-west-2 store_dimension ts_dim_west_2 store_dimension_west_2_idx

Then check the DDL queue for the new region:

				
					SELECT
  queue_id,
  config_id,
  cloud,
  region,
  table_name,
  ddl_type,
  status,
  ddl_command
FROM yb_tip.dimension_index_ddl_queue
WHERE region = 'us-west-1'
ORDER BY
  CASE ddl_type
    WHEN 'create_tablespace' THEN 1
    WHEN 'create_index' THEN 2
  END,
  queue_id;
				
			

You should see new pending DDL for the new region if the objects do not already exist.

Example:

queue_id config_id region table_name ddl_type status ddl_command
1 NULL us-west-1 NULL create_tablespace pending CREATE TABLESPACE ts_dim_west_1 WITH (…);
2 1 us-west-1 store_dimension create_index pending CREATE UNIQUE INDEX store_dimension_west_1_idx …;

We can wait for the shell script to be exected via the cron job, or we can simply execute it manually.

				
					[root@localhost ~]# ./yb_tip_dimension_index_ddl_worker.sh
CALL
Processing queue_id=1, ddl_type=create_tablespace
Executing: CREATE TABLESPACE ts_dim_west_1 WITH (...);
CREATE TABLESPACE
UPDATE 1
Processing queue_id=2, ddl_type=create_index
Executing: CREATE UNIQUE INDEX store_dimension_west_1_idx ON public.store_dimension (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type) TABLESPACE ts_dim_west_1;
CREATE INDEX
UPDATE 1
				
			

Now validate the final objects:

				
					SELECT spcname
FROM pg_tablespace
WHERE spcname LIKE 'ts_dim_%'
ORDER BY spcname;
				
			

Expected:

				
					.   spcname
---------------
 ts_dim_east_1
 ts_dim_east_2
 ts_dim_global
 ts_dim_west_1
 ts_dim_west_2
(5 rows)
				
			

And validate the indexes:

				
					SELECT
  schemaname,
  tablename,
  indexname,
  tablespace
FROM pg_indexes
WHERE schemaname = 'public'
  AND tablename = 'store_dimension'
  AND indexname LIKE 'store_dimension_%_idx'
ORDER BY indexname;
				
			

Expected:

				
					.schemaname |    tablename    |          indexname          |  tablespace
------------+-----------------+-----------------------------+---------------
 public     | store_dimension | store_dimension_east_1_idx  | ts_dim_east_1
 public     | store_dimension | store_dimension_east_2_idx  | ts_dim_east_2
 public     | store_dimension | store_dimension_west_1_idx  | ts_dim_west_1
 public     | store_dimension | store_dimension_west_2_idx  | ts_dim_west_2
				
			

We can also verify the objects by describing the table:

				
					yugabyte=# \d public.store_dimension;
             Table "public.store_dimension"
    Column    |  Type   | Collation | Nullable | Default
--------------+---------+-----------+----------+---------
 store_id     | integer |           | not null |
 store_name   | text    |           | not null |
 region_code  | text    |           | not null |
 manager_name | text    |           | not null |
 store_type   | text    |           | not null |
Indexes:
    "store_dimension_pkey" PRIMARY KEY, lsm (store_id HASH), tablespace "ts_dim_global"
    "store_dimension_east_1_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_1"
    "store_dimension_east_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_east_2"
    "store_dimension_west_1_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_west_1"
    "store_dimension_west_2_idx" UNIQUE, lsm (store_id HASH) INCLUDE (store_name, region_code, manager_name, store_type), tablespace "ts_dim_west_2"
Tablespace: "ts_dim_global"
				
			

Why Keep RF at 3?

In this example, each duplicate covering index uses replication factor 3.

When a fourth region is added, the new region-specific index is placed in the target region plus two peer regions.

It does not automatically become RF=4.

That is intentional.

Note: Keeping the duplicate covering indexes at RF=3 avoids automatically increasing write cost every time a new region is added. If you want every regional index replicated to every region, that is a separate architecture decision.

Important Considerations

Consideration Why It Matters
CREATE TABLESPACE must be top-level Do not try to run CREATE TABLESPACE inside a PL/pgSQL procedure. Queue the DDL and execute it separately.
Track regions, not nodes Adding another node to an existing region should not create another duplicate covering index.
Use a small config list Only configure tables that truly benefit from regional local reads. This should be intentional, not automatic for every table.
Validate the config The configured key and INCLUDE columns must exist on the target table before index DDL is generated.
DDL is not free Creating a new index on an existing table can consume CPU, IO, and network resources. Schedule DDL execution carefully.
Small tables only This pattern is best for small, frequently read dimension tables. Do not blindly duplicate large fact tables.
Index write amplification Every duplicate covering index adds write overhead. That is usually acceptable for small dimension tables, but it should still be intentional.
Config table permissions Restrict who can insert or update configuration rows. The procedure generates executable DDL from this metadata.
Worker placement Do not run the DDL worker as a single point of failure on one database node. Use an external automation layer, connect through a stable YSQL endpoint, and make the queue safe to retry.
Failure handling Keep failed commands in the queue with an error status so they can be reviewed, corrected, and retried safely.

Final Takeaway

Duplicate covering indexes are a powerful way to keep small dimension-table reads local in a multi-region YugabyteDB cluster.

But when the cluster topology changes, those local access paths need to change too.

A practical automation pattern is to make the process configuration-driven:

  • ● store the list of eligible dimension tables in a config table
  • ● validate that the configured columns exist
  • ● detect new regions from yb_servers()
  • ● generate the required regional tablespace and index DDL
  • ● queue the DDL for review or execution
  • ● execute the DDL as top-level SQL through a reliable operations worker

For demos, that worker can be a simple script.

For production, it should not depend on a single YugabyteDB node. Run it from an external automation layer, connect through a stable YSQL endpoint, and make the queue idempotent so another worker can safely continue if one fails.

For small dimension tables that need fast local reads in every region, this pattern can make the design much easier to maintain as the cluster grows.

Have Fun!

Maui hits different from the front row of a helicopter. 🚁🌺