Simplifying Row-Level Geo-Partitioning in YugabyteDB

Geo-partitioning in YugabyteDB is one of its most powerful capabilities.

It lets you:

● Keep data close to users
● Reduce latency
● Enforce data residency
● Build truly global applications

But if you’ve ever implemented it manually… you know the reality:

👉 It’s a lot of DDL.

Tablespaces. Placement JSON. Partitions. Indexes. Repeat for every region.

💡 Key Insight

Geo-partitioning isn’t hard conceptually… it’s the repetitive, error-prone DDL that makes it painful.

🧠 The First Breakthrough: Remove the Boilerplate

This is where Hari Krishna Sunder (Principal Engineer at YugabyteDB) stepped in.

He created a helper function that replaces all that repetitive DDL with a single call.

Instead of writing 5–6 statements per region…

👉 You define your table
👉 Call a function
👉 Execute the generated DDL

⚙️ Hari’s Approach

				
					CREATE TABLE bank_transactions (
    user_id       INTEGER NOT NULL,
    account_id    INTEGER NOT NULL,
    geo_partition VARCHAR NOT NULL,
    account_type  VARCHAR NOT NULL,
    amount        NUMERIC NOT NULL,
    txn_type      VARCHAR NOT NULL,
    created_at    TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id HASH, account_id, geo_partition)
) PARTITION BY LIST (geo_partition);

				
					SELECT generate_geo_partition_ddl(
    'bank_transactions',
    ARRAY[
        ROW('EU',    'aws', 'eu-central-1', ARRAY['eu-central-1a','eu-central-1b','eu-central-1c']),
        ROW('India', 'aws', 'ap-south-1',   ARRAY['ap-south-1a','ap-south-1b','ap-south-1c']),
        ROW('US',    'aws', 'us-west-2',    ARRAY['us-west-2a','us-west-2b','us-west-2c'])
    ]::geo_partition_spec[],
    ARRAY['account_id']
);

\gexec

🚀 What This Solves

● Eliminates repetitive DDL
● Standardizes geo-partition setup
● Makes adoption much easier

⚠️ The Gap: Still Too Manual

Hari’s function is a huge improvement.

But in real-world deployments, a few problems remain:

● You still hardcode regions and zones
● No validation against the actual cluster
● No support for:
- ○ leader preference ranking
- ○ latency-optimized layouts
- ○ RF overrides
- ○ duplicate covering indexes

⚠️ The Real Problem

We simplified DDL… but we didn’t make it topology-aware.

🚀 The Evolution: A Geo-Partitioning Assistant

YugabyteDB already exposes cluster topology via:

				
					SELECT cloud, region, zone FROM yb_servers();

Example:

				
					. cloud |  region   |    zone
-------+-----------+------------
 aws   | us-east-2 | us-east-2a
 aws   | us-east-1 | us-east-1a
 aws   | us-west-2 | us-west-2a

Now we can:

● Validate placement against reality
● Support advanced placement patterns
● Eliminate incorrect configs before they happen

🧩 The Production-Ready Helper

Instead of a single monolithic function, this is now a modular helper package with a clean public API:

				
					geo_partition_apply()

Internally, it is composed of:

● geo_partition_build_specs_from_yb_servers()
● geo_partition_create_tablespaces()
● geo_partition_create_partitions()
● geo_partition_create_partition_indexes()

🔧 Core Types

Each partition defines its own placement strategy.

				
					DROP TYPE IF EXISTS geo_placement_block_spec CASCADE;
DROP TYPE IF EXISTS geo_partition_spec CASCADE;

CREATE TYPE geo_placement_block_spec AS (
    cloud             TEXT,
    region            TEXT,
    zones             TEXT[],
    min_num_replicas  INT,
    leader_preference INT
);

CREATE TYPE geo_partition_spec AS (
    partition_value TEXT,
    table_rf        INT,
    placements      geo_placement_block_spec[]
);

This lets us support:

● RF3 tablespaces in an RF5 universe
● One-region locality-optimized partitions
● Multi-region latency-optimized partitions
● Leader preference ranking like 1, 2, 3

🔎 Why This Matters

The RF of a tablespace does not have to blindly follow the RF of the universe. A universe may be RF5, while a specific geo-partitioned table or index tablespace may intentionally use RF3.

🔍 Validation Layer (Key for Production)

Before generating any DDL, the helper validates:

● Cloud / region / zone exist in yb_servers()
● Replica counts align with RF
● Placement definitions are sane

				
					CREATE OR REPLACE FUNCTION geo_partition_validate_specs(
    p_specs geo_partition_spec[]
) RETURNS VOID AS $$
DECLARE
    v_spec       geo_partition_spec;
    v_block      geo_placement_block_spec;
    v_zone       TEXT;
    v_found      INT;
    v_rf_sum     INT;
BEGIN
    FOREACH v_spec IN ARRAY p_specs LOOP
        v_rf_sum := 0;

        IF v_spec.placements IS NULL OR array_length(v_spec.placements,1) IS NULL THEN
            RAISE EXCEPTION 'Partition % has no placement blocks', v_spec.partition_value;
        END IF;

        FOREACH v_block IN ARRAY v_spec.placements LOOP

            IF v_block.min_num_replicas <= 0 THEN
                RAISE EXCEPTION 'Invalid replica count in partition %', v_spec.partition_value;
            END IF;

            v_rf_sum := v_rf_sum + v_block.min_num_replicas;

            FOREACH v_zone IN ARRAY v_block.zones LOOP
                SELECT COUNT(*) INTO v_found
                FROM yb_servers()
                WHERE cloud = v_block.cloud
                  AND region = v_block.region
                  AND zone = v_zone;

                IF v_found = 0 THEN
                    RAISE EXCEPTION
                        'Invalid placement: %.%.% not found in yb_servers()',
                        v_block.cloud, v_block.region, v_zone;
                END IF;
            END LOOP;
        END LOOP;

        IF v_spec.table_rf IS NOT NULL AND v_rf_sum <> v_spec.table_rf THEN
            RAISE WARNING
                'Partition % RF mismatch: expected %, got %',
                v_spec.partition_value, v_spec.table_rf, v_rf_sum;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

🌐 Auto-Discovery from `yb_servers()`

				
					CREATE OR REPLACE FUNCTION geo_partition_build_specs_from_yb_servers(
    p_default_rf INT DEFAULT 1
) RETURNS geo_partition_spec[] AS $$
DECLARE
    v_specs geo_partition_spec[];
BEGIN
    WITH regions AS (
        SELECT
            cloud,
            region,
            array_agg(DISTINCT zone ORDER BY zone) AS zones
        FROM yb_servers()
        WHERE cloud IS NOT NULL
          AND region IS NOT NULL
          AND zone IS NOT NULL
        GROUP BY cloud, region
    )
    SELECT array_agg(
        ROW(
            region,
            p_default_rf,
            ARRAY[
                ROW(
                    cloud,
                    region,
                    zones,
                    array_length(zones, 1),
                    1
                )::geo_placement_block_spec
            ]
        )::geo_partition_spec
        ORDER BY region
    )
    INTO v_specs
    FROM regions;

    IF v_specs IS NULL THEN
        RAISE EXCEPTION 'No placement topology found from yb_servers()';
    END IF;

    RETURN v_specs;
END;
$$ LANGUAGE plpgsql;

⚙️ Tablespace Generator

				
					CREATE OR REPLACE FUNCTION geo_partition_create_tablespaces(
    p_specs geo_partition_spec[]
) RETURNS SETOF TEXT AS $$
DECLARE
    v_spec  geo_partition_spec;
    v_block geo_placement_block_spec;
    v_json  JSONB;
    v_blocks JSONB;
    v_zone TEXT;
    v_tablespace TEXT;
BEGIN
    PERFORM geo_partition_validate_specs(p_specs);

    FOREACH v_spec IN ARRAY p_specs LOOP

        v_tablespace := lower(v_spec.partition_value) || '_tablespace';
        v_blocks := '[]'::jsonb;

        FOREACH v_block IN ARRAY v_spec.placements LOOP
            FOREACH v_zone IN ARRAY v_block.zones LOOP
                v_blocks := v_blocks || jsonb_build_array(
                    jsonb_build_object(
                        'cloud', v_block.cloud,
                        'region', v_block.region,
                        'zone', v_zone,
                        'min_num_replicas', v_block.min_num_replicas,
                        'leader_preference', v_block.leader_preference
                    )
                );
            END LOOP;
        END LOOP;

        v_json := jsonb_build_object(
            'num_replicas', v_spec.table_rf,
            'placement_blocks', v_blocks
        );

        IF NOT EXISTS (SELECT 1 FROM pg_tablespace WHERE spcname = v_tablespace) THEN
            RETURN NEXT format(
                'CREATE TABLESPACE %I WITH (replica_placement=%s)',
                v_tablespace,
                quote_literal(v_json::text)
            );
        END IF;

    END LOOP;
END;
$$ LANGUAGE plpgsql;

🧱 Partition + Index Generator

				
					CREATE OR REPLACE FUNCTION geo_partition_create_partitions(
    p_parent_table TEXT,
    p_specs        geo_partition_spec[],
    p_index_exprs  TEXT[]
) RETURNS SETOF TEXT AS $$
DECLARE
    v_spec geo_partition_spec;
    v_partition TEXT;
    v_tablespace TEXT;
    v_idx_expr TEXT;
    v_idx_name TEXT;
    v_idx_seq INT;
BEGIN
    FOREACH v_spec IN ARRAY p_specs LOOP

        v_partition := p_parent_table || '_' || lower(v_spec.partition_value);
        v_tablespace := lower(v_spec.partition_value) || '_tablespace';

        RETURN NEXT format(
            'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I FOR VALUES IN (%L) TABLESPACE %I',
            v_partition, p_parent_table, v_spec.partition_value, v_tablespace
        );

        IF p_index_exprs IS NOT NULL THEN
            v_idx_seq := 0;

            FOREACH v_idx_expr IN ARRAY p_index_exprs LOOP
                v_idx_seq := v_idx_seq + 1;
                v_idx_name := v_partition || '_idx' || v_idx_seq;

                RETURN NEXT format(
                    'CREATE INDEX IF NOT EXISTS %I ON %I (%s) TABLESPACE %I',
                    v_idx_name, v_partition, v_idx_expr, v_tablespace
                );
            END LOOP;
        END IF;

    END LOOP;
END;
$$ LANGUAGE plpgsql;

🚀 Main Entry Point

				
					CREATE OR REPLACE FUNCTION geo_partition_apply(
    p_parent_table TEXT,
    p_partition_column TEXT,
    p_specs geo_partition_spec[] DEFAULT NULL,
    p_index_exprs TEXT[] DEFAULT NULL,
    p_default_rf INT DEFAULT 3,
    p_auto_discover BOOLEAN DEFAULT false
) RETURNS SETOF TEXT AS $$
DECLARE
    v_specs geo_partition_spec[];
BEGIN
    IF p_specs IS NULL AND p_auto_discover THEN
        v_specs := geo_partition_build_specs_from_yb_servers(p_default_rf);
    ELSE
        v_specs := p_specs;
    END IF;

    IF v_specs IS NULL THEN
        RAISE EXCEPTION 'No specs provided';
    END IF;

    RETURN QUERY SELECT * FROM geo_partition_create_tablespaces(v_specs);
    RETURN QUERY SELECT * FROM geo_partition_create_partitions(p_parent_table, v_specs, p_index_exprs);

END;
$$ LANGUAGE plpgsql;

🧠 How to Use It (Safe Workflow)

Always do this in two steps:

				
					-- Review first
SELECT geo_partition_apply(...);

-- Execute
SELECT geo_partition_apply(...);
\gexec

💡 Why This Matters

This isn’t just a convenience function. It turns geo-partitioning into a repeatable, validated deployment pattern instead of fragile hand-written DDL.

🧪 Demo Setup

Base table:

				
					CREATE TABLE bank_transactions (
    user_id       INTEGER NOT NULL,
    account_id    INTEGER NOT NULL,
    geo_partition VARCHAR NOT NULL,
    account_type  VARCHAR NOT NULL,
    amount        NUMERIC NOT NULL,
    txn_type      VARCHAR NOT NULL,
    created_at    TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id HASH, account_id, geo_partition)
) PARTITION BY LIST (geo_partition);

Demo 1: Locality-Optimized Geo-Partitioning

Each partition lives in a single region.

				
					SELECT geo_partition_apply(
    p_parent_table     => 'bank_transactions',
    p_partition_column => 'geo_partition',
    p_specs => ARRAY[
        ROW('east1',1,ARRAY[
            ROW('aws','us-east-1',ARRAY['us-east-1a'],1,1)
        ]::geo_placement_block_spec[])::geo_partition_spec,

        ROW('east2',1,ARRAY[
            ROW('aws','us-east-2',ARRAY['us-east-2a'],1,1)
        ]::geo_placement_block_spec[])::geo_partition_spec,

        ROW('west2',1,ARRAY[
            ROW('aws','us-west-2',ARRAY['us-west-2a'],1,1)
        ]::geo_placement_block_spec[])::geo_partition_spec
    ],
    p_index_exprs => ARRAY['account_id']
);

\gexec

This generates tablespaces, partitions, and indexes tied to each region.

Example output:

				
					 CREATE TABLESPACE east1_tablespace WITH (replica_placement='{"num_replicas": 1, "placement_blocks": [{"zone": "us-east-1a", "cloud": "aws", "region": "us-east-1", "min_num_replicas": 1, "leader_preference": 1}]}')
 CREATE TABLESPACE east2_tablespace WITH (replica_placement='{"num_replicas": 1, "placement_blocks": [{"zone": "us-east-2a", "cloud": "aws", "region": "us-east-2", "min_num_replicas": 1, "leader_preference": 1}]}')
 CREATE TABLESPACE west2_tablespace WITH (replica_placement='{"num_replicas": 1, "placement_blocks": [{"zone": "us-west-2a", "cloud": "aws", "region": "us-west-2", "min_num_replicas": 1, "leader_preference": 1}]}')
 CREATE TABLE IF NOT EXISTS bank_transactions_east1 PARTITION OF bank_transactions FOR VALUES IN ('east1') TABLESPACE east1_tablespace
 CREATE INDEX IF NOT EXISTS bank_transactions_east1_idx1 ON bank_transactions_east1 (account_id) TABLESPACE east1_tablespace
 CREATE TABLE IF NOT EXISTS bank_transactions_east2 PARTITION OF bank_transactions FOR VALUES IN ('east2') TABLESPACE east2_tablespace
 CREATE INDEX IF NOT EXISTS bank_transactions_east2_idx1 ON bank_transactions_east2 (account_id) TABLESPACE east2_tablespace
 CREATE TABLE IF NOT EXISTS bank_transactions_west2 PARTITION OF bank_transactions FOR VALUES IN ('west2') TABLESPACE west2_tablespace
 CREATE INDEX IF NOT EXISTS bank_transactions_west2_idx1 ON bank_transactions_west2 (account_id) TABLESPACE west2_tablespace

📍 Locality-Optimized Pattern

Use this when the goal is to keep rows close to a specific region, usually for data residency or low-latency regional access.

Demo 2: Latency-Optimized Geo-Partitioning

Leader preference ranking across regions.

				
					SELECT geo_partition_apply(
    p_parent_table     => 'bank_transactions',
    p_partition_column => 'geo_partition',
    p_specs => ARRAY[
        ROW('east_user',3,ARRAY[
            ROW('aws','us-east-1',ARRAY['us-east-1a'],1,1),
            ROW('aws','us-east-2',ARRAY['us-east-2a'],1,2),
            ROW('aws','us-west-2',ARRAY['us-west-2a'],1,3)
        ]::geo_placement_block_spec[])::geo_partition_spec,

        ROW('west_user',3,ARRAY[
            ROW('aws','us-west-2',ARRAY['us-west-2a'],1,1),
            ROW('aws','us-east-2',ARRAY['us-east-2a'],1,2),
            ROW('aws','us-east-1',ARRAY['us-east-1a'],1,3)
        ]::geo_placement_block_spec[])::geo_partition_spec
    ],
    p_index_exprs => ARRAY['account_id']
);

\gexec

This lets each partition define its own leader preference sequence.

Example output:

				
					 CREATE TABLESPACE east_user_tablespace WITH (replica_placement='{"num_replicas": 3, "placement_blocks": [{"zone": "us-east-1a", "cloud": "aws", "region": "us-east-1", "min_num_replicas": 1, "leader_preference": 1}, {"zone": "us-east-2a", "cloud": "aws", "region": "us-east-2", "min_num_replicas": 1, "leader_preference": 2}, {"zone": "us-west-2a", "cloud": "aws", "region": "us-west-2", "min_num_replicas": 1, "leader_preference": 3}]}')
 CREATE TABLESPACE west_user_tablespace WITH (replica_placement='{"num_replicas": 3, "placement_blocks": [{"zone": "us-west-2a", "cloud": "aws", "region": "us-west-2", "min_num_replicas": 1, "leader_preference": 1}, {"zone": "us-east-2a", "cloud": "aws", "region": "us-east-2", "min_num_replicas": 1, "leader_preference": 2}, {"zone": "us-east-1a", "cloud": "aws", "region": "us-east-1", "min_num_replicas": 1, "leader_preference": 3}]}')
 CREATE TABLE IF NOT EXISTS bank_transactions_east_user PARTITION OF bank_transactions FOR VALUES IN ('east_user') TABLESPACE east_user_tablespace
 CREATE INDEX IF NOT EXISTS bank_transactions_east_user_idx1 ON bank_transactions_east_user (account_id) TABLESPACE east_user_tablespace
 CREATE TABLE IF NOT EXISTS bank_transactions_west_user PARTITION OF bank_transactions FOR VALUES IN ('west_user') TABLESPACE west_user_tablespace
 CREATE INDEX IF NOT EXISTS bank_transactions_west_user_idx1 ON bank_transactions_west_user (account_id) TABLESPACE west_user_tablespace

⚡ Latency-Optimized Pattern

Use this when you want each partition to prefer leaders near the application or user population, while still keeping replicas in other regions for availability and fallback.

Demo 3: RF Override

In real deployments, your universe RF and workload RF are not always the same.

Example:

● Universe = RF5 (for durability)
● Specific table = RF3 (for cost + performance)

With the helper, RF is defined per partition.

				
					SELECT geo_partition_apply(
    p_parent_table     => 'bank_transactions',
    p_partition_column => 'geo_partition',
    p_specs => ARRAY[
        ROW('east_user',3,ARRAY[
            ROW('aws','us-east-1',ARRAY['us-east-1a'],1,1),
            ROW('aws','us-east-2',ARRAY['us-east-2a'],1,2),
            ROW('aws','us-west-2',ARRAY['us-west-2a'],1,3)
        ]::geo_placement_block_spec[])::geo_partition_spec,

        ROW('west_user',3,ARRAY[
            ROW('aws','us-west-2',ARRAY['us-west-2a'],1,1),
            ROW('aws','us-east-2',ARRAY['us-east-2a'],1,2),
            ROW('aws','us-east-1',ARRAY['us-east-1a'],1,3)
        ]::geo_placement_block_spec[])::geo_partition_spec
    ],
    p_index_exprs => ARRAY['account_id']
);

\gexec

Example output:

				
					 CREATE TABLESPACE east_user_tablespace WITH (replica_placement='{"num_replicas": 3, "placement_blocks": [{"zone": "us-east-1a", "cloud": "aws", "region": "us-east-1", "min_num_replicas": 1, "leader_preference": 1}, {"zone": "us-east-2a", "cloud": "aws", "region": "us-east-2", "min_num_replicas": 1, "leader_preference": 2}, {"zone": "us-west-2a", "cloud": "aws", "region": "us-west-2", "min_num_replicas": 1, "leader_preference": 3}]}')
 CREATE TABLESPACE west_user_tablespace WITH (replica_placement='{"num_replicas": 3, "placement_blocks": [{"zone": "us-west-2a", "cloud": "aws", "region": "us-west-2", "min_num_replicas": 1, "leader_preference": 1}, {"zone": "us-east-2a", "cloud": "aws", "region": "us-east-2", "min_num_replicas": 1, "leader_preference": 2}, {"zone": "us-east-1a", "cloud": "aws", "region": "us-east-1", "min_num_replicas": 1, "leader_preference": 3}]}')
 CREATE TABLE IF NOT EXISTS bank_transactions_east_user PARTITION OF bank_transactions FOR VALUES IN ('east_user') TABLESPACE east_user_tablespace
 CREATE INDEX IF NOT EXISTS bank_transactions_east_user_idx1 ON bank_transactions_east_user (account_id) TABLESPACE east_user_tablespace
 CREATE TABLE IF NOT EXISTS bank_transactions_west_user PARTITION OF bank_transactions FOR VALUES IN ('west_user') TABLESPACE west_user_tablespace
 CREATE INDEX IF NOT EXISTS bank_transactions_west_user_idx1 ON bank_transactions_west_user (account_id) TABLESPACE west_user_tablespace

🔧 Why RF Override Matters

● Tune durability vs cost per workload
● Avoid over-replication for non-critical data
● Keep global tables lean while the universe stays highly durable

Demo 4: Duplicate Covering Indexes

Duplicate covering indexes are useful when the table is global, but you want each region to have a local index leader for fast, immediately consistent reads.

YugabyteDB notes that duplicate indexes can lower read latency by allowing applications to read from a local duplicate index, although writes must maintain multiple indexes.

Duplicate Covering Index Generator

				
					CREATE OR REPLACE FUNCTION geo_partition_create_duplicate_covering_indexes(
    p_table_name    TEXT,
    p_index_prefix  TEXT,
    p_key_exprs     TEXT[],
    p_include_cols  TEXT[] DEFAULT NULL,
    p_specs         geo_partition_spec[] DEFAULT NULL,
    p_default_rf    INT DEFAULT 1,
    p_auto_discover BOOLEAN DEFAULT false
) RETURNS SETOF TEXT AS $$
DECLARE
    v_specs       geo_partition_spec[];
    v_spec        geo_partition_spec;
    v_tablespace  TEXT;
    v_index_name  TEXT;
    v_include_sql TEXT;
BEGIN
    ----------------------------------------------------------------------
    -- Build specs (manual or auto)
    ----------------------------------------------------------------------
    IF p_specs IS NULL AND p_auto_discover THEN
        v_specs := geo_partition_build_specs_from_yb_servers(p_default_rf);
    ELSE
        v_specs := p_specs;
    END IF;

    IF v_specs IS NULL THEN
        RAISE EXCEPTION
            'No specs provided. Pass p_specs or set p_auto_discover=true.';
    END IF;

    ----------------------------------------------------------------------
    -- Build INCLUDE clause
    ----------------------------------------------------------------------
    IF p_include_cols IS NULL OR array_length(p_include_cols, 1) IS NULL THEN
        v_include_sql := '';
    ELSE
        v_include_sql := format(' INCLUDE (%s)', array_to_string(p_include_cols, ', '));
    END IF;

    ----------------------------------------------------------------------
    -- Generate one index per region / tablespace
    ----------------------------------------------------------------------
    FOREACH v_spec IN ARRAY v_specs LOOP

        v_tablespace :=
            lower(regexp_replace(v_spec.partition_value, '[^a-zA-Z0-9]+', '_', 'g'))
            || '_tablespace';

        v_index_name :=
            lower(regexp_replace(p_index_prefix || '_' || v_spec.partition_value, '[^a-zA-Z0-9]+', '_', 'g'));

        RETURN NEXT format(
            'CREATE INDEX IF NOT EXISTS %I ON %I (%s)%s TABLESPACE %I',
            v_index_name,
            p_table_name,
            array_to_string(p_key_exprs, ', '),
            v_include_sql,
            v_tablespace
        );

    END LOOP;

END;
$$ LANGUAGE plpgsql;

Create a global lookup-style table:

				
					CREATE TABLE product_catalog (
    sku         TEXT PRIMARY KEY,
    name        TEXT NOT NULL,
    category    TEXT NOT NULL,
    price       NUMERIC NOT NULL,
    updated_at  TIMESTAMP DEFAULT NOW()
);

Usage (Auto-Discovery)

				
					SELECT geo_partition_create_duplicate_covering_indexes(
    p_table_name    => 'product_catalog',
    p_index_prefix  => 'category_local',
    p_key_exprs     => ARRAY['category'],
    p_include_cols  => ARRAY['sku','name','price'],
    p_auto_discover => true,
    p_default_rf    => 1
);

\gexec

📌 Duplicate Index Pattern

Duplicate covering indexes are not about redundancy. They are a global read performance pattern. Each index can have a local leader in a different region, allowing consistent reads without always crossing regions.

Operational Helper: Detect Missing Partitions from Actual Data

One more useful helper: detect partition values in the data that do not have matching partitions.

				
					CREATE OR REPLACE FUNCTION find_missing_geo_partitions(
    p_parent_table      REGCLASS,
    p_partition_column TEXT
) RETURNS TABLE (
    missing_partition_value TEXT
) AS $$
BEGIN
    RETURN QUERY EXECUTE format(
        $SQL$
        WITH existing_partitions AS (
            SELECT
                regexp_replace(
                    pg_get_expr(c.relpartbound, c.oid),
                    '^FOR VALUES IN \(''(.*)''\)$',
                    '\1'
                ) AS partition_value
            FROM pg_class c
            JOIN pg_inherits i
              ON c.oid = i.inhrelid
            WHERE i.inhparent = %L::regclass
        ),
        actual_values AS (
            SELECT DISTINCT %I::text AS partition_value
            FROM %s
            WHERE %I IS NOT NULL
        )
        SELECT a.partition_value
        FROM actual_values a
        LEFT JOIN existing_partitions e
          ON e.partition_value = a.partition_value
        WHERE e.partition_value IS NULL
        ORDER BY a.partition_value
        $SQL$,
        p_parent_table::TEXT,
        p_partition_column,
        p_parent_table,
        p_partition_column
    );
END;
$$ LANGUAGE plpgsql;

Example:

				
					SELECT *
FROM find_missing_geo_partitions(
    'bank_transactions',
    'geo_partition'
);

This is useful when the application starts sending a new geo value before the partition has been created.

What This Helper Now Supports

Feature	Supported?	Why It Matters
Topology validation using `yb_servers()`	Yes	Avoids invalid cloud, region, and zone values
Locality-optimized geo-partitioning	Yes	Pins rows to a specific region
Latency-optimized geo-partitioning	Yes	Supports ranked leader preference across regions
Per-partition RF override	Yes	Allows RF3 tablespaces in a larger RF universe
Per-partition indexes	Yes	Keeps secondary indexes aligned with partition locality
Duplicate covering indexes	Yes	Improves globally local consistent reads
Missing partition detection	Yes	Detects drift between application geo values and configured partitions

🏁 Final Takeaway

Hari’s original helper solved the first big problem: too much hand-written geo-partitioning DDL.

This evolution goes further… it makes geo-partitioning topology-aware and operationally safe.

By using yb_servers() as the source of truth, the helper can:

● Generate placement aligned with the actual cluster
● Support locality- and latency-optimized patterns
● Apply RF overrides per partition
● Create duplicate covering indexes for global reads

And just as importantly…

👉 It helps you detect when reality drifts from design

The ability to identify missing partitions from actual data closes the loop between:

● what your application is writing
● and what your database is prepared to handle

🎯 Final Thought

This isn’t just about generating less SQL.

It’s about generating the right architecture… and continuously validating that it stays correct as your system evolves.

Have Fun!