Fixing Unintentionally Range-Sharded Tables in YugabyteDB

If you upgraded into the YugabyteDB 2024.1 era and enabled Enhanced PostgreSQL Compatibility Mode, there is an easy mistake to miss:

You create a table with a plain old PRIMARY KEY, assume YugabyteDB will hash distribute it like before, and later discover the table is actually range sharded.

That can be a nasty surprise.

For many workloads, especially write-heavy or skew-prone ones, that silent shift can mean:

● uneven distribution
● hotspot risk
● performance that no longer matches what you expected from a distributed SQL database

The good news is that if you catch this early, you can audit the affected objects and generate the exact DDL needed to rebuild them with HASH.

In this tip, you’ll learn how to:

● identify range-sharded primary keys
● identify range-sharded secondary indexes
● generate corrective DDL with a helper function
● preserve useful details like DESC, INCLUDE (...), and partial WHERE clauses
● avoid confusing YugabyteDB primary keys with CREATE UNIQUE INDEX syntax

🔎 What this tip covers

If you relied on default sharding behavior instead of explicitly writing HASH, some newer YugabyteDB environments may now create primary keys and indexes as range sharded instead. This tip shows how to find them and generate the DDL to fix them.

Why this happens

A lot of us got used to thinking of YugabyteDB like this:

“If I write PRIMARY KEY (...), it’ll be hash sharded unless I explicitly say otherwise.”

That assumption is no longer always safe.

If the default changes, your schema behavior changes with it.

So this:

				
					CREATE TABLE test (
  c1 int PRIMARY KEY,
  c2 text
);

may not behave the way you think.

What actually gets created depends on the default

If hash splitting is the default, yb_use_hash_splitting_by_default = true, the intent is effectively:

				
					CREATE TABLE test (
  c1 int,
  c2 text,
  PRIMARY KEY (c1 HASH)
);

If range splitting is the default, yb_use_hash_splitting_by_default = false, it behaves more like:

				
					CREATE TABLE test (
  c1 int,
  c2 text,
  PRIMARY KEY (c1 ASC)
);

That is a very different storage and distribution strategy.

Sharding scheme comparison

Sharding Scheme	Best At	Trade-off	Multi-column Behavior
HASH	Even distribution, predictable performance, avoids hotspots, efficient point lookups	Range scans can be less efficient	Hashed key columns drive distribution
Range (ASC/DESC)	Ordered access, efficient range queries, natural clustering by primary key order	Leading-key skew can create hotspots and uneven distribution	Leading column heavily influences distribution pattern

When this tip is appropriate

This tip is for the case where you caught the issue early and want YugabyteDB to generate the corrective DDL for you.

It is most appropriate when:

● the table is still small
● the workload impact of rebuilding the primary key is acceptable
● you want to audit or repair recent schema objects that were created with the wrong default

It is not the right blanket production strategy for every large tables, because recreating a primary key rewrites the table’s storage. YugabyteDB’s table design is index-oriented, and the primary key defines the table’s storage layout.

🛠️ Important production note

Rebuilding a primary key is not just an index change. In YugabyteDB, the primary key defines the table’s storage layout, so dropping and recreating it rewrites table storage. This helper is best for early detection, smaller tables, or carefully planned maintenance windows.

Audit What’s Actually Range Sharded

Before fixing anything, let’s identify which objects are currently range sharded.

We’ll split this into two parts:

● Primary keys → show the actual PK definition
● Secondary indexes → show just the indexed columns

This keeps things clean and avoids confusing PKs with index DDL.

🧱 Find Range-Sharded Primary Keys

				
					SELECT
    c.oid::regclass AS table_name,
    con.conname     AS pk_name,
    'PRIMARY KEY (' ||
    string_agg(
        format(
            '%I %s',
            a.attname,
            CASE
                WHEN (i.indoption[k.ord - 1] & 1) = 1 THEN 'DESC'
                ELSE 'ASC'
            END
        ),
        ', ' ORDER BY k.ord
    ) || ')' AS pk_definition
FROM pg_constraint con
JOIN pg_class c
  ON c.oid = con.conrelid
JOIN pg_namespace n
  ON n.oid = c.relnamespace
JOIN pg_index i
  ON i.indexrelid = con.conindid
CROSS JOIN LATERAL unnest(i.indkey) WITH ORDINALITY AS k(attnum, ord)
JOIN pg_attribute a
  ON a.attrelid = i.indrelid
 AND a.attnum   = k.attnum
WHERE con.contype = 'p'
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
  AND k.ord <= i.indnkeyatts
  AND position(' HASH' in upper(pg_get_indexdef(con.conindid))) = 0
GROUP BY c.oid, con.conname
ORDER BY 1;

Example output:

				
					.table_name |  pk_name   |         pk_definition
------------+------------+-------------------------------
 test       | test_pkey  | PRIMARY KEY (c1 ASC)
 test4      | test4_pkey | PRIMARY KEY (c1 ASC, c2 DESC)

🧱 Find Range-Sharded Secondary Indexes

				
					SELECT
    i.indexrelid::regclass AS index_name,
    i.indrelid::regclass   AS table_name,
    string_agg(
        format(
            '%I %s',
            a.attname,
            CASE
                WHEN (i.indoption[k.ord - 1] & 1) = 1 THEN 'DESC'
                ELSE 'ASC'
            END
        ),
        ', ' ORDER BY k.ord
    ) AS index_keys
FROM pg_index i
JOIN pg_class ic
  ON ic.oid = i.indexrelid
JOIN pg_namespace n
  ON n.oid = ic.relnamespace
CROSS JOIN LATERAL unnest(i.indkey) WITH ORDINALITY AS k(attnum, ord)
JOIN pg_attribute a
  ON a.attrelid = i.indrelid
 AND a.attnum   = k.attnum
WHERE NOT i.indisprimary
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
  AND k.ord <= i.indnkeyatts
  AND position(' HASH' in upper(pg_get_indexdef(i.indexrelid))) = 0
GROUP BY i.indexrelid, i.indrelid
ORDER BY 2, 1;

Example output:

				
					.  index_name    | table_name |   index_keys
-----------------+------------+-----------------
 test_c2_idx     | test       | c2 ASC
 test4_c2_c3_idx | test4      | c2 ASC, c3 DESC

🧠 What you are looking for

Any primary key or secondary index that does not include HASH is range sharded. In write-heavy or skew-prone workloads, that can lead to uneven distribution and hotspots.

🧱 Find Range-Sharded Secondary Indexes (Excluding Date Type Leading Columns)

A hash-sharded index on date or timestamp columns in YugabyteDB is highly inefficient because it randomly scatters naturally sequential data across tablets, reducing common range queries and sorting operations to expensive full-cluster scans while offering no practical benefit for exact-match point lookups.

This updated ‘Find Range Sharded Secondary Index’ query excludes indexes that are already correctly defined as range-partitioned:

				
					SELECT
    i.indexrelid::regclass AS index_name,
    i.indrelid::regclass   AS table_name,
    string_agg(
        format(
            '%I %s',
            a.attname,
            CASE
                WHEN (i.indoption[k.ord - 1] & 1) = 1 THEN 'DESC'
                ELSE 'ASC'
            END
        ),
        ', ' ORDER BY k.ord
    ) AS index_keys
FROM pg_index i
JOIN pg_class ic
  ON ic.oid = i.indexrelid
JOIN pg_namespace n
  ON n.oid = ic.relnamespace
CROSS JOIN LATERAL unnest(i.indkey) WITH ORDINALITY AS k(attnum, ord)
JOIN pg_attribute a
  ON a.attrelid = i.indrelid
 AND a.attnum   = k.attnum
WHERE NOT i.indisprimary
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
  AND k.ord <= i.indnkeyatts
  AND position(' HASH' in upper(pg_get_indexdef(i.indexrelid))) = 0
  -- New condition: Exclude the index if the first column is a date/time type
  AND NOT EXISTS (
      SELECT 1
      FROM pg_attribute a_idx
      JOIN pg_type t_idx ON t_idx.oid = a_idx.atttypid
      WHERE a_idx.attrelid = i.indexrelid
        AND a_idx.attnum = 1         -- First column in the index
        AND t_idx.typcategory = 'D'  -- 'D' represents all Date/Time data types
  )
GROUP BY i.indexrelid, i.indrelid
ORDER BY 2, 1;

⚙️ Generate the Repaire DDL

The function below takes the relid of either:

● a table ➔ it inspects the table’s primary key
● an index ➔ it inspects the index directly

If the object is range sharded, it returns the DROP and CREATE DDL needed to rebuild it with HASH added to the first key column.

It also supports the common “pretty manageable” index cases:

● ordinary column-based indexes
● UNIQUE indexes
● INCLUDE (...)
● partial indexes with WHERE
● collations
● opclasses

It intelligently handles constraint-backed indexes:

● Automatically detects if an index belongs to a PRIMARY KEY or UNIQUE constraint.
● Outputs ALTER TABLE ... DROP/ADD CONSTRAINT instead of standard index DDL.
● Safely strips unsupported syntax modifiers (like HASH or ASC/DESC) from constraint definitions to prevent execution errors.

For more complex definitions, such as expression indexes, the function deliberately tells you to handle them manually.

				
					CREATE OR REPLACE FUNCTION yb_get_hash_rebuild_ddl(p_relid regclass)
RETURNS TABLE (
    input_relid          oid,
    object_type          text,
    schema_name          text,
    object_name          text,
    base_table           text,
    is_range_sharded     boolean,
    drop_ddl             text,
    create_ddl           text,
    notes                text
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_relkind           "char";
    v_nspname           text;
    v_relname           text;

    v_index_oid         oid;
    v_index_name        text;
    v_index_nspname     text;
    v_table_oid         oid;
    v_table_name        text;
    v_table_nspname     text;

    v_conname           text;
    v_is_pk             boolean;
    v_has_hash          boolean;
    v_is_unique         boolean;
    v_predicate         text;

    v_keycols            text;
    v_constraint_keycols text;
    v_includecols        text;
BEGIN
    SELECT c.relkind, n.nspname, c.relname
      INTO v_relkind, v_nspname, v_relname
      FROM pg_class c
      JOIN pg_namespace n ON n.oid = c.relnamespace
     WHERE c.oid = p_relid;

    IF NOT FOUND THEN
        RAISE EXCEPTION 'Object % not found', p_relid;
    END IF;

    IF v_relkind IN ('r','p') THEN
        SELECT con.conindid,
               con.conname,
               ct.oid,
               ct.relname,
               nt.nspname
          INTO v_index_oid,
               v_conname,
               v_table_oid,
               v_table_name,
               v_table_nspname
          FROM pg_constraint con
          JOIN pg_class ct
            ON ct.oid = con.conrelid
          JOIN pg_namespace nt
            ON nt.oid = ct.relnamespace
         WHERE con.conrelid = p_relid
           AND con.contype = 'p';

        IF v_index_oid IS NULL THEN
            RAISE EXCEPTION 'Table %.% has no primary key', v_nspname, v_relname;
        END IF;

        v_index_name    := v_conname;
        v_index_nspname := v_table_nspname;
        v_is_unique     := true;
        v_predicate     := NULL;

    ELSIF v_relkind = 'i' THEN
        SELECT c.relname,
               n.nspname,
               i.indrelid,
               ct.relname,
               nt.nspname,
               i.indisunique,
               i.indisprimary,
               pg_get_expr(i.indpred, i.indrelid)
          INTO v_index_name,
               v_index_nspname,
               v_table_oid,
               v_table_name,
               v_table_nspname,
               v_is_unique,
               v_is_pk,
               v_predicate
          FROM pg_class c
          JOIN pg_namespace n ON n.oid = c.relnamespace
          JOIN pg_index i ON i.indexrelid = c.oid
          JOIN pg_class ct ON ct.oid = i.indrelid
          JOIN pg_namespace nt ON nt.oid = ct.relnamespace
         WHERE c.oid = p_relid;

        IF v_table_oid IS NULL THEN
            RAISE EXCEPTION 'Index % not found', p_relid;
        END IF;

        IF v_is_pk THEN
            RAISE EXCEPTION
              'Index %.% is the backing index for a primary key. Pass the table relid instead.',
              v_index_nspname, v_index_name;
        END IF;

        -- Check if this index is actually backing a UNIQUE constraint
        SELECT conname
          INTO v_conname
          FROM pg_constraint
         WHERE conindid = p_relid
           AND contype = 'u';

        v_index_oid := p_relid;

    ELSE
        RAISE EXCEPTION
          'Unsupported relkind % for object %.%. Pass a table or index relid.',
          v_relkind, v_nspname, v_relname;
    END IF;

    SELECT position(' HASH' in upper(pg_get_indexdef(v_index_oid))) > 0
      INTO v_has_hash;

    -- EARLY EXIT: Skip if expression index
    IF EXISTS (
        SELECT 1
          FROM pg_index
         WHERE indexrelid = v_index_oid
           AND indexprs IS NOT NULL
    ) THEN
        input_relid      := p_relid;
        object_type      := CASE 
                                WHEN v_relkind IN ('r','p') THEN 'primary_key' 
                                WHEN v_conname IS NOT NULL THEN 'unique_constraint'
                                ELSE 'index' 
                            END;
        schema_name      := v_index_nspname;
        object_name      := COALESCE(v_index_name, v_conname);
        base_table       := format('%I.%I', v_table_nspname, v_table_name);
        is_range_sharded := NULL;
        drop_ddl         := NULL;
        create_ddl       := NULL;
        notes            := 'Expression index detected. Handle manually.';
        RETURN NEXT;
        RETURN;
    END IF;

    input_relid      := p_relid;
    object_type      := CASE 
                            WHEN v_relkind IN ('r','p') THEN 'primary_key' 
                            WHEN v_conname IS NOT NULL THEN 'unique_constraint'
                            ELSE 'index' 
                        END;
    schema_name      := v_index_nspname;
    object_name      := COALESCE(v_conname, v_index_name);
    base_table       := format('%I.%I', v_table_nspname, v_table_name);
    is_range_sharded := NOT v_has_hash;

    -- EARLY EXIT: Skip if already HASH sharded
    IF v_has_hash THEN
        drop_ddl   := NULL;
        create_ddl := NULL;
        notes      := 'Already HASH sharded. No rebuild DDL generated.';
        RETURN NEXT;
        RETURN;
    END IF;

    -- 1. Key Columns WITH explicit modifiers (for CREATE INDEX and PRIMARY KEY)
    SELECT string_agg(
               CASE
                   WHEN k.ord = 1 THEN
                       -- Force HASH on the first column
                       format('%I HASH', att.attname)
                   ELSE
                       -- Use ASC/DESC for remaining columns
                       trim(both ' ' from concat_ws(' ',
                           format('%I', att.attname),
                           CASE
                               WHEN coll.oid IS NOT NULL
                                    AND coll.collname IS NOT NULL
                                    AND coll.collname <> 'default'
                               THEN format('COLLATE %I', coll.collname)
                               ELSE NULL
                           END,
                           CASE
                               WHEN opc.oid IS NOT NULL
                                    AND opc.opcdefault IS DISTINCT FROM true
                               THEN format('%I.%I', opcnsp.nspname, opc.opcname)
                               ELSE NULL
                           END,
                           CASE
                               WHEN (idx.indoption[k.ord - 1] & 1) = 1 THEN 'DESC'
                               ELSE 'ASC'
                           END
                       ))
               END,
               ', ' ORDER BY k.ord
           )
      INTO v_keycols
      FROM pg_index idx
      CROSS JOIN LATERAL unnest(idx.indkey) WITH ORDINALITY AS k(attnum, ord)
      JOIN pg_attribute att
        ON att.attrelid = idx.indrelid
       AND att.attnum   = k.attnum
      LEFT JOIN pg_collation coll
        ON coll.oid = idx.indcollation[k.ord - 1]
      LEFT JOIN pg_opclass opc
        ON opc.oid = idx.indclass[k.ord - 1]
      LEFT JOIN pg_namespace opcnsp
        ON opcnsp.oid = opc.opcnamespace
     WHERE idx.indexrelid = v_index_oid
       AND k.ord <= idx.indnkeyatts;

    -- 2. Key Columns WITHOUT modifiers (for UNIQUE constraints)
    SELECT string_agg(format('%I', att.attname), ', ' ORDER BY k.ord)
      INTO v_constraint_keycols
      FROM pg_index idx
      CROSS JOIN LATERAL unnest(idx.indkey) WITH ORDINALITY AS k(attnum, ord)
      JOIN pg_attribute att
        ON att.attrelid = idx.indrelid
       AND att.attnum   = k.attnum
     WHERE idx.indexrelid = v_index_oid
       AND k.ord <= idx.indnkeyatts;

    -- 3. INCLUDE Columns
    SELECT string_agg(format('%I', att.attname), ', ' ORDER BY k.ord)
      INTO v_includecols
      FROM pg_index idx
      CROSS JOIN LATERAL unnest(idx.indkey) WITH ORDINALITY AS k(attnum, ord)
      JOIN pg_attribute att
        ON att.attrelid = idx.indrelid
       AND att.attnum   = k.attnum
     WHERE idx.indexrelid = v_index_oid
       AND k.ord > idx.indnkeyatts;

    -- Build output DDL based on object type
    IF object_type = 'primary_key' THEN
        drop_ddl :=
            format('ALTER TABLE %I.%I DROP CONSTRAINT %I;',
                   v_table_nspname, v_table_name, v_conname);

        -- *** UPDATED: Uses v_keycols (Includes HASH/ASC/DESC) ***
        create_ddl :=
            format('ALTER TABLE %I.%I ADD CONSTRAINT %I PRIMARY KEY (%s)%s;',
                   v_table_nspname, v_table_name, v_conname, v_keycols,
                   CASE 
                       WHEN v_includecols IS NOT NULL THEN format(' INCLUDE (%s)', v_includecols) 
                       ELSE '' 
                   END);

        notes :=
            'Supports ordinary column-based PKs. Recreating a PK rewrites table storage. Expression indexes and exotic definitions should be handled manually.';

    ELSIF object_type = 'unique_constraint' THEN
        drop_ddl :=
            format('ALTER TABLE %I.%I DROP CONSTRAINT %I;',
                   v_table_nspname, v_table_name, v_conname);

        -- Uses v_constraint_keycols (No HASH/ASC/DESC)
        create_ddl :=
            format('ALTER TABLE %I.%I ADD CONSTRAINT %I UNIQUE (%s)%s;',
                   v_table_nspname, v_table_name, v_conname, v_constraint_keycols,
                   CASE 
                       WHEN v_includecols IS NOT NULL THEN format(' INCLUDE (%s)', v_includecols) 
                       ELSE '' 
                   END);

        notes :=
            'Supports UNIQUE constraints. Recreating constraints rewrites table storage. Expression indexes and exotic definitions should be handled manually.';

    ELSE
        drop_ddl :=
            format('DROP INDEX %I.%I;',
                   v_index_nspname, v_index_name);

        -- Uses v_keycols (Includes HASH/ASC/DESC)
        create_ddl :=
            format(
                'CREATE %sINDEX %I ON %I.%I USING lsm (%s)%s%s;',
                CASE WHEN v_is_unique THEN 'UNIQUE ' ELSE '' END,
                v_index_name,
                v_table_nspname,
                v_table_name,
                v_keycols,
                CASE
                    WHEN v_includecols IS NOT NULL THEN format(' INCLUDE (%s)', v_includecols)
                    ELSE ''
                END,
                CASE
                    WHEN v_predicate IS NOT NULL THEN format(' WHERE %s', v_predicate)
                    ELSE ''
                END
            );

        notes :=
            'Supports ordinary column indexes, INCLUDE columns, unique indexes, and partial predicates. Expression indexes and exotic definitions should be handled manually.';
    END IF;

    RETURN NEXT;
END;
$$;

🚑 use it

Say we have this table:

				
					yugabyte=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 c1     | integer |           | not null |
 c2     | text    |           |          |
Indexes:
    "test_pkey" PRIMARY KEY, lsm (c1 ASC)
    "test_c2_idx" lsm (c2 ASC)

We can generate the DDL to fix the PK using:

				
					\x
SELECT * FROM yb_get_hash_rebuild_ddl('public.test'::regclass);

Example output:

				
					input_relid      | 16399
object_type      | primary_key
schema_name      | public
object_name      | test_pkey
base_table       | public.test
is_range_sharded | t
drop_ddl         | ALTER TABLE public.test DROP CONSTRAINT test_pkey;
create_ddl       | ALTER TABLE public.test ADD CONSTRAINT test_pkey PRIMARY KEY (c1 HASH);
notes            | Supports ordinary column-based PKs. Recreating a PK rewrites table storage. Expression indexes and exotic definitions should be handled manually.

And we can do the same with the index:

				
					\x
SELECT * FROM yb_get_hash_rebuild_ddl('public.test_c2_idx'::regclass);

Example output:

				
					input_relid      | 16429
object_type      | index
schema_name      | public
object_name      | test_c2_idx
base_table       | public.test
is_range_sharded | t
drop_ddl         | DROP INDEX public.test_c2_idx;
create_ddl       | CREATE INDEX test_c2_idx ON public.test USING lsm (c2 HASH);
notes            | Supports ordinary column indexes, INCLUDE columns, unique indexes, and partial predicates. Expression indexes and exotic definitions should be handled manually.

To print the rebuild statements in execution order, use:

				
					SELECT step, ddl
FROM yb_get_hash_rebuild_ddl('<<object name>>'::regclass)
CROSS JOIN LATERAL (
  VALUES
    (1, drop_ddl),
    (2, create_ddl)
) AS x(step, ddl)
WHERE ddl IS NOT NULL
ORDER BY step;

Example:

				
					yugabyte=# SELECT step, ddl
yugabyte-# FROM yb_get_hash_rebuild_ddl('public.test4'::regclass)
yugabyte-# CROSS JOIN LATERAL (
yugabyte(#   VALUES
yugabyte(#     (1, drop_ddl),
yugabyte(#     (2, create_ddl)
yugabyte(# ) AS x(step, ddl)
yugabyte-# WHERE ddl IS NOT NULL
yugabyte-# ORDER BY step;
 step |                                        ddl
------+------------------------------------------------------------------------------------
    1 | ALTER TABLE public.test4 DROP CONSTRAINT test4_pkey;
    2 | ALTER TABLE public.test4 ADD CONSTRAINT test4_pkey PRIMARY KEY (c1 HASH, c2 DESC);
(2 rows)

💡 What this helper actually handles (and what it doesn’t)

This helper is designed for the common, safe cases when rebuilding indexes or primary keys with HASH distribution.

It works well with:

● Standard column-based indexes and primary keys
● UNIQUE indexes
● INCLUDE (…) columns
● Partial indexes (WHERE clauses)
● Collations and opclasses

⚠️ Where you need to step in

More complex definitions, like expression indexes, are intentionally skipped.

This prevents generating incorrect DDL, but it also means you will need to review and rebuild those manually.

🔑 Final takeaway

Defaults are convenient until they change.

In YugabyteDB, sharding behavior MUST be treated as part of your schema design, not as an environment assumption.

So the playbook is simple:

1. audit what is currently range sharded
2. generate the corrective DDL
3. review before running it
4. going forward, always write HASH explicitly when you mean hash sharding

🧠 Key insight

If distribution matters, do not rely on defaults. In YugabyteDB, be explicit: choose HASH for distribution, or ASC/DESC when you intentionally want ordered access patterns.

Have Fun!