If you upgraded into the YugabyteDB 2024.1 era and enabled Enhanced PostgreSQL Compatibility Mode, there is an easy mistake to miss:
That can be a nasty surprise.
For many workloads, especially write-heavy or skew-prone ones, that silent shift can mean:
- ● uneven distribution
- ● hotspot risk
- ● performance that no longer matches what you expected from a distributed SQL database
The good news is that if you catch this early, you can audit the affected objects and generate the exact DDL needed to rebuild them with HASH.
In this tip, you’ll learn how to:
- ● identify range-sharded primary keys
- ● identify range-sharded secondary indexes
- ● generate corrective DDL with a helper function
- ● preserve useful details like
DESC,INCLUDE (...), and partialWHEREclauses - ● avoid confusing YugabyteDB primary keys with
CREATE UNIQUE INDEXsyntax
HASH, some newer YugabyteDB environments may now create primary keys and indexes as range sharded instead. This tip shows how to find them and generate the DDL to fix them.
Why this happens
A lot of us got used to thinking of YugabyteDB like this:
- “If I write
PRIMARY KEY (...), it’ll be hash sharded unless I explicitly say otherwise.”
That assumption is no longer always safe.
If the default changes, your schema behavior changes with it.
So this:
CREATE TABLE test (
c1 int PRIMARY KEY,
c2 text
);
may not behave the way you think.
What actually gets created depends on the default
If hash splitting is the default, yb_use_hash_splitting_by_default = true, the intent is effectively:
CREATE TABLE test (
c1 int,
c2 text,
PRIMARY KEY (c1 HASH)
);
If range splitting is the default, yb_use_hash_splitting_by_default = false, it behaves more like:
CREATE TABLE test (
c1 int,
c2 text,
PRIMARY KEY (c1 ASC)
);
That is a very different storage and distribution strategy.
Sharding scheme comparison
| Sharding Scheme | Best At | Trade-off | Multi-column Behavior |
|---|---|---|---|
| HASH | Even distribution, predictable performance, avoids hotspots, efficient point lookups | Range scans can be less efficient | Hashed key columns drive distribution |
| Range (ASC/DESC) | Ordered access, efficient range queries, natural clustering by primary key order | Leading-key skew can create hotspots and uneven distribution | Leading column heavily influences distribution pattern |
When this tip is appropriate
This tip is for the case where you caught the issue early and want YugabyteDB to generate the corrective DDL for you.
It is most appropriate when:
- ● the table is still small
- ● the workload impact of rebuilding the primary key is acceptable
- ● you want to audit or repair recent schema objects that were created with the wrong default
It is not the right blanket production strategy for every large tables, because recreating a primary key rewrites the table’s storage. YugabyteDB’s table design is index-oriented, and the primary key defines the table’s storage layout.
Audit What’s Actually Range Sharded
Before fixing anything, let’s identify which objects are currently range sharded.
We’ll split this into two parts:
- ● Primary keys → show the actual PK definition
- ● Secondary indexes → show just the indexed columns
This keeps things clean and avoids confusing PKs with index DDL.
🧱 Find Range-Sharded Primary Keys
SELECT
c.oid::regclass AS table_name,
con.conname AS pk_name,
'PRIMARY KEY (' ||
string_agg(
format(
'%I %s',
a.attname,
CASE
WHEN (i.indoption[k.ord - 1] & 1) = 1 THEN 'DESC'
ELSE 'ASC'
END
),
', ' ORDER BY k.ord
) || ')' AS pk_definition
FROM pg_constraint con
JOIN pg_class c
ON c.oid = con.conrelid
JOIN pg_namespace n
ON n.oid = c.relnamespace
JOIN pg_index i
ON i.indexrelid = con.conindid
CROSS JOIN LATERAL unnest(i.indkey) WITH ORDINALITY AS k(attnum, ord)
JOIN pg_attribute a
ON a.attrelid = i.indrelid
AND a.attnum = k.attnum
WHERE con.contype = 'p'
AND n.nspname NOT IN ('pg_catalog', 'information_schema')
AND k.ord <= i.indnkeyatts
AND position(' HASH' in upper(pg_get_indexdef(con.conindid))) = 0
GROUP BY c.oid, con.conname
ORDER BY 1;
Example output:
.table_name | pk_name | pk_definition
------------+------------+-------------------------------
test | test_pkey | PRIMARY KEY (c1 ASC)
test4 | test4_pkey | PRIMARY KEY (c1 ASC, c2 DESC)
🧱 Find Range-Sharded Secondary Indexes
SELECT
i.indexrelid::regclass AS index_name,
i.indrelid::regclass AS table_name,
string_agg(
format(
'%I %s',
a.attname,
CASE
WHEN (i.indoption[k.ord - 1] & 1) = 1 THEN 'DESC'
ELSE 'ASC'
END
),
', ' ORDER BY k.ord
) AS index_keys
FROM pg_index i
JOIN pg_class ic
ON ic.oid = i.indexrelid
JOIN pg_namespace n
ON n.oid = ic.relnamespace
CROSS JOIN LATERAL unnest(i.indkey) WITH ORDINALITY AS k(attnum, ord)
JOIN pg_attribute a
ON a.attrelid = i.indrelid
AND a.attnum = k.attnum
WHERE NOT i.indisprimary
AND n.nspname NOT IN ('pg_catalog', 'information_schema')
AND k.ord <= i.indnkeyatts
AND position(' HASH' in upper(pg_get_indexdef(i.indexrelid))) = 0
GROUP BY i.indexrelid, i.indrelid
ORDER BY 2, 1;
Example output:
. index_name | table_name | index_keys
-----------------+------------+-----------------
test_c2_idx | test | c2 ASC
test4_c2_c3_idx | test4 | c2 ASC, c3 DESC
HASH is range sharded. In write-heavy or skew-prone workloads, that can lead to uneven distribution and hotspots.
⚙️ Generate the Repaire DDL
The function below takes the relid of either:
- ● a table → it inspects the table’s primary key
- ● an index → it inspects the index directly
If the object is range sharded, it returns the DROP and CREATE DDL needed to rebuild it with HASH added to the first key column.
It also supports the common “pretty manageable” index cases:
- ● ordinary column-based indexes
- ●
UNIQUE - ●
INCLUDE (...) - ● partial indexes with
WHERE - ● collations
- ● opclasses
For more complex definitions, such as expression indexes, the function deliberately tells you to handle them manually.
CREATE OR REPLACE FUNCTION yb_get_hash_rebuild_ddl(p_relid regclass)
RETURNS TABLE (
input_relid oid,
object_type text,
schema_name text,
object_name text,
base_table text,
is_range_sharded boolean,
drop_ddl text,
create_ddl text,
notes text
)
LANGUAGE plpgsql
AS $$
DECLARE
v_relkind "char";
v_nspname text;
v_relname text;
v_index_oid oid;
v_index_name text;
v_index_nspname text;
v_table_oid oid;
v_table_name text;
v_table_nspname text;
v_conname text;
v_is_pk boolean;
v_has_hash boolean;
v_is_unique boolean;
v_predicate text;
v_keycols text;
v_includecols text;
BEGIN
SELECT c.relkind, n.nspname, c.relname
INTO v_relkind, v_nspname, v_relname
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.oid = p_relid;
IF NOT FOUND THEN
RAISE EXCEPTION 'Object % not found', p_relid;
END IF;
IF v_relkind IN ('r','p') THEN
SELECT con.conindid,
con.conname,
ct.oid,
ct.relname,
nt.nspname
INTO v_index_oid,
v_conname,
v_table_oid,
v_table_name,
v_table_nspname
FROM pg_constraint con
JOIN pg_class ct
ON ct.oid = con.conrelid
JOIN pg_namespace nt
ON nt.oid = ct.relnamespace
WHERE con.conrelid = p_relid
AND con.contype = 'p';
IF v_index_oid IS NULL THEN
RAISE EXCEPTION 'Table %.% has no primary key', v_nspname, v_relname;
END IF;
v_index_name := v_conname;
v_index_nspname := v_table_nspname;
v_is_unique := true;
v_predicate := NULL;
ELSIF v_relkind = 'i' THEN
SELECT c.relname,
n.nspname,
i.indrelid,
ct.relname,
nt.nspname,
i.indisunique,
i.indisprimary,
pg_get_expr(i.indpred, i.indrelid)
INTO v_index_name,
v_index_nspname,
v_table_oid,
v_table_name,
v_table_nspname,
v_is_unique,
v_is_pk,
v_predicate
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_index i ON i.indexrelid = c.oid
JOIN pg_class ct ON ct.oid = i.indrelid
JOIN pg_namespace nt ON nt.oid = ct.relnamespace
WHERE c.oid = p_relid;
IF v_table_oid IS NULL THEN
RAISE EXCEPTION 'Index % not found', p_relid;
END IF;
IF v_is_pk THEN
RAISE EXCEPTION
'Index %.% is the backing index for a primary key. Pass the table relid instead.',
v_index_nspname, v_index_name;
END IF;
v_index_oid := p_relid;
ELSE
RAISE EXCEPTION
'Unsupported relkind % for object %.%. Pass a table or index relid.',
v_relkind, v_nspname, v_relname;
END IF;
SELECT position(' HASH' in upper(pg_get_indexdef(v_index_oid))) > 0
INTO v_has_hash;
IF EXISTS (
SELECT 1
FROM pg_index
WHERE indexrelid = v_index_oid
AND indexprs IS NOT NULL
) THEN
input_relid := p_relid;
object_type := CASE WHEN v_relkind IN ('r','p') THEN 'primary_key' ELSE 'index' END;
schema_name := v_index_nspname;
object_name := COALESCE(v_index_name, v_conname);
base_table := format('%I.%I', v_table_nspname, v_table_name);
is_range_sharded := NULL;
drop_ddl := NULL;
create_ddl := NULL;
notes := 'Expression index detected. Handle manually.';
RETURN NEXT;
RETURN;
END IF;
SELECT string_agg(
CASE
WHEN k.ord = 1 AND NOT v_has_hash THEN
format('%I HASH', att.attname)
ELSE
trim(both ' ' from concat_ws(' ',
format('%I', att.attname),
CASE
WHEN coll.oid IS NOT NULL
AND coll.collname IS NOT NULL
AND coll.collname <> 'default'
THEN format('COLLATE %I', coll.collname)
ELSE NULL
END,
CASE
WHEN opc.oid IS NOT NULL
AND opc.opcdefault IS DISTINCT FROM true
THEN format('%I.%I', opcnsp.nspname, opc.opcname)
ELSE NULL
END,
CASE
WHEN (idx.indoption[k.ord - 1] & 1) = 1 THEN 'DESC'
ELSE 'ASC'
END
))
END,
', ' ORDER BY k.ord
)
INTO v_keycols
FROM pg_index idx
CROSS JOIN LATERAL unnest(idx.indkey) WITH ORDINALITY AS k(attnum, ord)
JOIN pg_attribute att
ON att.attrelid = idx.indrelid
AND att.attnum = k.attnum
LEFT JOIN pg_collation coll
ON coll.oid = idx.indcollation[k.ord - 1]
LEFT JOIN pg_opclass opc
ON opc.oid = idx.indclass[k.ord - 1]
LEFT JOIN pg_namespace opcnsp
ON opcnsp.oid = opc.opcnamespace
WHERE idx.indexrelid = v_index_oid
AND k.ord <= idx.indnkeyatts;
SELECT string_agg(format('%I', att.attname), ', ' ORDER BY k.ord)
INTO v_includecols
FROM pg_index idx
CROSS JOIN LATERAL unnest(idx.indkey) WITH ORDINALITY AS k(attnum, ord)
JOIN pg_attribute att
ON att.attrelid = idx.indrelid
AND att.attnum = k.attnum
WHERE idx.indexrelid = v_index_oid
AND k.ord > idx.indnkeyatts;
input_relid := p_relid;
object_type := CASE WHEN v_relkind IN ('r','p') THEN 'primary_key' ELSE 'index' END;
schema_name := v_index_nspname;
object_name := COALESCE(v_conname, v_index_name);
base_table := format('%I.%I', v_table_nspname, v_table_name);
is_range_sharded := NOT v_has_hash;
IF v_has_hash THEN
drop_ddl := NULL;
create_ddl := NULL;
notes := 'Already HASH sharded. No rebuild DDL generated.';
RETURN NEXT;
RETURN;
END IF;
IF object_type = 'primary_key' THEN
drop_ddl :=
format('ALTER TABLE %I.%I DROP CONSTRAINT %I;',
v_table_nspname, v_table_name, v_conname);
create_ddl :=
format('ALTER TABLE %I.%I ADD CONSTRAINT %I PRIMARY KEY (%s);',
v_table_nspname, v_table_name, v_conname, v_keycols);
notes :=
'Supports ordinary column-based PKs. Recreating a PK rewrites table storage. Expression indexes and exotic definitions should be handled manually.';
ELSE
drop_ddl :=
format('DROP INDEX %I.%I;',
v_index_nspname, v_index_name);
create_ddl :=
format(
'CREATE %sINDEX %I ON %I.%I USING lsm (%s)%s%s;',
CASE WHEN v_is_unique THEN 'UNIQUE ' ELSE '' END,
v_index_name,
v_table_nspname,
v_table_name,
v_keycols,
CASE
WHEN v_includecols IS NOT NULL THEN format(' INCLUDE (%s)', v_includecols)
ELSE ''
END,
CASE
WHEN v_predicate IS NOT NULL THEN format(' WHERE %s', v_predicate)
ELSE ''
END
);
notes :=
'Supports ordinary column indexes, INCLUDE columns, unique indexes, and partial predicates. Expression indexes and exotic definitions should be handled manually.';
END IF;
RETURN NEXT;
END;
$$;
🚑 use it
Say we have this table:
yugabyte=# \d test
Table "public.test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | not null |
c2 | text | | |
Indexes:
"test_pkey" PRIMARY KEY, lsm (c1 ASC)
"test_c2_idx" lsm (c2 ASC)
We can generate the DDL to fix the PK using:
\x
SELECT * FROM yb_get_hash_rebuild_ddl('public.test'::regclass);
Example output:
input_relid | 16399
object_type | primary_key
schema_name | public
object_name | test_pkey
base_table | public.test
is_range_sharded | t
drop_ddl | ALTER TABLE public.test DROP CONSTRAINT test_pkey;
create_ddl | ALTER TABLE public.test ADD CONSTRAINT test_pkey PRIMARY KEY (c1 HASH);
notes | Supports ordinary column-based PKs. Recreating a PK rewrites table storage. Expression indexes and exotic definitions should be handled manually.
And we can do the same with the index:
\x
SELECT * FROM yb_get_hash_rebuild_ddl('public.test_c2_idx'::regclass);
Example output:
input_relid | 16429
object_type | index
schema_name | public
object_name | test_c2_idx
base_table | public.test
is_range_sharded | t
drop_ddl | DROP INDEX public.test_c2_idx;
create_ddl | CREATE INDEX test_c2_idx ON public.test USING lsm (c2 HASH);
notes | Supports ordinary column indexes, INCLUDE columns, unique indexes, and partial predicates. Expression indexes and exotic definitions should be handled manually.
To print the rebuild statements in execution order, use:
SELECT step, ddl
FROM yb_get_hash_rebuild_ddl('<
Example:
yugabyte=# SELECT step, ddl
yugabyte-# FROM yb_get_hash_rebuild_ddl('public.test4'::regclass)
yugabyte-# CROSS JOIN LATERAL (
yugabyte(# VALUES
yugabyte(# (1, drop_ddl),
yugabyte(# (2, create_ddl)
yugabyte(# ) AS x(step, ddl)
yugabyte-# WHERE ddl IS NOT NULL
yugabyte-# ORDER BY step;
step | ddl
------+------------------------------------------------------------------------------------
1 | ALTER TABLE public.test4 DROP CONSTRAINT test4_pkey;
2 | ALTER TABLE public.test4 ADD CONSTRAINT test4_pkey PRIMARY KEY (c1 HASH, c2 DESC);
(2 rows)
HASH distribution.
- ● Standard column-based indexes and primary keys
- ● UNIQUE indexes
- ● INCLUDE (…) columns
- ● Partial indexes (
WHEREclauses) - ● Collations and opclasses
🔑 Final takeaway
Defaults are convenient until they change.
In YugabyteDB, sharding behavior MUST be treated as part of your schema design, not as an environment assumption.
So the playbook is simple:
- 1. audit what is currently range sharded
- 2. generate the corrective DDL
- 3. review before running it
- 4. going forward, always write
HASHexplicitly when you mean hash sharding
HASH for distribution, or ASC/DESC when you intentionally want ordered access patterns.
Have Fun!
