Hot shards happen when too many rows land in a small subset of tablets. That subset becomes a bottleneck while other tablets sit idle.
On newer releases of YugabyteDB you can lean on:
● Active Session History (ASH) to spot hot tablets in near-real time.
● YBA Performance Advisor (Hot Shard) for guided detection and remediation tips.
If you’re on an older release or want a portable SQL-only check, this post gives you a drop-in set of functions to:
● parse the hash key columns for a table’s PK and each HASH-partitioned secondary index (including composite keys like
(c1, c2)),● compute per-tablet row counts by slicing the
[0..65535]hash space, and● produce a summary row (the “hottest shard”) per object.
Everything below runs in YSQL and requires no extensions.
Demo setup (optional)
create table if not exists test (
id int primary key,
c1 int,
c2 int
);
create index if not exists test_c1_idx on test (c1 hash);
create index if not exists test_c2_idx1 on test ((c1, c2) hash) include (id);
-- Example data
truncate table test;
insert into test
select g, 1, (g % 10)
from generate_series(1, 1000000) g;
Step 1 — Extract the HASH key expressions (robust, composite-aware)
This extractor:
● finds
USING lsm ( ... ),● reads the entire parenthesized list (handles nested
(...)),● splits on commas only at depth 0 (so
(c1, c2)stays intact),● keeps only items ending in
HASH,● and unwraps an outer pair of parentheses for composite keys so we can call
yb_hash_code(c1, c2)(notyb_hash_code((c1, c2))).
-- Extract HASH key expressions from an LSM index definition (e.g., "... USING lsm (c1 HASH)" or "... USING lsm ((c1, c2) HASH) ...")
create or replace function yb_hash_cols_of_index(p_index oid)
returns text
language plpgsql
stable
as $$
declare
def_text text; def_lc text;
pos_using int; after_using_lc text; after_using_orig text;
pos_open_rel int; scan_lc text; scan_orig text;
i int; ch_lc text; depth int := 1;
buf_orig text := ''; cols text;
token text := ''; tokens text[] := '{}';
item text; cleaned text; outcols text[] := '{}';
begin
select pg_get_indexdef(p_index, 0, true) into def_text;
if def_text is null then return null; end if;
def_lc := lower(def_text);
pos_using := strpos(def_lc, ' using lsm ');
if pos_using = 0 then pos_using := strpos(def_lc, 'using lsm '); end if;
if pos_using = 0 then return null; end if;
after_using_lc := substr(def_lc, pos_using);
after_using_orig := substr(def_text, pos_using);
-- first '(' after "using lsm "
pos_open_rel := strpos(after_using_lc, '(');
if pos_open_rel = 0 then return null; end if;
-- scan from after '(' until the matching ')'
scan_lc := substr(after_using_lc, pos_open_rel + 1);
scan_orig := substr(after_using_orig, pos_open_rel + 1);
for i in 1..length(scan_lc) loop
ch_lc := substr(scan_lc, i, 1);
if ch_lc = '(' then
depth := depth + 1;
buf_orig := buf_orig || substr(scan_orig, i, 1);
elsif ch_lc = ')' then
depth := depth - 1;
exit when depth = 0;
buf_orig := buf_orig || substr(scan_orig, i, 1);
else
buf_orig := buf_orig || substr(scan_orig, i, 1);
end if;
end loop;
cols := btrim(buf_orig);
if cols = '' then return null; end if;
-- split on commas at depth 0
depth := 0; token := '';
for i in 1..length(cols) loop
ch_lc := lower(substr(cols, i, 1));
if ch_lc = '(' then
depth := depth + 1; token := token || substr(cols, i, 1);
elsif ch_lc = ')' then
depth := depth - 1; token := token || substr(cols, i, 1);
elsif ch_lc = ',' and depth = 0 then
if btrim(token) <> '' then tokens := array_append(tokens, btrim(token)); end if;
token := '';
else
token := token || substr(cols, i, 1);
end if;
end loop;
if btrim(token) <> '' then tokens := array_append(tokens, btrim(token)); end if;
-- keep items with HASH; strip trailing "HASH [ASC|DESC]"; unwrap outer () for composite lists
foreach item in array tokens loop
if position(' HASH' in upper(item)) > 0 then
cleaned := regexp_replace(item, '\s+HASH(\s+(ASC|DESC))?$', '', 'i');
cleaned := btrim(cleaned);
if left(cleaned,1) = '(' and right(cleaned,1) = ')' and position(',' in cleaned) > 0 then
cleaned := substr(cleaned, 2, length(cleaned)-2); -- unwrap "(c1, c2)" -> "c1, c2"
end if;
outcols := array_append(outcols, cleaned);
end if;
end loop;
if array_length(outcols, 1) is null then return null; end if;
return array_to_string(outcols, ', ');
end;
$$;
Helper for the table’s primary key:
-- HASH key columns for a table's PRIMARY KEY
create or replace function yb_hash_cols_of_table(p_table regclass)
returns text
language plpgsql
stable
as $$
declare
v_idx oid;
v_cols text;
begin
select i.indexrelid into v_idx
from pg_index i
where i.indrelid = p_table
and i.indisprimary
limit 1;
if v_idx is null then
return null;
end if;
v_cols := yb_hash_cols_of_index(v_idx);
if v_cols is not null and btrim(v_cols) <> '' then
return v_cols;
end if;
-- Fallback: build PK column list from catalog (covers simple PKs)
return (
with idx as (
select i.indkey
from pg_index i
where i.indexrelid = v_idx
),
att as (
select a.attname, a.attnum
from pg_attribute a
where a.attrelid = p_table and a.attnum > 0 and not a.attisdropped
)
select string_agg(att.attname, ', ' order by array_position(idx.indkey, att.attnum))
from idx, att
where att.attnum = any(idx.indkey)
);
end;
$$;
Sanity checks
select yb_hash_cols_of_table('public.test'::regclass); -- e.g., 'id'
select yb_hash_cols_of_index('public.test_c1_idx'::regclass); -- 'c1'
select yb_hash_cols_of_index('public.test_c2_idx1'::regclass); -- 'c1, c2'
Example:
yugabyte=# select yb_hash_cols_of_table('public.test'::regclass); -- e.g., 'id'
yb_hash_cols_of_table
-----------------------
id
(1 row)
yugabyte=# select yb_hash_cols_of_index('public.test_c1_idx'::regclass); -- 'c1'
yb_hash_cols_of_index
-----------------------
c1
(1 row)
yugabyte=# select yb_hash_cols_of_index('public.test_c2_idx1'::regclass); -- 'c1, c2'
yb_hash_cols_of_index
-----------------------
c1, c2
(1 row)
Step 2 — Per-tablet distribution (with hash_cols in output)
This function:
● treats the PK index as the table (we don’t double-count it as an index),
● gets
num_tabletsfromyb_table_properties(...),● evenly slices
[0..65535]intonum_tabletsranges viafloor(65536*i/n) .. floor(65536*(i+1)/n)-1,● counts rows per range using
yb_hash_code(<hash_cols>),and returns a row per tablet including the hash key used.
create or replace function yb_hot_shard_distribution(p_schema text, p_table text)
returns table(
object_type text, -- 'table' or 'index'
object_name text, -- qualified name
hash_cols text, -- <== NEW: the hash key used
tablet_no int, -- 0..num_tablets-1
hash_start int, -- inclusive
hash_end int, -- inclusive
row_count bigint,
pct_of_total numeric
)
language plpgsql
stable
as $$
declare
v_rel regclass;
v_tbl_qualified text;
v_tbl_num_tablets int;
v_hash_cols text;
v_total bigint;
r record;
i int; s int; e int; sql text; cnt bigint;
-- Only HASH-sharded secondary indexes (skip PK == table)
cur_indexes cursor for
with idx as (
select i.indexrelid::regclass as idxreg,
n.nspname||'.'||c2.relname as idxname,
i.indisprimary,
yb_hash_cols_of_index(i.indexrelid) as hash_cols
from pg_index i
join pg_class c on c.oid = i.indrelid
join pg_namespace n on n.oid = c.relnamespace
join pg_class c2 on c2.oid = i.indexrelid
where n.nspname = p_schema
and c.relname = p_table
)
select idxreg, idxname, hash_cols
from idx
where indisprimary = false
and hash_cols is not null and btrim(hash_cols) <> '';
begin
v_rel := format('%I.%I', p_schema, p_table)::regclass;
v_tbl_qualified := format('%I.%I', p_schema, p_table);
select (yb_table_properties(v_rel)).num_tablets into v_tbl_num_tablets;
v_tbl_num_tablets := coalesce(nullif(v_tbl_num_tablets, 0), 1);
-- ===== TABLE (via PK hash columns) =====
v_hash_cols := yb_hash_cols_of_table(v_rel);
if v_hash_cols is not null and btrim(v_hash_cols) <> '' then
sql := format('select count(*) from %s where yb_hash_code(%s) is not null',
v_tbl_qualified, v_hash_cols);
execute sql into v_total;
v_total := coalesce(v_total, 0);
for i in 0..(v_tbl_num_tablets-1) loop
s := floor(65536.0 * i / v_tbl_num_tablets);
e := floor(65536.0 * (i+1) / v_tbl_num_tablets) - 1;
sql := format(
'select count(*) from %s where yb_hash_code(%s) between %s and %s',
v_tbl_qualified, v_hash_cols, s, e
);
execute sql into cnt;
cnt := coalesce(cnt, 0);
object_type := 'table';
object_name := v_tbl_qualified;
hash_cols := v_hash_cols; -- <= included
tablet_no := i;
hash_start := s;
hash_end := e;
row_count := cnt;
pct_of_total := case when v_total > 0 then round(100.0 * cnt / v_total, 4) else 0 end;
return next;
end loop;
end if;
-- ===== SECONDARY INDEXES (HASH) =====
for r in cur_indexes loop
declare v_idx_num_tablets int; v_idx_total bigint;
begin
select (yb_table_properties(r.idxreg)).num_tablets into v_idx_num_tablets;
v_idx_num_tablets := coalesce(nullif(v_idx_num_tablets, 0), 1);
sql := format('select count(*) from %s where yb_hash_code(%s) is not null',
v_tbl_qualified, r.hash_cols);
execute sql into v_idx_total;
v_idx_total := coalesce(v_idx_total, 0);
for i in 0..(v_idx_num_tablets-1) loop
s := floor(65536.0 * i / v_idx_num_tablets);
e := floor(65536.0 * (i+1) / v_idx_num_tablets) - 1;
sql := format(
'select count(*) from %s where yb_hash_code(%s) between %s and %s',
v_tbl_qualified, r.hash_cols, s, e
);
execute sql into cnt;
cnt := coalesce(cnt, 0);
object_type := 'index';
object_name := r.idxname;
hash_cols := r.hash_cols; -- <= included
tablet_no := i;
hash_start := s;
hash_end := e;
row_count := cnt;
pct_of_total := case when v_idx_total > 0 then round(100.0 * cnt / v_idx_total, 4) else 0 end;
return next;
end loop;
end;
end loop;
end;
$$;
Step 3 — “Hottest shard” summary (with hash_cols)
create or replace function yb_hot_shard_summary(p_schema text, p_table text)
returns table(
object_type text,
object_name text,
hash_cols text, -- <= included
tablet_no int,
hash_start int,
hash_end int,
row_count bigint,
pct_of_total numeric
)
language sql
stable
as $$
with dist as (
select * from yb_hot_shard_distribution(p_schema, p_table)
)
select distinct on (object_type, object_name)
object_type, object_name, hash_cols, tablet_no, hash_start, hash_end, row_count, pct_of_total
from dist
order by object_type, object_name, pct_of_total desc;
$$;
Usage
-- Full per-tablet distribution for table + indexes
select *
from yb_hot_shard_distribution('public','test')
order by object_type, object_name, pct_of_total desc, tablet_no;
-- Hottest shard per object (table and each index)
select *
from yb_hot_shard_summary('public','test');
Example:
yugabyte=# -- Full per-tablet distribution for table + indexes
yugabyte=# select *
yugabyte-# from yb_hot_shard_distribution('public','test')
yugabyte-# order by object_type, object_name, pct_of_total desc, tablet_no;
object_type | object_name | tablet_no | hash_start | hash_end | row_count | pct_of_total
-------------+-----------------------+-----------+------------+----------+-----------+--------------
index | public.test_c1_idx | 0 | 0 | 21844 | 1000000 | 99.9990
index | public.test_c1_idx | 2 | 43690 | 65535 | 10 | 0.0010
index | public.test_c1_idx | 1 | 21845 | 43689 | 0 | 0.0000
index | public.test_c2_idx1 | 1 | 21845 | 43689 | 1000010 | 100.0000
index | public.test_c2_idx1 | 0 | 0 | 21844 | 0 | 0.0000
index | public.test_c2_idx1 | 2 | 43690 | 65535 | 0 | 0.0000
table | public.test | 0 | 0 | 21844 | 333720 | 33.3717
table | public.test | 2 | 43690 | 65535 | 333425 | 33.3422
table | public.test | 1 | 21845 | 43689 | 332865 | 33.2862
(9 rows)
yugabyte=# -- Hottest shard per object (table and each index)
yugabyte=# select *
yugabyte-# from yb_hot_shard_summary('public','test');
object_type | object_name | tablet_no | hash_start | hash_end | row_count | pct_of_total
-------------+---------------------+-----------+------------+----------+-----------+--------------
index | public.test_c1_idx | 0 | 0 | 21844 | 1000000 | 99.9990
index | public.test_c2_idx1 | 1 | 21845 | 43689 | 1000010 | 100.0000
table | public.test | 0 | 0 | 21844 | 333720 | 33.3717
(3 rows)
👉 In this output:
● The base table (
public.testonid) is nicely balanced across its three tablets (~33% each).● The
public.test_c1_idxindex is extremely skewed — ~99.999% of rows landed in one shard (with just 10 rows trickling into another).● The
public.test_c2_idx1index is fully skewed — 100% of rows in a single shard.
Both secondary indexes clearly exhibit hot shard problems, even though the base table is well distributed.
Remediation ideas (once you find a hot shard)
● Revisit the hash key: add more entropy (e.g., add a second column) or use a composite key that spreads values more evenly.
● Add a second column to the hash partition that represents a modulus (for example, mod 9) so the rows are spread across nine buckets instead of collapsing into one. There are caveats with this technique, which we’ll cover in a future YugabyteDB Tip.
● If the hot shard is on an index, make sure you need that index! Can it be dropped? Is it redundant?
● For workloads with temporal skew, consider modeling time as part of the hash key (with care), or use range partitioning alongside hash.
Summary
- ● You can detect hot shards with pure SQL by counting rows per hash slice for the table and each HASH-partitioned index.
● The helpers above robustly parse single-column and composite hash keys and show the
hash_colsused in the results.● On modern releases, prefer ASH and YBA Performance Advisor for real-time, guided detection — but these SQL tools are handy everywhere.
Have Fun!
