Detecting Hot Shards in YSQL with SQL

Hot shards happen when too many rows land in a small subset of tablets. That subset becomes a bottleneck while other tablets sit idle.

On newer releases of YugabyteDB you can lean on:

● Active Session History (ASH) to spot hot tablets in near-real time.
● YBA Performance Advisor (Hot Shard) for guided detection and remediation tips.

If you’re on an older release or want a portable SQL-only check, this post gives you a drop-in set of functions to:

● parse the hash key columns for a table’s PK and each HASH-partitioned secondary index (including composite keys like (c1, c2)),
● compute per-tablet row counts by slicing the [0..65535] hash space, and
● produce a summary row (the “hottest shard”) per object.

Everything below runs in YSQL and requires no extensions.

Demo setup (optional)

				
					create table if not exists test (
  id int primary key,
  c1 int,
  c2 int
);

create index if not exists test_c1_idx  on test (c1 hash);
create index if not exists test_c2_idx1 on test ((c1, c2) hash) include (id);

-- Example data
truncate table test;
insert into test
select g, 1, (g % 10)
from generate_series(1, 1000000) g;

Step 1 — Extract the HASH key expressions (robust, composite-aware)

This extractor:

● finds USING lsm ( ... ),
● reads the entire parenthesized list (handles nested (...)),
● splits on commas only at depth 0 (so (c1, c2) stays intact),
● keeps only items ending in HASH,
● and unwraps an outer pair of parentheses for composite keys so we can call yb_hash_code(c1, c2) (not yb_hash_code((c1, c2))).

				
					-- Extract HASH key expressions from an LSM index definition (e.g., "... USING lsm (c1 HASH)" or "... USING lsm ((c1, c2) HASH) ...")
create or replace function yb_hash_cols_of_index(p_index oid)
returns text
language plpgsql
stable
as $$
declare
  def_text         text;   def_lc text;
  pos_using        int;    after_using_lc text;   after_using_orig text;
  pos_open_rel     int;    scan_lc text;          scan_orig text;
  i int; ch_lc text; depth int := 1;
  buf_orig text := ''; cols text;
  token text := ''; tokens text[] := '{}';
  item text; cleaned text; outcols text[] := '{}';
begin
  select pg_get_indexdef(p_index, 0, true) into def_text;
  if def_text is null then return null; end if;

  def_lc := lower(def_text);
  pos_using := strpos(def_lc, ' using lsm ');
  if pos_using = 0 then pos_using := strpos(def_lc, 'using lsm '); end if;
  if pos_using = 0 then return null; end if;

  after_using_lc   := substr(def_lc,  pos_using);
  after_using_orig := substr(def_text, pos_using);

  -- first '(' after "using lsm "
  pos_open_rel := strpos(after_using_lc, '(');
  if pos_open_rel = 0 then return null; end if;

  -- scan from after '(' until the matching ')'
  scan_lc   := substr(after_using_lc,   pos_open_rel + 1);
  scan_orig := substr(after_using_orig, pos_open_rel + 1);

  for i in 1..length(scan_lc) loop
    ch_lc := substr(scan_lc, i, 1);
    if ch_lc = '(' then
      depth := depth + 1;
      buf_orig := buf_orig || substr(scan_orig, i, 1);
    elsif ch_lc = ')' then
      depth := depth - 1;
      exit when depth = 0;
      buf_orig := buf_orig || substr(scan_orig, i, 1);
    else
      buf_orig := buf_orig || substr(scan_orig, i, 1);
    end if;
  end loop;

  cols := btrim(buf_orig);
  if cols = '' then return null; end if;

  -- split on commas at depth 0
  depth := 0; token := '';
  for i in 1..length(cols) loop
    ch_lc := lower(substr(cols, i, 1));
    if ch_lc = '(' then
      depth := depth + 1; token := token || substr(cols, i, 1);
    elsif ch_lc = ')' then
      depth := depth - 1; token := token || substr(cols, i, 1);
    elsif ch_lc = ',' and depth = 0 then
      if btrim(token) <> '' then tokens := array_append(tokens, btrim(token)); end if;
      token := '';
    else
      token := token || substr(cols, i, 1);
    end if;
  end loop;
  if btrim(token) <> '' then tokens := array_append(tokens, btrim(token)); end if;

  -- keep items with HASH; strip trailing "HASH [ASC|DESC]"; unwrap outer () for composite lists
  foreach item in array tokens loop
    if position(' HASH' in upper(item)) > 0 then
      cleaned := regexp_replace(item, '\s+HASH(\s+(ASC|DESC))?$', '', 'i');
      cleaned := btrim(cleaned);
      if left(cleaned,1) = '(' and right(cleaned,1) = ')' and position(',' in cleaned) > 0 then
        cleaned := substr(cleaned, 2, length(cleaned)-2);  -- unwrap "(c1, c2)" -> "c1, c2"
      end if;
      outcols := array_append(outcols, cleaned);
    end if;
  end loop;

  if array_length(outcols, 1) is null then return null; end if;
  return array_to_string(outcols, ', ');
end;
$$;

Helper for the table’s primary key:

				
					-- HASH key columns for a table's PRIMARY KEY
create or replace function yb_hash_cols_of_table(p_table regclass)
returns text
language plpgsql
stable
as $$
declare
  v_idx oid;
  v_cols text;
begin
  select i.indexrelid into v_idx
  from pg_index i
  where i.indrelid = p_table
    and i.indisprimary
  limit 1;

  if v_idx is null then
    return null;
  end if;

  v_cols := yb_hash_cols_of_index(v_idx);
  if v_cols is not null and btrim(v_cols) <> '' then
    return v_cols;
  end if;

  -- Fallback: build PK column list from catalog (covers simple PKs)
  return (
    with idx as (
      select i.indkey
      from pg_index i
      where i.indexrelid = v_idx
    ),
    att as (
      select a.attname, a.attnum
      from pg_attribute a
      where a.attrelid = p_table and a.attnum > 0 and not a.attisdropped
    )
    select string_agg(att.attname, ', ' order by array_position(idx.indkey, att.attnum))
    from idx, att
    where att.attnum = any(idx.indkey)
  );
end;
$$;

Sanity checks

				
					select yb_hash_cols_of_table('public.test'::regclass);         -- e.g., 'id'
select yb_hash_cols_of_index('public.test_c1_idx'::regclass);  -- 'c1'
select yb_hash_cols_of_index('public.test_c2_idx1'::regclass); -- 'c1, c2'

Example:

				
					yugabyte=# select yb_hash_cols_of_table('public.test'::regclass);         -- e.g., 'id'
 yb_hash_cols_of_table
-----------------------
 id
(1 row)

yugabyte=# select yb_hash_cols_of_index('public.test_c1_idx'::regclass);  -- 'c1'
 yb_hash_cols_of_index
-----------------------
 c1
(1 row)

yugabyte=# select yb_hash_cols_of_index('public.test_c2_idx1'::regclass); -- 'c1, c2'
 yb_hash_cols_of_index
-----------------------
 c1, c2
(1 row)

Step 2 — Per-tablet distribution (with `hash_cols` in output)

This function:

● treats the PK index as the table (we don’t double-count it as an index),
● gets num_tablets from yb_table_properties(...),
● evenly slices [0..65535] into num_tablets ranges via floor(65536*i/n) .. floor(65536*(i+1)/n)-1,
● counts rows per range using yb_hash_code(<hash_cols>),
and returns a row per tablet including the hash key used.

				
					create or replace function yb_hot_shard_distribution(p_schema text, p_table text)
returns table(
  object_type  text,     -- 'table' or 'index'
  object_name  text,     -- qualified name
  hash_cols    text,     -- <== NEW: the hash key used
  tablet_no    int,      -- 0..num_tablets-1
  hash_start   int,      -- inclusive
  hash_end     int,      -- inclusive
  row_count    bigint,
  pct_of_total numeric
)
language plpgsql
stable
as $$
declare
  v_rel             regclass;
  v_tbl_qualified   text;
  v_tbl_num_tablets int;
  v_hash_cols       text;
  v_total           bigint;
  r record;

  i int; s int; e int; sql text; cnt bigint;

  -- Only HASH-sharded secondary indexes (skip PK == table)
  cur_indexes cursor for
    with idx as (
      select i.indexrelid::regclass as idxreg,
             n.nspname||'.'||c2.relname as idxname,
             i.indisprimary,
             yb_hash_cols_of_index(i.indexrelid) as hash_cols
      from pg_index i
      join pg_class c  on c.oid  = i.indrelid
      join pg_namespace n on n.oid = c.relnamespace
      join pg_class c2 on c2.oid = i.indexrelid
      where n.nspname = p_schema
        and c.relname = p_table
    )
    select idxreg, idxname, hash_cols
    from idx
    where indisprimary = false
      and hash_cols is not null and btrim(hash_cols) <> '';
begin
  v_rel := format('%I.%I', p_schema, p_table)::regclass;
  v_tbl_qualified := format('%I.%I', p_schema, p_table);
  select (yb_table_properties(v_rel)).num_tablets into v_tbl_num_tablets;
  v_tbl_num_tablets := coalesce(nullif(v_tbl_num_tablets, 0), 1);

  -- ===== TABLE (via PK hash columns) =====
  v_hash_cols := yb_hash_cols_of_table(v_rel);
  if v_hash_cols is not null and btrim(v_hash_cols) <> '' then
    sql := format('select count(*) from %s where yb_hash_code(%s) is not null',
                  v_tbl_qualified, v_hash_cols);
    execute sql into v_total;
    v_total := coalesce(v_total, 0);

    for i in 0..(v_tbl_num_tablets-1) loop
      s := floor(65536.0 * i / v_tbl_num_tablets);
      e := floor(65536.0 * (i+1) / v_tbl_num_tablets) - 1;

      sql := format(
        'select count(*) from %s where yb_hash_code(%s) between %s and %s',
        v_tbl_qualified, v_hash_cols, s, e
      );
      execute sql into cnt;
      cnt := coalesce(cnt, 0);

      object_type  := 'table';
      object_name  := v_tbl_qualified;
      hash_cols    := v_hash_cols;     -- <= included
      tablet_no    := i;
      hash_start   := s;
      hash_end     := e;
      row_count    := cnt;
      pct_of_total := case when v_total > 0 then round(100.0 * cnt / v_total, 4) else 0 end;
      return next;
    end loop;
  end if;

  -- ===== SECONDARY INDEXES (HASH) =====
  for r in cur_indexes loop
    declare v_idx_num_tablets int; v_idx_total bigint;
    begin
      select (yb_table_properties(r.idxreg)).num_tablets into v_idx_num_tablets;
      v_idx_num_tablets := coalesce(nullif(v_idx_num_tablets, 0), 1);

      sql := format('select count(*) from %s where yb_hash_code(%s) is not null',
                    v_tbl_qualified, r.hash_cols);
      execute sql into v_idx_total;
      v_idx_total := coalesce(v_idx_total, 0);

      for i in 0..(v_idx_num_tablets-1) loop
        s := floor(65536.0 * i / v_idx_num_tablets);
        e := floor(65536.0 * (i+1) / v_idx_num_tablets) - 1;

        sql := format(
          'select count(*) from %s where yb_hash_code(%s) between %s and %s',
          v_tbl_qualified, r.hash_cols, s, e
        );
        execute sql into cnt;
        cnt := coalesce(cnt, 0);

        object_type  := 'index';
        object_name  := r.idxname;
        hash_cols    := r.hash_cols;   -- <= included
        tablet_no    := i;
        hash_start   := s;
        hash_end     := e;
        row_count    := cnt;
        pct_of_total := case when v_idx_total > 0 then round(100.0 * cnt / v_idx_total, 4) else 0 end;
        return next;
      end loop;
    end;
  end loop;
end;
$$;

Step 3 — “Hottest shard” summary (with `hash_cols`)

				
					create or replace function yb_hot_shard_summary(p_schema text, p_table text)
returns table(
  object_type  text,
  object_name  text,
  hash_cols    text,     -- <= included
  tablet_no    int,
  hash_start   int,
  hash_end     int,
  row_count    bigint,
  pct_of_total numeric
)
language sql
stable
as $$
  with dist as (
    select * from yb_hot_shard_distribution(p_schema, p_table)
  )
  select distinct on (object_type, object_name)
         object_type, object_name, hash_cols, tablet_no, hash_start, hash_end, row_count, pct_of_total
  from dist
  order by object_type, object_name, pct_of_total desc;
$$;

Usage

				
					-- Full per-tablet distribution for table + indexes
select *
from yb_hot_shard_distribution('public','test')
order by object_type, object_name, pct_of_total desc, tablet_no;

-- Hottest shard per object (table and each index)
select *
from yb_hot_shard_summary('public','test');

Example:

				
					yugabyte=# -- Full per-tablet distribution for table + indexes
yugabyte=# select *
yugabyte-# from yb_hot_shard_distribution('public','test')
yugabyte-# order by object_type, object_name, pct_of_total desc, tablet_no;
 object_type |      object_name      | tablet_no | hash_start | hash_end | row_count | pct_of_total
-------------+-----------------------+-----------+------------+----------+-----------+--------------
 index       | public.test_c1_idx    |         0 |          0 |    21844 |   1000000 |      99.9990
 index       | public.test_c1_idx    |         2 |      43690 |    65535 |        10 |       0.0010
 index       | public.test_c1_idx    |         1 |      21845 |    43689 |         0 |       0.0000
 index       | public.test_c2_idx1   |         1 |      21845 |    43689 |   1000010 |     100.0000
 index       | public.test_c2_idx1   |         0 |          0 |    21844 |         0 |       0.0000
 index       | public.test_c2_idx1   |         2 |      43690 |    65535 |         0 |       0.0000
 table       | public.test           |         0 |          0 |    21844 |    333720 |      33.3717
 table       | public.test           |         2 |      43690 |    65535 |    333425 |      33.3422
 table       | public.test           |         1 |      21845 |    43689 |    332865 |      33.2862
(9 rows)

yugabyte=# -- Hottest shard per object (table and each index)
yugabyte=# select *
yugabyte-# from yb_hot_shard_summary('public','test');
 object_type |     object_name     | tablet_no | hash_start | hash_end | row_count | pct_of_total
-------------+---------------------+-----------+------------+----------+-----------+--------------
 index       | public.test_c1_idx  |         0 |          0 |    21844 |   1000000 |      99.9990
 index       | public.test_c2_idx1 |         1 |      21845 |    43689 |   1000010 |     100.0000
 table       | public.test         |         0 |          0 |    21844 |    333720 |      33.3717
(3 rows)

👉 In this output:

● The base table (public.test on id) is nicely balanced across its three tablets (~33% each).
● The public.test_c1_idx index is extremely skewed — ~99.999% of rows landed in one shard (with just 10 rows trickling into another).
● The public.test_c2_idx1 index is fully skewed — 100% of rows in a single shard.

Both secondary indexes clearly exhibit hot shard problems, even though the base table is well distributed.

Remediation ideas (once you find a hot shard)

● Revisit the hash key: add more entropy (e.g., add a second column) or use a composite key that spreads values more evenly.
● Add a second column to the hash partition that represents a modulus (for example, mod 9) so the rows are spread across nine buckets instead of collapsing into one. There are caveats with this technique, which we’ll cover in a future YugabyteDB Tip.
● If the hot shard is on an index, make sure you need that index! Can it be dropped? Is it redundant?
● For workloads with temporal skew, consider modeling time as part of the hash key (with care), or use range partitioning alongside hash.

Summary

● You can detect hot shards with pure SQL by counting rows per hash slice for the table and each HASH-partitioned index.
● The helpers above robustly parse single-column and composite hash keys and show the hash_cols used in the results.
● On modern releases, prefer ASH and YBA Performance Advisor for real-time, guided detection — but these SQL tools are handy everywhere.

Have Fun!