Detecting Hot Shards in YSQL with SQL

Hot shards happen when too many rows land in a small subset of tablets. That subset becomes a bottleneck while other tablets sit idle.

On newer releases of YugabyteDB you can lean on:

If you’re on an older release or want a portable SQL-only check, this post gives you a drop-in set of functions to:

  • parse the hash key columns for a table’s PK and each HASH-partitioned secondary index (including composite keys like (c1, c2)),

  • compute per-tablet row counts by slicing the [0..65535] hash space, and

  • produce a summary row (the “hottest shard”) per object.

Everything below runs in YSQL and requires no extensions.

Demo setup (optional)
				
					create table if not exists test (
  id int primary key,
  c1 int,
  c2 int
);

create index if not exists test_c1_idx  on test (c1 hash);
create index if not exists test_c2_idx1 on test ((c1, c2) hash) include (id);

-- Example data
truncate table test;
insert into test
select g, 1, (g % 10)
from generate_series(1, 1000000) g;

				
			
Step 1 — Extract the HASH key expressions (robust, composite-aware)

This extractor:

  • finds USING lsm ( ... ),

  • reads the entire parenthesized list (handles nested (...)),

  • splits on commas only at depth 0 (so (c1, c2) stays intact),

  • keeps only items ending in HASH,

  • and unwraps an outer pair of parentheses for composite keys so we can call yb_hash_code(c1, c2) (not yb_hash_code((c1, c2))).

				
					-- Extract HASH key expressions from an LSM index definition (e.g., "... USING lsm (c1 HASH)" or "... USING lsm ((c1, c2) HASH) ...")
create or replace function yb_hash_cols_of_index(p_index oid)
returns text
language plpgsql
stable
as $$
declare
  def_text         text;   def_lc text;
  pos_using        int;    after_using_lc text;   after_using_orig text;
  pos_open_rel     int;    scan_lc text;          scan_orig text;
  i int; ch_lc text; depth int := 1;
  buf_orig text := ''; cols text;
  token text := ''; tokens text[] := '{}';
  item text; cleaned text; outcols text[] := '{}';
begin
  select pg_get_indexdef(p_index, 0, true) into def_text;
  if def_text is null then return null; end if;

  def_lc := lower(def_text);
  pos_using := strpos(def_lc, ' using lsm ');
  if pos_using = 0 then pos_using := strpos(def_lc, 'using lsm '); end if;
  if pos_using = 0 then return null; end if;

  after_using_lc   := substr(def_lc,  pos_using);
  after_using_orig := substr(def_text, pos_using);

  -- first '(' after "using lsm "
  pos_open_rel := strpos(after_using_lc, '(');
  if pos_open_rel = 0 then return null; end if;

  -- scan from after '(' until the matching ')'
  scan_lc   := substr(after_using_lc,   pos_open_rel + 1);
  scan_orig := substr(after_using_orig, pos_open_rel + 1);

  for i in 1..length(scan_lc) loop
    ch_lc := substr(scan_lc, i, 1);
    if ch_lc = '(' then
      depth := depth + 1;
      buf_orig := buf_orig || substr(scan_orig, i, 1);
    elsif ch_lc = ')' then
      depth := depth - 1;
      exit when depth = 0;
      buf_orig := buf_orig || substr(scan_orig, i, 1);
    else
      buf_orig := buf_orig || substr(scan_orig, i, 1);
    end if;
  end loop;

  cols := btrim(buf_orig);
  if cols = '' then return null; end if;

  -- split on commas at depth 0
  depth := 0; token := '';
  for i in 1..length(cols) loop
    ch_lc := lower(substr(cols, i, 1));
    if ch_lc = '(' then
      depth := depth + 1; token := token || substr(cols, i, 1);
    elsif ch_lc = ')' then
      depth := depth - 1; token := token || substr(cols, i, 1);
    elsif ch_lc = ',' and depth = 0 then
      if btrim(token) <> '' then tokens := array_append(tokens, btrim(token)); end if;
      token := '';
    else
      token := token || substr(cols, i, 1);
    end if;
  end loop;
  if btrim(token) <> '' then tokens := array_append(tokens, btrim(token)); end if;

  -- keep items with HASH; strip trailing "HASH [ASC|DESC]"; unwrap outer () for composite lists
  foreach item in array tokens loop
    if position(' HASH' in upper(item)) > 0 then
      cleaned := regexp_replace(item, '\s+HASH(\s+(ASC|DESC))?$', '', 'i');
      cleaned := btrim(cleaned);
      if left(cleaned,1) = '(' and right(cleaned,1) = ')' and position(',' in cleaned) > 0 then
        cleaned := substr(cleaned, 2, length(cleaned)-2);  -- unwrap "(c1, c2)" -> "c1, c2"
      end if;
      outcols := array_append(outcols, cleaned);
    end if;
  end loop;

  if array_length(outcols, 1) is null then return null; end if;
  return array_to_string(outcols, ', ');
end;
$$;
				
			

Helper for the table’s primary key:

				
					-- HASH key columns for a table's PRIMARY KEY
create or replace function yb_hash_cols_of_table(p_table regclass)
returns text
language plpgsql
stable
as $$
declare
  v_idx oid;
  v_cols text;
begin
  select i.indexrelid into v_idx
  from pg_index i
  where i.indrelid = p_table
    and i.indisprimary
  limit 1;

  if v_idx is null then
    return null;
  end if;

  v_cols := yb_hash_cols_of_index(v_idx);
  if v_cols is not null and btrim(v_cols) <> '' then
    return v_cols;
  end if;

  -- Fallback: build PK column list from catalog (covers simple PKs)
  return (
    with idx as (
      select i.indkey
      from pg_index i
      where i.indexrelid = v_idx
    ),
    att as (
      select a.attname, a.attnum
      from pg_attribute a
      where a.attrelid = p_table and a.attnum > 0 and not a.attisdropped
    )
    select string_agg(att.attname, ', ' order by array_position(idx.indkey, att.attnum))
    from idx, att
    where att.attnum = any(idx.indkey)
  );
end;
$$;
				
			
Sanity checks
				
					select yb_hash_cols_of_table('public.test'::regclass);         -- e.g., 'id'
select yb_hash_cols_of_index('public.test_c1_idx'::regclass);  -- 'c1'
select yb_hash_cols_of_index('public.test_c2_idx1'::regclass); -- 'c1, c2'
				
			

Example:

				
					yugabyte=# select yb_hash_cols_of_table('public.test'::regclass);         -- e.g., 'id'
 yb_hash_cols_of_table
-----------------------
 id
(1 row)

yugabyte=# select yb_hash_cols_of_index('public.test_c1_idx'::regclass);  -- 'c1'
 yb_hash_cols_of_index
-----------------------
 c1
(1 row)

yugabyte=# select yb_hash_cols_of_index('public.test_c2_idx1'::regclass); -- 'c1, c2'
 yb_hash_cols_of_index
-----------------------
 c1, c2
(1 row)
				
			
Step 2 — Per-tablet distribution (with hash_cols in output)

This function:

  • treats the PK index as the table (we don’t double-count it as an index),

  • gets num_tablets from yb_table_properties(...),

  • evenly slices [0..65535] into num_tablets ranges via floor(65536*i/n) .. floor(65536*(i+1)/n)-1,

  • counts rows per range using yb_hash_code(<hash_cols>),

  • and returns a row per tablet including the hash key used.

				
					create or replace function yb_hot_shard_distribution(p_schema text, p_table text)
returns table(
  object_type  text,     -- 'table' or 'index'
  object_name  text,     -- qualified name
  hash_cols    text,     -- <== NEW: the hash key used
  tablet_no    int,      -- 0..num_tablets-1
  hash_start   int,      -- inclusive
  hash_end     int,      -- inclusive
  row_count    bigint,
  pct_of_total numeric
)
language plpgsql
stable
as $$
declare
  v_rel             regclass;
  v_tbl_qualified   text;
  v_tbl_num_tablets int;
  v_hash_cols       text;
  v_total           bigint;
  r record;

  i int; s int; e int; sql text; cnt bigint;

  -- Only HASH-sharded secondary indexes (skip PK == table)
  cur_indexes cursor for
    with idx as (
      select i.indexrelid::regclass as idxreg,
             n.nspname||'.'||c2.relname as idxname,
             i.indisprimary,
             yb_hash_cols_of_index(i.indexrelid) as hash_cols
      from pg_index i
      join pg_class c  on c.oid  = i.indrelid
      join pg_namespace n on n.oid = c.relnamespace
      join pg_class c2 on c2.oid = i.indexrelid
      where n.nspname = p_schema
        and c.relname = p_table
    )
    select idxreg, idxname, hash_cols
    from idx
    where indisprimary = false
      and hash_cols is not null and btrim(hash_cols) <> '';
begin
  v_rel := format('%I.%I', p_schema, p_table)::regclass;
  v_tbl_qualified := format('%I.%I', p_schema, p_table);
  select (yb_table_properties(v_rel)).num_tablets into v_tbl_num_tablets;
  v_tbl_num_tablets := coalesce(nullif(v_tbl_num_tablets, 0), 1);

  -- ===== TABLE (via PK hash columns) =====
  v_hash_cols := yb_hash_cols_of_table(v_rel);
  if v_hash_cols is not null and btrim(v_hash_cols) <> '' then
    sql := format('select count(*) from %s where yb_hash_code(%s) is not null',
                  v_tbl_qualified, v_hash_cols);
    execute sql into v_total;
    v_total := coalesce(v_total, 0);

    for i in 0..(v_tbl_num_tablets-1) loop
      s := floor(65536.0 * i / v_tbl_num_tablets);
      e := floor(65536.0 * (i+1) / v_tbl_num_tablets) - 1;

      sql := format(
        'select count(*) from %s where yb_hash_code(%s) between %s and %s',
        v_tbl_qualified, v_hash_cols, s, e
      );
      execute sql into cnt;
      cnt := coalesce(cnt, 0);

      object_type  := 'table';
      object_name  := v_tbl_qualified;
      hash_cols    := v_hash_cols;     -- <= included
      tablet_no    := i;
      hash_start   := s;
      hash_end     := e;
      row_count    := cnt;
      pct_of_total := case when v_total > 0 then round(100.0 * cnt / v_total, 4) else 0 end;
      return next;
    end loop;
  end if;

  -- ===== SECONDARY INDEXES (HASH) =====
  for r in cur_indexes loop
    declare v_idx_num_tablets int; v_idx_total bigint;
    begin
      select (yb_table_properties(r.idxreg)).num_tablets into v_idx_num_tablets;
      v_idx_num_tablets := coalesce(nullif(v_idx_num_tablets, 0), 1);

      sql := format('select count(*) from %s where yb_hash_code(%s) is not null',
                    v_tbl_qualified, r.hash_cols);
      execute sql into v_idx_total;
      v_idx_total := coalesce(v_idx_total, 0);

      for i in 0..(v_idx_num_tablets-1) loop
        s := floor(65536.0 * i / v_idx_num_tablets);
        e := floor(65536.0 * (i+1) / v_idx_num_tablets) - 1;

        sql := format(
          'select count(*) from %s where yb_hash_code(%s) between %s and %s',
          v_tbl_qualified, r.hash_cols, s, e
        );
        execute sql into cnt;
        cnt := coalesce(cnt, 0);

        object_type  := 'index';
        object_name  := r.idxname;
        hash_cols    := r.hash_cols;   -- <= included
        tablet_no    := i;
        hash_start   := s;
        hash_end     := e;
        row_count    := cnt;
        pct_of_total := case when v_idx_total > 0 then round(100.0 * cnt / v_idx_total, 4) else 0 end;
        return next;
      end loop;
    end;
  end loop;
end;
$$;
				
			
Step 3 — “Hottest shard” summary (with hash_cols)
				
					create or replace function yb_hot_shard_summary(p_schema text, p_table text)
returns table(
  object_type  text,
  object_name  text,
  hash_cols    text,     -- <= included
  tablet_no    int,
  hash_start   int,
  hash_end     int,
  row_count    bigint,
  pct_of_total numeric
)
language sql
stable
as $$
  with dist as (
    select * from yb_hot_shard_distribution(p_schema, p_table)
  )
  select distinct on (object_type, object_name)
         object_type, object_name, hash_cols, tablet_no, hash_start, hash_end, row_count, pct_of_total
  from dist
  order by object_type, object_name, pct_of_total desc;
$$;
				
			
Usage
				
					-- Full per-tablet distribution for table + indexes
select *
from yb_hot_shard_distribution('public','test')
order by object_type, object_name, pct_of_total desc, tablet_no;

-- Hottest shard per object (table and each index)
select *
from yb_hot_shard_summary('public','test');
				
			

Example:

				
					yugabyte=# -- Full per-tablet distribution for table + indexes
yugabyte=# select *
yugabyte-# from yb_hot_shard_distribution('public','test')
yugabyte-# order by object_type, object_name, pct_of_total desc, tablet_no;
 object_type |      object_name      | tablet_no | hash_start | hash_end | row_count | pct_of_total
-------------+-----------------------+-----------+------------+----------+-----------+--------------
 index       | public.test_c1_idx    |         0 |          0 |    21844 |   1000000 |      99.9990
 index       | public.test_c1_idx    |         2 |      43690 |    65535 |        10 |       0.0010
 index       | public.test_c1_idx    |         1 |      21845 |    43689 |         0 |       0.0000
 index       | public.test_c2_idx1   |         1 |      21845 |    43689 |   1000010 |     100.0000
 index       | public.test_c2_idx1   |         0 |          0 |    21844 |         0 |       0.0000
 index       | public.test_c2_idx1   |         2 |      43690 |    65535 |         0 |       0.0000
 table       | public.test           |         0 |          0 |    21844 |    333720 |      33.3717
 table       | public.test           |         2 |      43690 |    65535 |    333425 |      33.3422
 table       | public.test           |         1 |      21845 |    43689 |    332865 |      33.2862
(9 rows)

yugabyte=# -- Hottest shard per object (table and each index)
yugabyte=# select *
yugabyte-# from yb_hot_shard_summary('public','test');
 object_type |     object_name     | tablet_no | hash_start | hash_end | row_count | pct_of_total
-------------+---------------------+-----------+------------+----------+-----------+--------------
 index       | public.test_c1_idx  |         0 |          0 |    21844 |   1000000 |      99.9990
 index       | public.test_c2_idx1 |         1 |      21845 |    43689 |   1000010 |     100.0000
 table       | public.test         |         0 |          0 |    21844 |    333720 |      33.3717
(3 rows)
				
			

👉 In this output:

  • The base table (public.test on id) is nicely balanced across its three tablets (~33% each).

  • The public.test_c1_idx index is extremely skewed — ~99.999% of rows landed in one shard (with just 10 rows trickling into another).

  • The public.test_c2_idx1 index is fully skewed — 100% of rows in a single shard.

Both secondary indexes clearly exhibit hot shard problems, even though the base table is well distributed.

Remediation ideas (once you find a hot shard)
  • Revisit the hash key: add more entropy (e.g., add a second column) or use a composite key that spreads values more evenly.

  • Add a second column to the hash partition that represents a modulus (for example, mod 9) so the rows are spread across nine buckets instead of collapsing into one. There are caveats with this technique, which we’ll cover in a future YugabyteDB Tip.

  • If the hot shard is on an index, make sure you need that index! Can it be dropped? Is it redundant?

  • For workloads with temporal skew, consider modeling time as part of the hash key (with care), or use range partitioning alongside hash.

  •  
Summary
  • You can detect hot shards with pure SQL by counting rows per hash slice for the table and each HASH-partitioned index.
  • The helpers above robustly parse single-column and composite hash keys and show the hash_cols used in the results.

  • On modern releases, prefer ASH and YBA Performance Advisor for real-time, guided detection — but these SQL tools are handy everywhere.

Have Fun!

Standing in awe of Cliff Palace at Mesa Verde… an ancient city tucked into the cliffs, built nearly 800 years ago. Hard to believe people once called this breathtaking spot home! Getting down there was its own adventure ... steep trails, narrow paths, and even ladders to climb, but so worth it.