Demonstrating Hot Shard Detection (and Why a Tablet Can Refuse to Split)

You may see a tablet growing large and repeatedly attempting to split… only to fail with errors like:

				
					TABLET_SPLIT_KEY_RANGE_TOO_SMALL
Failed to detect middle keyconsole.log( 'Code is Poetry' );
				
			

This can look mysterious, but it often has a simple cause:

  • the tablet’s key space has almost no “splittable” variety, even though its data size is large.

A very common way to get there is a NULL-heavy hash index.

This tip shows a quick way to detect the hot shard and connect it to the root cause.

The Core Pattern

The dangerous pattern looks like this:

				
					CREATE INDEX idx_fact_lookup_key
ON some_table (lookup_key HASH);
				
			

Where:

  • lookup_key is nullable

  • ● A large percentage of rows have lookup_key IS NULL

Important fact:

  • All NULL values hash to the same hash code.

That means:

  • ● All NULL entries go to one hash bucket

  • ● That bucket belongs to one tablet

  • ● That tablet has one leader

  • ● That leader becomes hot

What we’re going to build

We’ll create:

  • ● a “fact” table: demo_hot.fact_events

  • ● a nullable bigint column: lookup_key

  • ● a hash secondary index: idx_fact_lookup_key

Then we’ll insert a dataset where a big percentage of rows have lookup_key IS NULL.

Even if the table is large, the important property is:

  • All NULL values hash to the same hash code, concentrating into one hash bucket.

For the demo, I have a local three node cluster:

				
					SELECT host, cloud, region, zone FROM yb_servers() ORDER BY host;
				
			

Sample output:

				
					.  host    | cloud |  region   |    zone
-----------+-------+-----------+------------
 127.0.0.1 | aws   | us-east-1 | us-east-1a
 127.0.0.2 | aws   | us-east-2 | us-east-2a
 127.0.0.3 | aws   | us-west-2 | us-west-2a
(3 rows)
				
			

Step 1: Create the schema and table

				
					DROP SCHEMA IF EXISTS demo_hot CASCADE;
CREATE SCHEMA demo_hot;

CREATE TABLE demo_hot.fact_events (
  row_id      uuid   NOT NULL,
  tenant_id   uuid   NOT NULL,
  lookup_key  bigint NULL,
  created_at  timestamp NOT NULL DEFAULT now(),

  -- hashed primary key (typical “multi-tenant” shape)
  PRIMARY KEY ((tenant_id, row_id))
);

-- Secondary index on a nullable bigint (hash index)
CREATE INDEX idx_fact_lookup_key
ON demo_hot.fact_events (lookup_key);
				
			

Step 2: Insert skewed data (small demo)

This inserts 100,000 rows:

  • ● 60,000 rows with lookup_key IS NULL (skew)

  • ● 40,000 rows with lookup_key = 1..40000

				
					-- 60% NULLs
INSERT INTO demo_hot.fact_events (row_id, tenant_id, lookup_key, created_at)
SELECT
  (substr(md5('row-null-' || gs::text),1,8) || '-' ||
   substr(md5('row-null-' || gs::text),9,4) || '-' ||
   substr(md5('row-null-' || gs::text),13,4) || '-' ||
   substr(md5('row-null-' || gs::text),17,4) || '-' ||
   substr(md5('row-null-' || gs::text),21,12))::uuid AS row_id,
  (substr(md5('tenant-' || (gs % 100)::text),1,8) || '-' ||
   substr(md5('tenant-' || (gs % 100)::text),9,4) || '-' ||
   substr(md5('tenant-' || (gs % 100)::text),13,4) || '-' ||
   substr(md5('tenant-' || (gs % 100)::text),17,4) || '-' ||
   substr(md5('tenant-' || (gs % 100)::text),21,12))::uuid AS tenant_id,
  NULL::bigint AS lookup_key,
  now() - (gs % 86400) * interval '1 second' AS created_at
FROM generate_series(1, 60000) gs;

-- 40% non-NULL keys
INSERT INTO demo_hot.fact_events (row_id, tenant_id, lookup_key, created_at)
SELECT
  (substr(md5('row-key-' || k::text),1,8) || '-' ||
   substr(md5('row-key-' || k::text),9,4) || '-' ||
   substr(md5('row-key-' || k::text),13,4) || '-' ||
   substr(md5('row-key-' || k::text),17,4) || '-' ||
   substr(md5('row-key-' || k::text),21,12))::uuid AS row_id,
  (substr(md5('tenant-' || (k % 100)::text),1,8) || '-' ||
   substr(md5('tenant-' || (k % 100)::text),9,4) || '-' ||
   substr(md5('tenant-' || (k % 100)::text),13,4) || '-' ||
   substr(md5('tenant-' || (k % 100)::text),17,4) || '-' ||
   substr(md5('tenant-' || (k % 100)::text),21,12))::uuid AS tenant_id,
  k::bigint AS lookup_key,
  now() - (k % 86400) * interval '1 second' AS created_at
FROM generate_series(1, 40000) k;
				
			

Step 3: Confirm the skew and compute the NULL hash code

3.1 Basic skew stats
				
					SELECT
  COUNT(*) AS total_rows,
  COUNT(*) FILTER (WHERE lookup_key IS NULL) AS null_rows,
  ROUND(100.0 * COUNT(*) FILTER (WHERE lookup_key IS NULL) / NULLIF(COUNT(*), 0), 2) AS null_pct,
  COUNT(DISTINCT lookup_key) AS distinct_keys
FROM demo_hot.fact_events;
				
			

Sample output:

				
					.total_rows | null_rows | null_pct | distinct_keys
------------+-----------+----------+---------------
     100000 |     60000 |    60.00 |         40000
(1 row)
				
			

That’s your red flag.

3.2 What hash bucket does NULL map to?
				
					SELECT
  yb_hash_code(NULL::bigint) AS null_hash_code,
  to_hex(yb_hash_code(NULL::bigint)) AS null_hash_hex;
				
			

Sample output:

				
					.null_hash_code | null_hash_hex
----------------+---------------
          19780 | 4d44
(1 row)
				
			

This means:

  • All NULL entries land in hash bucket 0x4D44.

Step 4: Show that NULL dominates one hash bucket

This query groups by hash code and shows which hash buckets are heavy:

				
					SELECT
  yb_hash_code(lookup_key) AS hash_code,
  to_hex(yb_hash_code(lookup_key)) AS hash_hex,
  COUNT(*) AS rows
FROM demo_hot.fact_events
GROUP BY 1,2
ORDER BY rows DESC
LIMIT 5;
				
			

Sample output:

				
					.hash_code | hash_hex | rows
-----------+----------+-------
     19780 | 4d44     | 60000
     32526 | 7f0e     |     7
      4613 | 1205     |     5
     60654 | ecee     |     5
     43740 | aadc     |     5
(5 rows)
				
			

You should see one row at the top where:

  • lookup_key was NULL

  • hash_code matches yb_hash_code(NULL::bigint)

  • rows is ~60,000 in this demo

You will also likely see a few non-NULL hash buckets with small counts (for example 5–10 rows). That is completely normal.

A better “hot bucket” query

This version explicitly shows how dominant NULL is relative to the rest:

				
					WITH buckets AS (
  SELECT yb_hash_code(lookup_key) AS hash_code,
         to_hex(yb_hash_code(lookup_key)) AS hash_hex,
         COUNT(*) AS rows
  FROM demo_hot.fact_events
  GROUP BY 1,2
),
ranked AS (
  SELECT *,
         rows::numeric / SUM(rows) OVER () AS pct_of_rows,
         MAX(rows) OVER ()                 AS max_rows,
         rows::numeric / NULLIF(AVG(rows) OVER (), 0) AS vs_avg
  FROM buckets
)
SELECT hash_code, hash_hex, rows,
       ROUND(100*pct_of_rows, 2) AS pct_of_rows,
       ROUND(vs_avg, 2)          AS times_avg_bucket
FROM ranked
ORDER BY rows DESC
LIMIT 10;
				
			

Sample output:

				
					.hash_code | hash_hex | rows  | pct_of_rows | times_avg_bucket
-----------+----------+-------+-------------+------------------
     19780 | 4d44     | 60000 |       60.00 |         17917.80
     32526 | 7f0e     |     7 |        0.01 |             2.09
     63180 | f6cc     |     5 |        0.01 |             1.49
     34036 | 84f4     |     5 |        0.01 |             1.49
     59874 | e9e2     |     5 |        0.01 |             1.49
     60654 | ecee     |     5 |        0.01 |             1.49
      4613 | 1205     |     5 |        0.01 |             1.49
     59682 | e922     |     5 |        0.01 |             1.49
     43740 | aadc     |     5 |        0.01 |             1.49
     46072 | b3f8     |     5 |        0.01 |             1.49
(10 rows)
				
			

That makes the point unmissable: the NULL bucket is orders of magnitude above average.

Step 5: Identify the Tablet Owning the Hot Bucket

List index tablets:

				
					SELECT
  tablet_id,
  leader,
  start_hash_code,
  end_hash_code
FROM yb_tablet_metadata
WHERE oid = 'demo_hot.idx_fact_lookup_key'::regclass::oid
ORDER BY start_hash_code;
				
			

Now locate the NULL bucket:

				
					WITH null_bucket AS (
  SELECT yb_hash_code(NULL::bigint) AS h
)
SELECT
  ytm.tablet_id,
  ytm.leader,
  ytm.start_hash_code,
  ytm.end_hash_code
FROM yb_tablet_metadata ytm
JOIN null_bucket nb
  ON nb.h >= ytm.start_hash_code
 AND nb.h <  ytm.end_hash_code
WHERE ytm.oid = 'demo_hot.idx_fact_lookup_key'::regclass::oid;
				
			

Sample output:

				
					.           tablet_id             |     leader     | start_hash_code | end_hash_code
----------------------------------+----------------+-----------------+---------------
 307b1474106c443d8c0a917a111be977 | 127.0.0.1:5433 |               0 |         65536
(1 row)
				
			

This tablet is your hot shard.

Why TABLET_SPLIT_KEY_RANGE_TOO_SMALL can happen here

Tablet splitting requires finding a “middle key” (a split point) between the tablet’s bounds.

But in the pathological case:

  • ● the tablet’s effective key space is dominated by one logical value (NULL)

  • ● there may be no meaningful “middle key” to pick (especially if most entries have identical/near-identical key prefixes)

So despite having lots of bytes, the tablet can have too little key diversity to split cleanly, producing errors like:

  • TABLET_SPLIT_KEY_RANGE_TOO_SMALL

  • ● “Failed to detect middle key”

The key insight:

  • Splitting depends on key distribution, not just tablet size.

Step 6: Show How This Becomes a Leader Bottleneck

Every write to:

				
					INSERT ... lookup_key IS NULL
				
			

Now routes to:

  • ● One hash bucket

  • ● One tablet

  • ● One leader

  • ● One tablet server

To see leader distribution:

				
					SELECT
  leader,
  COUNT(*) AS tablets_led
FROM yb_tablet_metadata
WHERE oid = 'demo_hot.idx_fact_lookup_key'::regclass::oid
GROUP BY leader
ORDER BY tablets_led DESC;
				
			

Sample output:

				
					.    leader     | tablets_led
----------------+-------------
 127.0.0.1:5433 |           1
(1 row)
				
			

Now isolate the hot tablet’s leader:

				
					WITH hot_tablet AS (
  SELECT tablet_id, leader
  FROM yb_tablet_metadata
  WHERE oid = 'demo_hot.idx_fact_lookup_key'::regclass::oid
    AND yb_hash_code(NULL::bigint) >= start_hash_code
    AND yb_hash_code(NULL::bigint) <  end_hash_code
)
SELECT
  h.tablet_id,
  h.leader,
  COUNT(t.tablet_id) AS total_tablets_led_by_node
FROM hot_tablet h
JOIN yb_tablet_metadata t
  ON t.leader = h.leader
GROUP BY h.tablet_id, h.leader;
				
			

Sanple output:

				
					.           tablet_id             |     leader     | total_tablets_led_by_node
----------------------------------+----------------+---------------------------
 307b1474106c443d8c0a917a111be977 | 127.0.0.1:5433 |                        15
(1 row)
				
			

If that node already leads many tablets, the hot shard amplifies node imbalance.

What This Looks Like Operationally

You may observe:

  • ● One node high CPU

  • ● High compaction load on one node

  • ● Large SST growth on one tablet

  • ● Split retries in logs

  • ● Eventually TABLET_SPLIT_KEY_RANGE_TOO_SMALL

The cluster is balanced by tablet count… but unbalanced by workload.

Production-safe checklist for hot shard diagnosis

If the root cause is “NULL-dominated hash index key”, the most common fix is:

1) Partial index (avoid NULL entries entirely)

				
					CREATE INDEX idx_fact_lookup_key_not_null
ON demo_hot.fact_events (lookup_key)
WHERE lookup_key IS NOT NULL;
				
			

2) Populate a real, high-cardinality key (avoid NULL dominance)

If “missing lookup_key” is a normal state, consider:

  • ● a separate boolean (has_lookup_key) + different access path

  • ● using a different indexed attribute

  • ● or storing a sentinel value only if it does not create its own skew

3) Redesign key usage

  • ● Avoid hashing low-cardinality columns

  • ● Avoid indexing heavily NULL columns

  • ● Use alternative access paths

There is no split workaround if the key distribution itself is skewed.

What This Demonstrates

  • ● Hash sharding distributes keys, not rows.

  • ● NULL-heavy hash keys create single dominant hash buckets.

  • ● A large tablet is not necessarily splittable.

  • ● Split logic depends on key diversity — not SST size.

  • ● A skewed key can create:

    • ○ a hot tablet

    • ○ a hot leader

    • ○ and ultimately a hot node

In other words:

  • Hot key → hot bucket → hot tablet → hot leader → hot node.

Splitting cannot rescue low-cardinality or NULL-dominated design.

When one logical key owns a disproportionate share of the data, the system is behaving correctly… it is reflecting your key distribution.

The durable fix is almost always:

  • Fix the data model, not the tablet.

Have Fun!

Brand new plane. Cutting-edge technology. Back-of-seat entertainment at every seat… …except mine.