Choosing the Right Primary Key to Avoid Write Hotspots

When you move from a single-node database to a distributed system like YugabyteDB, one design decision suddenly matters a lot more:

What worked perfectly in PostgreSQL can quietly become your biggest bottleneck in a distributed cluster.

⚠️ The Core Problem: Monotonic Keys Create Hotspots (Sometimes)

In a distributed database, data is split into tablets.

If your primary key is monotonically increasing (like SERIAL or IDENTITY), then:

  • ● Every new insert can go to the same tablet
  • ● That tablet becomes a write hotspot
  • ● Throughput bottlenecks quickly follow

πŸ‘‰ This is a classic distributed database anti-pattern…

…but there’s an important twist in YugabyteDB πŸ‘‡

🧠 Key Insight
In YugabyteDB, your primary key is not just an identifier… it determines how your data is distributed across the cluster.

By default, YugabyteDB uses HASH sharding, which automatically distributes writes… even for sequential keys.
But if you switch to RANGE sharding, your key design becomes critical to avoid hotspots.

βš™οΈ HASH vs RANGE: What Actually Controls Distribution?

YugabyteDB behavior is controlled by this setting:

				
					SELECT name, setting
FROM pg_settings
WHERE name = 'yb_use_hash_splitting_by_default';
				
			

Default:

				
					yb_use_hash_splitting_by_default = on
				
			
What this means
  • βœ… ON (default) β†’ HASH sharding
    • ● Writes are evenly distributed
    • ● Sequential IDs are safe
  • ⚠️ OFF β†’ RANGE sharding
    • ● Inserts follow key order
    • ● Monotonic keys = hotspot risk

πŸ”‘ Option 1: UUID (Best Default for Distribution)

Use when:

  • ● You don’t need human-readable IDs
  • ● You want guaranteed even distribution
				
					CREATE TABLE customer_payments (
    payment_id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    account_id UUID NOT NULL,
    amount DECIMAL(12,2),
    processed_at TIMESTAMPTZ DEFAULT now()
);
				
			
Why this works
  • ● Random values β†’ naturally distributed
  • ● Works with both HASH and RANGE setups

πŸ‘‰ This is the safest default for most distributed workloads.

πŸ”’ Option 2: IDENTITY (Sequential IDs, Still Distributed by Default)

Use when:

  • ● You need human-readable or sequential IDs
  • ● Example: ticket numbers, invoice numbers
				
					CREATE TABLE support_cases (
    case_number BIGINT GENERATED ALWAYS AS IDENTITY,
    subject TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (case_number)
);
				
			
Why it works
  • ● Even though values are sequential…
  • ● YugabyteDB automatically applies HASH sharding (by default)

πŸ‘‰ You get the best of both worlds:

  • βœ… Sequential IDs
  • βœ… Even write distribution
⚠️ Important Caveat

If this setting is disabled:

				
					SET yb_use_hash_splitting_by_default = off;
				
			

Then:

				
					PRIMARY KEY (case_number)
				
			

πŸ‘‰ becomes range-based

🚨 Now you will get:

  • ● Sequential inserts β†’ same tablet
  • ● Write hotspot

🧩 Option 3: Natural Key (Leverage Your Data Model)

Use when:

  • ● Your application has a well-distributed grouping key
    • Example: org_id, tenant_id, user_id
  • ● And you need uniqueness within that group
				
					CREATE TABLE organization_events (
    org_id UUID,
    event_id UUID,
    event_payload JSONB,
    PRIMARY KEY (org_id, event_id)
);
				
			
Why it works
  • ● Leading column (org_id) drives distribution
  • ● Second column (event_id) provides uniqueness and ordering within that group

πŸ‘‰ In YugabyteDB:

  • ● Data is distributed based on the primary key prefix
  • ● So:
    • β—‹ org_id β†’ spreads data across tablets
    • β—‹ event_id β†’ organizes rows within each org
πŸ’‘ Pro Tip
In composite primary keys, the leading column determines how data is distributed.

Choose a high-cardinality, well-distributed column (like tenant_id) first, and use the remaining columns for uniqueness and access patterns.

πŸ“Š Quick Decision Guide

Scenario Best Choice
No natural key, distribution matters most UUID
Need sequential / human-readable IDs IDENTITY (HASH by default)
Already have a well-distributed key Natural Key

⚠️ Common Mistake to Avoid

				
					PRIMARY KEY (id)
				
			

With:

				
					id BIGINT GENERATED ALWAYS AS IDENTITY
				
			

πŸ‘‰ This is safe by default in YugabyteDB…

…but becomes dangerous if:

				
					yb_use_hash_splitting_by_default = off
				
			

🎯 Final Takeaway

In YugabyteDB:

  • ● Your primary key = your distribution strategy
  • ● HASH sharding (default) protects you from hotspots
  • ● RANGE sharding puts the responsibility back on your schema design

πŸ‘‰ The rule of thumb:

  • ● Use UUID for simplicity and safety
  • ● Use IDENTITY when you need readable IDs (safe with HASH)
  • ● Use natural keys when your data already distributes well

πŸ‘‰ Get this right early, and your cluster scales effortlessly.
πŸ‘‰ Get it wrong (especially with RANGE), and you’ll be chasing hotspots later.

Related YugabyteDB tips

Have Fun!