TL;DR
YugabyteDB splits hash-sharded tables into tablets using a deterministic hash space, replicates each tablet across nodes, and places leaders according to preferred regions. Using yb_tablet_metadata and SQL, you can see exactly where each tablet lives, which replica is the leader, and how leader placement changes without moving data.
Introduction
In YugabyteDB, understanding how tables are split into tablets, where replicas live, and which nodes host tablet leaders is critical for:
β Predictable performance
β Cross-region latency control
β Fault tolerance
β Verifying that data placement and leader preferences are actually being honored
This tip walks through:
β How YugabyteDB uses hash codes to distribute rows into tablets
β How replication factor (RF) creates leaders and followers
β How preferred leader regions influence where traffic goes
β How to observe all of this directly using SQL
This tip focuses on hash-sharded (hash-partitioned) tables in YugabyteDB.
The per-tablet row counts and hash-range columns are derived from the
yb_hash_code()
hash space and the tablet hash boundaries exposed in
yb_tablet_metadata.
Range-sharded (range-partitioned) tables use different tablet boundaries, so the βhash range β tabletβ mapping shown here does not apply.
2025.2.0.0). Configuration defaults and behavior may differ in earlier releases.
How YugabyteDB breaks tables into tablets
Hash-sharded tables in YugabyteDB are distributed using a deterministic hash function.
β YugabyteDB computes a hash using
yb_hash_code(...)β The hash space is 16-bit, ranging from 0 to 65535
β Each tablet owns a contiguous slice of that hash space
β Together, all tablets fully cover
[0,65536)
For example, a table split into 3 tablets might look like this:
Hash Space (0β65535)
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ
β Tablet A β Tablet B β Tablet C β
β [0,21845) β [21845,43690)β [43690,65536)β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ
Every row hashes into exactly one tablet based on its hash key columns.
Replication factor, leaders, and followers
Each tablet is replicated RF (Replication Factor) times.
β One replica is elected as the leader
β The remaining replicas are followers
β All writes and strongly consistent reads go to the leader
β Followers stay in sync and can take over if the leader fails
Preferred zones and leader placement
YugabyteDB allows you to express data placement policies that include:
β Fault tolerance level (zone, region, or cloud)
β Replication factor
β Preferred leader locations
Preferred leader zones influence where tablet leaders should live when possible.
Leaders are not permanently pinned, but YugabyteDB actively rebalances leadership to respect these preferences during normal operation.
Replication factor controls where data exists.
Leader placement controls where traffic goes.
Every tablet is replicated according to the configured replication factor (RF), ensuring durability and fault tolerance.
However, only the leader replica handles writes and strongly consistent reads.
- β Changing replication affects data availability
- β Changing leader preference affects latency and request routing
You can replicate data across regions for fault tolerance while steering application traffic toward preferred regions by adjusting leader preferences, without moving or rewriting data.
Demo setup: cluster layout
For this demo, the cluster has one node per region:
yugabyte=# SELECT host, cloud, region, zone FROM yb_servers() ORDER BY host;
host | cloud | region | zone
-----------+-------+-----------+------------
127.0.0.1 | aws | us-east-1 | us-east-1a
127.0.0.2 | aws | us-east-2 | us-east-2a
127.0.0.3 | aws | us-west-2 | us-west-2a
(3 rows)
This layout makes leader placement behavior very easy to observe.
Initial data placement configuration
The cluster is configured with:
β Fault tolerance: region
β Replication factor: 3 (one replica per region)
β Leader preference: us-east-1
yugabyted configure data_placement \
--fault_tolerance=region \
--constraint_value=aws.us-east-1.us-east-1a:1,aws.us-east-2.us-east-2a:2,aws.us-west-2.us-west-2a:3 \
--rf=3 \
--base_dir=~/yb01
Because us-east-1 has the highest leader preference, all tablet leaders initially land in us-east-1.
Create a table with visible tablet distribution
CREATE TABLE test (
c1 INT PRIMARY KEY
) SPLIT INTO 9 TABLETS;
INSERT INTO test
SELECT generate_series(1, 1000);
This creates nine tablets covering the full hash space and inserts enough rows to make distribution visible.
Visualizing tablets, replicas, and leaders
Conceptually, each tablet looks like this:
Tablet: [start_hash, end_hash)
Leader (writes)
β
βΌ
ββββββββββββββ
β us-east-1 β
ββββββββββββββ
β² β²
β β
ββββββββββββββ ββββββββββββββ
β us-east-2 β β us-west-2 β
β follower β β follower β
ββββββββββββββ ββββββββββββββ
A helper function to inspect tablet placement
To make tablet distribution and leader placement easy to inspect using SQL, weβll create a small helper function.
This function:
β Takes a schema name and table name
β Joins
yb_tablet_metadatawithyb_servers()β Computes:
β tablet hash ranges
β leader vs follower replicas
β row counts per tablet
β percentage of table rows per tablet
You only need to create this function once per database.
CREATE OR REPLACE FUNCTION public.yb_tablet_placement_details(
schema_name_p text,
table_name_p text,
leaders_only_p boolean DEFAULT false
)
RETURNS TABLE(
schema_name text,
table_name text,
ysql_oid oid,
tablet_id text,
tablet_hash_start integer,
tablet_hash_end integer,
replica_role text,
node_host text,
node_port integer,
node_public_ip text,
node_uuid text,
node_type text,
cloud text,
region text,
zone text,
tablet_row_count bigint,
tablet_pct_of_table numeric
)
LANGUAGE plpgsql
AS $function$
DECLARE
v_relid oid;
v_relkind "char";
v_base_relid oid;
v_base_schema text;
v_base_relname text;
v_pk_index_oid oid;
v_pk_def text;
v_idx_def text;
v_hash_cols_raw text;
v_hash_cols_arr text[];
v_hash_expr text;
v_has_bounds boolean;
v_role_filter text := '';
v_sql text;
BEGIN
-- Resolve relation OID (table or index)
SELECT to_regclass(format('%I.%I', schema_name_p, table_name_p))::oid
INTO v_relid;
IF v_relid IS NULL THEN
RAISE EXCEPTION 'Relation %.% not found', schema_name_p, table_name_p;
END IF;
SELECT c.relkind
INTO v_relkind
FROM pg_class c
WHERE c.oid = v_relid;
IF leaders_only_p THEN
v_role_filter := 'WHERE replica_role = ''LEADER''';
END IF;
-- Do we have hash boundaries for THIS relation (table or index)?
SELECT EXISTS (
SELECT 1
FROM yb_tablet_metadata ytm
WHERE ytm.db_name = current_database()
AND ytm.oid = v_relid
AND (ytm.start_hash_code IS NOT NULL OR ytm.end_hash_code IS NOT NULL)
)
INTO v_has_bounds;
/*
Determine hash expression:
- If TABLE: use PK hash cols
- If INDEX: use index HASH cols, but computed from the BASE TABLE alias "t"
*/
IF v_has_bounds THEN
IF v_relkind = 'i' THEN
-- Base table for this index
SELECT i.indrelid
INTO v_base_relid
FROM pg_index i
WHERE i.indexrelid = v_relid;
IF v_base_relid IS NULL THEN
RAISE EXCEPTION 'Could not resolve base table for index %.%', schema_name_p, table_name_p;
END IF;
SELECT n.nspname, c.relname
INTO v_base_schema, v_base_relname
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.oid = v_base_relid;
-- Extract HASH columns from the index definition (lsm (... HASH ...))
v_idx_def := pg_get_indexdef(v_relid);
SELECT string_agg(quote_ident(regexp_replace(trim(part), '\s+HASH.*$', '', 'i')), ',' )
INTO v_hash_cols_raw
FROM regexp_split_to_table(
regexp_replace(v_idx_def, '^.*\((.*)\).*$','\1'),
','
) AS part
WHERE part ~* '\mHASH\M';
IF v_hash_cols_raw IS NULL OR length(btrim(v_hash_cols_raw)) = 0 THEN
RAISE EXCEPTION 'Index %.% does not appear to have HASH columns (indexdef=%)',
schema_name_p, table_name_p, v_idx_def;
END IF;
SELECT array_agg(btrim(x))
INTO v_hash_cols_arr
FROM unnest(string_to_array(v_hash_cols_raw, ',')) AS x;
-- Build: yb_hash_code(t.col1, t.col2, ...)
SELECT
'yb_hash_code(' ||
string_agg('t.' || col, ', ') ||
')'
INTO v_hash_expr
FROM unnest(v_hash_cols_arr) AS col;
ELSE
-- TABLE path: PK-based hash expression
SELECT i.indexrelid
INTO v_pk_index_oid
FROM pg_index i
WHERE i.indrelid = v_relid
AND i.indisprimary
LIMIT 1;
IF v_pk_index_oid IS NULL THEN
RAISE EXCEPTION 'Table %.% has no primary key (required to map rows to tablets)', schema_name_p, table_name_p;
END IF;
v_pk_def := pg_get_indexdef(v_pk_index_oid);
-- Try to extract explicit HASH columns from PK definition: "((c1, c2) HASH ...)"
SELECT (regexp_match(v_pk_def, '\(\((.+?)\)\s+HASH', 'i'))[1]
INTO v_hash_cols_raw;
IF v_hash_cols_raw IS NOT NULL THEN
SELECT array_agg(btrim(x))
INTO v_hash_cols_arr
FROM unnest(string_to_array(v_hash_cols_raw, ',')) AS x;
ELSE
-- Fallback: all PK columns in index order
SELECT array_agg(quote_ident(a.attname) ORDER BY s.ord)
INTO v_hash_cols_arr
FROM pg_index i
JOIN LATERAL unnest(i.indkey) WITH ORDINALITY AS s(attnum, ord) ON true
JOIN pg_attribute a
ON a.attrelid = i.indrelid
AND a.attnum = s.attnum
WHERE i.indexrelid = v_pk_index_oid;
END IF;
IF v_hash_cols_arr IS NULL OR array_length(v_hash_cols_arr, 1) = 0 THEN
RAISE EXCEPTION 'Could not determine hash key columns for %.% (pk_def=%)', schema_name_p, table_name_p, v_pk_def;
END IF;
SELECT
'yb_hash_code(' ||
string_agg('t.' || quote_ident(replace(col,'"','')), ', ') ||
')'
INTO v_hash_expr
FROM unnest(v_hash_cols_arr) AS col;
END IF;
END IF;
/*
Build the query
*/
IF v_has_bounds THEN
v_sql := format($q$
WITH tablet_map AS (
SELECT
ytm.tablet_id,
ytm.oid,
COALESCE(ytm.start_hash_code, 0)::int AS start_hash_code_eff,
COALESCE(ytm.end_hash_code, 65536)::int AS end_hash_code_eff,
ytm.leader,
ytm.replicas
FROM yb_tablet_metadata ytm
WHERE ytm.db_name = current_database()
AND ytm.oid = %s
),
-- β
production nicety: count starting from tablet_map so empty tablets appear with 0
counts AS (
SELECT
tm.tablet_id,
COUNT(t.*)::bigint AS tablet_row_count
FROM tablet_map tm
LEFT JOIN %I.%I t
ON %s >= tm.start_hash_code_eff
AND %s < tm.end_hash_code_eff
GROUP BY tm.tablet_id
),
totals AS (
SELECT COALESCE(SUM(tablet_row_count)::numeric, 0::numeric) AS total_rows
FROM counts
),
replica_rows AS (
SELECT
tm.tablet_id,
tm.oid,
tm.start_hash_code_eff,
tm.end_hash_code_eff,
rep AS replica_hostport,
tm.leader,
split_part(rep, ':', 1) AS rep_host,
split_part(rep, ':', 2)::int AS rep_port
FROM tablet_map tm
CROSS JOIN LATERAL unnest(tm.replicas) AS rep
),
base AS (
SELECT
%L::text AS schema_name,
%L::text AS table_name,
rr.oid AS ysql_oid,
rr.tablet_id,
rr.start_hash_code_eff AS tablet_hash_start,
rr.end_hash_code_eff AS tablet_hash_end,
CASE
WHEN rr.replica_hostport = rr.leader THEN 'LEADER'
ELSE 'FOLLOWER'
END AS replica_role,
ys.host AS node_host,
ys.port::int AS node_port,
ys.public_ip AS node_public_ip,
ys.uuid::text AS node_uuid,
ys.node_type AS node_type,
ys.cloud,
ys.region,
ys.zone,
COALESCE(c.tablet_row_count, 0)::bigint AS tablet_row_count,
CASE
WHEN (SELECT total_rows FROM totals) = 0
THEN NULL::numeric(6,2)
ELSE ROUND((COALESCE(c.tablet_row_count, 0)::numeric / (SELECT total_rows FROM totals)) * 100, 2)
END AS tablet_pct_of_table
FROM replica_rows rr
JOIN yb_servers() ys
ON rr.rep_port = ys.port::int
AND (rr.rep_host = ys.host OR rr.rep_host = ys.public_ip)
LEFT JOIN counts c
ON c.tablet_id = rr.tablet_id
)
SELECT *
FROM base
%s
ORDER BY tablet_hash_start, tablet_id, replica_role DESC, node_host, node_port;
$q$,
v_relid,
-- scan base table if index, else scan requested table
(CASE WHEN v_relkind = 'i' THEN v_base_schema ELSE schema_name_p END),
(CASE WHEN v_relkind = 'i' THEN v_base_relname ELSE table_name_p END),
v_hash_expr, v_hash_expr,
schema_name_p, table_name_p,
v_role_filter
);
ELSE
v_sql := format($q$
WITH tablet_map AS (
SELECT
ytm.tablet_id,
ytm.oid,
ytm.leader,
ytm.replicas
FROM yb_tablet_metadata ytm
WHERE ytm.db_name = current_database()
AND ytm.oid = %s
),
replica_rows AS (
SELECT
tm.tablet_id,
tm.oid,
rep AS replica_hostport,
tm.leader,
split_part(rep, ':', 1) AS rep_host,
split_part(rep, ':', 2)::int AS rep_port
FROM tablet_map tm
CROSS JOIN LATERAL unnest(tm.replicas) AS rep
),
base AS (
SELECT
%L::text AS schema_name,
%L::text AS table_name,
rr.oid AS ysql_oid,
rr.tablet_id,
NULL::int AS tablet_hash_start,
NULL::int AS tablet_hash_end,
CASE
WHEN rr.replica_hostport = rr.leader THEN 'LEADER'
ELSE 'FOLLOWER'
END AS replica_role,
ys.host AS node_host,
ys.port::int AS node_port,
ys.public_ip AS node_public_ip,
ys.uuid::text AS node_uuid,
ys.node_type AS node_type,
ys.cloud,
ys.region,
ys.zone,
NULL::bigint AS tablet_row_count,
NULL::numeric(6,2) AS tablet_pct_of_table
FROM replica_rows rr
JOIN yb_servers() ys
ON rr.rep_port = ys.port::int
AND (rr.rep_host = ys.host OR rr.rep_host = ys.public_ip)
)
SELECT *
FROM base
%s
ORDER BY tablet_id, replica_role DESC, node_host, node_port;
$q$,
v_relid,
schema_name_p, table_name_p,
v_role_filter
);
END IF;
RETURN QUERY EXECUTE v_sql;
END;
$function$;
Observing tablet placement using SQL
-- all replicas
SELECT * FROM yb_tablet_placement_details('public','test');
-- Leaders only, sorted by hash range start
SELECT * FROM yb_tablet_placement_details('public','test', true);
-- Verify: total rows
SELECT sum(tablet_row_count) FROM yb_tablet_placement_details('public','test', true);
This returns one row per tablet leader, including:
β Tablet ID and hash range
β Leader region and zone
β Row count per tablet
β Percentage of total table rows
Example:
yugabyte=# SELECT * FROM yb_tablet_placement_details('public','test', true);
schema_name | table_name | ysql_oid | tablet_id | tablet_hash_start | tablet_hash_end | replica_role | node_host | node_port | node_public_ip | node_uuid | node_type | cloud | region | zone | tablet_row_count | tablet_pct_of_table
-------------+------------+----------+----------------------------------+-------------------+-----------------+--------------+-----------+-----------+----------------+----------------------------------+-----------+-------+-----------+------------+------------------+---------------------
public | test | 16391 | a09897953e7c402280a1713e1a181014 | 0 | 7281 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 117 | 11.70
public | test | 16391 | 5551450e4fdb4e1ab3fe7bc9cb75c9a5 | 7281 | 14563 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 109 | 10.90
public | test | 16391 | c306ea285cd548aab76d098478baa037 | 14563 | 21845 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 119 | 11.90
public | test | 16391 | b1c55f2cab1c40c1924a696dd8c49085 | 21845 | 29127 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 114 | 11.40
public | test | 16391 | 0cca4664fadf4b9d9166a431667bb747 | 29127 | 36408 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 97 | 9.70
public | test | 16391 | a91d4675c60b4b64a4dbb4e796db6afa | 36408 | 43690 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 110 | 11.00
public | test | 16391 | 20a04222f8894e3c8d1584efb58998ef | 43690 | 50972 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 108 | 10.80
public | test | 16391 | fd5dc0a957504d97b9265ea33ad56786 | 50972 | 58254 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 107 | 10.70
public | test | 16391 | 9dd91e5dc58641bdb3483d8d7759215c | 58254 | 65536 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 119 | 11.90
(9 rows)
yugabyte=# SELECT sum(tablet_row_count) FROM yb_tablet_placement_details('public','test', true);
sum
------
1000
(1 row)
Changing preferred leader regions
Now update the placement policy to allow leaders in us-east-1 and us-east-2:
[root@localhost yb]# yugabyted configure data_placement \
--fault_tolerance=region \
--constraint_value=aws.us-east-1.us-east-1a:1,aws.us-east-2.us-east-2a:1,aws.us-west-2.us-west-2a:2 \
--rf=3 \
--base_dir=~/yb01
+------------------------------------------------------------------------------------------------+
| yugabyted |
+------------------------------------------------------------------------------------------------+
| Status : Configuration successful. Primary data placement is geo-redundant. |
| Fault Tolerance : Primary Cluster can survive at most any 1 region failure. |
| Zone Preferences : Successful in setting preference for zones. |
| : Following are the preferences for zones |
| : aws.us-east-2.us-east-2a : 1 |
| : aws.us-east-1.us-east-1a : 1 |
| : aws.us-west-2.us-west-2a : 2 |
+------------------------------------------------------------------------------------------------+
Expected behavior
β No data movement
β No table rebuild
β Tablet leaders rebalance across us-east-1 and us-east-2
β Followers remain in all regions
Verifying the new leader distribution
SELECT * FROM yb_tablet_placement_details('public','test', true);
Example:
yugabyte=# SELECT * FROM yb_tablet_placement_details('public','test', true);
schema_name | table_name | ysql_oid | tablet_id | tablet_hash_start | tablet_hash_end | replica_role | node_host | node_port | node_public_ip | node_uuid | node_type | cloud | region | zone | tablet_row_count | tablet_pct_of_table
-------------+------------+----------+----------------------------------+-------------------+-----------------+--------------+-----------+-----------+----------------+----------------------------------+-----------+-------+-----------+------------+------------------+---------------------
public | test | 16391 | a09897953e7c402280a1713e1a181014 | 0 | 7281 | LEADER | 127.0.0.2 | 5433 | 127.0.0.2 | 10d1c6ffc1e04a69a4b22c8edd489c5d | primary | aws | us-east-2 | us-east-2a | 117 | 11.70
public | test | 16391 | 5551450e4fdb4e1ab3fe7bc9cb75c9a5 | 7281 | 14563 | LEADER | 127.0.0.2 | 5433 | 127.0.0.2 | 10d1c6ffc1e04a69a4b22c8edd489c5d | primary | aws | us-east-2 | us-east-2a | 109 | 10.90
public | test | 16391 | c306ea285cd548aab76d098478baa037 | 14563 | 21845 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 119 | 11.90
public | test | 16391 | b1c55f2cab1c40c1924a696dd8c49085 | 21845 | 29127 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 114 | 11.40
public | test | 16391 | 0cca4664fadf4b9d9166a431667bb747 | 29127 | 36408 | LEADER | 127.0.0.2 | 5433 | 127.0.0.2 | 10d1c6ffc1e04a69a4b22c8edd489c5d | primary | aws | us-east-2 | us-east-2a | 97 | 9.70
public | test | 16391 | a91d4675c60b4b64a4dbb4e796db6afa | 36408 | 43690 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 110 | 11.00
public | test | 16391 | 20a04222f8894e3c8d1584efb58998ef | 43690 | 50972 | LEADER | 127.0.0.2 | 5433 | 127.0.0.2 | 10d1c6ffc1e04a69a4b22c8edd489c5d | primary | aws | us-east-2 | us-east-2a | 108 | 10.80
public | test | 16391 | fd5dc0a957504d97b9265ea33ad56786 | 50972 | 58254 | LEADER | 127.0.0.1 | 5433 | 127.0.0.1 | 465326e2b44446359fea51bcba62aa16 | primary | aws | us-east-1 | us-east-1a | 107 | 10.70
public | test | 16391 | 9dd91e5dc58641bdb3483d8d7759215c | 58254 | 65536 | LEADER | 127.0.0.2 | 5433 | 127.0.0.2 | 10d1c6ffc1e04a69a4b22c8edd489c5d | primary | aws | us-east-2 | us-east-2a | 119 | 11.90
(9 rows)
You should now see leaders split between us-east-1 and us-east-2.
Summary and Key Takeaways
- βΒ YugabyteDB distributes rows in hash-sharded tables using a deterministic hash space (
0β65535), with each tablet owning a contiguous slice of that space. - βΒ Tablets are the fundamental unit of distribution, replication, and leadership in YugabyteDB.
- βΒ The replication factor (RF) determines where data is physically stored across zones or regions.
- βΒ Leader placement is a separate concern that determines where reads and writes are served from.
- βΒ Preferred leader regions allow you to optimize latency and traffic locality without moving data.
- βΒ Tablet leaders can change dynamically in response to placement policies, failures, or rebalancing events.
- βΒ Using
yb_tablet_metadataandyb_servers(), tablet placement and leadership are fully observable using SQL. - βΒ For hash-sharded tables, tablet hash boundaries combined with
yb_hash_code()make it possible to compute:- β per-tablet row counts
- β percentage of table data per tablet
- β leader and follower distribution by region and zone
- βΒ Leader placement changes can be verified directly, rather than inferred or guessed.
Bottom line: YugabyteDBβs tablet placement and leader behavior are not a black box. With a small amount of metadata and SQL, you can precisely understand how data and traffic are distributed, and confirm that your placement policies are behaving exactly as intended.
Have Fun!
