Exact Per-Tablet Row Counts in YugabyteDB – Hash and Range Sharding in One SQL Function

Sometimes a simple question gets surprisingly interesting in a distributed database:

  • How many rows are in each shard?

In YugabyteDB, the physical unit behind that question is the tablet. A table may be split across multiple tablets using either hash sharding or range sharding, and the logic for counting rows per tablet is different for each.

In this tip, we’ll combine those ideas into a single SQL-only function:

				
					public.yb_tablet_key_counts('schema.table')
				
			

The function automatically detects whether the table is hash-sharded or range-sharded, then returns exact row counts and distinct partition-key counts per tablet.

Key Insight

Hash-sharded tablets can be counted using yb_hash_code() and the tablet hash-code boundaries from yb_tablet_metadata. Range-sharded tablets can be counted using the declared SPLIT AT VALUES boundaries. Both approaches can be combined into one SQL function that generates a UNION ALL query for more parallel-friendly exact counts.

Why This Is Useful

YugabyteDB exposes useful tablet metadata through views and functions such as:

				
					yb_tablet_metadata
yb_local_tablets()
				
			

But knowing the tablet boundaries is only part of the story.

Often, you also want to know:

  • â—Ź How many rows live in each tablet?
  • â—Ź Are rows evenly distributed?
  • â—Ź Are some tablets larger than others?
  • â—Ź How many distinct partition-key values are in each tablet?
  • â—Ź Is the table hash-sharded or range-sharded?
  • â—Ź Can we answer this in SQL only?

That last point is important. You can always build an external script to fan out work across tablets, but sometimes a self-contained SQL utility is easier to test, share, and explain.

What the Function Returns

The function returns one row per tablet.

Column Description
tablet_ordinal Logical tablet order. Use this column for reliable ordering instead of sorting by the display text in split_range.
tablet_id The YugabyteDB tablet ID.
partitioning_scheme Either hash or range.
start_hash_code / end_hash_code Hash-code boundaries for hash-sharded tablets.
start_range_value / end_range_value Range boundary values for range-sharded tablets.
split_range A readable representation of the tablet’s range, such as -inf .. (250000) or (750000) .. +inf.
leader The current tablet leader, when available.
replicas The tablet replica locations, when available.
row_count Exact row count for that tablet range.
distinct_key_count Exact count of distinct partition-key values for that tablet range.

Install the Function

				
					CREATE OR REPLACE FUNCTION public.yb_tablet_key_counts(
    p_table regclass
)
RETURNS TABLE (
    tablet_ordinal int,
    tablet_id text,
    partitioning_scheme text,
    start_hash_code int,
    end_hash_code int,
    start_range_value jsonb,
    end_range_value jsonb,
    split_range text,
    leader text,
    replicas text[],
    row_count bigint,
    distinct_key_count bigint
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_schema_name text;
    v_table_name  text;
    v_indexdef    text;
    v_is_hash     boolean := false;

    v_hash_cols_text text;
    v_hash_expr      text;
    v_distinct_expr  text;

    v_pk_cols text[];
    v_key_tuple text;
    v_range_key_tuple text;
    v_range_key_col_count int;

    v_union_sql text := '';
    v_branch_sql text;

    r record;

    v_tablet_ordinal int := 0;

    v_split_clause text;
    v_values_text text;
    v_inner text;
    v_split_values text[] := ARRAY[]::text[];
    v_normalized_split_values text[] := ARRAY[]::text[];

    v_current text := '';
    v_ch text;
    v_depth int := 0;
    v_in_quote boolean := false;
    v_pos int := 1;
    v_len int := 0;
    v_next_ch text;

    v_start_value text;
    v_end_value text;
    v_tablet_count int;
    v_range_idx int := 0;

    v_predicate text;

    /*
      Helpers for normalizing split tuples like:

        ('uuid-value', MINVALUE)

      into:

        ('uuid-value')
    */
    v_split_value text;
    v_tuple_inner text;
    v_elem_values text[];
    v_elem_current text;
    v_elem text;
    v_norm_elems text[];
    v_norm_tuple text;
    v_elem_count int;
    i int;
BEGIN
    /*
      Resolve schema/table name.
    */
    SELECT n.nspname, c.relname
    INTO v_schema_name, v_table_name
    FROM pg_class c
    JOIN pg_namespace n ON n.oid = c.relnamespace
    WHERE c.oid = p_table;

    IF v_schema_name IS NULL THEN
        RAISE EXCEPTION 'Table % not found', p_table;
    END IF;

    /*
      Get primary key definition.
    */
    SELECT pg_get_indexdef(idx.indexrelid)
    INTO v_indexdef
    FROM pg_index idx
    WHERE idx.indrelid = p_table
      AND idx.indisprimary;

    IF v_indexdef IS NULL THEN
        RAISE EXCEPTION 'Table % has no primary key', p_table;
    END IF;

    v_is_hash := v_indexdef ~* '\sHASH';

    /*
      Get primary key columns in PK order.
    */
    SELECT array_agg(a.attname ORDER BY k.ord)
    INTO v_pk_cols
    FROM pg_index idx
    CROSS JOIN LATERAL generate_subscripts(idx.indkey, 1) AS k(ord)
    JOIN pg_attribute a
      ON a.attrelid = idx.indrelid
     AND a.attnum = idx.indkey[k.ord]
    WHERE idx.indrelid = p_table
      AND idx.indisprimary
    GROUP BY idx.indexrelid;

    IF v_pk_cols IS NULL THEN
        RAISE EXCEPTION 'Could not determine primary key columns for %', p_table;
    END IF;

    SELECT '(' || string_agg(format('%I', x.col), ', ') || ')'
    INTO v_key_tuple
    FROM unnest(v_pk_cols) AS x(col);

    /*
      ----------------------------------------------------------------------
      HASH-SHARDED TABLE
      ----------------------------------------------------------------------
    */
    IF v_is_hash THEN

        /*
          Extract the HASH key expression.

          Handles:
            lsm (c1 HASH)
            lsm ((c1, c2) HASH, c3 ASC)
        */
        v_hash_cols_text := substring(v_indexdef FROM '\(\((.*?)\)\s+HASH');

        IF v_hash_cols_text IS NULL THEN
            v_hash_cols_text := substring(v_indexdef FROM '\(([[:alnum:]_".]+)\s+HASH');
        END IF;

        IF v_hash_cols_text IS NULL THEN
            RAISE EXCEPTION 'Could not extract HASH columns from primary key definition: %', v_indexdef;
        END IF;

        v_hash_expr := format('yb_hash_code(%s)', v_hash_cols_text);
        v_distinct_expr := format('(%s)', v_hash_cols_text);

        v_tablet_ordinal := 0;

        FOR r IN
            SELECT
                m.tablet_id,
                m.start_hash_code,
                m.end_hash_code,
                m.leader,
                m.replicas
            FROM yb_tablet_metadata m
            WHERE m.oid = p_table::oid
            ORDER BY m.start_hash_code
        LOOP
            v_tablet_ordinal := v_tablet_ordinal + 1;

            v_branch_sql := format(
                $fmt$
                SELECT
                    %s::int AS tablet_ordinal,
                    %L::text AS tablet_id,
                    'hash'::text AS partitioning_scheme,
                    %s::int AS start_hash_code,
                    %s::int AS end_hash_code,
                    NULL::jsonb AS start_range_value,
                    NULL::jsonb AS end_range_value,
                    NULL::text AS split_range,
                    %L::text AS leader,
                    %L::text[] AS replicas,
                    COUNT(*)::bigint AS row_count,
                    COUNT(DISTINCT %s)::bigint AS distinct_key_count
                FROM %s
                WHERE %s >= %s
                  AND %s <  %s
                $fmt$,
                v_tablet_ordinal,
                r.tablet_id,
                r.start_hash_code,
                r.end_hash_code,
                r.leader,
                r.replicas::text,
                v_distinct_expr,
                p_table,
                v_hash_expr,
                r.start_hash_code,
                v_hash_expr,
                r.end_hash_code
            );

            v_union_sql :=
                v_union_sql ||
                CASE WHEN v_union_sql = '' THEN '' ELSE E'\nUNION ALL\n' END ||
                v_branch_sql;
        END LOOP;

    /*
      ----------------------------------------------------------------------
      RANGE-SHARDED TABLE
      ----------------------------------------------------------------------

      Uses YugabyteDB's reconstructed SPLIT AT VALUES clause instead of
      decoding yb_local_tablets().partition_key_start/end bytea values.
    */
    ELSE
        v_split_clause := NULL;

        BEGIN
            SELECT yb_get_range_split_clause(p_table)
            INTO v_split_clause;
        EXCEPTION
            WHEN OTHERS THEN
                v_split_clause := NULL;
        END;

        IF v_split_clause IS NULL OR btrim(v_split_clause) = '' THEN
            RAISE EXCEPTION
                'Could not get SPLIT AT VALUES metadata for range-sharded table %. This function currently requires declared range splits.',
                p_table;
        END IF;

        v_split_clause := btrim(v_split_clause);
        v_split_clause := regexp_replace(v_split_clause, ';\s*$', '');

        /*
          Strip the leading "SPLIT AT VALUES".

          Example:

            SPLIT AT VALUES (('4000...'), ('8000...'))

          Becomes:

            (('4000...'), ('8000...'))
        */
        v_values_text :=
            btrim(
                regexp_replace(
                    v_split_clause,
                    '^\s*SPLIT\s+AT\s+VALUES\s*',
                    '',
                    'i'
                )
            );

        /*
          Remove the outer wrapper parentheses.

          Example:

            (('4000...'), ('8000...'))

          Becomes:

            ('4000...'), ('8000...')
        */
        IF left(v_values_text, 1) = '('
           AND right(v_values_text, 1) = ')' THEN
            v_inner := substring(v_values_text FROM 2 FOR length(v_values_text) - 2);
        ELSE
            RAISE EXCEPTION 'Could not parse SPLIT AT VALUES clause: %', v_split_clause;
        END IF;

        /*
          Parse top-level split tuples:

            (250000), (500000), (750000)
            ('uuid', MINVALUE), ('uuid2', MINVALUE)
        */
        v_split_values := ARRAY[]::text[];
        v_current := '';
        v_depth := 0;
        v_in_quote := false;
        v_pos := 1;
        v_len := length(v_inner);

        WHILE v_pos <= v_len LOOP
            v_ch := substr(v_inner, v_pos, 1);
            v_next_ch :=
                CASE
                    WHEN v_pos < v_len THEN substr(v_inner, v_pos + 1, 1)
                    ELSE NULL
                END;

            IF v_ch = '''' THEN
                v_current := v_current || v_ch;

                IF v_in_quote AND v_next_ch = '''' THEN
                    /*
                      Escaped quote inside SQL literal.
                    */
                    v_current := v_current || v_next_ch;
                    v_pos := v_pos + 2;
                    CONTINUE;
                ELSE
                    v_in_quote := NOT v_in_quote;
                    v_pos := v_pos + 1;
                    CONTINUE;
                END IF;
            END IF;

            IF NOT v_in_quote THEN
                IF v_ch = '(' THEN
                    v_depth := v_depth + 1;
                    v_current := v_current || v_ch;
                    v_pos := v_pos + 1;
                    CONTINUE;
                ELSIF v_ch = ')' THEN
                    v_depth := v_depth - 1;
                    v_current := v_current || v_ch;
                    v_pos := v_pos + 1;
                    CONTINUE;
                ELSIF v_ch = ',' AND v_depth = 0 THEN
                    IF btrim(v_current) <> '' THEN
                        v_split_values := array_append(v_split_values, btrim(v_current));
                    END IF;

                    v_current := '';
                    v_pos := v_pos + 1;
                    CONTINUE;
                END IF;
            END IF;

            v_current := v_current || v_ch;
            v_pos := v_pos + 1;
        END LOOP;

        IF btrim(v_current) <> '' THEN
            v_split_values := array_append(v_split_values, btrim(v_current));
        END IF;

        IF array_length(v_split_values, 1) IS NULL THEN
            RAISE EXCEPTION 'No split values parsed from clause: %', v_split_clause;
        END IF;

        /*
          Normalize each split tuple by stripping trailing MINVALUE/MAXVALUE.

          Example:

            ('4000...', MINVALUE)

          becomes:

            ('4000...')

          This allows predicates to use only the leading range key columns.
        */
        v_normalized_split_values := ARRAY[]::text[];
        v_range_key_col_count := NULL;

        FOREACH v_split_value IN ARRAY v_split_values LOOP
            v_split_value := btrim(v_split_value);

            IF left(v_split_value, 1) = '('
               AND right(v_split_value, 1) = ')' THEN
                v_tuple_inner := substring(v_split_value FROM 2 FOR length(v_split_value) - 2);
            ELSE
                RAISE EXCEPTION 'Could not parse split tuple: %', v_split_value;
            END IF;

            /*
              Parse elements inside one split tuple.

              Example:

                '4000...', MINVALUE

              becomes array:

                ['4000...', 'MINVALUE']
            */
            v_elem_values := ARRAY[]::text[];
            v_elem_current := '';
            v_depth := 0;
            v_in_quote := false;
            v_pos := 1;
            v_len := length(v_tuple_inner);

            WHILE v_pos <= v_len LOOP
                v_ch := substr(v_tuple_inner, v_pos, 1);
                v_next_ch :=
                    CASE
                        WHEN v_pos < v_len THEN substr(v_tuple_inner, v_pos + 1, 1)
                        ELSE NULL
                    END;

                IF v_ch = '''' THEN
                    v_elem_current := v_elem_current || v_ch;

                    IF v_in_quote AND v_next_ch = '''' THEN
                        v_elem_current := v_elem_current || v_next_ch;
                        v_pos := v_pos + 2;
                        CONTINUE;
                    ELSE
                        v_in_quote := NOT v_in_quote;
                        v_pos := v_pos + 1;
                        CONTINUE;
                    END IF;
                END IF;

                IF NOT v_in_quote THEN
                    IF v_ch = '(' THEN
                        v_depth := v_depth + 1;
                        v_elem_current := v_elem_current || v_ch;
                        v_pos := v_pos + 1;
                        CONTINUE;
                    ELSIF v_ch = ')' THEN
                        v_depth := v_depth - 1;
                        v_elem_current := v_elem_current || v_ch;
                        v_pos := v_pos + 1;
                        CONTINUE;
                    ELSIF v_ch = ',' AND v_depth = 0 THEN
                        v_elem_values := array_append(v_elem_values, btrim(v_elem_current));
                        v_elem_current := '';
                        v_pos := v_pos + 1;
                        CONTINUE;
                    END IF;
                END IF;

                v_elem_current := v_elem_current || v_ch;
                v_pos := v_pos + 1;
            END LOOP;

            IF btrim(v_elem_current) <> '' THEN
                v_elem_values := array_append(v_elem_values, btrim(v_elem_current));
            END IF;

            /*
              Keep values until MINVALUE/MAXVALUE.

              Example:

                ('4000...', MINVALUE)

              becomes:

                ('4000...')
            */
            v_norm_elems := ARRAY[]::text[];

            FOREACH v_elem IN ARRAY v_elem_values LOOP
                IF upper(btrim(v_elem)) IN ('MINVALUE', 'MAXVALUE') THEN
                    EXIT;
                END IF;

                v_norm_elems := array_append(v_norm_elems, btrim(v_elem));
            END LOOP;

            v_elem_count := COALESCE(array_length(v_norm_elems, 1), 0);

            IF v_elem_count = 0 THEN
                RAISE EXCEPTION 'Split tuple % did not contain any usable values before MINVALUE/MAXVALUE', v_split_value;
            END IF;

            IF v_range_key_col_count IS NULL THEN
                v_range_key_col_count := v_elem_count;
            ELSIF v_range_key_col_count <> v_elem_count THEN
                RAISE EXCEPTION
                    'Inconsistent split tuple widths after MINVALUE/MAXVALUE normalization. Expected %, found % in %',
                    v_range_key_col_count,
                    v_elem_count,
                    v_split_value;
            END IF;

            IF v_range_key_col_count > array_length(v_pk_cols, 1) THEN
                RAISE EXCEPTION
                    'Split tuple has more columns (%) than primary key columns (%)',
                    v_range_key_col_count,
                    array_length(v_pk_cols, 1);
            END IF;

            v_norm_tuple := '(' || array_to_string(v_norm_elems, ', ') || ')';
            v_normalized_split_values := array_append(v_normalized_split_values, v_norm_tuple);
        END LOOP;

        /*
          Build the tuple of leading PK columns used by the split values.

          For:

            PRIMARY KEY (customer_id ASC, order_id ASC)
            SPLIT AT VALUES (('uuid', MINVALUE))

          this becomes:

            (customer_id)
        */
        SELECT '(' || string_agg(format('%I', v_pk_cols[g.col_num]), ', ' ORDER BY g.col_num) || ')'
        INTO v_range_key_tuple
        FROM generate_series(1, v_range_key_col_count) AS g(col_num);

        SELECT COUNT(*)
        INTO v_tablet_count
        FROM yb_local_tablets()
        WHERE table_name = v_table_name;

        IF v_tablet_count <> array_length(v_normalized_split_values, 1) + 1 THEN
            RAISE NOTICE
                'Range tablet count (%) does not equal parsed split count + 1 (%). Counts may not reflect the current physical tablet layout.',
                v_tablet_count,
                array_length(v_normalized_split_values, 1) + 1;
        END IF;

        v_range_idx := 0;

        /*
          The ordinal is assigned in physical range order based on
          partition_key_start. The caller should ORDER BY tablet_ordinal.
        */
        FOR r IN
            SELECT
                l.tablet_id,
                l.partition_key_start,
                m.leader,
                m.replicas
            FROM yb_local_tablets() l
            LEFT JOIN yb_tablet_metadata m
              ON m.tablet_id = l.tablet_id
             AND m.oid = p_table::oid
            WHERE l.table_name = v_table_name
            ORDER BY l.partition_key_start NULLS FIRST
        LOOP
            v_range_idx := v_range_idx + 1;

            v_start_value :=
                CASE
                    WHEN v_range_idx = 1 THEN NULL
                    ELSE v_normalized_split_values[v_range_idx - 1]
                END;

            v_end_value :=
                CASE
                    WHEN v_range_idx <= array_length(v_normalized_split_values, 1)
                    THEN v_normalized_split_values[v_range_idx]
                    ELSE NULL
                END;

            v_predicate := 'TRUE';

            IF v_start_value IS NOT NULL THEN
                v_predicate :=
                    v_predicate || format(' AND %s >= %s', v_range_key_tuple, v_start_value);
            END IF;

            IF v_end_value IS NOT NULL THEN
                v_predicate :=
                    v_predicate || format(' AND %s < %s', v_range_key_tuple, v_end_value);
            END IF;

            v_branch_sql := format(
                $fmt$
                SELECT
                    %s::int AS tablet_ordinal,
                    %L::text AS tablet_id,
                    'range'::text AS partitioning_scheme,
                    NULL::int AS start_hash_code,
                    NULL::int AS end_hash_code,
                    %s::jsonb AS start_range_value,
                    %s::jsonb AS end_range_value,
                    %L::text AS split_range,
                    %L::text AS leader,
                    %L::text[] AS replicas,
                    COUNT(*)::bigint AS row_count,
                    COUNT(DISTINCT %s)::bigint AS distinct_key_count
                FROM %s
                WHERE %s
                $fmt$,
                v_range_idx,
                r.tablet_id,
                CASE
                    WHEN v_start_value IS NULL THEN 'NULL'
                    ELSE quote_literal(jsonb_build_object('split_at_value', v_start_value)::text)
                END,
                CASE
                    WHEN v_end_value IS NULL THEN 'NULL'
                    ELSE quote_literal(jsonb_build_object('split_at_value', v_end_value)::text)
                END,
                format(
                    '%s .. %s',
                    COALESCE(v_start_value, '-inf'),
                    COALESCE(v_end_value, '+inf')
                ),
                r.leader,
                r.replicas::text,
                v_range_key_tuple,
                p_table,
                v_predicate
            );

            v_union_sql :=
                v_union_sql ||
                CASE WHEN v_union_sql = '' THEN '' ELSE E'\nUNION ALL\n' END ||
                v_branch_sql;
        END LOOP;
    END IF;

    IF v_union_sql = '' THEN
        RAISE EXCEPTION 'No tablet branches generated for table %', p_table;
    END IF;

    RETURN QUERY EXECUTE v_union_sql;
END;
$$;
				
			

At a high level, it does the following:

For hash-sharded tables:
				
					yb_hash_code(...) >= start_hash_code
AND
yb_hash_code(...) < end_hash_code
				
			
For range-sharded tables:
				
					(primary_key_tuple) >= split_value_1
AND
(primary_key_tuple) < split_value_2
				
			
And it generates a single query shaped like:
				
					SELECT ... COUNT(*) ... WHERE tablet_range_1
UNION ALL
SELECT ... COUNT(*) ... WHERE tablet_range_2
UNION ALL
SELECT ... COUNT(*) ... WHERE tablet_range_3;
				
			

That UNION ALL shape is important because it gives the SQL engine a better opportunity to run the tablet-level counts in parallel.

The function includes a tablet_ordinal column for both hash-sharded and range-sharded tables. That means you can always use ORDER BY tablet_ordinal to display tablets in logical order, instead of relying on ORDER BY split_range or ORDER BY start_hash_code.

Optional Session Tuning

Before running larger tests, I like to use:

				
					SET max_parallel_workers_per_gather = 16;
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;
SET min_parallel_table_scan_size = 0;
SET min_parallel_index_scan_size = 0;
SET enable_parallel_append = on;
				
			

These settings are useful for demos where you want the UNION ALL branches to be more parallel-friendly.

Performance Note

This function returns exact counts, not estimates. It still scans the relevant table ranges. The UNION ALL shape helps make the work more parallel-friendly inside SQL, but for absolute maximum wall-clock speed on very large tables, external client-side fan-out across multiple sessions can still be faster.

Demo 1: Hash-Sharded Table

Create a simple hash-sharded table:

				
					DROP TABLE IF EXISTS public.big;

CREATE TABLE public.big (
    c1 bigint NOT NULL,
    PRIMARY KEY (c1 HASH)
) SPLIT INTO 3 TABLETS;
				
			

Check the tablet metadata:

				
					SELECT *
FROM yb_tablet_metadata
WHERE oid = 'public.big'::regclass::oid
ORDER BY start_hash_code;
				
			

Example output:

				
					.            tablet_id            |  oid  | db_name  | relname | start_hash_code | end_hash_code |     leader     |                    replicas
----------------------------------+-------+----------+---------+-----------------+---------------+----------------+------------------------------------------------
 02ed09613bda4599aae4ec8b859da984 | 16401 | yugabyte | big     |               0 |         21845 | 127.0.0.3:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433}
 a2d45ff1f16d486b9ad67e5b12a6d68c | 16401 | yugabyte | big     |           21845 |         43690 | 127.0.0.2:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433}
 8fda81faf357464987e7edfcdd3b2234 | 16401 | yugabyte | big     |           43690 |         65536 | 127.0.0.1:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433}
				
			

Load some rows:

				
					INSERT INTO public.big
SELECT generate_series(1, 1034302);
				
			

Run the function:

				
					SELECT *
FROM public.yb_tablet_key_counts('public.big')
ORDER BY tablet_ordinal;
				
			

Example output:

				
					.            tablet_id            | partitioning_scheme | start_hash_code | end_hash_code | start_range_value | end_range_value | split_range |     leader     |                    replicas.                   | row_count | distinct_key_count
----------------------------------+---------------------+-----------------+---------------+-------------------+-----------------+-------------+----------------+------------------------------------------------+-----------+--------------------
 02ed09613bda4599aae4ec8b859da984 | hash                |               0 |         21845 |                   |                 |             | 127.0.0.3:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433} |    345443 |             345443
 a2d45ff1f16d486b9ad67e5b12a6d68c | hash                |           21845 |         43690 |                   |                 |             | 127.0.0.2:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433} |    344104 |             344104
 8fda81faf357464987e7edfcdd3b2234 | hash                |           43690 |         65536 |                   |                 |             | 127.0.0.1:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433} |    344755 |             344755

				
			

Validate the total:

				
					SELECT
    SUM(row_count) AS total_rows,
    SUM(distinct_key_count) AS total_distinct_partition_keys
FROM public.yb_tablet_key_counts('public.big');
				
			

Result:

				
					.total_rows | total_distinct_partition_keys
------------+-------------------------------
    1034302 |                       1034302
				
			
Important Detail

For hash-sharded tables, tablet boundaries should be treated as half-open intervals: yb_hash_code(...) >= start_hash_code and yb_hash_code(...) < end_hash_code. Using > and <= can miss hash code 0 and produce incorrect totals.

Demo 2: Range-Sharded Table with Integers

Now create a range-sharded table:

				
					DROP TABLE IF EXISTS public.big_range;

CREATE TABLE public.big_range (
    c1 bigint NOT NULL,
    payload text,
    PRIMARY KEY (c1 ASC)
)
SPLIT AT VALUES ((250000), (500000), (750000));
				
			

Load one million and one rows:

				
					INSERT INTO public.big_range
SELECT g, md5(g::text)
FROM generate_series(1, 1000001) AS g;
				
			

Run the same function:

				
					SELECT *
FROM public.yb_tablet_key_counts('public.big_range')
ORDER BY tablet_ordinal;
				
			

Example output:

				
					.tablet_ordinal |            tablet_id             | partitioning_scheme | start_hash_code | end_hash_code |       start_range_value        |        end_range_value         |     split_range      |     leader     |                    replicas                    | row_count | distinct_key_count
----------------+----------------------------------+---------------------+-----------------+---------------+--------------------------------+--------------------------------+----------------------+----------------+------------------------------------------------+-----------+--------------------
              1 | c5b4803a91e8418fbbf2313c6ca0deb2 | range               |                 |               |                                | {"split_at_value": "(250000)"} | -inf .. (250000)     | 127.0.0.3:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433} |    249999 |             249999
              2 | 759b42b169534de393cac0d8451e6089 | range               |                 |               | {"split_at_value": "(250000)"} | {"split_at_value": "(500000)"} | (250000) .. (500000) | 127.0.0.3:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433} |    250000 |             250000
              3 | a8e3ee598a6540b285c722b93b8e267c | range               |                 |               | {"split_at_value": "(500000)"} | {"split_at_value": "(750000)"} | (500000) .. (750000) | 127.0.0.2:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433} |    250000 |             250000
              4 | 937c8e3281e54d29ae48570bd9136f19 | range               |                 |               | {"split_at_value": "(750000)"} |                                | (750000) .. +inf     | 127.0.0.1:5433 | {127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433} |    250002 |             250002
				
			

Validate the total:

				
					SELECT
    SUM(row_count) AS total_rows,
    SUM(distinct_key_count) AS total_distinct_partition_keys
FROM public.yb_tablet_key_counts('public.big_range');
				
			

Example output:

				
					.total_rows | total_distinct_partition_keys
------------+-------------------------------
    1000001 |                       1000001
				
			

The range counts are exactly what we expect:

Range Predicate Rows
c1 < 250000 249,999
c1 >= 250000 AND c1 < 500000 250,000
c1 >= 500000 AND c1 < 750000 250,000
c1 >= 750000 250,002

The final range has two extra rows compared with the middle ranges because it includes both 750000 and the final inserted value, 1000001.

Demo 3: UUID Range Sharding with Multiple Rows per Key

Create the range sharded table:

				
					DROP TABLE IF EXISTS public.uuid_range;

CREATE TABLE public.uuid_range (
    customer_id uuid NOT NULL,
    order_id int NOT NULL,
    payload text,
    PRIMARY KEY (customer_id ASC, order_id ASC)
)
SPLIT AT VALUES (
    ('40000000-0000-0000-0000-000000000000'),
    ('80000000-0000-0000-0000-000000000000'),
    ('c0000000-0000-0000-0000-000000000000')
);
				
			

Then insert uneven data:

				
					INSERT INTO public.uuid_range VALUES
-- Range 1: -inf .. 4000...
-- 2 distinct customers, 3 total rows
('00000000-0000-0000-0000-000000000001', 1, 'range 1'),
('00000000-0000-0000-0000-000000000001', 2, 'range 1'),
('10000000-0000-0000-0000-000000000001', 1, 'range 1'),

-- Range 2: 4000... .. 8000...
-- 3 distinct customers, 7 total rows
('40000000-0000-0000-0000-000000000000', 1, 'range 2'),
('40000000-0000-0000-0000-000000000000', 2, 'range 2'),
('40000000-0000-0000-0000-000000000000', 3, 'range 2'),
('50000000-0000-0000-0000-000000000001', 1, 'range 2'),
('50000000-0000-0000-0000-000000000001', 2, 'range 2'),
('7fffffff-ffff-ffff-ffff-ffffffffffff', 1, 'range 2'),
('7fffffff-ffff-ffff-ffff-ffffffffffff', 2, 'range 2'),

-- Range 3: 8000... .. c000...
-- 1 distinct customer, 4 total rows
('80000000-0000-0000-0000-000000000000', 1, 'range 3'),
('80000000-0000-0000-0000-000000000000', 2, 'range 3'),
('80000000-0000-0000-0000-000000000000', 3, 'range 3'),
('80000000-0000-0000-0000-000000000000', 4, 'range 3'),

-- Range 4: c000... .. +inf
-- 4 distinct customers, 9 total rows
('c0000000-0000-0000-0000-000000000000', 1, 'range 4'),
('d0000000-0000-0000-0000-000000000001', 1, 'range 4'),
('d0000000-0000-0000-0000-000000000001', 2, 'range 4'),
('e0000000-0000-0000-0000-000000000001', 1, 'range 4'),
('e0000000-0000-0000-0000-000000000001', 2, 'range 4'),
('e0000000-0000-0000-0000-000000000001', 3, 'range 4'),
('ffffffff-ffff-ffff-ffff-ffffffffffff', 1, 'range 4'),
('ffffffff-ffff-ffff-ffff-ffffffffffff', 2, 'range 4'),
('ffffffff-ffff-ffff-ffff-ffffffffffff', 3, 'range 4');
				
			

Query the function:

				
					SELECT split_range, row_count, distinct_key_count
FROM public.yb_tablet_key_counts('public.uuid_range')
ORDER BY tablet_ordinal;
				
			

Example output:

				
					.                                    split_range                                      | row_count | distinct_key_count
--------------------------------------------------------------------------------------+-----------+--------------------
 -inf .. ('40000000-0000-0000-0000-000000000000')                                     |         3 |                  2
 ('40000000-0000-0000-0000-000000000000') .. ('80000000-0000-0000-0000-000000000000') |         7 |                  3
 ('80000000-0000-0000-0000-000000000000') .. ('c0000000-0000-0000-0000-000000000000') |         4 |                  1
 ('c0000000-0000-0000-0000-000000000000') .. +inf                                     |         9 |                  4
				
			

In this example, row_count is the total number of rows in the tablet range, while distinct_key_count is the number of distinct range-key values. Because the primary key is (customer_id ASC, order_id ASC), each customer_id can have multiple order_id values.

That is why the row count and distinct key count can differ.

Bonus Demo: Composite Hash Key with a Clustered Column

For hash-sharded tables, the hash portion of the primary key determines tablet placement.

For example:

				
					DROP TABLE IF EXISTS public.hash_composite;

CREATE TABLE public.hash_composite (
    tenant_id int NOT NULL,
    customer_id int NOT NULL,
    order_id int NOT NULL,
    payload text,
    PRIMARY KEY ((tenant_id, customer_id) HASH, order_id ASC)
) SPLIT INTO 3 TABLETS;
				
			

In this table, rows are distributed by:

				
					yb_hash_code(tenant_id, customer_id)
				
			

not by:

				
					yb_hash_code(tenant_id, customer_id, order_id)
				
			

The order_id column is part of the primary key, but it is the clustered/range portion inside each hash key.

Insert uneven rows per the (tenant_id, customer_id):

				
					INSERT INTO public.hash_composite VALUES
-- hash key: (1, 101), 3 rows
(1, 101, 1, 'a'),
(1, 101, 2, 'b'),
(1, 101, 3, 'c'),

-- hash key: (1, 102), 1 row
(1, 102, 1, 'a'),

-- hash key: (2, 201), 4 rows
(2, 201, 1, 'a'),
(2, 201, 2, 'b'),
(2, 201, 3, 'c'),
(2, 201, 4, 'd'),

-- hash key: (2, 202), 2 rows
(2, 202, 1, 'a'),
(2, 202, 2, 'b'),

-- hash key: (3, 301), 5 rows
(3, 301, 1, 'a'),
(3, 301, 2, 'b'),
(3, 301, 3, 'c'),
(3, 301, 4, 'd'),
(3, 301, 5, 'e');
				
			

Now query the function:

				
					SELECT tablet_ordinal, start_hash_code, end_hash_code, row_count, distinct_key_count
FROM public.yb_tablet_key_counts('public.hash_composite')
ORDER BY tablet_ordinal;
				
			

Sample output:

				
					.tablet_ordinal | start_hash_code | end_hash_code | row_count | distinct_key_count
----------------+-----------------+---------------+-----------+--------------------
              1 |               0 |         21845 |         4 |                  2
              2 |           21845 |         43690 |         0 |                  0
              3 |           43690 |         65536 |        11 |                  3
				
			

For PRIMARY KEY ((tenant_id, customer_id) HASH, order_id ASC), tablet placement is based on the hash of (tenant_id, customer_id). The order_id column is part of the primary key, but it is the clustered/range portion within each hash key. That is why row_count can be larger than distinct_key_count.

In this example:

				
					Tablet 1 has 4 rows across 2 distinct hash keys.
Tablet 2 has no rows.
Tablet 3 has 11 rows across 3 distinct hash keys.
				
			

Total check:

				
					SELECT
    SUM(row_count) AS total_rows,
    SUM(distinct_key_count) AS total_distinct_hash_keys
FROM public.yb_tablet_key_counts('public.hash_composite');
				
			

Examle output:

				
					.total_rows | total_distinct_hash_keys
------------+--------------------------
         15 |                        5
				
			

This is a nice complement to the UUID range demo because both show the same idea:

				
					row_count = total rows in the tablet
distinct_key_count = distinct partition/range keys in the tablet
				
			

Conclusion

Counting rows per tablet in YugabyteDB sounds simple, but the right approach depends on how the table is sharded.

For hash-sharded tables, the function uses yb_tablet_metadata and applies the tablet’s hash-code boundaries with the correct half-open interval:

				
					yb_hash_code(...) >= start_hash_code
AND
yb_hash_code(...) < end_hash_code
				
			

For range-sharded tables, the function avoids low-level DocDB key decoding and instead uses YugabyteDB’s declared SPLIT AT VALUES metadata. That makes the range path much cleaner and works naturally with SQL literals such as integers, UUIDs, dates, timestamps, and text values.

The end result is one SQL-only utility that works across both sharding styles and returns a tablet-level view of row distribution, distinct partition-key counts, leaders, replicas, and logical tablet ordering.

The tablet_ordinal column is especially useful because it gives you a consistent way to order results for both hash and range sharded tables:

				
					SELECT *
FROM public.yb_tablet_key_counts('public.some_table')
ORDER BY tablet_ordinal;
				
			
Final Takeaway

With one SQL function, you can inspect tablet-level row distribution for both hash-sharded and range-sharded YugabyteDB tables. This makes it easier to validate sharding behavior, spot skew, compare row counts with distinct partition-key counts, and better understand how your logical table maps to physical tablets.

Have Fun!

Spring cleaning has officially reached the “why have I moved this box from house to house for decades?” phase. Today’s archeological discovery: the actual gold piece of snail mail that told me I got accepted to the University of Pittsburgh... complete with a weird typewriter font, dramatic “GOOD NEWS” heading, and the confidence that no further action was required. Apparently, past me thought this was important enough to preserve through multiple moves. Past me was right. Hail to Pitt! 💙💛