How to Emulate PostgreSQL pgstattuple in YugabyteDB

If you come from PostgreSQL, pgstattuple is one of those extensions that feels like it should be part of your regular toolbox.

It gives you a way to inspect tuple density, dead tuples, free space, and some index internals. So naturally, if you are working in YugabyteDB and trying to understand storage behavior, bloat, or row density, you might wonder:

  • Can I use pgstattuple here too?

Interestingly, the answer is yes… but also no.

Yes, you can install the extension in YugabyteDB…. But no, it is not very useful for what PostgreSQL users normally expect it to do.

That is because pgstattuple was designed around PostgreSQL heap pages and local storage internals, while YugabyteDB stores data very differently: YSQL sits on top of DocDB, and DocDB stores data in distributed tablets backed by RocksDB SST files and compaction.

So this tip is really about something more practical:

  • How do you emulate the useful goals of pgstattuple in YugabyteDB using tools that actually match YugabyteDB’s architecture?
TL;DR
You can install pgstattuple in YugabyteDB, but it is mostly pointless. The extension was built to inspect PostgreSQL heap pages and page-level storage structures, while YugabyteDB stores table and index data in distributed DocDB tablets backed by RocksDB SST files and compaction. In practice, the YugabyteDB-native replacements are pg_table_size(), pg_indexes_size(), pg_total_relation_size(), row-count estimation techniques, and SST/compaction metrics from the YugabyteDB UI and metrics pages.

🔍 Yes, pgstattuple installs in YugabyteDB

This is what makes the whole thing a little misleading.

You really can install it:

				
					yugabyte=# CREATE EXTENSION pgstattuple;
CREATE EXTENSION

yugabyte=# \dx
                                              List of installed extensions
        Name        |   Version   |   Schema   |                              Description
--------------------+-------------+------------+------------------------------------------------------------------------
 pg_stat_statements | 1.10-yb-2.0 | pg_catalog | track planning and execution statistics of all SQL statements executed
 pgstattuple        | 1.5         | public     | show tuple-level statistics
 plpgsql            | 1.0         | pg_catalog | PL/pgSQL procedural language
(3 rows)
				
			

And if you inspect the extension objects as described in the YugabyteDB Tip Lists Functions Created by an Extension, you can see the familiar routines are there:

				
					SELECT e.extname, ne.nspname AS extschema, p.proname, np.nspname AS proschema
FROM pg_catalog.pg_extension AS e
    INNER JOIN pg_catalog.pg_depend AS d ON (d.refobjid = e.oid)
    INNER JOIN pg_catalog.pg_proc AS p ON (p.oid = d.objid)
    INNER JOIN pg_catalog.pg_namespace AS ne ON (ne.oid = e.extnamespace)
    INNER JOIN pg_catalog.pg_namespace AS np ON (np.oid = p.pronamespace)
WHERE d.deptype = 'e'
  AND e.extname = 'pgstattuple'
ORDER BY 1, 3;
				
			

Output:

				
					.  extname   | extschema |      proname       | proschema
-------------+-----------+--------------------+-----------
 pgstattuple | public    | pg_relpages        | public
 pgstattuple | public    | pg_relpages        | public
 pgstattuple | public    | pgstatginindex     | public
 pgstattuple | public    | pgstathashindex    | public
 pgstattuple | public    | pgstatindex        | public
 pgstattuple | public    | pgstatindex        | public
 pgstattuple | public    | pgstattuple        | public
 pgstattuple | public    | pgstattuple        | public
 pgstattuple | public    | pgstattuple_approx | public
(9 rows)
				
			

So the problem is not that the extension is missing.

The problem is that the extension expects to inspect PostgreSQL-style physical storage internals, and that is not how YugabyteDB stores data.

Key Insight
pgstattuple makes little sense in YugabyteDB because it was designed to inspect fixed-size PostgreSQL heap pages and local buffer-managed files, while YugabyteDB stores data in distributed DocDB tablets backed by RocksDB SST files and background compaction.

🏗️ Postgres storage vs. YugabyteDB storage

In PostgreSQL, tools like pgstattuple make sense because they inspect structures that are central to how PostgreSQL works:

  • ● heap pages
  • ● free space in pages
  • ● dead tuples
  • ● page density
  • ● local index pages

In YugabyteDB, that mental model breaks down.

YSQL is PostgreSQL-compatible at the SQL layer, but storage lives underneath in DocDB, where data is distributed across tablets and persisted in LSM-tree structures using SST files. Instead of thinking in terms of heap pages and VACUUM-visible free space, it is usually more useful to think in terms of:

  • ● relation size
  • ● distributed row counts
  • ● SST file growth
  • ● number of SST files
  • ● background compaction behavior

That is why pgstattuple may install cleanly while still returning errors, zeros, or results that do not tell you anything useful.

🧪 What happens if you try to use it anyway?

Let’s make this concrete.

Demo setup

				
					CREATE EXTENSION IF NOT EXISTS pgstattuple;

DROP TABLE IF EXISTS yb_demo;

CREATE TABLE yb_demo (
    id         BIGINT PRIMARY KEY,
    category   TEXT,
    payload    JSONB
);

CREATE INDEX yb_demo_category_idx ON yb_demo(category);

CREATE INDEX yb_demo_payload_gin_idx
ON yb_demo
USING GIN (
  jsonb_to_tsvector('simple'::regconfig, payload, '["string"]'::jsonb)
);

INSERT INTO yb_demo (id, category, payload)
SELECT
    g AS id,
    (ARRAY['user','order','device','event'])[1 + floor(random() * 4)::int] AS category,
    jsonb_build_object(
        'name',      'user_' || g,
        'city',      (ARRAY['Boston','Chicago','Seattle','Austin','Miami'])[1 + floor(random() * 5)::int],
        'status',    (ARRAY['active','inactive','pending'])[1 + floor(random() * 3)::int],
        'tier',      (ARRAY['gold','silver','bronze'])[1 + floor(random() * 3)::int],
        'score',     (random() * 1000)::int,
        'tags',      jsonb_build_array(
                        (ARRAY['api','mobile','vip','trial','ops'])[1 + floor(random() * 5)::int],
                        (ARRAY['east','west','central','priority','legacy'])[1 + floor(random() * 5)::int]
                     ),
        'details',   jsonb_build_object(
                        'region', (ARRAY['us-east','us-west','eu-central'])[1 + floor(random() * 3)::int],
                        'flag',   (random() > 0.5)
                     )
    )
FROM generate_series(1, 50000) AS g;
				
			

Now let’s try the PostgreSQL extension functions.

Try pgstattuple()
				
					SELECT * FROM pgstattuple('yb_demo');
				
			

Result:

				
					ERROR:  bad buffer ID: 0
				
			
Try pgstattuple_approx()
				
					SELECT * FROM pgstattuple_approx('yb_demo');
				
			

Result:

				
					.table_len | scanned_percent | approx_tuple_count | approx_tuple_len | approx_tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | approx_free_space | approx_free_percent
-----------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+-------------------+--------------------
         0 |               0 |                  0 |                0 |                    0 |                0 |              0 |                  0 |                 0 |                  0
(1 row)
				
			
Try related functions
				
					SELECT * FROM pgstatindex('yb_demo_category_idx');
				
			

Result:

				
					ERROR:  relation "yb_demo_category_idx" is not a btree index
				
			
				
					SELECT * FROM pgstatginindex('yb_demo_payload_gin_idx');
				
			

Result:

				
					ERROR:  relation "yb_demo_payload_gin_idx" is not a GIN index
				
			
				
					SELECT * FROM pg_relpages('yb_demo');
				
			

Result:

				
					.pg_relpages
-------------
           0
(1 row)
				
			

That behavior tells the story pretty clearly.

📊 Postgres vs. YugabyteDB: what to use instead

Here is the practical mapping.

What You Want to Inspect PostgreSQL Mental Model YugabyteDB Mental Model Best Tool in YugabyteDB
Table size Heap file size Distributed tablet storage / SST-backed table data pg_table_size()
Index size B-tree index file size Distributed index tablets pg_indexes_size()
Total object footprint Heap + indexes All distributed relation storage pg_total_relation_size()
Live row count Tuple visibility in heap pages Distributed row count across tablets COUNT(*), pg_class.reltuples, or a row-estimate helper
Dead space / bloat Dead tuples, free space, VACUUM behavior SST growth, obsolete versions, compaction behavior SST file metrics and compaction metrics
Page count 8 KB heap pages Not a meaningful storage abstraction in YB Do not rely on pg_relpages()

That is really the heart of the tip:

  • Replace page-level inspection with relation-size functions, row-count tools, and storage metrics that actually match YugabyteDB’s architecture.

📦 1) If you want size, use the YSQL size functions

A lot of the time, people reach for pgstattuple because they really want an answer to a simpler question:

How big is this table or index?

In YugabyteDB, the direct answers are:

				
					SELECT pg_size_pretty(pg_table_size('yb_demo')) AS table_size;

SELECT pg_size_pretty(pg_indexes_size('yb_demo')) AS index_size;

SELECT pg_size_pretty(pg_total_relation_size('yb_demo')) AS total_size;
				
			

These are the YugabyteDB-native tools to use when your real concern is storage footprint.

This is also a great place to cross-link to earlier tips:

Tip
If your real goal is size, not tuple archaeology, start with pg_table_size(), pg_indexes_size(), and pg_total_relation_size(). These line up far better with how YugabyteDB exposes relation footprint than pgstattuple ever will.

🔢 2) If you want row counts, use exact counts or smart estimates

Another reason people use pgstattuple is that they are trying to answer a logical question:

  • How many rows are really in this thing?

For an exact answer:

				
					SELECT count(*) FROM yb_demo;
				
			

For an estimated answer:

				
					ANALYZE yb_demo;

SELECT reltuples
FROM pg_class
WHERE relname = 'yb_demo';
				
			

And for larger tables, check out the YugabyteDB Tip:

That tip is especially relevant here because in YugabyteDB, COUNT(*) is not just a quick local operation. It fans out across tablets and nodes. That makes row-count estimation a much more practical tool in distributed environments than people first realize.

Important
In YugabyteDB, COUNT(*) is a distributed operation. That is one of the reasons row-count estimation techniques matter much more here than they do in a single-node PostgreSQL mindset.

🪨 3) If you want “bloat” insight, think SST files and compaction

This is the biggest mindset shift.

In PostgreSQL, pgstattuple is often part of a “bloat investigation” workflow. You think in terms of:

  • ● dead tuples
  • ● free space
  • ● page density
  • ● VACUUM cleanup

In YugabyteDB, the better question is usually:

  • What is happening with SST files and compaction?

That is much closer to how storage health shows up in YugabyteDB.

Compaction metrics

YugabyteDB also exposes compaction-related metrics such as:

  • rocksdb_compact_read_bytes
  • rocksdb_compact_write_bytes
  • rocksdb_compaction_times_micros
  • rocksdb_numfiles_in_singlecompaction

These help you understand whether the system is actively merging SSTs, reducing levels, and cleaning up obsolete versions in the background.

That is the YugabyteDB-world analogue to what Postgres users loosely think of as “bloat maintenance.”

Think of it this way
In PostgreSQL, you inspect bloat through heap/page-oriented tools. In YugabyteDB, the closest equivalent is watching SST file growth, SST counts, and compaction behavior.

🖥️ Where should you look in practice?

If you are troubleshooting real storage usage in YugabyteDB, the most practical workflow is usually a combination of:

  • ● YSQL size functions for relation footprint
  • ● row-count estimates or exact counts for logical volume
  • ● Master UI / TServer UI / metrics pages for SST files and compaction behavior

That combination gives you a much more truthful picture than pgstattuple() will.

For example:

  • pg_table_size() helps answer, “How large is this relation?”
  • pg_indexes_size() helps answer, “How much space are the indexes using?”
  • pg_total_relation_size() helps answer, “What is the total footprint?”
  • reltuples or your row-estimate helper helps answer, “How many rows are in play?”
  • ● SST and compaction metrics help answer, “What is the storage engine actually doing underneath?”

🧭 A better mental model for YugabyteDB

This is probably the best way to summarize the shift:

  • ● In PostgreSQL, you often reason from pages inward.
  • ● In YugabyteDB, you usually reason from relations, tablets, SSTs, and compaction outward.

That is why pgstattuple feels like it should help, but usually does not.

The extension is present. The SQL layer is compatible. But the storage engine story underneath is fundamentally different.

🎯 Final Takeaway

pgstattuple installs in YugabyteDB, but that is mostly a compatibility artifact, not a sign that it is the right tool for storage analysis.

If you want meaningful answers in YugabyteDB:

  • ● use relation-size functions for footprint
  • ● use exact counts or smart estimates for row counts
  • ● use SST and compaction metrics for storage health

That is the YugabyteDB-native way to emulate the intent behind pgstattuple, even though the extension itself is mostly pointless here.

Final Takeaway
You can install pgstattuple in YugabyteDB, but for real troubleshooting it is mostly a reminder that PostgreSQL-compatible SQL does not always imply PostgreSQL-compatible storage internals. For meaningful answers, use YugabyteDB-native size functions, row-count techniques, and SST/compaction metrics instead.

Have Fun!