If you come from PostgreSQL, pgstattuple is one of those extensions that feels like it should be part of your regular toolbox.
It gives you a way to inspect tuple density, dead tuples, free space, and some index internals. So naturally, if you are working in YugabyteDB and trying to understand storage behavior, bloat, or row density, you might wonder:
- Can I use
pgstattuplehere too?
Interestingly, the answer is yes… but also no.
Yes, you can install the extension in YugabyteDB…. But no, it is not very useful for what PostgreSQL users normally expect it to do.
That is because pgstattuple was designed around PostgreSQL heap pages and local storage internals, while YugabyteDB stores data very differently: YSQL sits on top of DocDB, and DocDB stores data in distributed tablets backed by RocksDB SST files and compaction.
So this tip is really about something more practical:
- How do you emulate the useful goals of
pgstattuplein YugabyteDB using tools that actually match YugabyteDB’s architecture?
🔍 Yes, pgstattuple installs in YugabyteDB
This is what makes the whole thing a little misleading.
You really can install it:
yugabyte=# CREATE EXTENSION pgstattuple;
CREATE EXTENSION
yugabyte=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------------+-------------+------------+------------------------------------------------------------------------
pg_stat_statements | 1.10-yb-2.0 | pg_catalog | track planning and execution statistics of all SQL statements executed
pgstattuple | 1.5 | public | show tuple-level statistics
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(3 rows)
And if you inspect the extension objects as described in the YugabyteDB Tip Lists Functions Created by an Extension, you can see the familiar routines are there:
SELECT e.extname, ne.nspname AS extschema, p.proname, np.nspname AS proschema
FROM pg_catalog.pg_extension AS e
INNER JOIN pg_catalog.pg_depend AS d ON (d.refobjid = e.oid)
INNER JOIN pg_catalog.pg_proc AS p ON (p.oid = d.objid)
INNER JOIN pg_catalog.pg_namespace AS ne ON (ne.oid = e.extnamespace)
INNER JOIN pg_catalog.pg_namespace AS np ON (np.oid = p.pronamespace)
WHERE d.deptype = 'e'
AND e.extname = 'pgstattuple'
ORDER BY 1, 3;
Output:
. extname | extschema | proname | proschema
-------------+-----------+--------------------+-----------
pgstattuple | public | pg_relpages | public
pgstattuple | public | pg_relpages | public
pgstattuple | public | pgstatginindex | public
pgstattuple | public | pgstathashindex | public
pgstattuple | public | pgstatindex | public
pgstattuple | public | pgstatindex | public
pgstattuple | public | pgstattuple | public
pgstattuple | public | pgstattuple | public
pgstattuple | public | pgstattuple_approx | public
(9 rows)
So the problem is not that the extension is missing.
The problem is that the extension expects to inspect PostgreSQL-style physical storage internals, and that is not how YugabyteDB stores data.
🏗️ Postgres storage vs. YugabyteDB storage
In PostgreSQL, tools like pgstattuple make sense because they inspect structures that are central to how PostgreSQL works:
- ● heap pages
- ● free space in pages
- ● dead tuples
- ● page density
- ● local index pages
In YugabyteDB, that mental model breaks down.
YSQL is PostgreSQL-compatible at the SQL layer, but storage lives underneath in DocDB, where data is distributed across tablets and persisted in LSM-tree structures using SST files. Instead of thinking in terms of heap pages and VACUUM-visible free space, it is usually more useful to think in terms of:
- ● relation size
- ● distributed row counts
- ● SST file growth
- ● number of SST files
- ● background compaction behavior
That is why pgstattuple may install cleanly while still returning errors, zeros, or results that do not tell you anything useful.
🧪 What happens if you try to use it anyway?
Let’s make this concrete.
Demo setup
CREATE EXTENSION IF NOT EXISTS pgstattuple;
DROP TABLE IF EXISTS yb_demo;
CREATE TABLE yb_demo (
id BIGINT PRIMARY KEY,
category TEXT,
payload JSONB
);
CREATE INDEX yb_demo_category_idx ON yb_demo(category);
CREATE INDEX yb_demo_payload_gin_idx
ON yb_demo
USING GIN (
jsonb_to_tsvector('simple'::regconfig, payload, '["string"]'::jsonb)
);
INSERT INTO yb_demo (id, category, payload)
SELECT
g AS id,
(ARRAY['user','order','device','event'])[1 + floor(random() * 4)::int] AS category,
jsonb_build_object(
'name', 'user_' || g,
'city', (ARRAY['Boston','Chicago','Seattle','Austin','Miami'])[1 + floor(random() * 5)::int],
'status', (ARRAY['active','inactive','pending'])[1 + floor(random() * 3)::int],
'tier', (ARRAY['gold','silver','bronze'])[1 + floor(random() * 3)::int],
'score', (random() * 1000)::int,
'tags', jsonb_build_array(
(ARRAY['api','mobile','vip','trial','ops'])[1 + floor(random() * 5)::int],
(ARRAY['east','west','central','priority','legacy'])[1 + floor(random() * 5)::int]
),
'details', jsonb_build_object(
'region', (ARRAY['us-east','us-west','eu-central'])[1 + floor(random() * 3)::int],
'flag', (random() > 0.5)
)
)
FROM generate_series(1, 50000) AS g;
Now let’s try the PostgreSQL extension functions.
Try pgstattuple()
SELECT * FROM pgstattuple('yb_demo');
Result:
ERROR: bad buffer ID: 0
Try pgstattuple_approx()
SELECT * FROM pgstattuple_approx('yb_demo');
Result:
.table_len | scanned_percent | approx_tuple_count | approx_tuple_len | approx_tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | approx_free_space | approx_free_percent
-----------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+-------------------+--------------------
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
(1 row)
Try related functions
SELECT * FROM pgstatindex('yb_demo_category_idx');
Result:
ERROR: relation "yb_demo_category_idx" is not a btree index
SELECT * FROM pgstatginindex('yb_demo_payload_gin_idx');
Result:
ERROR: relation "yb_demo_payload_gin_idx" is not a GIN index
SELECT * FROM pg_relpages('yb_demo');
Result:
.pg_relpages
-------------
0
(1 row)
That behavior tells the story pretty clearly.
📊 Postgres vs. YugabyteDB: what to use instead
Here is the practical mapping.
| What You Want to Inspect | PostgreSQL Mental Model | YugabyteDB Mental Model | Best Tool in YugabyteDB |
|---|---|---|---|
| Table size | Heap file size | Distributed tablet storage / SST-backed table data | pg_table_size() |
| Index size | B-tree index file size | Distributed index tablets | pg_indexes_size() |
| Total object footprint | Heap + indexes | All distributed relation storage | pg_total_relation_size() |
| Live row count | Tuple visibility in heap pages | Distributed row count across tablets | COUNT(*), pg_class.reltuples, or a row-estimate helper |
| Dead space / bloat | Dead tuples, free space, VACUUM behavior | SST growth, obsolete versions, compaction behavior | SST file metrics and compaction metrics |
| Page count | 8 KB heap pages | Not a meaningful storage abstraction in YB | Do not rely on pg_relpages() |
That is really the heart of the tip:
- Replace page-level inspection with relation-size functions, row-count tools, and storage metrics that actually match YugabyteDB’s architecture.
📦 1) If you want size, use the YSQL size functions
A lot of the time, people reach for pgstattuple because they really want an answer to a simpler question:
How big is this table or index?
In YugabyteDB, the direct answers are:
SELECT pg_size_pretty(pg_table_size('yb_demo')) AS table_size;
SELECT pg_size_pretty(pg_indexes_size('yb_demo')) AS index_size;
SELECT pg_size_pretty(pg_total_relation_size('yb_demo')) AS total_size;
These are the YugabyteDB-native tools to use when your real concern is storage footprint.
This is also a great place to cross-link to earlier tips:
- ● If you want a dedicated walkthrough for table-level size, see How to Display Table Size in YSQL.
- ● If you want to think bigger than a single table, see How to Display YSQL Database Size
🔢 2) If you want row counts, use exact counts or smart estimates
Another reason people use pgstattuple is that they are trying to answer a logical question:
- How many rows are really in this thing?
For an exact answer:
SELECT count(*) FROM yb_demo;
For an estimated answer:
ANALYZE yb_demo;
SELECT reltuples
FROM pg_class
WHERE relname = 'yb_demo';
And for larger tables, check out the YugabyteDB Tip:
That tip is especially relevant here because in YugabyteDB, COUNT(*) is not just a quick local operation. It fans out across tablets and nodes. That makes row-count estimation a much more practical tool in distributed environments than people first realize.
🪨 3) If you want “bloat” insight, think SST files and compaction
This is the biggest mindset shift.
In PostgreSQL, pgstattuple is often part of a “bloat investigation” workflow. You think in terms of:
- ● dead tuples
- ● free space
- ● page density
- ● VACUUM cleanup
In YugabyteDB, the better question is usually:
- What is happening with SST files and compaction?
That is much closer to how storage health shows up in YugabyteDB.
Compaction metrics
YugabyteDB also exposes compaction-related metrics such as:
- ●
rocksdb_compact_read_bytes - ●
rocksdb_compact_write_bytes - ●
rocksdb_compaction_times_micros - ●
rocksdb_numfiles_in_singlecompaction
These help you understand whether the system is actively merging SSTs, reducing levels, and cleaning up obsolete versions in the background.
That is the YugabyteDB-world analogue to what Postgres users loosely think of as “bloat maintenance.”
You can read more here:
🖥️ Where should you look in practice?
If you are troubleshooting real storage usage in YugabyteDB, the most practical workflow is usually a combination of:
- ● YSQL size functions for relation footprint
- ● row-count estimates or exact counts for logical volume
- ● Master UI / TServer UI / metrics pages for SST files and compaction behavior
That combination gives you a much more truthful picture than pgstattuple() will.
For example:
- ●
pg_table_size()helps answer, “How large is this relation?” - ●
pg_indexes_size()helps answer, “How much space are the indexes using?” - ●
pg_total_relation_size()helps answer, “What is the total footprint?” - ●
reltuplesor your row-estimate helper helps answer, “How many rows are in play?” - ● SST and compaction metrics help answer, “What is the storage engine actually doing underneath?”
🧭 A better mental model for YugabyteDB
This is probably the best way to summarize the shift:
- ● In PostgreSQL, you often reason from pages inward.
- ● In YugabyteDB, you usually reason from relations, tablets, SSTs, and compaction outward.
That is why pgstattuple feels like it should help, but usually does not.
The extension is present. The SQL layer is compatible. But the storage engine story underneath is fundamentally different.
🎯 Final Takeaway
pgstattuple installs in YugabyteDB, but that is mostly a compatibility artifact, not a sign that it is the right tool for storage analysis.
If you want meaningful answers in YugabyteDB:
- ● use relation-size functions for footprint
- ● use exact counts or smart estimates for row counts
- ● use SST and compaction metrics for storage health
That is the YugabyteDB-native way to emulate the intent behind pgstattuple, even though the extension itself is mostly pointless here.
Have Fun!
