In YugabyteDB, just like in PostgreSQL, the YSQL API doesn’t require a primary key when creating a table. However, because YugabyteDB distributes and partitions data based on the primary key, it’s crucial to define one during table creation.
If you skip it, YugabyteDB will automatically add a hidden yb_rowid column to evenly distribute the data across the cluster. While this ensures balanced storage, it’s not ideal—since the distribution won’t be aligned with your actual query patterns, it can lead to inefficient data access and suboptimal performance.
Example:
Query plan without PKs:
yugabyte=# CREATE TABLE t1 (c1 INT);
CREATE TABLE
yugabyte=# CREATE TABLE t2 (c1 INT);
CREATE TABLE
yugabyte=# EXPLAIN SELECT t1.c1, t2.c1 FROM t1 JOIN t2 USING (c1);
QUERY PLAN
-------------------------------------------------------------------
Merge Join (cost=299.66..379.66 rows=5000 width=8)
Merge Cond: (t1.c1 = t2.c1)
-> Sort (cost=149.83..152.33 rows=1000 width=4)
Sort Key: t1.c1
-> Seq Scan on t1 (cost=0.00..100.00 rows=1000 width=4)
-> Sort (cost=149.83..152.33 rows=1000 width=4)
Sort Key: t2.c1
-> Seq Scan on t2 (cost=0.00..100.00 rows=1000 width=4)
(8 rows)
Query plan with PKs:
yugabyte=# CREATE TABLE t1_pk (c1 INT PRIMARY KEY);
CREATE TABLE
yugabyte=# CREATE TABLE t2_pk (c1 INT PRIMARY KEY);
CREATE TABLE
yugabyte=# EXPLAIN SELECT t1_pk.c1, t2_pk.c1 FROM t1_pk JOIN t2_pk USING (c1);
QUERY PLAN
------------------------------------------------------------------------------
YB Batched Nested Loop Join (cost=0.00..224.00 rows=1000 width=8)
Join Filter: (t1_pk.c1 = t2_pk.c1)
-> Seq Scan on t1_pk (cost=0.00..100.00 rows=1000 width=4)
-> Index Scan using t2_pk_pkey on t2_pk (cost=0.00..0.11 rows=1 width=4)
Index Cond: (c1 = ANY (ARRAY[t1_pk.c1, $1, $2, ..., $1023]))
(5 rows)
Always define a primary key that reflects your access patterns!
You can identify tables that lack a primary key with the following query:
SELECT t.table_schema, t.table_name
FROM information_schema.tables t
LEFT JOIN information_schema.table_constraints tc
ON t.table_schema = tc.table_schema
AND t.table_name = tc.table_name
AND tc.constraint_type = 'PRIMARY KEY'
WHERE t.table_type = 'BASE TABLE'
AND t.table_schema NOT IN ('pg_catalog', 'information_schema')
AND tc.constraint_name is null
ORDER BY t.table_schema, t.table_name;
Example:
yugabyte=# SELECT t.table_schema, t.table_name
yugabyte-# FROM information_schema.tables t
yugabyte-# LEFT JOIN information_schema.table_constraints tc
yugabyte-# ON t.table_schema = tc.table_schema
yugabyte-# AND t.table_name = tc.table_name
yugabyte-# AND tc.constraint_type = 'PRIMARY KEY'
yugabyte-# WHERE t.table_type = 'BASE TABLE'
yugabyte-# AND t.table_schema NOT IN ('pg_catalog', 'information_schema')
yugabyte-# AND tc.constraint_name is null
yugabyte-# ORDER BY t.table_schema, t.table_name;
table_schema | table_name
table_schema | table_name
--------------+------------
public | t1
public | t2
(2 rows)
You can confirm that these tables lack a primary key constraint by describing them in ysqlsh:
yugabyte=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | |
yugabyte=# \d t2
Table "public.t2"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | |
If you’re planning to add a primary key to a table that already has data, be sure to check out the tip Pitfalls of Adding a PK to an Existing Table in YSQL first!
Performance Update!
To avoid information_schema in PostgreSQL/YugabyteDB, you can query the native system catalogs in pg_catalog.
System catalogs are generally much faster to query than information_schema views, which have to do a lot of complex joins under the hood to comply with the SQL standard.
Here is the exact equivalent of your query using PostgreSQL/YugabyteDB’s internal catalog tables:
SELECT
n.nspname AS table_schema,
c.relname AS table_name
FROM
pg_catalog.pg_class c
JOIN
pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE
c.relkind = 'r' -- 'r' = ordinary relation (base table)
AND n.nspname NOT IN ('pg_catalog', 'information_schema')
AND NOT EXISTS (
SELECT 1
FROM pg_catalog.pg_constraint con
WHERE con.conrelid = c.oid
AND con.contype = 'p' -- 'p' = primary key constraint
)
ORDER BY
n.nspname,
c.relname;
Have Fun!
