In YugabyteDB, just like in PostgreSQL, the YSQL API doesn’t require a primary key when creating a table. However, because YugabyteDB distributes and partitions data based on the primary key, it’s crucial to define one during table creation.
If you skip it, YugabyteDB will automatically add a hidden yb_rowid
column to evenly distribute the data across the cluster. While this ensures balanced storage, it’s not ideal—since the distribution won’t be aligned with your actual query patterns, it can lead to inefficient data access and suboptimal performance.
Example:
Query plan without PKs:
yugabyte=# CREATE TABLE t1 (c1 INT);
CREATE TABLE
yugabyte=# CREATE TABLE t2 (c1 INT);
CREATE TABLE
yugabyte=# EXPLAIN SELECT t1.c1, t2.c1 FROM t1 JOIN t2 USING (c1);
QUERY PLAN
-------------------------------------------------------------------
Merge Join (cost=299.66..379.66 rows=5000 width=8)
Merge Cond: (t1.c1 = t2.c1)
-> Sort (cost=149.83..152.33 rows=1000 width=4)
Sort Key: t1.c1
-> Seq Scan on t1 (cost=0.00..100.00 rows=1000 width=4)
-> Sort (cost=149.83..152.33 rows=1000 width=4)
Sort Key: t2.c1
-> Seq Scan on t2 (cost=0.00..100.00 rows=1000 width=4)
(8 rows)
Query plan with PKs:
yugabyte=# CREATE TABLE t1_pk (c1 INT PRIMARY KEY);
CREATE TABLE
yugabyte=# CREATE TABLE t2_pk (c1 INT PRIMARY KEY);
CREATE TABLE
yugabyte=# EXPLAIN SELECT t1_pk.c1, t2_pk.c1 FROM t1_pk JOIN t2_pk USING (c1);
QUERY PLAN
------------------------------------------------------------------------------
YB Batched Nested Loop Join (cost=0.00..224.00 rows=1000 width=8)
Join Filter: (t1_pk.c1 = t2_pk.c1)
-> Seq Scan on t1_pk (cost=0.00..100.00 rows=1000 width=4)
-> Index Scan using t2_pk_pkey on t2_pk (cost=0.00..0.11 rows=1 width=4)
Index Cond: (c1 = ANY (ARRAY[t1_pk.c1, $1, $2, ..., $1023]))
(5 rows)
Always define a primary key that reflects your access patterns!
You can identify tables that lack a primary key with the following query:
SELECT t.table_schema, t.table_name
FROM information_schema.tables t
LEFT JOIN information_schema.table_constraints tc
ON t.table_schema = tc.table_schema
AND t.table_name = tc.table_name
AND tc.constraint_type = 'PRIMARY KEY'
WHERE t.table_type = 'BASE TABLE'
AND t.table_schema NOT IN ('pg_catalog', 'information_schema')
AND tc.constraint_name is null
ORDER BY t.table_schema, t.table_name;
Example:
yugabyte=# SELECT t.table_schema, t.table_name
yugabyte-# FROM information_schema.tables t
yugabyte-# LEFT JOIN information_schema.table_constraints tc
yugabyte-# ON t.table_schema = tc.table_schema
yugabyte-# AND t.table_name = tc.table_name
yugabyte-# AND tc.constraint_type = 'PRIMARY KEY'
yugabyte-# WHERE t.table_type = 'BASE TABLE'
yugabyte-# AND t.table_schema NOT IN ('pg_catalog', 'information_schema')
yugabyte-# AND tc.constraint_name is null
yugabyte-# ORDER BY t.table_schema, t.table_name;
table_schema | table_name
table_schema | table_name
--------------+------------
public | t1
public | t2
(2 rows)
You can confirm that these tables lack a primary key constraint by describing them in ysqlsh
:
yugabyte=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | |
yugabyte=# \d t2
Table "public.t2"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | |
If you’re planning to add a primary key to a table that already has data, be sure to check out the tip Pitfalls of Adding a PK to an Existing Table in YSQL first!
Have Fun!