Partitioning is another term for physically dividing large tables in YugabyteDB into smaller, more manageable tables to improve performance.
Because partitioned tables do not appear nor act differently from the original table, applications accessing the database are not always aware of the fact that partitioning has taken place.
Example:
yugabyte=# CREATE TABLE TEST (c1 INT, c2 VARCHAR, PRIMARY KEY(c1, c2)) PARTITION BY LIST (c2);
CREATE TABLE
yugabyte=# CREATE TABLE TEST_a PARTITION OF test (c1, c2) FOR VALUES IN ('a');
CREATE TABLE
yugabyte=# CREATE TABLE TEST_b PARTITION OF test (c1, c2) FOR VALUES IN ('b');
CREATE TABLE
yugabyte=# INSERT INTO test SELECT 1, 'a';
INSERT 0 1
yugabyte=# INSERT INTO test SELECT 2, 'b';
INSERT 0 1
yugabyte=# SELECT * FROM test ORDER BY 1, 2;
c1 | c2
----+----
1 | a
2 | b
(2 rows)
One caveat to be aware of when using partitioned tables is that all primary keys, unique constraints and unique indexes must contain the partition expression.
Indexes on partitioned tables, including those that enforce constraints, are implemented on each partition. Thus, there is no way to enforce uniqueness across partitions.
Using the table TEST created above, you may want the C1 column to be unique in the table. But that’s not possible given that the primary key also includes the partition column.
Example:
yugabyte=# TRUNCATE TABLE test;
TRUNCATE TABLE
yugabyte=# INSERT INTO test SELECT 1, 'a';
INSERT 0 1
yugabyte=# INSERT INTO test SELECT 1, 'b';
INSERT 0 1
yugabyte=# SELECT * FROM test ORDER BY 1, 2;
c1 | c2
----+----
1 | a
1 | b
(2 rows)
yugabyte=# INSERT INTO test SELECT 1, 'b';
ERROR: duplicate key value violates unique constraint "test_b_pkey"
As you can see, that last INSERT failed because we can guarantee uniqueness within a partition.
However, if you really want global uniqueness, here is a tip that uses a check constraint to call a function that checks for an existing row. If the row already exists, the INSERT will fail.
First we need to create the function:
yugabyte=# CREATE FUNCTION synrf.lookup_func(
yugabyte(# p_in anyelement,
yugabyte(# table_in VARCHAR,
yugabyte(# column_in VARCHAR
yugabyte(# )
yugabyte-# RETURNS boolean
yugabyte-# LANGUAGE plpgsql AS
yugabyte-# $$
yugabyte$# DECLARE
yugabyte$# result BOOLEAN;
yugabyte$# BEGIN
yugabyte$# EXECUTE format('SELECT NOT EXISTS (SELECT 1 FROM %s WHERE %s = %s)', table_in, column_in, p_in) INTO result;
yugabyte$# RETURN result;
yugabyte$# END;
yugabyte$# $$
yugabyte-# ;
CREATE FUNCTION
Next we add the check constraint.
yugabyte=# TRUNCATE TABLE test;
TRUNCATE TABLE
yugabyte=# ALTER TABLE test ADD CONSTRAINT test_c1_ck CHECK(lookup_func(c1, 'test', 'c1'));
ALTER TABLE
Now I shouldn’t be able to INSERT duplicate values for column C1:
yugabyte=# INSERT INTO test SELECT 1, 'a';
INSERT 0 1
yugabyte=# INSERT INTO test SELECT 1, 'b';
ERROR: new row for relation "test_b" violates check constraint "test_c1_ck"
DETAIL: Failing row contains (1, b).
Bear in mind that this “solution” might not scale very well. That is, if you need to do a bulk load or INSERT many records, you’ll want to drop the check constraint first. Then once your rows have been loaded or inserted (without committing), check for duplicates, deal with any, then re-create the check constraint.