Faster Distributed Transactions with CTEs

When working with distributed databases like YugabyteDB, writing safe, performant, and transactionally consistent SQL can get tricky — especially when your app logic spans multiple tables and mutation types (like updates and inserts).

This is where Common Table Expressions (CTEs) shine.

CTEs — especially data-modifying CTEs — can give you the power to collapse multiple related operations into a single, atomic SQL statement. And in distributed systems, that single-statement approach matters a lot.

Here’s why…

Stronger Atomicity Across Tables and Rows.

In a distributed database, every SQL statement may touch data across multiple nodes. If you execute 4 separate SQL statements — say:

an UPDATE
a second UPDATE
and two INSERTs

Each of those statements could be routed independently, possibly ending up on different nodes, and could partially succeed or fail without easy rollback. That’s a nightmare for consistency.

Note: YugabyteDB supports distributed transactions, so you can wrap the above DML statements in a BEGIN/END block for atomicity. However, executing multiple separate DML statements introduces additional latency — each one incurs its own round-trip and coordination overhead, which can quickly add up.

CTEs let you express all 4 operations as one single SQL statement. YugabyteDB treats it as a single unit — a single distributed transaction — and guarantees atomicity across all operations.

If one part fails, the whole thing rolls back.

Cleaner, More Readable Logic

Compare these:

				
					-- Many separate statements
BEGIN TRANSACTION;
  UPDATE table_a ...
  UPDATE table_b ...
  INSERT INTO table_c ...
  INSERT INTO table_d ...
COMMIT;

vs.

				
					-- One statement!
WITH
updated_a AS (...),
updated_b AS (...),
inserted_c AS (...),
inserted_d AS (...)
SELECT 1;

The CTE version is self-contained and declarative. Every data transformation is visible in a single block — which is easier to read, debug, and maintain.

Fewer Network Hops and Lower Latency

In a distributed system like YugabyteDB, each separate SQL statement potentially involves:

A new round trip from client to coordinator node
Distributed consensus for write operations (Raft)
Separate coordination of retries and failure handling

Collapsing everything into a single statement reduces:

Network overhead
Coordination time
Transactional bookkeeping

The result? Faster, leaner, more reliable execution.

Safer Transaction Patterns

Using explicit transactions (BEGIN TRANSACTION; ... COMMIT;) works fine — but in high-throughput systems, long-lived or multi-step transactions are harder to manage and can impact performance.

A multi-statement transaction:

Increases the chance of deadlocks
May hold locks longer than necessary
May fail partially if the app crashes mid-way

CTEs sidestep this by giving you one, short, powerful statement — which YugabyteDB internally wraps with the right level of isolation and consistency.

Quick Example: Cascade Updates

Imagine we have the following tables:

				
					-- Table A: Holds main data
CREATE TABLE table_a (
    id SERIAL PRIMARY KEY,
    status TEXT,
    reference_code TEXT
);

-- Table B: Linked by reference_code
CREATE TABLE table_b (
    code TEXT PRIMARY KEY,
    processed BOOLEAN DEFAULT FALSE
);

-- Table C: Receives inserts based on Table B
CREATE TABLE table_c (
    id SERIAL PRIMARY KEY,
    code TEXT,
    source TEXT,
    created_at TIMESTAMP
);

-- Table D: Receives inserts based on Table B
CREATE TABLE table_d (
    id SERIAL PRIMARY KEY,
    code TEXT,
    details TEXT,
    created_at TIMESTAMP
);

-- Insert into Table B first since Table A references its code
INSERT INTO table_b (code) VALUES
('REF123'),
('REF456'),
('REF789');

-- Insert into Table A referencing codes in Table B
INSERT INTO table_a (status, reference_code) VALUES
('pending', 'REF123'),
('pending', 'REF456'),
('pending', 'REF789');

Initially, the tables contains the following data:

				
					SELECT * FROM table_a ORDER BY id;
 id | status  | reference_code
----+---------+----------------
  1 | pending | REF123
  2 | pending | REF456
  3 | pending | REF789
(3 rows)

SELECT * FROM table_b;
  code  | processed
--------+-----------
 REF456 | f
 REF789 | f
 REF123 | f
(3 rows)

SELECT * FROM table_c ORDER BY id;
 id | code | source | created_at
----+------+--------+------------
(0 rows)

SELECT * FROM table_d ORDER BY id;
 id | code | details | created_at
----+------+---------+------------
(0 rows)

We want to update a row in table_a, marking a related row in table_b as processed, and then inserting audit rows into table_c and table_d.

We can encapsulate that logic in a stored function, executing a single DML operation per table:

				
					CREATE OR REPLACE FUNCTION update_status_and_cascade(
    target_id INT,
    new_status TEXT
) RETURNS VOID
LANGUAGE plpgsql
AS $$
DECLARE
    ref_code TEXT;
    src TEXT := 'system';
    now_ts TIMESTAMP := CURRENT_TIMESTAMP;
BEGIN
    -- Step 1: Update table_a and fetch reference_code
    UPDATE table_a
    SET status = new_status
    WHERE id = target_id
    RETURNING reference_code INTO ref_code;

    IF ref_code IS NOT NULL THEN
        -- Step 2: Update table_b
        UPDATE table_b
        SET processed = TRUE
        WHERE code = ref_code;

        -- Step 3: Insert into table_c
        INSERT INTO table_c (code, source, created_at)
        SELECT code, src, now_ts
        FROM table_b
        WHERE code = ref_code;

        -- Step 4: Insert into table_d
        INSERT INTO table_d (code, details, created_at)
        SELECT code, CONCAT('Processed at ', now_ts), now_ts
        FROM table_b
        WHERE code = ref_code;
    END IF;
END;
$$;

Let’s execute the function using EXPLAIN (ANALYZE, DIST) to observe storage access patterns and performance metrics.

				
					EXPLAIN (ANALYZE, DIST) SELECT update_status_and_cascade(1, 'approved');
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Result  (cost=0.00..0.26 rows=1 width=4) (actual time=2193.689..2193.689 rows=1 loops=1)
   Storage Table Read Requests: 2
   Storage Table Read Execution Time: 0.519 ms
   Storage Table Rows Scanned: 2
   Storage Table Write Requests: 4
   Storage Flush Requests: 2
   Storage Flush Execution Time: 156.895 ms
 Planning Time: 0.019 ms
 Execution Time: 2461.330 ms
 Storage Read Requests: 2
 Storage Read Execution Time: 0.519 ms
 Storage Rows Scanned: 2
 Storage Write Requests: 4
 Catalog Read Requests: 20
 Catalog Read Execution Time: 1326.103 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 2
 Storage Flush Execution Time: 156.895 ms
 Storage Execution Time: 1483.516 ms
 Peak Memory Usage: 24 kB
(20 rows)

SELECT * FROM table_a ORDER BY id;
 id |  status  | reference_code
----+----------+----------------
  1 | approved | REF123
  2 | pending  | REF456
  3 | pending  | REF789
(3 rows)

SELECT * FROM table_b;
  code  | processed
--------+-----------
 REF456 | f
 REF789 | f
 REF123 | t
(3 rows)

SELECT * FROM table_c ORDER BY id;
 id |  code  | source |         created_at
----+--------+--------+----------------------------
  1 | REF123 | system | 2025-05-01 20:24:01.025168
(1 row)

SELECT * FROM table_d ORDER BY id;
 id |  code  |                 details                 |         created_at
----+--------+-----------------------------------------+----------------------------
  1 | REF123 | Processed at 2025-05-01 20:24:01.025168 | 2025-05-01 20:24:01.025168
(1 row)

Note that the total execution time was 2.461 seconds.

Next, let’s create a stored function that encapsulates the same logic as above, but this time executes it using a single SQL statement:

				
					CREATE OR REPLACE FUNCTION update_status_and_cascade_cte(
    target_id INT,
    new_status TEXT
) RETURNS VOID
LANGUAGE SQL
AS $$
WITH updated_a AS (
    UPDATE table_a
    SET status = new_status
    WHERE id = target_id
    RETURNING reference_code
),
updated_b AS (
    UPDATE table_b
    SET processed = TRUE
    WHERE code IN (SELECT reference_code FROM updated_a)
    RETURNING code
),
inserted_c AS (
    INSERT INTO table_c (code, source, created_at)
    SELECT code, 'system', CURRENT_TIMESTAMP
    FROM updated_b
    RETURNING *
),
inserted_d AS (
    INSERT INTO table_d (code, details, created_at)
    SELECT code, 'Processed at ' || CURRENT_TIMESTAMP, CURRENT_TIMESTAMP
    FROM updated_b
    RETURNING *
)
SELECT;
$$;

Now, let’s execute the latest function using EXPLAIN (ANALYZE, DIST) to observe storage access patterns and performance metrics.

				
					EXPLAIN (ANALYZE, DIST) SELECT update_status_and_cascade_cte(2, 'approved');
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Result  (cost=0.00..0.26 rows=1 width=4) (actual time=760.674..760.674 rows=1 loops=1)
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 0.233 ms
   Storage Table Rows Scanned: 1
   Storage Table Write Requests: 4
 Planning Time: 666.188 ms
 Execution Time: 788.145 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 0.233 ms
 Storage Rows Scanned: 1
 Storage Write Requests: 4
 Catalog Read Requests: 21
 Catalog Read Execution Time: 1397.474 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 27.391 ms
 Storage Execution Time: 1425.099 ms
 Peak Memory Usage: 358 kB
(18 rows)

SELECT * FROM table_a ORDER BY id;
 id |  status  | reference_code
----+----------+----------------
  1 | approved | REF123
  2 | approved | REF456
  3 | pending  | REF789
(3 rows)

SELECT * FROM table_b;
  code  | processed
--------+-----------
 REF456 | t
 REF789 | f
 REF123 | t
(3 rows)

SELECT * FROM table_c ORDER BY id;
 id |  code  | source |         created_at
----+--------+--------+----------------------------
  1 | REF123 | system | 2025-05-01 20:24:01.025168
  2 | REF456 | system | 2025-05-01 20:24:04.272399
(2 rows)

SELECT * FROM table_d ORDER BY id;
 id |  code  |                  details                   |         created_at
----+--------+--------------------------------------------+----------------------------
  1 | REF123 | Processed at 2025-05-01 20:24:01.025168    | 2025-05-01 20:24:01.025168
  2 | REF456 | Processed at 2025-05-01 20:24:04.272399+00 | 2025-05-01 20:24:04.272399
(2 rows)

Notably, the total execution time to perform the same workload was reduced to just 788 milliseconds.

That’s an improvement of roughly 68%!

				
					yugabyte=# SELECT ((2.461 - .788) / 2.461 * 100)::NUMERIC(7,2) "% Improvement";
 % Improvement
---------------
         67.98
(1 row)

Final thoughts

Using data-modifying CTEs isn’t just a Postgres trick — it’s a best practice when working with distributed systems like YugabyteDB. They give you better performance, stronger guarantees, and cleaner code — all in one go.

If you’re already using YugabyteDB‘s Postgres-compatible SQL, you’re just one WITH clause away from safer, smarter data logic!

Have Fun!