How Yugabyte xCluster Handles Conflicts with Last Writer Wins

Introduction

One of the most common questions about YugabyteDB’s xCluster active-active replication is:

“What happens if two universes update the same row at nearly the same time?”

The answer: YugabyteDB uses the last-writer-wins (LWW) strategy, driven by the Hybrid Logical Clock (HLC).

● Each write gets a globally comparable HLC timestamp.
● When conflicts arise, the write with the later HLC wins.
● The losing update is overwritten … no merge, no error.

In this YugabyteDB Tip we’ll demo this behavior step by step, using yugabyted to spin up two local universes.

Special Note:

Before setting up xCluster replication between two “production” class YugabyteDB universes, be sure to read the Inconsistencies affecting transactions section in the official YugabyteDB documentation to understand key transactional nuances. It’s an important read!

1. Setup: Two Local Universes

				
					yugabyted start --base_dir=~/yb01 --advertise_address=127.0.0.1 > start1.log
yugabyted start --base_dir=~/yb02 --advertise_address=127.0.0.2 > start2.log

2. Create a test table on each universe

				
					ysqlsh -h 127.0.0.1 -c "CREATE TABLE test(id INT PRIMARY KEY, c1 TEXT, mofified_ts TIMESTAMP DEFAULT clock_timestamp());"
ysqlsh -h 127.0.0.2 -c "CREATE TABLE test(id INT PRIMARY KEY, c1 TEXT, mofified_ts TIMESTAMP DEFAULT clock_timestamp());"

3. Set up bi-directional xCluster

We’ll use here the the setup commands that were covered in the YugabyteDB Tip
Quickly Test xCluster Active-Active Multi-Master Replication with yugabyted.

First, we need the table IDs for the test table from each Universe:

				
					[root@localhost ~]# ysqlsh -h 127.0.0.1 -Atc "SELECT '0000' || lpad(to_hex(d.oid::int), 4, '0') || '00003000800000000000' || lpad(to_hex(c.oid::int), 4, '0') FROM pg_class c, pg_namespace n, pg_database d WHERE c.relname = 'test' AND c.relnamespace = n.oid AND d.datname = current_database();"
000034cb000030008000000000004000

[root@localhost ~]# ysqlsh -h 127.0.0.2 -Atc "SELECT '0000' || lpad(to_hex(d.oid::int), 4, '0') || '00003000800000000000' || lpad(to_hex(c.oid::int), 4, '0') FROM pg_class c, pg_namespace n, pg_database d WHERE c.relname = 'test' AND c.relnamespace = n.oid AND d.datname = current_database();"
000034cb000030008000000000004000

Next, let’s set up xCluster replication / Source Cluster C1 — > Target Cluster C2:

				
					[root@localhost ~]# yb-admin -master_addresses 127.0.0.2:7100 setup_universe_replication universe_c1 127.0.0.1:7100 000034cb000030008000000000004000
Replication setup successfully

Finally, let’s set up xCluster replication / Source Cluster C2 — > Target Cluster C1:

				
					[root@localhost ~]# yb-admin -master_addresses 127.0.0.1:7100 setup_universe_replication universe_c2 127.0.0.2:7100 000034cb000030008000000000004000
Replication setup successfully

4. Load some sample data

Insert data into the test table on Universe C1… it will be replcated to Universe C2:

				
					[root@localhost ~]# ysqlsh -h 127.0.0.1 -c "INSERT INTO test (id, c1) SELECT g, md5(random()::TEXT) FROM generate_series(1, 10) g;"
INSERT 0 10

[root@localhost ~]# ysqlsh -h 127.0.0.1 -c "SELECT * FROM test ORDER BY id;"
 id |                c1                |        mofified_ts
----+----------------------------------+----------------------------
  1 | 262be0c0ec1ba9b810aa8195c35d07fb | 2025-09-30 19:51:50.390403
  2 | fb72dbf6ab3bc1f6ff8251edaa869fa7 | 2025-09-30 19:51:50.393087
  3 | d11186f2588b0a352f6fc68102a7c26e | 2025-09-30 19:51:50.393099
  4 | 63bb7e02a50a1c7ed3b339031bc8f601 | 2025-09-30 19:51:50.393106
  5 | 13e681376ef3b219e1734dc35325e2fb | 2025-09-30 19:51:50.393111
  6 | 5286df163f659a56e3cbc6dd8621097f | 2025-09-30 19:51:50.393117
  7 | b73c3c5c3d6055a0ddcd6c215a15c411 | 2025-09-30 19:51:50.393122
  8 | 3e7d18b88c36051fe2c68382d5e6bdf6 | 2025-09-30 19:51:50.393128
  9 | 3deb1124f9ff724dca590f33d59922a7 | 2025-09-30 19:51:50.393137
 10 | b10b2b70d8581e746ab9fa14751f867f | 2025-09-30 19:51:50.393143
(10 rows)

[root@localhost ~]# ysqlsh -h 127.0.0.2 -c "SELECT * FROM test ORDER BY id;"
 id |                c1                |        mofified_ts
----+----------------------------------+----------------------------
  1 | 262be0c0ec1ba9b810aa8195c35d07fb | 2025-09-30 19:51:50.390403
  2 | fb72dbf6ab3bc1f6ff8251edaa869fa7 | 2025-09-30 19:51:50.393087
  3 | d11186f2588b0a352f6fc68102a7c26e | 2025-09-30 19:51:50.393099
  4 | 63bb7e02a50a1c7ed3b339031bc8f601 | 2025-09-30 19:51:50.393106
  5 | 13e681376ef3b219e1734dc35325e2fb | 2025-09-30 19:51:50.393111
  6 | 5286df163f659a56e3cbc6dd8621097f | 2025-09-30 19:51:50.393117
  7 | b73c3c5c3d6055a0ddcd6c215a15c411 | 2025-09-30 19:51:50.393122
  8 | 3e7d18b88c36051fe2c68382d5e6bdf6 | 2025-09-30 19:51:50.393128
  9 | 3deb1124f9ff724dca590f33d59922a7 | 2025-09-30 19:51:50.393137
 10 | b10b2b70d8581e746ab9fa14751f867f | 2025-09-30 19:51:50.393143
(10 rows)

5. HLC Decoder Helper

In a previous tip, Peeking into YugabyteDB’s Hybrid Logical Clock (HLC), we leared about a cool little helper function that can decode an HLC bigint directly inside YSQL. Let’s create that in each of our universes.

				
					CREATE OR REPLACE FUNCTION decode_hybrid_time(ht_lsn bigint)
RETURNS TABLE(physical_usec bigint, physical_ts timestamptz, logical_counter int) AS $$
BEGIN
  -- Extract physical microseconds
  physical_usec := ht_lsn >> 12;
  -- Convert to timestamp
  physical_ts := to_timestamp(physical_usec / 1000000.0);
  -- Extract logical counter (lower 12 bits)
  logical_counter := ht_lsn & ((1<<12)-1);
  RETURN NEXT;
END;
$$ LANGUAGE plpgsql;

6. The Conflict Script

Save the following script as lww_test.sh. This will fire an update in Cluster 1 (127.0.0.1:5433) and Cluster 2 (127.0.0.2:5433) at the same time, then capture the HLC before/after each update.

				
					#!/usr/bin/env bash
set -euo pipefail

# Connection endpoints (same port, different loopback IPs)
A_HOST=127.0.0.1         # Cluster 1
B_HOST=127.0.0.2         # Cluster 2
PORT=5433
DB=yugabyte

# Pick a common fire time a few seconds in the future (UTC)
TARGET_UTC=$(date -u -d 'now + 4 seconds' '+%Y-%m-%d %H:%M:%S.%6N+00')
TARGET_EPOCH=$(date -ud "$TARGET_UTC" +%s.%6N)

echo "Target UTC: $TARGET_UTC"
echo "Launching concurrent updates..."

run_race() {
  local HOST="$1"
  local VALUE="$2"
  local LABEL="$3"

  ysqlsh -h "$HOST" -p "$PORT" -d "$DB" \
         --set=ON_ERROR_STOP=on \
         -v target_epoch="$TARGET_EPOCH" \
         -v val="$VALUE" \
         -v label="$LABEL" <<'SQL_EOF'
\pset pager off
\timing off

SELECT format('=== %s: starting transaction ===', :'label') AS info;

BEGIN;

-- 1) Sleep until the shared target time so both clusters fire together
WITH t AS (SELECT to_timestamp(:'target_epoch')::timestamptz AS tgt),
d AS (
  SELECT GREATEST(0, EXTRACT(EPOCH FROM ((SELECT tgt FROM t) - clock_timestamp()))) AS sec_to_sleep
)
SELECT pg_sleep((SELECT sec_to_sleep FROM d));

-- 2) Snapshot local HLC *immediately* before the UPDATE
WITH s AS (SELECT (decode_hybrid_time(yb_get_current_hybrid_time_lsn())).*)
SELECT :'label' AS cluster, 'PRE_HLC' AS tag, s.physical_usec, s.physical_ts, s.logical_counter FROM s;

-- 3) The conflicting UPDATE
UPDATE public.test
SET  c1 = :'val',
     mofified_ts = clock_timestamp()
WHERE id = 1;

-- 4) Snapshot local HLC again, still inside the txn
WITH s AS (SELECT (decode_hybrid_time(yb_get_current_hybrid_time_lsn())).*)
SELECT :'label' AS cluster, 'POST_HLC' AS tag, s.physical_usec, s.physical_ts, s.logical_counter FROM s;

COMMIT;

-- 5) Show the row and one more HLC sample after commit
SELECT :'label' AS cluster, id, c1, mofified_ts FROM public.test WHERE id = 1;

WITH s AS (SELECT (decode_hybrid_time(yb_get_current_hybrid_time_lsn())).*)
SELECT :'label' AS cluster, 'AFTER_COMMIT_HLC' AS tag, s.physical_usec, s.physical_ts, s.logical_counter FROM s;

SELECT format('=== %s: finished ===', :'label') AS info;
SQL_EOF
}

# Fire both in parallel (label each universe)
run_race "$A_HOST" "Cluster 1 Update" "Cluster 1 (127.0.0.1)" & pidA=$!
run_race "$B_HOST" "Cluster 2 Update" "Cluster 2 (127.0.0.2)" & pidB=$!

wait "$pidA" "$pidB"
echo "Both updates finished."

echo
echo "=== Result after replication settles ==="
sleep 2   # adjust if your xCluster lag is larger

echo "-- Cluster 1 view:"
ysqlsh -h "$A_HOST" -p "$PORT" -d "$DB" -c "SELECT 'Cluster 1 (127.0.0.1)' AS cluster, id, c1, mofified_ts FROM public.test WHERE id = 1;"

echo "-- Cluster 2 view:"
ysqlsh -h "$B_HOST" -p "$PORT" -d "$DB" -c "SELECT 'Cluster 2 (127.0.0.2)' AS cluster, id, c1, mofified_ts FROM public.test WHERE id = 1;"

# Determine the winner by reading the final value (it propagates to both clusters)
FINAL_VAL=$(ysqlsh -qAt -h "$A_HOST" -p "$PORT" -d "$DB" -c "SELECT c1 FROM public.test WHERE id = 1;")

echo
echo "=== Winner (Last Writer Wins) ==="
if [[ "$FINAL_VAL" == "Cluster 1 Update" ]]; then
  echo "Winner: Cluster 1 (127.0.0.1) — its HLC was later."
elif [[ "$FINAL_VAL" == "Cluster 2 Update" ]]; then
  echo "Winner: Cluster 2 (127.0.0.2) — its HLC was later."
else
  echo "Winner could not be determined from value (unexpected c1='$FINAL_VAL')."
fi

Make it executable and run:

				
					chmod +x lww_test.sh

7. Example Output

				
					[root@localhost ~]# ./lww_test.sh
Target UTC: 2025-09-30 21:17:40.201450+00
Launching concurrent updates...
Pager usage is off.
Timing is off.
Pager usage is off.
Timing is off.
                        info
-----------------------------------------------------
 === Cluster 1 (127.0.0.1): starting transaction ===
(1 row)

BEGIN
                        info
-----------------------------------------------------
 === Cluster 2 (127.0.0.2): starting transaction ===
(1 row)

BEGIN
 pg_sleep
----------

(1 row)

 pg_sleep
----------

(1 row)

        cluster        |   tag   |  physical_usec   |          physical_ts          | logical_counter
-----------------------+---------+------------------+-------------------------------+-----------------
 Cluster 1 (127.0.0.1) | PRE_HLC | 1759267060212065 | 2025-09-30 21:17:40.212065+00 |               0
(1 row)

        cluster        |   tag   |  physical_usec   |          physical_ts          | logical_counter
-----------------------+---------+------------------+-------------------------------+-----------------
 Cluster 2 (127.0.0.2) | PRE_HLC | 1759267060212139 | 2025-09-30 21:17:40.212139+00 |               0
(1 row)

UPDATE 1
        cluster        |   tag    |  physical_usec   |          physical_ts          | logical_counter
-----------------------+----------+------------------+-------------------------------+-----------------
 Cluster 2 (127.0.0.2) | POST_HLC | 1759267060256437 | 2025-09-30 21:17:40.256437+00 |               0
(1 row)

UPDATE 1
COMMIT
        cluster        |   tag    |  physical_usec   |          physical_ts          | logical_counter
-----------------------+----------+------------------+-------------------------------+-----------------
 Cluster 1 (127.0.0.1) | POST_HLC | 1759267060257124 | 2025-09-30 21:17:40.257124+00 |               0
(1 row)

COMMIT
        cluster        | id |        c1        |        mofified_ts
-----------------------+----+------------------+----------------------------
 Cluster 2 (127.0.0.2) |  1 | Cluster 2 Update | 2025-09-30 21:17:40.253473
(1 row)

        cluster        |       tag        |  physical_usec   |          physical_ts          | logical_counter
-----------------------+------------------+------------------+-------------------------------+-----------------
 Cluster 2 (127.0.0.2) | AFTER_COMMIT_HLC | 1759267060258224 | 2025-09-30 21:17:40.258224+00 |               0
(1 row)

                  info
-----------------------------------------
 === Cluster 2 (127.0.0.2): finished ===
(1 row)

        cluster        | id |        c1        |        mofified_ts
-----------------------+----+------------------+----------------------------
 Cluster 1 (127.0.0.1) |  1 | Cluster 1 Update | 2025-09-30 21:17:40.254289
(1 row)

        cluster        |       tag        |  physical_usec   |         physical_ts          | logical_counter
-----------------------+------------------+------------------+------------------------------+-----------------
 Cluster 1 (127.0.0.1) | AFTER_COMMIT_HLC | 1759267060258850 | 2025-09-30 21:17:40.25885+00 |               0
(1 row)

                  info
-----------------------------------------
 === Cluster 1 (127.0.0.1): finished ===
(1 row)

Both updates finished.

=== Result after replication settles ===
-- Cluster 1 view:
        cluster        | id |        c1        |        mofified_ts
-----------------------+----+------------------+----------------------------
 Cluster 1 (127.0.0.1) |  1 | Cluster 1 Update | 2025-09-30 21:17:40.254289
(1 row)

-- Cluster 2 view:
        cluster        | id |        c1        |        mofified_ts
-----------------------+----+------------------+----------------------------
 Cluster 2 (127.0.0.2) |  1 | Cluster 1 Update | 2025-09-30 21:17:40.254289
(1 row)


=== Winner (Last Writer Wins) ===
Winner: Cluster 1 (127.0.0.1) — its HLC was later.

🕰️ Clarifying “Last Writer” vs. “Last Timestamp”

It’s important to pause and clarify what “last writer wins” actually means in a distributed database like YugabyteDB.

The term is sometimes misunderstood… especially when physical clocks are involved.

Note:

The claim that the last writer wins may be misleading depending on how we define last writer. There’s a subtle but important difference.

YugabyteDB doesn’t synchronize physical clocks across clusters… it uses hybrid logical clocks (HLC) that combine physical and logical components. While HLCs guarantee monotonicity within a node, they don’t eliminate inter-cluster clock drift.

As a result, in the presence of drift, it’s possible for two writes from the same client to be applied in a different perceived order.

For example, imagine a client writes (1,1) to Cluster 1, and immediately afterward writes (1,2) to Cluster 2. If Cluster 2’s physical clock is slightly behind, its hybrid timestamp might still be earlier than the first write.

When replication completes, both clusters will converge to the value (1,1), consistent everywhere, but not the last value from the client’s perspective.

This is expected behavior: LWW ensures eventual consistency, not necessarily causal correctness. The “last” winner is determined by timestamp order, not by client-observed order.

Example:

				
					Client
 ├── Write (1,1) → Cluster 1 @ 10:00:05.500
 └── Write (1,2) → Cluster 2 @ 10:00:04.900  ← clock drift (-600 ms)

Cluster 1: HLC = [10:00:05.500, L=0]
Cluster 2: HLC = [10:00:04.900, L=0]

→ Hybrid timestamp order:  (1,2) < (1,1)
→ LWW resolves to (1,1)

Result: Both clusters agree on (1,1) ✅  
—but the client’s “last” update (1,2) was lost due to clock skew.

Clock skew can cause hybrid timestamps to reorder client writes. Both clusters converge consistently (LWW), but the final value may not match the client’s perceived last write.

This nuance is subtle but critical… especially when modeling workloads that depend on strict causality across regions.

If application-level order must be preserved, clients or middleware need to provide additional sequencing guarantees (for example, version vectors or per-key write serialization).

8. Summary of Results

In our run, both clusters fired updates within microseconds of each other. Looking at the POST_HLC timestamps:

● Cluster 2 committed at 2025-09-30 21:17:40.256437+00.
● Cluster 1 committed just slightly later at 2025-09-30 21:17:40.257124+00.

That tiny difference, less than a millisecond, was enough to determine the winner.

After replication caught up, both clusters agreed on the final state of the row:

				
					 id  | c1              | mofified_ts
----+-----------------+----------------------------
  1 | Cluster 1 Update| 2025-09-30 21:17:40.254289

Cluster 1’s update survived, because its HLC was the larger of the two. This demonstrates exactly how last-writer-wins (LWW) works in YugabyteDB’s xCluster active-active mode:

● When conflicting writes occur, each is stamped with an HLC.
● The update with the later HLC becomes the canonical value.
● The earlier write is deterministically overwritten.

Have Fun!