When Generated Columns and Indexes Turn a Simple JSONB UPDATE into a Distributed Write Amplifier

The the previous tip, Update JSONB the Smart Way with jsonb_set() and the || Merge Operator, we looked at two cleaner ways to update JSONB in YSQL:

  • jsonb_set()
  • ● JSONB merge operator: ||

Those patterns are useful because they let your application update only the part of the JSONB document that actually changed.

That is good application design.

But in YugabyteDB, it can also be very important for write performance.

Why?

Because a simple-looking JSONB update may not be simple under the hood, especially when that JSONB column feeds STORED generated columns, and those generated columns are used in secondary indexes.

Generated columns add some work.

Secondary indexes are usually the real multiplier.

The Pattern

Here is a simplified table that uses JSONB as the source of several STORED generated columns.

				
					DROP TABLE IF EXISTS account_profile_demo;

CREATE TABLE account_profile_demo (
    id UUID PRIMARY KEY,
    data JSONB NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),

    account_id TEXT GENERATED ALWAYS AS (data ->> 'account_id') STORED,
    account_status TEXT GENERATED ALWAYS AS (data ->> 'status') STORED,
    account_name TEXT GENERATED ALWAYS AS (data ->> 'name') STORED,
    plan_name TEXT GENERATED ALWAYS AS (data ->> 'plan') STORED
);
				
			

Insert a row:

				
					INSERT INTO account_profile_demo (id, data)
VALUES (
    '11111111-1111-1111-1111-111111111111',
    '{
       "account_id": "A-100",
       "status": "active",
       "name": "Acme Corp",
       "plan": "basic",
       "notes": "original notes"
     }'
);
				
			

This gives you JSONB flexibility, while also exposing selected JSON values as normal SQL columns.

For example:

				
					SELECT
    id,
    account_id,
    account_status,
    account_name,
    plan_name
FROM account_profile_demo;
				
			

Result:

				
					.                 id                  | account_id | account_status | account_name | plan_name
--------------------------------------+------------+----------------+--------------+-----------
 11111111-1111-1111-1111-111111111111 | A-100      | active         | Acme Corp    | basic
				
			

So far, so good.

Generated columns can be a nice bridge between flexible JSONB storage and normal SQL queries.

Now Add Secondary Indexes

The generated columns become even more useful when they are indexed.

				
					CREATE INDEX account_profile_demo_status_idx
ON account_profile_demo (account_status);

CREATE INDEX account_profile_demo_account_id_idx
ON account_profile_demo (account_id);

CREATE INDEX account_profile_demo_name_idx
ON account_profile_demo (account_name);

CREATE INDEX account_profile_demo_status_plan_idx
ON account_profile_demo (account_status, plan_name);
				
			

These indexes can make read queries faster:

				
					SELECT id, account_name, plan_name
FROM account_profile_demo
WHERE account_status = 'active';
				
			

or…

				
					SELECT id, account_status
FROM account_profile_demo
WHERE account_id = 'A-100';
				
			

That is the read-side benefit.

But every secondary index also adds write-side responsibility.

The Simple-Looking UPDATE

Now consider this update pattern:

				
					UPDATE account_profile_demo
SET updated_at = now(),
    data = '{
       "account_id": "A-100",
       "status": "inactive",
       "name": "Acme Corp",
       "plan": "basic",
       "notes": "changed notes"
     }'::jsonb
WHERE id = '11111111-1111-1111-1111-111111111111';
				
			

At first glance, this looks like a single-row update by primary key.

But it is doing more than just replacing a JSONB value.

Because the generated columns are STORED, YugabyteDB has to calculate the generated column values from the new JSONB document. And because those generated columns are indexed, YugabyteDB also has to keep the secondary indexes transactionally consistent with the base table.

That is where the write amplification comes from.

Generated Columns Are Part of the Cost

A STORED generated column is calculated during writes and physically stored with the row.

For example:

				
					account_status TEXT GENERATED ALWAYS AS (data ->> 'status') STORED
				
			

Whenever the data column changes, YugabyteDB has to evaluate the expression again.

So this JSONB update:

				
					SET data = ...
				
			

also means YugabyteDB must recalculate:

				
					account_id
account_status
account_name
plan_name
				
			

That adds CPU work. But for a single row, recalculating a few generated expressions is usually not the biggest issue.

The bigger issue appears when those generated columns are indexed.

Indexes Are Usually the Real Culprit

When an indexed value changes, the database cannot simply overwrite the index entry. It has to remove the old index entry and insert the new one.

Otherwise, the old index value could still point to a row that no longer matches.

That means an update to an indexed generated column may require this sequence:

				
					Read the old row
  ↓
Find the old indexed values
  ↓
Write delete markers for old secondary index entries
  ↓
Write new secondary index entries
  ↓
Write the updated base row
				
			

Because YugabyteDB uses an LSM-based storage engine, those old index entries are not necessarily removed from disk immediately.

They are logically deleted first, typically using tombstones/delete markers, and are physically cleaned up later during compaction.

In a single-node database, that work is local.

In YugabyteDB, secondary indexes are distributed objects. The base table and index tablets may live on different nodes. So maintaining those indexes can require distributed coordination.

That is why a simple-looking update can become much more expensive than expected.

Key Insight

Generated columns add work because they must be recalculated. Secondary indexes multiply the work because every changed indexed value must be maintained transactionally.

Why Full JSONB Replacement Can Be Expensive

When the application sends the entire JSONB document back to the database:

				
					UPDATE account_profile_demo
SET data = $1::jsonb
WHERE id = $2::uuid;
				
			

…YugabyteDB receives a new JSONB value. If that JSONB column feeds multiple generated columns, and those generated columns feed multiple indexes, the database has to determine what changed and maintain the affected indexes.

The more generated columns and indexes involved, the larger the write multiplier can become.

Demo: See the Write Amplification

Create a small demo table.

				
					DROP TABLE IF EXISTS json_write_amp_demo;

CREATE TABLE json_write_amp_demo (
    id UUID PRIMARY KEY,
    data JSONB NOT NULL,

    key1 TEXT GENERATED ALWAYS AS (data ->> 'key1') STORED,
    key2 TEXT GENERATED ALWAYS AS (data ->> 'key2') STORED,
    key3 TEXT GENERATED ALWAYS AS (data ->> 'key3') STORED,
    key4 TEXT GENERATED ALWAYS AS (data ->> 'key4') STORED
);
				
			

Insert one row:

				
					INSERT INTO json_write_amp_demo (id, data)
VALUES (
    '11111111-1111-1111-1111-111111111111',
    '{
       "key1": "value1",
       "key2": "value2",
       "key3": "value3",
       "key4": "value4"
     }'
);
				
			

Now add indexes on two generated columns:

				
					CREATE INDEX json_write_amp_demo_key1_idx
ON json_write_amp_demo (key1);

CREATE INDEX json_write_amp_demo_key4_idx
ON json_write_amp_demo (key4);
				
			

Now run an update that changes the indexed JSONB keys:

				
					EXPLAIN (ANALYZE, DIST)
UPDATE json_write_amp_demo
SET data = '{
   "key1": "new_value1",
   "key2": "value2",
   "key3": "value3",
   "key4": "new_value4"
 }'::jsonb
WHERE id = '11111111-1111-1111-1111-111111111111';
				
			

The plan shows:

				
					Storage Table Read Requests: 1
Storage Table Write Requests: 1
Storage Index Write Requests: 4
Storage Write Requests: 5
				
			

That is the write amplification. This was one logical row update by primary key.

But because the update changed two generated columns that participate in secondary indexes, YugabyteDB had to maintain those index entries too.

Conceptually, the work looks like this:

				
					1 write  -> updated base table row

2 writes -> delete old key1 index entry
            insert new key1 index entry

2 writes -> delete old key4 index entry
            insert new key4 index entry
				
			

So one logical update became five storage write requests.

Key Insight

This update touched one row, but it changed two indexed generated-column values. YugabyteDB therefore wrote the base table row and maintained both secondary indexes, resulting in five total storage write requests.

What Changes with a Targeted JSONB Update?

Now let’s compare the full JSONB replacement with targeted JSONB updates.

In the previous example, the full JSONB replacement changed key1 and key4.

Both of those JSONB keys feed generated columns that are indexed.

That update produced:

				
					Storage Table Read Requests: 1
Storage Table Write Requests: 1
Storage Index Write Requests: 4
Storage Write Requests: 5
				
			

That is the write amplification.

One logical row update became five storage write requests because YugabyteDB had to update the base table row and maintain two secondary indexes.

Now update only an unindexed key.

In this example, key2 is a generated column, but it is not indexed.

				
					EXPLAIN (ANALYZE, DIST)
UPDATE json_write_amp_demo
SET data = jsonb_set(data, '{key2}', '"safe_update"', true)
WHERE id = '11111111-1111-1111-1111-111111111111';
				
			

The plan shows:

				
					Storage Table Read Requests: 1
Storage Table Write Requests: 1
Storage Write Requests: 1
				
			

Notice what is missing:

				
					Storage Index Write Requests
				
			

That is the key point.

This update changed the JSONB document, and it changed the generated value for key2. But key2 is not part of a secondary index. The secondary indexes are on key1 and key4, and neither of those indexed values changed.

So YugabyteDB only had to write the updated base table row. No secondary index entries had to be marked deleted or inserted.

You can see the same behavior with the JSONB merge operator.

Now update only key3:

				
					EXPLAIN (ANALYZE, DIST)
UPDATE json_write_amp_demo
SET data = data || '{"key3": "merged_update"}'::jsonb
WHERE id = '11111111-1111-1111-1111-111111111111';
				
			

The plan again shows:

				
					Storage Table Read Requests: 1
Storage Table Write Requests: 1
Storage Write Requests: 1
				
			

And again, there are no secondary index write requests.

This update changed the JSONB document and the generated value for key3. But key3 is not indexed.
The indexed generated values are key1 and key4, and neither changed.

So YugabyteDB only had to write the updated base table row. No secondary index entries had to be marked deleted or inserted.

That is the optimization opportunity.

Targeted JSONB updates do not magically make every update cheap. If you update a JSONB key that feeds an indexed generated column, the secondary index still has to be maintained.

But if you update JSONB keys that do not affect indexed generated columns, targeted updates can avoid unnecessary secondary index churn.

Key Insight

Both jsonb_set() and the JSONB merge operator still update the JSONB document. The benefit is that when the changed JSONB keys do not affect indexed generated columns, YugabyteDB can avoid secondary index maintenance and perform only the base table write.

Demo Comparison

Update Changed JSONB Keys Indexed Generated Values Changed? Storage Index Write Requests Total Storage Write Requests
Full JSONB replacement key1, key4 Yes 4 5
jsonb_set() targeted update key2 No 0 1
data || ... merge update key3 No 0 1

The difference is clear.

The full replacement changed indexed generated values, so YugabyteDB had to perform secondary index maintenance. The targeted updates changed only unindexed generated values, so YugabyteDB avoided secondary index writes. 

That is why targeted JSONB updates can matter so much on write-heavy tables with generated columns and secondary indexes.

Final Takeaway

JSONB gives you flexibility.

Generated columns make important JSONB values easier to query.

Secondary indexes make those generated columns fast to search.

But together, they can also make writes more expensive.

In the demo, one full JSONB replacement changed two indexed generated-column values: key1 and key4.

That single-row update required:

  • ● 1 base table write
  • ● 4 secondary index writes
  • ● 5 total storage write requests

The targeted updates were different.

Both jsonb_set() and the JSONB merge operator changed the JSONB document. But they changed only unindexed generated values: key2 and key3. Because the indexed values did not change, YugabyteDB avoided secondary index writes and performed only one storage write request.

That is the main lesson.

Targeted JSONB updates do not make every update cheap. If you change a JSONB key that feeds an indexed generated column, the secondary index still has to be maintained.

But when your application only changes unindexed JSONB fields, avoid replacing the entire document. Use jsonb_set() or || so YugabyteDB can avoid unnecessary index churn.

Key Takeaway

In YugabyteDB, the expensive part of a JSONB update is often not the JSONB column itself. It is the generated columns and secondary indexes connected to it. Update only the JSONB keys that actually changed, and YugabyteDB may be able to avoid unnecessary distributed index writes.

Have Fun!

My wife and I had the opportunity to visit the USS Arizona Memorial at Pearl Harbor. Standing before the list of names was deeply humbling... a quiet reminder of the lives lost, the families forever changed, and the sacrifice that should never be forgotten. A truly sobering experience.