Tuning work_mem in YugabyteDB

When it comes to query performance in PostgreSQL-compatible databases like YugabyteDB, one of the most impactful (and often overlooked) settings is work_mem.

By default, YugabyteDB sets work_mem to 4MB, which is the same as PostgreSQL’s default. But depending on your workload, this default might be far too low, especially for analytical queries involving sorts, joins, or aggregates over large datasets.

What Is Work Mem?

work_mem defines how much memory is allocated per sort, hash, or aggregation operation per query. If an operation exceeds this limit, PostgreSQL (and YugabyteDB) spills to disk using temporary files, significantly slowing down performance.

In queries with multiple operations (e.g., joins with multiple hash tables or subqueries), each operation gets its own allocation.

How to Know if Work Mem Is Too Low

Here are several common signs:

  1. Queries are slow even with small result sets.

  2. You’re seeing a lot of temporary file creation in logs.

  3. You observe disk I/O spikes during large queries.

  4. EXPLAIN (ANALYZE, BUFFERS) shows:

				
					Sort Method: external merge  Disk: 10240kB
Hash Method: external
				
			

You can also monitor temp usage via:

				
					SELECT query, temp_bytes
  FROM pg_stat_statements
 ORDER BY temp_bytes DESC
 LIMIT 10;
				
			
Work Mem Too Low Example

Create and populate a table:

				
					CREATE TABLE big_table (
    id SERIAL PRIMARY KEY,
    data TEXT,
    val INTEGER
);

-- Insert 100,000 rows
INSERT INTO big_table (data, val)
SELECT md5(random()::text),
       (random() * 100000)::int
  FROM generate_series(1, 100000);
				
			

With the default value for work_mem, run a sort-heavy query:

				
					yugabyte=# SHOW work_mem;
 work_mem
----------
 4MB
(1 row)

yugabyte=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM big_table ORDER BY val;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1935.27..1937.77 rows=1000 width=40) (actual time=114.833..124.400 rows=100000 loops=1)
   Sort Key: val
   Sort Method: external merge  Disk: 5296kB
   Buffers: temp read=662 written=664
   ->  Seq Scan on big_table  (cost=180.00..1885.44 rows=1000 width=40) (actual time=1.373..65.173 rows=100000 loops=1)
 Planning Time: 0.060 ms
 Execution Time: 133.534 ms
 Peak Memory Usage: 5312 kB
(8 rows)
				
			

That Sort Method: external merge Disk: line tells you YugabyteDB couldn’t fit the sort in memory, so it spilled to disk.

If we raise work_mem, YugabyteDB may use an in-memory sort method:

				
					yugabyte=# SET work_mem = '10MB';
SET

yugabyte=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM big_table ORDER BY val;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1935.27..1937.77 rows=1000 width=40) (actual time=96.199..111.571 rows=100000 loops=1)
   Sort Key: val
   Sort Method: quicksort  Memory: 10176kB
   ->  Seq Scan on big_table  (cost=180.00..1885.44 rows=1000 width=40) (actual time=1.260..58.354 rows=100000 loops=1)
 Planning Time: 0.056 ms
 Execution Time: 120.150 ms
 Peak Memory Usage: 10695 kB
(7 rows)
				
			

The Sort Method: quicksort  Memory: confirms that the query now completes entirely in memory.

How to Tune Work Mem
  1. Start small and increase cautiously, especially in multi-user or high-concurrency environments.
  2. Use EXPLAIN (ANALYZE, BUFFERS) to check whether the sort is happening in memory or spilling to disk. Keep increasing work_mem until the sort method changes from external to quicksort (or similar).
  3. Set work_mem for the current session, globally via the ysql_pg_conf_csv TServer gFLag, for a database, for a role, or even at the query level.

Example (session level):

Set up:

				
					CREATE TABLE big_a (
    id INTEGER,
    payload TEXT
);

CREATE TABLE big_b (
    id INTEGER,
    info TEXT
);

-- Insert 500,000 rows into each table
INSERT INTO big_a
SELECT i, md5(random()::text)
FROM generate_series(1, 500000) i;

INSERT INTO big_b
SELECT i, md5(random()::text)
FROM generate_series(1, 500000) i;

EXPLAIN (ANALYZE, BUFFERS)
  SELECT a.payload, b.info
    FROM big_a a
  JOIN big_b b ON a.id = b.id;
  
SET work_mem = '65kB';
				
			

Note: For demonstration purposes, work_mem is deliberately set to an artificially low value to force the use of temporary files during query execution.

				
					yugabyte=# EXPLAIN (ANALYZE, BUFFERS)
yugabyte-#   SELECT a.payload, b.info
yugabyte-#     FROM big_a a
yugabyte-#     JOIN big_b b ON a.id = b.id;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3717.95..3797.95 rows=5000 width=64) (actual time=1500.308..1815.907 rows=500000 loops=1)
   Merge Cond: (a.id = b.id)
   Buffers: temp read=34392 written=36286
   ->  Sort  (cost=1858.97..1861.47 rows=1000 width=36) (actual time=700.157..764.158 rows=500000 loops=1)
         Sort Key: a.id
         Sort Method: external merge  Disk: 24496kB
         Buffers: temp read=14947 written=15894
         ->  Seq Scan on big_a a  (cost=180.00..1809.15 rows=1000 width=36) (actual time=1.623..222.854 rows=500000 loops=1)
   ->  Sort  (cost=1858.97..1861.47 rows=1000 width=36) (actual time=800.135..854.408 rows=500000 loops=1)
         Sort Key: b.id
         Sort Method: external sort  Disk: 26448kB
         Buffers: temp read=19445 written=20392
         ->  Seq Scan on big_b b  (cost=180.00..1809.15 rows=1000 width=36) (actual time=1.342..210.705 rows=500000 loops=1)
 Planning Time: 0.110 ms
 Execution Time: 1867.724 ms
 Peak Memory Usage: 1491 kB
(16 rows)
				
			

Hash Method: external means YugabyteDB couldn’t fit the hash table in memory and had to spill to disk.

What happens if we set work_mem back to the default value?

				
					yugabyte=# SET work_mem=DEFAULT;
SET

yugabyte=# EXPLAIN (ANALYZE, BUFFERS)
yugabyte-#   SELECT a.payload, b.info
yugabyte-#     FROM big_a a
yugabyte-#   JOIN big_b b ON a.id = b.id;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2001.65..3808.29 rows=5000 width=64) (actual time=508.341..1061.451 rows=500000 loops=1)
   Hash Cond: (a.id = b.id)
   Buffers: temp read=8509 written=8509
   ->  Seq Scan on big_a a  (cost=180.00..1809.15 rows=1000 width=36) (actual time=1.539..226.256 rows=500000 loops=1)
   ->  Hash  (cost=1809.15..1809.15 rows=1000 width=36) (actual time=506.664..506.664 rows=500000 loops=1)
         Buckets: 65536 (originally 1024)  Batches: 16 (originally 1)  Memory Usage: 3585kB
         Buffers: temp written=3426
         ->  Seq Scan on big_b b  (cost=180.00..1809.15 rows=1000 width=36) (actual time=1.790..345.395 rows=500000 loops=1)
 Planning Time: 0.100 ms
 Execution Time: 1103.742 ms
 Peak Memory Usage: 5849 kB
(11 rows)
				
			

Now we see the following:

				
					Buffers: temp read=8509 written=8509
				
			

This line tells us how much temporary disk I/O happened during query execution due to insufficient memory (i.e., work_mem was too low to keep everything in memory).

Since each “buffer” typically equals 8KB, we see 8509 × 8KB ≈ 66MB of temp disk I/O.

Let’s increase work_mem to match, and retry running the query:

				
					yugabyte=# SET work_mem = '64MB';
SET

yugabyte=# EXPLAIN (ANALYZE, BUFFERS)
yugabyte-#   SELECT a.payload, b.info
yugabyte-#     FROM big_a a
yugabyte-#   JOIN big_b b ON a.id = b.id;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2001.65..3808.29 rows=5000 width=64) (actual time=568.339..1027.787 rows=500000 loops=1)
   Hash Cond: (a.id = b.id)
   ->  Seq Scan on big_a a  (cost=180.00..1809.15 rows=1000 width=36) (actual time=2.055..135.379 rows=500000 loops=1)
   ->  Hash  (cost=1809.15..1809.15 rows=1000 width=36) (actual time=566.253..566.253 rows=500000 loops=1)
         Buckets: 524288 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 39253kB
         ->  Seq Scan on big_b b  (cost=180.00..1809.15 rows=1000 width=36) (actual time=1.351..428.815 rows=500000 loops=1)
 Planning Time: 0.095 ms
 Execution Time: 1076.926 ms
 Peak Memory Usage: 48701 kB
(9 rows)
				
			

The Buffers: temp message goes away and the join no longer uses batching! It now completely fits into memory!

Best Practices when Tuning Work Mem
  1. For transactional workloads, stick with conservative values (e.g. 4–16MB).

  2. For reporting/ETL/analytical queries, consider boosting work_mem session-wise (e.g. 64–256MB). (Keep in mind that YugabyteDB is primarily optimized for OLTP workloads, so analytical queries may not perform as well compared to systems designed specifically for OLAP.)

  3. Monitor temp file usage and disk I/O to find the sweet spot.

  4. Avoid globally high work_mem on busy clusters — it adds up fast if many queries run concurrently.

Tuning work_mem can dramatically improve performance for memory-intensive operations in YugabyteDB, especially when dealing with large joins, sorts, or aggregations. The default 4MB works fine for OLTP workloads, but distributed SQL at scale often demands more headroom.

Start small, measure, and iterate. YugabyteDB gives you many of the same powerful knobs as PostgreSQL, use them wisely.

Have Fun!