YugabyteDB supports row-based TTL in YCQL (Cassandra-compatible), but not (yet) in YSQL (Postgres-compatible). This feature request is tracked in GitHub Issue: #3712.
If you’re writing SQL apps today in YSQL and want old rows to “expire,” you need a strategy that fits a distributed system where large DELETEs can be expensive.
Good news: there are several TTL-like patterns that work great in YSQL. Some hide expired rows; others purge them efficiently without hammering your YugabyteDB cluster.
TL;DR … what to use when
Time-range partitioning + drop/detach → fastest & simplest at scale (ideal for time-series and logs).
Hash-bucketed deletes with
pg_cron→ when you can’t partition (or for backfills), slice work withyb_hash_code(...).“Soft TTL” via RLS or a view → hide old rows instantly; purge later with #1 or #2.
Batch deletes via
ybctid→ tiny, locality-friendly batches that play nicely with compaction.
Option 1: Time-Range Partitioning + Drop/Detach 🚀
Partition by time (daily/weekly/monthly). When data ages out, drop or detach the old partition… this is a near-instant metadata operation (no row-by-row deletes).
1A) DIY with native partitions + pg_cron (step-by-step)
(1) Create the partitioned table
Ensure the PRIMARY KEY includes the partition key (created_at)… required for partitioned PKs.
CREATE TABLE IF NOT EXISTS events (
id BIGSERIAL,
created_at timestamptz NOT NULL,
payload jsonb NOT NULL,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
(2) Create a monthly partition (manually, example)
CREATE TABLE IF NOT EXISTS events_2025_09 PARTITION OF events
FOR VALUES FROM ('2025-09-01') TO ('2025-10-01');
Repeat as needed for each month you want to keep “online.”
3) Rotate out old data: archive (detach) or purge (drop)
-- Archive (detach and move to archive schema)
ALTER TABLE events DETACH PARTITION events_2025_06;
ALTER TABLE events_2025_06 SET SCHEMA archive;
-- Purge (free disk immediately)
DROP TABLE IF EXISTS events_2025_06;
(4) Automate with pg_cron
With pg_cron, you can (a) pre-create next month’s partition and (b) detach/drop anything older than your retention.
Helper to pre-create a monthly partition when missing:
CREATE OR REPLACE PROCEDURE ensure_events_month(p_month date)
LANGUAGE plpgsql AS $$
DECLARE
v_from date := date_trunc('month', p_month)::date;
v_to date := (date_trunc('month', p_month) + interval '1 month')::date;
v_part text := 'events_' || to_char(v_from, 'YYYY_MM');
v_exists boolean;
BEGIN
SELECT EXISTS (
SELECT 1
FROM pg_class c
WHERE c.relname = v_part
) INTO v_exists;
IF NOT v_exists THEN
EXECUTE format(
'CREATE TABLE %I PARTITION OF events FOR VALUES FROM (%L) TO (%L)',
v_part, v_from::timestamptz, v_to::timestamptz
);
END IF;
END $$;
Rotation procedure (calls the helper, then detaches/drops old partitions):
CREATE OR REPLACE PROCEDURE rotate_events(retention_days int DEFAULT 90)
LANGUAGE plpgsql AS $$
DECLARE
next_month date := (date_trunc('month', now()) + interval '1 month')::date;
cut_off timestamptz := now() - make_interval(days => retention_days);
part record;
from_ts timestamptz;
BEGIN
-- Ensure next month’s partition exists
CALL ensure_events_month(next_month);
-- Retire any partition fully older than the cutoff
FOR part IN
SELECT c.relname, pg_get_expr(pt.bound, pt.parent) AS bounds
FROM pg_partitioned_table pt
JOIN pg_class p ON p.oid = pt.partrelid
JOIN pg_inherits i ON i.inhparent = p.oid
JOIN pg_class c ON c.oid = i.inhrelid
WHERE p.relname = 'events'
LOOP
-- Extract the FROM('yyyy-mm-01') bound and determine its month
from_ts := (regexp_match(part.bounds, $$FROM \('([^']+)'\)$$))[1]::timestamptz;
IF from_ts + interval '1 month' < cut_off THEN
-- Choose one:
EXECUTE format('ALTER TABLE events DETACH PARTITION %I', part.relname); -- archive
-- EXECUTE format('DROP TABLE IF EXISTS %I', part.relname); -- purge
END IF;
END LOOP;
END $$;
Schedule it (e.g., monthly on the 1st at 00:05):
SELECT cron.schedule(
'events_rotate_monthly',
'5 0 1 * *',
$$CALL rotate_events(90);$$
);
(5) Inspect what partitions exist
SELECT
c.relname AS partition_name,
pg_get_expr(pt.bound, pt.parent) AS bounds
FROM pg_partitioned_table pt
JOIN pg_class p ON p.oid = pt.partrelid
JOIN pg_inherits i ON i.inhparent = p.oid
JOIN pg_class c ON c.oid = i.inhrelid
WHERE p.relname = 'events'
ORDER BY c.relname;
1B) Or use pg_partman for hands-off management
pg_partman can pre-create partitions and enforce retention automatically.
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_partman;
-- Parent table (same as above)
CREATE TABLE IF NOT EXISTS events (
id BIGSERIAL,
created_at timestamptz NOT NULL,
payload jsonb NOT NULL,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Hand over to pg_partman (monthly partitions on created_at)
SELECT partman.create_parent(
p_parent_table := 'public.events',
p_control := 'created_at',
p_type := 'native',
p_interval := 'monthly',
p_premake := 4, -- pre-create 4 future partitions
p_retention := '3 months' -- drop (or detach) older than 3 months
);
-- Run maintenance periodically to apply retention & premake
SELECT cron.schedule(
'events_partman_maintenance',
'0 * * * *',
$$SELECT partman.run_maintenance();$$
);
-- If your deployment exposes run_maintenance() as a PROCEDURE instead of a FUNCTION:
-- $$CALL partman.run_maintenance();$$
Tip: Prefer DETACH for a warm “archive” tier and DROP when you truly want to reclaim space immediately.
Option 2: Scheduled Deletes with pg_cron + yb_hash_code() ⏱️
When partitioning isn’t feasible, delete in tiny batches and only within a single hash slice per job. This localizes work to a subset of tablets and keeps compaction smooth.
This approach generalizes the “Quicker table row counts” trick: bucket work by yb_hash_code() and walk small ranges.
We’ll use one generic function that:
● Auto-detects the table’s HASH key columns from the PRIMARY KEY (e.g.,
PRIMARY KEY((c1,c2) HASH, c3)→ hashes onlyc1,c2),● or lets you override the columns to hash (
p_hash_cols),● deletes via
ybctidbatches (fast, storage-friendly), and● filters each run to a hash range (
p_lo..p_hi).
-- yb_ttl_delete_bucket: delete up to p_batch_limit expired rows
-- from a single yb_hash_code slice [p_lo, p_hi].
-- Uses only HASH-key columns by default; can be overridden with p_hash_cols.
CREATE OR REPLACE FUNCTION yb_ttl_delete_bucket(
p_schema text, -- e.g. 'public'
p_table text, -- e.g. 'events'
p_ts_col text, -- TTL column, e.g. 'expires_at'
p_retention interval, -- e.g. '0 days' if p_ts_col already encodes expiry
p_lo int, -- inclusive hash lower bound (0..65535)
p_hi int, -- inclusive hash upper bound
p_batch_limit int DEFAULT 5000,
p_hash_cols text[] DEFAULT NULL -- optional override: ARRAY['tenant_id','id']
) RETURNS integer
LANGUAGE plpgsql AS $$
DECLARE
v_sql text;
v_deleted int;
v_hash_arg text; -- "col1,col2,...", quoted
v_pkdef text; -- PRIMARY KEY definition
v_csv text; -- HASH key CSV from PK def
BEGIN
IF p_hash_cols IS NOT NULL AND array_length(p_hash_cols, 1) IS NOT NULL THEN
SELECT string_agg(format('%I', col), ',')
INTO v_hash_arg
FROM unnest(p_hash_cols) AS col;
ELSE
SELECT pg_get_indexdef(i.indexrelid)
INTO v_pkdef
FROM pg_index i
JOIN pg_class c ON c.oid = i.indrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = p_schema
AND c.relname = p_table
AND i.indisprimary;
IF v_pkdef IS NULL THEN
RAISE EXCEPTION 'Table %.% has no PRIMARY KEY', p_schema, p_table;
END IF;
-- Prefer explicit HASH key columns if present: PRIMARY KEY((... ) HASH, ...)
SELECT (regexp_matches(v_pkdef, '\(\(([^)]*)\)\s*HASH', 'i'))[1]
INTO v_csv;
IF v_csv IS NOT NULL THEN
SELECT string_agg(format('%I', btrim(s)), ',')
INTO v_hash_arg
FROM unnest(string_to_array(v_csv, ',')) AS s;
ELSE
-- Fallback: all PK cols in key order (ORDER BY is inside string_agg)
SELECT string_agg(format('%I', a.attname), ',' ORDER BY k.ord)
INTO v_hash_arg
FROM pg_index i
JOIN pg_class c ON c.oid = i.indrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN unnest(i.indkey) WITH ORDINALITY k(attnum, ord) ON TRUE
JOIN pg_attribute a ON a.attrelid = c.oid AND a.attnum = k.attnum
WHERE n.nspname = p_schema
AND c.relname = p_table
AND i.indisprimary;
END IF;
END IF;
IF v_hash_arg IS NULL OR v_hash_arg = '' THEN
RAISE EXCEPTION
'yb_ttl_delete_bucket: no hash columns resolved for %.%. Provide p_hash_cols or define a PRIMARY KEY.',
p_schema, p_table;
END IF;
v_sql := format($f$
WITH doomed AS (
SELECT ybctid
FROM %1$I.%2$I
WHERE %3$I < now() - $1
AND yb_hash_code(%4$s) BETWEEN $2 AND $3
ORDER BY ybctid
LIMIT $4
)
DELETE FROM %1$I.%2$I e
USING doomed d
WHERE e.ybctid = d.ybctid
RETURNING 1
$f$, p_schema, p_table, p_ts_col, v_hash_arg);
EXECUTE v_sql USING p_retention, p_lo, p_hi, p_batch_limit
INTO v_deleted;
RETURN COALESCE(v_deleted, 0);
END $$;
Examples:
Auto-use the HASH key from the PK
-- If table has PRIMARY KEY((c1,c2) HASH, c3), this hashes only (c1,c2).
SELECT yb_ttl_delete_bucket('public','events','created_at',
'90 days', 0, 4095, 5000);
Override columns to hash (multi-tenant):
SELECT yb_ttl_delete_bucket('sales','orders','created_at',
'120 days', 4096, 8191, 4000,
ARRAY['tenant_id','id']);
Schedule 16 tiny jobs with pg_cron
yb_hash_code() returns 0..65535. Sixteen buckets → width 4096 each.
-- Cover all 16 buckets over ~4 minutes with staggered runs (tune as needed)
-- Buckets 0..3
SELECT cron.schedule('ttl_ev_b0', '* * * * *',
$$SELECT yb_ttl_delete_bucket('public','events','created_at','90 days', 0, 4095, 5000);$$);
SELECT cron.schedule('ttl_ev_b1', '*/2 * * * *',
$$SELECT yb_ttl_delete_bucket('public','events','created_at','90 days', 4096, 8191, 5000);$$);
SELECT cron.schedule('ttl_ev_b2', '*/3 * * * *',
$$SELECT yb_ttl_delete_bucket('public','events','created_at','90 days', 8192, 12287, 5000);$$);
SELECT cron.schedule('ttl_ev_b3', '*/4 * * * *',
$$SELECT yb_ttl_delete_bucket('public','events','created_at','90 days', 12288, 16383, 5000);$$);
-- Repeat similarly for buckets 4..15 (or emit via a small generator script)
Tuning tips
● Keep each batch small (
p_batch_limit2k–10k).● Scale by more buckets and more frequent runs, not giant deletes.
● Index the TTL column (
created_at) if it’s selective on large tables.
Option 3: Row-Level Security “Soft TTL” 🔒
Sometimes you just need to hide expired rows instead of deleting them:
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
CREATE POLICY ttl_only_fresh ON events
USING (created_at >= now() - interval '90 days');
Or wrap a simple view around the table:
CREATE VIEW current_events AS
SELECT *
FROM events
WHERE created_at >= now() - interval '90 days';
⚠️ Rows remain on disk until you clean them up later. Great for apps that only need to filter.
Option 4: Batch Deletes via ybctid 🎯
ybctid is Yugabyte’s internal row identifier… perfect for deleting rows in small, ordered batches.
WITH doomed AS (
SELECT ybctid
FROM events
WHERE created_at < now() - interval '90 days'
ORDER BY ybctid
LIMIT 5000
)
DELETE FROM events e
USING doomed d
WHERE e.ybctid = d.ybctid;
You can also combine this with the hash slice idea (Option 2) by adding a yb_hash_code(...) BETWEEN ... filter inside the doomed CTE.
Summary
● Best performance: Time-range partitioning + scheduled drop/detach (pg_cron) or fully managed via pg_partman.
● When partitioning isn’t possible (or for backfills):
yb_hash_code-bucketed deletes with ouryb_ttl_delete_bucket(...)function—tiny, frequent jobs.● Quick to adopt: RLS or views to hide old data immediately; purge later.
● Low-level control:
ybctidbatches for smooth compaction and predictable IO.
⚔️✅ These patterns are battle-tested for distributed SQL. Until native YSQL TTL lands, they’ll keep your clusters lean, your queries fast, and your operations predictable.
Have Fun!
