Row-Based TTL in YSQL (When Your Table Has No Timestamp)

YCQL has a native, per-row TTL; YSQL doesn’t (yet). This feature request is tracked in GitHub Issue: #3712.

You can still emulate row-level TTL in YSQL by storing each row’s expiry somewhere and running small, hash-bucketed delete jobs.

Three practical patterns:

Add an expires_at column (recommended)
Derive expires_at from minimal metadata (e.g., created_at + per-row TTL)
Use a sidecar TTL registry table (when you can’t change the base schema)

All three plug into the same cleanup engine: tiny, frequent deletes filtered by yb_hash_code(...) buckets.

Related reading: YCQL’s native row-TTL is described here.

Companion tip: Simulating Row-Level TTL in YSQL … discusses broader TTL strategies in YSQL including partitions + hash-bucket deletes + ybctid.

Why do this?

● Distributed deletes are expensive; big “sweep” deletes create tombstones and compaction spikes.
● The trick is to delete in tiny batches, one hash slice at a time, so work is localized and smooth.
● If you can partition by time and drop/detach partitions, that’s still king. But when you want row-level TTL (per-row expiry) and your table has no timestamp, use one of the patterns below.

Option A (recommended): Add an `expires_at` column

Adds the most flexibility and mirrors YCQL’s per-row TTL semantics.

				
					-- 1) Add expiry; keep it NULL-safe so you can roll out gradually
ALTER TABLE your_table
  ADD COLUMN IF NOT EXISTS expires_at timestamptz;

-- 2) Default behavior: 90 days from INSERT (choose your policy)
CREATE OR REPLACE FUNCTION set_expires_on_insert()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
  IF NEW.expires_at IS NULL THEN
    NEW.expires_at := now() + interval '90 days';
  END IF;
  RETURN NEW;
END $$;

DROP TRIGGER IF EXISTS trg_set_expires_ins ON your_table;
CREATE TRIGGER trg_set_expires_ins
BEFORE INSERT ON your_table
FOR EACH ROW EXECUTE FUNCTION set_expires_on_insert();

-- (Optional) If you want updates to refresh TTL (YCQL-like):
CREATE OR REPLACE FUNCTION refresh_expires_on_update()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
  -- Only refresh if row TTL is policy-driven, not explicitly set
  IF NEW.expires_at IS NULL OR NEW.expires_at = OLD.expires_at THEN
    NEW.expires_at := now() + interval '90 days';
  END IF;
  RETURN NEW;
END $$;

DROP TRIGGER IF EXISTS trg_refresh_expires_upd ON your_table;
-- Enable this only if you want "update resets TTL"
-- CREATE TRIGGER trg_refresh_expires_upd
-- BEFORE UPDATE ON your_table
-- FOR EACH ROW EXECUTE FUNCTION refresh_expires_on_update();

-- 3) (Optional) Index to accelerate filtering (useful on very large tables)
CREATE INDEX IF NOT EXISTS your_table_expires_at_idx ON your_table (expires_at);

Cleanup: Use the generic hash-bucket delete function below and pass 'expires_at' as the TTL column (retention '0 days', since the column already encodes the expiry).

Option B: Derive `expires_at` from minimal metadata

Keep writes minimal, but still get a true per-row expiry.

				
					-- Add a creation timestamp, defaulted at insert
ALTER TABLE your_table
  ADD COLUMN IF NOT EXISTS created_at timestamptz DEFAULT now();

-- Optional: a per-row integer TTL (seconds); adapt if you prefer days
ALTER TABLE your_table
  ADD COLUMN IF NOT EXISTS ttl_seconds integer DEFAULT 90*24*60*60;

-- Maintain a stored expiry value via a trigger (portable)
ALTER TABLE your_table
  ADD COLUMN IF NOT EXISTS expires_at timestamptz;

CREATE OR REPLACE FUNCTION compute_expires()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
  NEW.expires_at := NEW.created_at + make_interval(secs => COALESCE(NEW.ttl_seconds, 0));
  RETURN NEW;
END $$;

DROP TRIGGER IF EXISTS trg_compute_expires_ins ON your_table;
CREATE TRIGGER trg_compute_expires_ins
BEFORE INSERT ON your_table
FOR EACH ROW EXECUTE FUNCTION compute_expires();

DROP TRIGGER IF EXISTS trg_compute_expires_upd ON your_table;
CREATE TRIGGER trg_compute_expires_upd
BEFORE UPDATE OF created_at, ttl_seconds ON your_table
FOR EACH ROW EXECUTE FUNCTION compute_expires();

From here, cleanup is identical to Option A.

Option C: Sidecar TTL registry (when you can’t change the base schema)

If you must not alter the original table, track expiries in a companion table that shares the base table’s PRIMARY KEY.

				
					-- Base table stays unchanged:
-- CREATE TABLE base_table (..., PRIMARY KEY (...));

CREATE TABLE base_table_ttl (
  -- Mirror the base table's PK columns & types:
  pk1 ... NOT NULL,
  pk2 ... NOT NULL,
  expires_at timestamptz NOT NULL,
  PRIMARY KEY (pk1, pk2),
  FOREIGN KEY (pk1, pk2) REFERENCES base_table(pk1, pk2) ON DELETE CASCADE
);

-- Keep registry updated at INSERT time (extend if you want updates to refresh TTL):
CREATE OR REPLACE FUNCTION ttl_sidecar_on_insert()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
  INSERT INTO base_table_ttl(pk1, pk2, expires_at)
  VALUES (NEW.pk1, NEW.pk2, now() + interval '90 days')
  ON CONFLICT (pk1, pk2) DO UPDATE
    SET expires_at = EXCLUDED.expires_at;
  RETURN NEW;
END $$;

DROP TRIGGER IF EXISTS trg_sidecar_ins ON base_table;
CREATE TRIGGER trg_sidecar_ins
AFTER INSERT ON base_table
FOR EACH ROW EXECUTE FUNCTION ttl_sidecar_on_insert();

There’s a sidecar-aware cleanup function below as well.

Cleanup engine: tiny, hash-bucketed deletes (generic)

YugabyteDB’s yb_hash_code() returns a 16-bit range 0..65535. We slice that into buckets (e.g., 16 buckets of width 4096) and delete only one slice per job so work stays localized. We also delete via ybctid in small batches for storage-friendly scans.

Cleaup Function (auto-detects HASH key columns)

● Uses your explicit p_hash_cols if provided.
● Otherwise reads the PRIMARY KEY definition and extracts only the HASH-partitioned columns (e.g., PRIMARY KEY((c1,c2) HASH, c3) → hashes c1,c2).
● Falls back to all PK columns if the PK has no explicit HASH segment.
● Deletes up to p_batch_limit rows per call from one hash slice.

				
					CREATE OR REPLACE FUNCTION yb_ttl_delete_bucket(
  p_schema        text,           -- e.g. 'public'
  p_table         text,           -- e.g. 'your_table'
  p_ts_col        text,           -- 'expires_at' for Options A/B; for sidecar use the helper below
  p_retention     interval,       -- '0 days' when p_ts_col already encodes expiry
  p_lo            int,            -- inclusive hash lower bound (0..65535)
  p_hi            int,            -- inclusive hash upper bound
  p_batch_limit   int    DEFAULT 5000,
  p_hash_cols     text[] DEFAULT NULL  -- optional override: ARRAY['tenant_id','id']
) RETURNS integer
LANGUAGE plpgsql AS $$
DECLARE
  v_sql         text;
  v_deleted     int;
  v_hash_arg    text;     -- "col1,col2,..."
  v_pkdef       text;
  v_csv         text;
BEGIN
  IF p_hash_cols IS NOT NULL AND array_length(p_hash_cols, 1) IS NOT NULL THEN
    SELECT string_agg(format('%I', col), ',') INTO v_hash_arg
    FROM unnest(p_hash_cols) AS col;
  ELSE
    SELECT pg_get_indexdef(i.indexrelid) INTO v_pkdef
      FROM pg_index i
      JOIN pg_class c ON c.oid = i.indrelid
      JOIN pg_namespace n ON n.oid = c.relnamespace
     WHERE n.nspname = p_schema
       AND c.relname = p_table
       AND i.indisprimary;

    IF v_pkdef IS NULL THEN
      RAISE EXCEPTION 'Table %.% has no PRIMARY KEY', p_schema, p_table;
    END IF;

    -- Extract columns inside the "((... ) HASH" section if present
    SELECT (regexp_matches(v_pkdef, '\(\(([^)]*)\)\s*HASH', 'i'))[1]
      INTO v_csv;

    IF v_csv IS NOT NULL THEN
      SELECT string_agg(format('%I', btrim(s)), ',')
        INTO v_hash_arg
      FROM unnest(string_to_array(v_csv, ',')) AS s;
    ELSE
      -- Fallback: all PK columns (in key order)
      SELECT string_agg(format('%I', a.attname), ',')
        INTO v_hash_arg
      FROM pg_index i
      JOIN pg_class c ON c.oid = i.indrelid
      JOIN pg_namespace n ON n.oid = c.relnamespace
      JOIN unnest(i.indkey) WITH ORDINALITY k(attnum, ord) ON TRUE
      JOIN pg_attribute a ON a.attrelid = c.oid AND a.attnum = k.attnum
      WHERE n.nspname = p_schema
        AND c.relname = p_table
        AND i.indisprimary
      ORDER BY k.ord;
    END IF;
  END IF;

  IF v_hash_arg IS NULL OR v_hash_arg = '' THEN
    RAISE EXCEPTION
      'yb_ttl_delete_bucket: no hash columns resolved for %.%. Provide p_hash_cols or define a PRIMARY KEY.',
      p_schema, p_table;
  END IF;

  v_sql := format($f$
    WITH doomed AS (
      SELECT ybctid
      FROM %1$I.%2$I
      WHERE %3$I < now() - $1
        AND yb_hash_code(%4$s) BETWEEN $2 AND $3
      ORDER BY ybctid
      LIMIT $4
    )
    DELETE FROM %1$I.%2$I e
    USING doomed d
    WHERE e.ybctid = d.ybctid
    RETURNING 1
  $f$, p_schema, p_table, p_ts_col, v_hash_arg);

  EXECUTE v_sql USING p_retention, p_lo, p_hi, p_batch_limit INTO v_deleted;
  RETURN COALESCE(v_deleted, 0);
END $$;

Example (Options A/B): delete one slice where expires_at is in the past.

				
					SELECT yb_ttl_delete_bucket(
  'public','your_table','expires_at',
  '0 days',         -- the column already encodes expiry
  0, 4095,          -- bucket slice (0..65535; width 4096 for 16 buckets)
  5000
);

Sidecar variant (Option C)

This version joins your sidecar registry to find expired rows, then deletes from the base table in tiny hash slices.

				
					CREATE OR REPLACE FUNCTION yb_ttl_delete_bucket_sidecar(
  p_schema_base   text,
  p_table_base    text,
  p_schema_ttl    text,
  p_table_ttl     text,
  p_retention     interval,        -- usually '0 days'
  p_lo            int,             -- inclusive 0..65535
  p_hi            int,             -- inclusive
  p_batch_limit   int DEFAULT 5000,
  p_hash_cols     text[] DEFAULT NULL   -- optional override of hash cols
) RETURNS integer
LANGUAGE plpgsql AS $$
DECLARE
  v_sql     text;
  v_deleted int;
  v_pkdef   text;
  v_csv     text;
  v_hasharg text;   -- "b.col1,b.col2,..."
  v_pklist  text;   -- "pk1,pk2,..."
BEGIN
  -- Resolve hash columns: explicit override wins, else derive from PK (prefer HASH key)
  IF p_hash_cols IS NOT NULL AND array_length(p_hash_cols,1) IS NOT NULL THEN
    SELECT string_agg(format('b.%I', col), ',') INTO v_hasharg
    FROM unnest(p_hash_cols) AS col;
  ELSE
    SELECT pg_get_indexdef(i.indexrelid) INTO v_pkdef
    FROM pg_index i
    JOIN pg_class c ON c.oid = i.indrelid
    JOIN pg_namespace n ON n.oid = c.relnamespace
    WHERE n.nspname = p_schema_base
      AND c.relname = p_table_base
      AND i.indisprimary;

    IF v_pkdef IS NULL THEN
      RAISE EXCEPTION '%.% has no PRIMARY KEY', p_schema_base, p_table_base;
    END IF;

    -- Try explicit HASH key first
    SELECT (regexp_matches(v_pkdef, '\(\(([^)]*)\)\s*HASH', 'i'))[1] INTO v_csv;

    IF v_csv IS NOT NULL THEN
      SELECT string_agg(format('b.%I', btrim(s)), ',') INTO v_hasharg
      FROM unnest(string_to_array(v_csv, ',')) AS s;
    ELSE
      -- Fallback: all PK cols in key order (ORDER BY inside string_agg)
      SELECT string_agg(format('b.%I', a.attname), ',' ORDER BY k.ord) INTO v_hasharg
      FROM pg_index i
      JOIN pg_class c ON c.oid = i.indrelid
      JOIN pg_namespace n ON n.oid = c.relnamespace
      JOIN unnest(i.indkey) WITH ORDINALITY k(attnum, ord) ON TRUE
      JOIN pg_attribute a ON a.attrelid = c.oid AND a.attnum = k.attnum
      WHERE n.nspname = p_schema_base
        AND c.relname = p_table_base
        AND i.indisprimary;
    END IF;
  END IF;

  IF v_hasharg IS NULL OR v_hasharg = '' THEN
    RAISE EXCEPTION 'No hash columns resolved for %.%. Provide p_hash_cols or define a PRIMARY KEY.',
      p_schema_base, p_table_base;
  END IF;

  -- Build USING(pk1,pk2,...) for the equality join (ORDER BY inside string_agg)
  SELECT string_agg(format('%I', a.attname), ',' ORDER BY k.ord) INTO v_pklist
  FROM pg_index i
  JOIN pg_class c ON c.oid = i.indrelid
  JOIN pg_namespace n ON n.oid = c.relnamespace
  JOIN unnest(i.indkey) WITH ORDINALITY k(attnum, ord) ON TRUE
  JOIN pg_attribute a ON a.attrelid = c.oid AND a.attnum = k.attnum
  WHERE n.nspname = p_schema_base
    AND c.relname = p_table_base
    AND i.indisprimary;

  v_sql := format($f$
    WITH doomed AS (
      SELECT b.ybctid
      FROM %1$I.%2$I b
      JOIN %3$I.%4$I t USING (%5$s)          -- USING(pk1, pk2, ...)
      WHERE t.expires_at < now() - $1
        AND yb_hash_code(%6$s) BETWEEN $2 AND $3
      ORDER BY b.ybctid
      LIMIT $4
    )
    DELETE FROM %1$I.%2$I b
    USING doomed d
    WHERE b.ybctid = d.ybctid
    RETURNING 1
  $f$, p_schema_base, p_table_base, p_schema_ttl, p_table_ttl, v_pklist, v_hasharg);

  EXECUTE v_sql USING p_retention, p_lo, p_hi, p_batch_limit INTO v_deleted;
  RETURN COALESCE(v_deleted, 0);
END $$;

Schedule tiny jobs with `pg_cron`

Cover the full hash space with small, staggered tasks. Sixteen buckets → width 4096.

				
					-- Example: four of the sixteen; repeat pattern for b4..b15
SELECT cron.schedule('ttl_tbl_b0', '* * * * *',
$$SELECT yb_ttl_delete_bucket('public','your_table','expires_at','0 days',     0,  4095, 5000);$$);

SELECT cron.schedule('ttl_tbl_b1', '*/2 * * * *',
$$SELECT yb_ttl_delete_bucket('public','your_table','expires_at','0 days',  4096,  8191, 5000);$$);

SELECT cron.schedule('ttl_tbl_b2', '*/3 * * * *',
$$SELECT yb_ttl_delete_bucket('public','your_table','expires_at','0 days',  8192, 12287, 5000);$$);

SELECT cron.schedule('ttl_tbl_b3', '*/4 * * * *',
$$SELECT yb_ttl_delete_bucket('public','your_table','expires_at','0 days', 12288, 16383, 5000);$$);

Tuning tips: keep p_batch_limit small (2k–10k), scale by more buckets / more frequent runs. If you can adopt time-range partitioning later, switch to drop/detach for the fastest expiry.

Soft TTL (hide first, purge later)

If you just need to stop serving expired rows:

				
					-- With Options A/B:
CREATE VIEW your_table_visible AS
SELECT * FROM your_table WHERE expires_at >= now();

-- Or RLS:
ALTER TABLE your_table ENABLE ROW LEVEL SECURITY;
CREATE POLICY keep_fresh_rows ON your_table
  USING (expires_at >= now());

With the sidecar, join the registry in the view/RLS predicate.

One-row smoke test (forces a delete in the right slice)

				
					yugabyte=# \d your_table
                       Table "public.your_table"
   Column   |           Type           | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
 c1         | integer                  |           | not null |
 expires_at | timestamp with time zone |           |          |
Indexes:
    "your_table_pkey" PRIMARY KEY, lsm (c1 HASH)
Triggers:
    trg_set_expires_ins BEFORE INSERT ON your_table FOR EACH ROW EXECUTE FUNCTION set_expires_on_insert()

yugabyte=# -- 1) Insert an expired row
yugabyte=# INSERT INTO your_table VALUES (1, now() - interval '1 hour');
INSERT 0 1

yugabyte=# SELECT * FROM your_table;
 c1 |          expires_at
----+-------------------------------
  1 | 2025-09-16 14:22:59.211378+00
(1 row)

yugabyte=# -- 2) Find its slice
yugabyte=# WITH h AS (
yugabyte(#   SELECT yb_hash_code(c1) AS hc FROM your_table WHERE c1 = 1
yugabyte(# )
yugabyte-# SELECT (hc/4096)::int AS bucket,
yugabyte-#        ( (hc/4096)::int * 4096 ) AS lo,
yugabyte-#        ( (hc/4096)::int * 4096 + 4095 ) AS hi
yugabyte-# FROM h;
 bucket |  lo  |  hi
--------+------+------
      1 | 4096 | 8191
(1 row)

-- 3) Run the delete for that slice (replace :lo/:hi)
yugabyte=# SELECT yb_ttl_delete_bucket('public','your_table','expires_at','0 days', 4096, 8191, 5000);
 yb_ttl_delete_bucket
----------------------
                    1
(1 row)

yugabyte=# SELECT * FROM your_table;
 c1 | expires_at
----+------------
(0 rows)

Wrap-up

● YCQL has native per-row TTL (see: https://docs.yugabyte.com/preview/develop/learn/ttl-data-expiration-ycql/#row-level-ttl).
● In YSQL, emulate it by storing an expiry (column or sidecar) and running tiny, hash-bucketed delete jobs (or go straight to time partitions and drop/detach).
● These approaches are production-ready and keep your cluster healthy while delivering YCQL-style behavior on the SQL side.

Have Fun!