Handling NaN in NUMERIC Columns (Postgres vs YugabyteDB)

πŸ’‘ TL;DR

PostgreSQL allows NaN in NUMERIC.

YugabyteDB currently does not.

The best solution is to model β€œNaN” explicitly in your schema, not force it into a numeric column.

🚨 The Problem

You might try something like this:

				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating numeric(8,4)
);

INSERT INTO _test_numeric (rating)
VALUES ('NaN'::numeric(8,4));
				
			

And hit:

				
					ERROR:  DECIMAL does not support NaN yet
				
			

πŸ” Why This Happens

PostgreSQL behavior
  • ● numeric supports:
    • β—‹ NaN

    • β—‹ Infinity

    • β—‹ -Infinity

YugabyteDB behavior
  • ● numeric (aka DECIMAL) currently does NOT support:
    • β—‹ NaN

    • β—‹ Infinity

    • β—‹ -Infinity

πŸ‘‰ This is a known limitation (tracked in GitHub issue #23075)

βš–οΈ Quick Comparison

Feature PostgreSQL YugabyteDB
`numeric` supports NaN βœ… Yes ❌ No
`numeric` supports Infinity βœ… Yes ❌ No
`double precision` supports NaN βœ… Yes βœ… Yes

🧠 Key Insight

In distributed systems, it’s better to model invalid or special states explicitly rather than encode them as magic values.

Instead of trying to store NaN inside a numeric column, represent it as part of your data model.

πŸ”„ What About ETL?

You could solve this during ingestion:

  • ● Convert 'NaN' β†’ NULL

  • ● Transform values before they reach YugabyteDB

  • ● Normalize data across pipelines

βš–οΈ Is ETL a Good Idea?
ETL works… but it moves a data modeling problem outside the database.
πŸ‘ When ETL makes sense
  • βœ… One-time Postgres β†’ YugabyteDB migration

  • βœ…Β You already have a centralized ingestion pipeline (Kafka, Spark, etc.)

  • βœ…Β You want consistent normalization across systems

πŸ‘Ž Why it’s often not ideal
  • ❌ Logic is hidden outside the database

  • ❌ Harder to enforce consistency across services

  • ❌ Can drift over time

  • ❌ Makes debugging harder

βœ… Recommended Workarounds

πŸ₯‡ Option 1: Boolean flag (simple and clean)
				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating numeric(8,4),
    rating_is_nan boolean NOT NULL DEFAULT false,
    CHECK (
        (rating_is_nan = false AND rating IS NOT NULL)
        OR
        (rating_is_nan = true AND rating IS NULL)
    )
);
				
			

Usage:

				
					-- Normal value
INSERT INTO _test_numeric (rating, rating_is_nan)
VALUES (1.2345, false);

-- NaN
INSERT INTO _test_numeric (rating, rating_is_nan)
VALUES (NULL, true);

-- All valid numeric values
SELECT *
FROM _test_numeric
WHERE rating_is_nan = false;

-- All NaN values
SELECT *
FROM _test_numeric
WHERE rating_is_nan = true;

-- Safe aggregation (ignore NaN)
SELECT avg(rating)
FROM _test_numeric
WHERE rating_is_nan = false;

-- Replace NaN with label
SELECT
    id,
    CASE
        WHEN rating_is_nan THEN 'NaN'
        ELSE rating::text
    END AS rating_display
FROM _test_numeric;
				
			
πŸ₯ˆ Option 2: Multi-state column (more expressive)
				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating numeric(8,4),
    rating_state text NOT NULL DEFAULT 'OK',
    CHECK (rating_state IN ('OK','NAN','INF','-INF','MISSING')),
    CHECK (
        (rating_state = 'OK' AND rating IS NOT NULL)
        OR
        (rating_state <> 'OK' AND rating IS NULL)
    )
);
				
			

Usage:

				
					-- Normal value
INSERT INTO _test_numeric (rating, rating_state)
VALUES (1.2345, 'OK');

-- NaN
INSERT INTO _test_numeric (rating, rating_state)
VALUES (NULL, 'NAN');

-- Missing / unknown
INSERT INTO _test_numeric (rating, rating_state)
VALUES (NULL, 'MISSING');

-- Future-proof: Infinity (if you ever normalize upstream)
INSERT INTO _test_numeric (rating, rating_state)
VALUES (NULL, 'INF');

-- Only valid numeric values
SELECT *
FROM _test_numeric
WHERE rating_state = 'OK';

-- All special values (NaN, INF, etc.)
SELECT *
FROM _test_numeric
WHERE rating_state <> 'OK';

-- Count by state
SELECT rating_state, count(*)
FROM _test_numeric
GROUP BY rating_state
ORDER BY rating_state;

-- Safe aggregation
SELECT avg(rating)
FROM _test_numeric
WHERE rating_state = 'OK';

-- Human-readable output
SELECT
    id,
    CASE
        WHEN rating_state = 'OK' THEN rating::text
        ELSE rating_state
    END AS rating_display
FROM _test_numeric;
				
			
πŸ₯‰ Option 3: Raw + parsed (great for ingestion pipelines)
				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating_raw text,
    rating numeric(8,4)
);
				
			

Usage:

				
					INSERT INTO _test_numeric (rating_raw, rating)
VALUES
  ('NaN', NULL),
  ('2.5000', 2.5000),
  ('3.7500', 3.7500);

-- Rows where original value was NaN
SELECT *
FROM _test_numeric
WHERE rating_raw = 'NaN';

-- Successfully parsed numeric values
SELECT *
FROM _test_numeric
WHERE rating IS NOT NULL;

-- Failed / special values
SELECT *
FROM _test_numeric
WHERE rating IS NULL;

-- Compare raw vs parsed
SELECT rating_raw, rating
FROM _test_numeric;

-- Safe aggregation
SELECT avg(rating)
FROM _test_numeric
WHERE rating IS NOT NULL;
				
			

This pattern cleanly separates ingestion from computation.

  • ● rating_raw preserves what arrived
  • ● rating enables safe math
⚑ Option 4: Use double precision (only if acceptable)
				
					CREATE TABLE _test_double (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating double precision
);
				
			

Usage:

				
					-- Insert examples
INSERT INTO _test_double (rating) VALUES (1.23);
INSERT INTO _test_double (rating) VALUES ('NaN'::float8);

-- Find NaN values
SELECT *
FROM _test_double
WHERE rating = 'NaN'::float8;

-- ⚠️ Important: NaN behaves differently!
-- This will NOT return NaN rows:
SELECT *
FROM _test_double
WHERE rating <> rating;

-- Safe aggregation (NaN ignored by filtering)
SELECT avg(rating)
FROM _test_double
WHERE rating = rating;  -- filters out NaN
				
			

⚠️ Trade-off:

  • βœ… Supports NaN

  • ❌ Not exact precision

🏁 Final Takeaway

Don’t force NaN into NUMERIC in YugabyteDB.

Instead:

  • βœ… Model it explicitly in your schema
  • ⚠️ Use ETL only when it already exists
  • 🚫 Avoid fake sentinel values (like -9999)

Have Fun!