Handling NaN in NUMERIC Columns (Postgres vs YugabyteDB)

💡 TL;DR

PostgreSQL allows NaN in NUMERIC.

YugabyteDB currently does not.

The best solution is to model “NaN” explicitly in your schema, not force it into a numeric column.

🚨 The Problem

You might try something like this:

				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating numeric(8,4)
);

INSERT INTO _test_numeric (rating)
VALUES ('NaN'::numeric(8,4));

And hit:

				
					ERROR:  DECIMAL does not support NaN yet

🔍 Why This Happens

PostgreSQL behavior

● numeric supports:
- ○ NaN
- ○ Infinity
- ○ -Infinity

YugabyteDB behavior

● numeric (aka DECIMAL) currently does NOT support:
- ○ NaN
- ○ Infinity
- ○ -Infinity

👉 This is a known limitation (tracked in GitHub issue #23075)

⚖️ Quick Comparison

Feature	PostgreSQL	YugabyteDB
`numeric` supports NaN	✅ Yes	❌ No
`numeric` supports Infinity	✅ Yes	❌ No
`double precision` supports NaN	✅ Yes	✅ Yes

🧠 Key Insight

In distributed systems, it’s better to model invalid or special states explicitly rather than encode them as magic values.

Instead of trying to store NaN inside a numeric column, represent it as part of your data model.

🔄 What About ETL?

You could solve this during ingestion:

● Convert 'NaN' → NULL
● Transform values before they reach YugabyteDB
● Normalize data across pipelines

⚖️ Is ETL a Good Idea?

ETL works… but it moves a data modeling problem outside the database.

👍 When ETL makes sense

✅ One-time Postgres → YugabyteDB migration
✅ You already have a centralized ingestion pipeline (Kafka, Spark, etc.)
✅ You want consistent normalization across systems

👎 Why it’s often not ideal

❌ Logic is hidden outside the database
❌ Harder to enforce consistency across services
❌ Can drift over time
❌ Makes debugging harder

✅ Recommended Workarounds

🥇 Option 1: Boolean flag (simple and clean)

				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating numeric(8,4),
    rating_is_nan boolean NOT NULL DEFAULT false,
    CHECK (
        (rating_is_nan = false AND rating IS NOT NULL)
        OR
        (rating_is_nan = true AND rating IS NULL)
    )
);

Usage:

				
					-- Normal value
INSERT INTO _test_numeric (rating, rating_is_nan)
VALUES (1.2345, false);

-- NaN
INSERT INTO _test_numeric (rating, rating_is_nan)
VALUES (NULL, true);

-- All valid numeric values
SELECT *
FROM _test_numeric
WHERE rating_is_nan = false;

-- All NaN values
SELECT *
FROM _test_numeric
WHERE rating_is_nan = true;

-- Safe aggregation (ignore NaN)
SELECT avg(rating)
FROM _test_numeric
WHERE rating_is_nan = false;

-- Replace NaN with label
SELECT
    id,
    CASE
        WHEN rating_is_nan THEN 'NaN'
        ELSE rating::text
    END AS rating_display
FROM _test_numeric;

🥈 Option 2: Multi-state column (more expressive)

				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating numeric(8,4),
    rating_state text NOT NULL DEFAULT 'OK',
    CHECK (rating_state IN ('OK','NAN','INF','-INF','MISSING')),
    CHECK (
        (rating_state = 'OK' AND rating IS NOT NULL)
        OR
        (rating_state <> 'OK' AND rating IS NULL)
    )
);

Usage:

				
					-- Normal value
INSERT INTO _test_numeric (rating, rating_state)
VALUES (1.2345, 'OK');

-- NaN
INSERT INTO _test_numeric (rating, rating_state)
VALUES (NULL, 'NAN');

-- Missing / unknown
INSERT INTO _test_numeric (rating, rating_state)
VALUES (NULL, 'MISSING');

-- Future-proof: Infinity (if you ever normalize upstream)
INSERT INTO _test_numeric (rating, rating_state)
VALUES (NULL, 'INF');

-- Only valid numeric values
SELECT *
FROM _test_numeric
WHERE rating_state = 'OK';

-- All special values (NaN, INF, etc.)
SELECT *
FROM _test_numeric
WHERE rating_state <> 'OK';

-- Count by state
SELECT rating_state, count(*)
FROM _test_numeric
GROUP BY rating_state
ORDER BY rating_state;

-- Safe aggregation
SELECT avg(rating)
FROM _test_numeric
WHERE rating_state = 'OK';

-- Human-readable output
SELECT
    id,
    CASE
        WHEN rating_state = 'OK' THEN rating::text
        ELSE rating_state
    END AS rating_display
FROM _test_numeric;

🥉 Option 3: Raw + parsed (great for ingestion pipelines)

				
					CREATE TABLE _test_numeric (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating_raw text,
    rating numeric(8,4)
);

Usage:

				
					INSERT INTO _test_numeric (rating_raw, rating)
VALUES
  ('NaN', NULL),
  ('2.5000', 2.5000),
  ('3.7500', 3.7500);

-- Rows where original value was NaN
SELECT *
FROM _test_numeric
WHERE rating_raw = 'NaN';

-- Successfully parsed numeric values
SELECT *
FROM _test_numeric
WHERE rating IS NOT NULL;

-- Failed / special values
SELECT *
FROM _test_numeric
WHERE rating IS NULL;

-- Compare raw vs parsed
SELECT rating_raw, rating
FROM _test_numeric;

-- Safe aggregation
SELECT avg(rating)
FROM _test_numeric
WHERE rating IS NOT NULL;

This pattern cleanly separates ingestion from computation.

● rating_raw preserves what arrived
● rating enables safe math

⚡ Option 4: Use `double precision` (only if acceptable)

				
					CREATE TABLE _test_double (
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    rating double precision
);

Usage:

				
					-- Insert examples
INSERT INTO _test_double (rating) VALUES (1.23);
INSERT INTO _test_double (rating) VALUES ('NaN'::float8);

-- Find NaN values
SELECT *
FROM _test_double
WHERE rating = 'NaN'::float8;

-- ⚠️ Important: NaN behaves differently!
-- This will NOT return NaN rows:
SELECT *
FROM _test_double
WHERE rating <> rating;

-- Safe aggregation (NaN ignored by filtering)
SELECT avg(rating)
FROM _test_double
WHERE rating = rating;  -- filters out NaN

⚠️ Trade-off:

✅ Supports NaN
❌ Not exact precision

🏁 Final Takeaway

Don’t force NaN into NUMERIC in YugabyteDB.

Instead:

✅ Model it explicitly in your schema
⚠️ Use ETL only when it already exists
🚫 Avoid fake sentinel values (like -9999)

Have Fun!