Efficient CIDR Range Lookups Using Expression Indexes

🚀 Introduction

YugabyteDB fully supports PostgreSQL’s inet and cidr datatypes, allowing you to store and manipulate IPv4 and IPv6 network addresses. However, because the sort order for these types doesn’t easily map to a byte-comparable canonical format used internally by DocDB (the underlying document storage engine of YugabyteDB), indexing directly on inet or cidr columns isn’t yet supported.

Work is planned to enable this via custom comparators in the storage layer, but until that arrives, you can get efficient “which range contains this IP?” lookups by:

● Normalizing IPv4/IPv6 to a 16-byte (bytea) representation (IPv6; IPv4 as IPv6-mapped).
● Materializing the start and end of each CIDR in generated columns.
● Creating a secondary index on the start key, and including the end key so the storage layer can filter at the index before touching the base table.
● Issuing a query that uses the index to reverse-seek from the target IP and stop at the first match.

This works especially well when your CIDR ranges are non-overlapping.

🧪 Demo: Generated Columns + Index for Range Containment

Below is a full, working example you can paste into YugabyteDB. It uses generated columns to store bytea(16) keys and an index that orders by start address and includes end address, enabling efficient index-level filtering.

				
					DROP TABLE IF EXISTS network_data;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

--
-- Helper function to normalize IPv4 and IPv6 addresses to a 16-byte (128-bit)
-- format bytea (binary format). This function is marked IMMUTABLE so that it
-- can be used in an expression based index or as a generate column expression.
-- expression.
--
-- Explanation:
--     * network(cidr) → returns the start address of the CIDR.
--     * host(...) → returns the textual representation (e.g., '192.168.1.0').
--     * For IPv4, we prefix with '::ffff:' to make it an IPv6-mapped address.
--     * inet_send() → converts the inet to its raw binary (bytea) representation.
--
CREATE OR REPLACE FUNCTION inet_to_byte16(i inet) RETURNS bytea
 LANGUAGE sql IMMUTABLE RETURNS NULL ON NULL INPUT
AS $$
  SELECT CASE
     WHEN family(i) = 4 THEN
         inet_send(inet('::ffff:' || host(i)))
     ELSE
         inet_send(i)
    END;
$$;

-- Get the start address of a CIDR range and convert it to a 16 byte binary value
CREATE OR REPLACE FUNCTION cidr_start_addr_16(c cidr) RETURNS bytea
 LANGUAGE sql IMMUTABLE RETURNS NULL ON NULL INPUT
AS $$
  SELECT inet_to_byte16(network(c));
$$;

-- Get the end address of a CIDR range and convert it to a 16 byte binary value
CREATE OR REPLACE FUNCTION cidr_end_addr_16(c cidr) RETURNS bytea
 LANGUAGE sql IMMUTABLE RETURNS NULL ON NULL INPUT
AS $$
  SELECT inet_to_byte16(broadcast(c));
$$;

CREATE TABLE network_data (
      id  uuid PRIMARY KEY DEFAULT gen_random_uuid(),
      cidr_col CIDR NOT NULL,
      cidr_start_addr bytea GENERATED ALWAYS AS (cidr_start_addr_16(cidr_col)) STORED,
      cidr_end_addr  bytea GENERATED ALWAYS AS (cidr_end_addr_16(cidr_col)) STORED,
      payload TEXT);

--
-- To implement efficient queries for looking up the the relevant row
-- for a target_ip (i.e. which row has the CIDR range for a IP we
-- want to lookup) we will use an index on the start of the CIDR range.
-- This works fine if the CIDR ranges in the table are non-overlapping.
-- We will include the cidr_end_addr also in the index so that both
-- conditions can be checked and filtered before going to main table
-- for the other columns of the matching row.
--
CREATE INDEX cidr_start_index ON network_data(cidr_start_addr ASC) INCLUDE (cidr_end_addr);

-- Let us load some sample data
INSERT INTO network_data(cidr_col, payload)
VALUES ('10.0.0.0/8', 'row-1'),
       ('172.16.0.0/12', 'row-2'),
       ('192.168.1.0/24', 'row-3'),
       ('198.51.100.0/29', 'row-4'),
       ('127.0.0.0/8', 'row-5'),
       ('2001:db8::/32', 'row-6'),
       ('fe80::/10', 'row-7'),
       ('2607:f8b0:4000::/48', 'row-8'),
       ('::1/128', 'row-9'),
       ('194.155.2.0/24', 'row-10');

-- Verify the test data
SELECT cidr_col,
       network(cidr_col) as ip_start,
       broadcast(cidr_col) as ip_end,
       cidr_start_addr,
       cidr_end_addr
FROM network_data LIMIT 10;

-- Sample query
SELECT *
FROM network_data
WHERE cidr_start_addr <= inet_to_byte16('194.155.2.7')
 AND  cidr_end_addr >= inet_to_byte16('194.155.2.7')
ORDER by cidr_start_addr DESC LIMIT 1;

Example output:

				
					yugabyte=# SELECT *
yugabyte-# FROM network_data
yugabyte-# WHERE cidr_start_addr <= inet_to_byte16('194.155.2.7')
yugabyte-#  AND  cidr_end_addr >= inet_to_byte16('194.155.2.7')
yugabyte-# ORDER by cidr_start_addr DESC LIMIT 1;
                  id                  |    cidr_col    |              cidr_start_addr               |
        cidr_end_addr                | payload
--------------------------------------+----------------+--------------------------------------------+--------------------------------------------+---------
 9f26bde1-4f90-4e69-a9f9-bd0613a285fb | 194.155.2.0/24 | \x0380001000000000000000000000ffffc29b0200 | \x0380001000000000000000000000ffffc29b02ff | row-10
(1 row)

📊 Detailed execution (storage-level goodness)

				
					EXPLAIN (ANALYZE, DIST, DEBUG)
SELECT *
FROM network_data
WHERE cidr_start_addr <= inet_to_byte16('194.155.2.7')
 AND  cidr_end_addr >= inet_to_byte16('194.155.2.7')
ORDER by cidr_start_addr DESC LIMIT 1;

Explain Plan:

				
					 Limit  (cost=0.00..0.54 rows=1 width=144) (actual time=0.750..0.751 rows=1 loops=1)
   ->  Index Scan Backward using cidr_start_index on network_data  (cost=0.00..5.38 rows=10 width=144) (actual time=0.742..0.743 rows=1 loops=1)
         Index Cond: (cidr_start_addr <= '\x0380001000000000000000000000ffffc29b0207'::bytea)
         Storage Index Filter: (cidr_end_addr >= '\x0380001000000000000000000000ffffc29b0207'::bytea)
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 0.155 ms
         Storage Table Rows Scanned: 1
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 0.428 ms
         Storage Index Rows Scanned: 1
           Metric rocksdb_number_db_seek: 2.000
           Metric rocksdb_number_db_next: 1.000
           Metric rocksdb_number_db_prev: 3.000
           Metric rocksdb_number_db_seek_found: 2.000
           Metric rocksdb_number_db_next_found: 1.000
           Metric rocksdb_number_db_prev_found: 3.000
           Metric rocksdb_iter_bytes_read: 730.000
           Metric docdb_keys_found: 3.000
           Metric ql_read_latency: sum: 130.000, count: 2.000
 Planning Time: 0.244 ms
 Execution Time: 0.778 ms
 Storage Read Requests: 2
 Storage Read Execution Time: 0.583 ms
 Storage Rows Scanned: 2
   Metric rocksdb_number_db_seek: 2
   Metric rocksdb_number_db_next: 1
   Metric rocksdb_number_db_prev: 3
   Metric rocksdb_number_db_seek_found: 2
   Metric rocksdb_number_db_next_found: 1
   Metric rocksdb_number_db_prev_found: 3
   Metric rocksdb_iter_bytes_read: 730
   Metric docdb_keys_found: 3
   Metric ql_read_latency: sum: 130, count: 2
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 0.583 ms
 Peak Memory Usage: 49 kB

Why this is fast:

● The secondary index is ordered by cidr_start_addr (DocDB key order).
● The planner performs a backward index scan from the target IP’s key and stops at the first qualifying row.
● Because cidr_end_addr is INCLUDEd, the storage layer can apply a Storage Index Filter (i.e., confirm end >= target) without immediately hopping to the base table. That keeps read-amplification low.

💡 Tip: This pattern is optimal when CIDR ranges are non-overlapping. With overlaps, the scan can still work, but it may have to step through more index entries before the end >= target filter finds the correct match.

🧩 How to think about the predicate

“Is IP :ip contained by some cidr_range?” translates to:

				
					WHERE :ip BETWEEN cidr_range_start AND cidr_range_end

In byte terms:

				
					WHERE cidr_start_addr <= inet_to_byte16(:ip)
  AND cidr_end_addr   >= inet_to_byte16(:ip)
ORDER BY cidr_start_addr DESC
LIMIT 1;

The reverse seek + LIMIT 1 is what gives you the “first matching range” efficiently.

🧭 Summary

● Direct indexing on inet/cidr isn’t available yet in YugabyteDB due to comparator constraints.
● You can normalize IPv4/IPv6 to a canonical 16-byte key and materialize start/end as generated columns.
● A secondary index on start that includes end enables:
- ◦ Reverse index scans from the target IP,
- ◦ Index-level filtering (Storage Index Filter) before base-table reads,
- ◦ Tight, predictable latency, especially with non-overlapping ranges.

This is a practical, production-friendly pattern you can use today for fast IP ∈ CIDR queries in YugabyteD, while we look forward to native support via custom comparators.

Have Fun!