Generate H3 / Geohash Cells for Fast Geo Filtering

This post is the implementation companion to Part 1: Geospatial Queries in YugabyteDB Without PostGIS. In that tip you learned how geospatial queries can be expressed in YugabyteDB using a layered pattern:

				
					cells → bounding box → exact math

This companion post focuses on the missing implementation detail:

How do we generate the spatial cells that power those query patterns?

Specifically, this tip shows how to:

● Convert latitude/longitude points into H3 or geohash cells
● Expand cells for radius queries
● Generate polygon cell coverage
● Store and use these values efficiently in YugabyteDB

👉 If you haven’t read Part 1 yet, start here: Geospatial Queries in YugabyteDB Without PostGIS

YugabyteDB supports many PostgreSQL extensions, but PostGIS is not currently part of the officially supported extension set. This tip shows how to handle geospatial needs without it.

🧠 Assumptions from Part 1

This post assumes you already understand:

● Why cell-based filtering is the first layer of geospatial queries
● How cells are used in indexed equality lookups or joins
● How bounding boxes and exact math fit into the overall query plan

This tip focuses only on generating the cell values themselves.

📍 Example Schema (from Part 1)

				
					CREATE TABLE IF NOT EXISTS geo_points (
  id      BIGSERIAL PRIMARY KEY,
  lat     DOUBLE PRECISION NOT NULL,
  lon     DOUBLE PRECISION NOT NULL,
  cell    TEXT NOT NULL,
  payload JSONB
);

CREATE INDEX IF NOT EXISTS geo_points_cell_idx
  ON geo_points (cell) INCLUDE (lat, lon);

The cell column stores either:

● an H3 index, or
● a geohash string

and is used for coarse spatial filtering in the queries shown in Part 1.

🔍 Choosing Between H3 and Geohash (Quick Recap)

As discussed in Part 1, both approaches work with the same query patterns.

● H3 is generally preferred for:
- ○ Radius searches
- ○ Neighbor expansion (k-rings)
- ○ Polygon coverage
● Geohash can be sufficient for:
- ○ Simpler region grouping
- ○ Prefix-based filtering
- ○ Existing geohash-heavy stacks

This post shows examples for both, but H3 is used in most examples below.

📍 Step 1: Convert a Point into H3 / Geohash Cells

This is the fundamental operation: converting a latitude/longitude pair into a spatial cell.

Save as gen_cells.py:

				
					#!/usr/bin/env python3

"""
Generate H3 and Geohash cells for a single latitude/longitude point.
"""

import h3
import geohash2


def generate_cells(lat: float, lon: float, h3_res: int, geohash_len: int):
    h3_cell = h3.latlng_to_cell(lat, lon, h3_res)
    geohash_cell = geohash2.encode(lat, lon, precision=geohash_len)
    return h3_cell, geohash_cell


if __name__ == "__main__":
    # Example coordinates (Pittsburgh, PA)
    lat = 40.4406
    lon = -79.9959

    # Tunable parameters
    h3_res = 9
    geohash_len = 6

    h3_cell, geohash_cell = generate_cells(
        lat=lat,
        lon=lon,
        h3_res=h3_res,
        geohash_len=geohash_len,
    )

    LABEL_WIDTH = 26

    def line(label, value):
        print(f"  {label.ljust(LABEL_WIDTH)} : {value}")

    print("Input:")
    line("lat/lon", f"{lat}, {lon}")
    print()
    print("Generated spatial cells:")
    line(f"H3 (resolution={h3_res})", h3_cell)
    line(f"Geohash (length={geohash_len})", geohash_cell)

Install and run:

				
					python3 -m venv venv
source venv/bin/activate
pip install h3 geohash2
python gen_cells.py

Example output:

				
					Input:
  lat/lon                    : 40.4406, -79.9959

Generated spatial cells:
  H3 (resolution=9)          : 892a847140bffff
  Geohash (length=6)         : dppn59

These values are what get stored in geo_points.cell.

🏗️ Step 2: Bulk Load Points with Cells

In practice, cell generation happens during data ingestion.

Example Python loader that:

● Generates random points
● Computes H3 cells
● Inserts rows into YugabyteDB

Save file as load_geo_points.py:

				
					#!/usr/bin/env python3

import random
import h3
import psycopg2

# Center point (Pittsburgh, PA)
CENTER_LAT = 40.4406
CENTER_LON = -79.9959

# Tunable parameters
SPREAD_DEG = 0.20        # Geographic spread in degrees
H3_RES = 9               # H3 resolution
NUM_ROWS = 50_000        # Number of points to generate

def rand_point(lat, lon, spread):
    return (
        lat + (random.random() - 0.5) * spread,
        lon + (random.random() - 0.5) * spread,
    )

conn = psycopg2.connect(
    host="127.0.0.1",
    port=5433,
    dbname="yugabyte",
    user="yugabyte",
)
conn.autocommit = True

with conn.cursor() as cur:
    for _ in range(NUM_ROWS):
        lat, lon = rand_point(CENTER_LAT, CENTER_LON, SPREAD_DEG)
        cell = h3.latlng_to_cell(lat, lon, H3_RES)

        cur.execute(
            "INSERT INTO geo_points (lat, lon, cell) VALUES (%s, %s, %s)",
            (lat, lon, cell)
        )

print(f"Loaded {NUM_ROWS} rows into geo_points")

How to run it:

				
					pip install h3 psycopg2-binary
python load_geo_points.py

You should see:

				
					Loaded 50000 rows into geo_points.

These rows are now ready to be queried using the patterns in Part 1.

🔁 Step 3: Expand Cells for Radius Queries (End-to-End Demo)

In Part 1, the radius query begins with a coarse spatial filter:

				
					WHERE p.cell = ANY(:cells)

This step shows exactly how to generate that :cells list, what it looks like in practice, and how it is used directly in YugabyteDB.

The script below:

1. Computes the H3 center cell
2. Expands to neighboring cells using a k-ring
3. Prints a SQL-ready array literal
4. Loads sample data
5. Proves the coarse filter works

File: radius_demo.py

				
					#!/usr/bin/env python3

"""
radius_demo.py

End-to-end radius demo (Part 2 Step 3 + Putting It All Together):

  1) Create geo_points if needed
  2) Generate random points around a center
  3) Compute each point's H3 cell and insert
  4) Compute :cells for a radius search (grid_disk for h3 v4+, k_ring for older)
  5) Print a nicely formatted SQL ARRAY[...] you can paste into YSQL
  6) Compute bounding box for a chosen radius
  7) Run:
       - coarse cell filter count
       - full radius query count (cells + bbox + exact distance)

Requirements:
  pip install h3 psycopg2-binary
"""

import math
import random
import h3
import psycopg2


# ----------------------------
# Tunables
# ----------------------------
CENTER_LAT = 40.4406
CENTER_LON = -79.9959

H3_RES = 9
K_RING = 2

NUM_POINTS = 2000
SPREAD_DEG = 0.10  # synthetic spread (degrees)

RADIUS_METERS = 2000  # used for bounding box + exact distance

DB_HOST = "127.0.0.1"
DB_PORT = 5433
DB_NAME = "yugabyte"
DB_USER = "yugabyte"
DB_PASSWORD = None  # set if needed


# ----------------------------
# Helpers
# ----------------------------
def rand_point(center_lat: float, center_lon: float, spread_deg: float):
    return (
        center_lat + (random.random() - 0.5) * spread_deg,
        center_lon + (random.random() - 0.5) * spread_deg,
    )


def connect():
    kwargs = dict(
        host=DB_HOST,
        port=DB_PORT,
        dbname=DB_NAME,
        user=DB_USER,
    )
    if DB_PASSWORD:
        kwargs["password"] = DB_PASSWORD
    return psycopg2.connect(**kwargs)


def h3_disk(cell: str, k: int):
    """
    Return the set of cells within k steps of `cell`.

    h3-py v4+:  grid_disk(cell, k)
    older:      k_ring(cell, k)
    """
    if hasattr(h3, "grid_disk"):
        return h3.grid_disk(cell, k)
    if hasattr(h3, "k_ring"):
        return h3.k_ring(cell, k)
    raise RuntimeError("Unsupported h3 library: missing grid_disk/k_ring APIs")


def pretty_sql_array(items, per_line=3):
    """
    Produce a readable SQL array literal:

    ARRAY[
      'a','b','c',
      'd','e','f'
    ]
    """
    escaped = [s.replace("'", "''") for s in items]
    lines = []
    for i in range(0, len(escaped), per_line):
        chunk = escaped[i:i + per_line]
        suffix = "," if i + per_line < len(escaped) else ""
        lines.append("    " + ",".join(f"'{x}'" for x in chunk) + suffix)
    return "ARRAY[\n" + "\n".join(lines) + "\n  ]"


def bounding_box(center_lat, center_lon, radius_m):
    """
    Approximate bounding box for a radius in meters.
    Good enough for coarse filtering.

    lat delta: meters / 111_320
    lon delta: meters / (111_320 * cos(lat))
    """
    lat_rad = math.radians(center_lat)
    dlat = radius_m / 111_320.0
    dlon = radius_m / (111_320.0 * max(math.cos(lat_rad), 1e-12))
    return (
        center_lat - dlat,
        center_lat + dlat,
        center_lon - dlon,
        center_lon + dlon,
    )


# ----------------------------
# Main
# ----------------------------
def main():
    # Compute radius cells (these become :cells in Part 1)
    center_cell = h3.latlng_to_cell(CENTER_LAT, CENTER_LON, H3_RES)
    cells = sorted(h3_disk(center_cell, K_RING))

    # Compute bounding box for the chosen radius
    min_lat, max_lat, min_lon, max_lon = bounding_box(CENTER_LAT, CENTER_LON, RADIUS_METERS)

    print("Radius cell expansion (H3):")
    print(f"  center lat/lon          : {CENTER_LAT}, {CENTER_LON}")
    print(f"  H3 resolution           : {H3_RES}")
    print(f"  k (ring distance)       : {K_RING}")
    print(f"  center cell             : {center_cell}")
    print(f"  number of cells         : {len(cells)}")
    print()

    print("SQL-ready array (paste into YSQL):")
    print(pretty_sql_array(cells))
    print()

    print("Bounding box (approx):")
    print(f"  radius_meters           : {RADIUS_METERS}")
    print(f"  lat BETWEEN             : {min_lat:.6f} AND {max_lat:.6f}")
    print(f"  lon BETWEEN             : {min_lon:.6f} AND {max_lon:.6f}")
    print()

    # Connect to YugabyteDB and load demo data
    conn = connect()
    conn.autocommit = True

    with conn.cursor() as cur:
        # Create table + index (idempotent)
        cur.execute("""
          CREATE TABLE IF NOT EXISTS geo_points (
            id   BIGSERIAL PRIMARY KEY,
            lat  DOUBLE PRECISION NOT NULL,
            lon  DOUBLE PRECISION NOT NULL,
            cell TEXT NOT NULL
          );
        """)
        cur.execute("CREATE INDEX IF NOT EXISTS geo_points_cell_idx ON geo_points (cell);")

        # Load synthetic points
        cur.execute("TRUNCATE TABLE geo_points;")

        rows = []
        for _ in range(NUM_POINTS):
            lat, lon = rand_point(CENTER_LAT, CENTER_LON, SPREAD_DEG)
            cell = h3.latlng_to_cell(lat, lon, H3_RES)
            rows.append((lat, lon, cell))

        cur.executemany(
            "INSERT INTO geo_points (lat, lon, cell) VALUES (%s, %s, %s)",
            rows
        )

        print(f"Loaded {NUM_POINTS} points into geo_points (spread_deg={SPREAD_DEG})")
        print()

        # Stage 1: coarse cell filter only
        cur.execute(
            "SELECT count(*) FROM geo_points WHERE cell = ANY(%s);",
            (cells,)
        )
        coarse_count = cur.fetchone()[0]

        # Stage 2+3: full radius query (cells + bbox + exact distance)
        cur.execute(
            """
            SELECT count(*)
            FROM geo_points p
            WHERE p.cell = ANY(%s)
              AND p.lat BETWEEN %s AND %s
              AND p.lon BETWEEN %s AND %s
              AND (
                6371000 * acos(
                  cos(radians(%s)) * cos(radians(p.lat)) *
                  cos(radians(p.lon) - radians(%s)) +
                  sin(radians(%s)) * sin(radians(p.lat))
                )
              ) <= %s;
            """,
            (
                cells,
                min_lat, max_lat,
                min_lon, max_lon,
                CENTER_LAT, CENTER_LON, CENTER_LAT,
                RADIUS_METERS,
            )
        )
        full_count = cur.fetchone()[0]

        print("Validation counts:")
        print(f"  coarse cell filter      : {coarse_count} / {NUM_POINTS}")
        print(f"  full radius query       : {full_count} / {NUM_POINTS}")
        print()

    conn.close()


if __name__ == "__main__":
    main()

How to run it:

				
					pip install h3 psycopg2-binary
python radius_demo.py

Example output:

				
					Radius cell expansion (H3):
  center lat/lon          : 40.4406, -79.9959
  H3 resolution           : 9
  k (ring distance)       : 2
  center cell             : 892a847140bffff
  number of cells         : 19

SQL-ready array (paste into YSQL):
ARRAY[
    '892a8471403ffff','892a8471407ffff','892a847140bffff',
    '892a847140fffff','892a8471413ffff','892a8471417ffff',
    '892a847141bffff','892a847143bffff','892a8471443ffff',
    '892a8471447ffff','892a847144fffff','892a8471453ffff',
    '892a8471457ffff','892a8471463ffff','892a8471473ffff',
    '892a8471477ffff','892a847147bffff','892a84714c7ffff',
    '892a84714cfffff'
  ]

Bounding box (approx):
  radius_meters           : 2000
  lat BETWEEN             : 40.422634 AND 40.458566
  lon BETWEEN             : -80.019506 AND -79.972294

Loaded 2000 points into geo_points (spread_deg=0.1)

Validation counts:
  coarse cell filter      : 39 / 2000
  full radius query       : 39 / 2000

What this output tells us:

● Out of 2,000 total rows, only 39 matched the H3 cell filter.
● The bounding box and exact distance checks did not further reduce the set for this synthetic dataset.
● Most importantly, expensive distance math ran on ~2% of the data, not the full table.

In real workloads with wider geographic spreads or smaller radii, the final distance check typically reduces the result set even further.

Note: Your H3 array will differ based on resolution and libraries

Key takeaway: cell-based filtering dramatically shrinks the candidate set before any costly geospatial math is applied.

🧩 Putting It All Together: Run the Full Radius Query in YugabyteDB

At this point we have everything needed to run a complete “near-me” radius query:

● :cells – generated by H3 k-ring expansion (Step 3)
● Bounding box – a cheap range filter (lat BETWEEN ... AND ..., lon BETWEEN ... AND ...)
● Exact distance – applied only after candidate reduction (as shown in Part 1)

Below is a runnable YSQL example using the exact array and bounding box values generated by the demo script:

Full radius query (cells → bounding box → exact distance):

				
					SELECT
  p.id,
  p.lat,
  p.lon
FROM geo_points p
WHERE
  -- Step 1: coarse spatial filter (fast, indexed)
  p.cell = ANY(
    ARRAY[
      '892a8471403ffff','892a8471407ffff','892a847140bffff',
      '892a847140fffff','892a8471413ffff','892a8471417ffff',
      '892a847141bffff','892a847143bffff','892a8471443ffff',
      '892a8471447ffff','892a847144fffff','892a8471453ffff',
      '892a8471457ffff','892a8471463ffff','892a8471473ffff',
      '892a8471477ffff','892a847147bffff','892a84714c7ffff',
      '892a84714cfffff'
    ]
  )

  -- Step 2: bounding box filter (cheap comparisons)
  AND p.lat BETWEEN 40.422634 AND 40.458566
  AND p.lon BETWEEN -80.019506 AND -79.972294

  -- Step 3: exact distance (meters)
  AND (
    6371000 * acos(
      cos(radians(40.4406)) * cos(radians(p.lat)) *
      cos(radians(p.lon) - radians(-79.9959)) +
      sin(radians(40.4406)) * sin(radians(p.lat))
    )
  ) <= 2000
ORDER BY p.id
LIMIT 25;

Example:

				
					yugabyte=# SELECT
yugabyte-#   p.id,
yugabyte-#   p.lat,
yugabyte-#   p.lon
yugabyte-# FROM geo_points p
yugabyte-# WHERE
yugabyte-#   -- Step 1: coarse spatial filter (fast, indexed)
yugabyte-#   p.cell = ANY(
yugabyte(#     ARRAY[
yugabyte(#       '892a8471403ffff','892a8471407ffff','892a847140bffff',
yugabyte(#       '892a847140fffff','892a8471413ffff','892a8471417ffff',
yugabyte(#       '892a847141bffff','892a847143bffff','892a8471443ffff',
yugabyte(#       '892a8471447ffff','892a847144fffff','892a8471453ffff',
yugabyte(#       '892a8471457ffff','892a8471463ffff','892a8471473ffff',
yugabyte(#       '892a8471477ffff','892a847147bffff','892a84714c7ffff',
yugabyte(#       '892a84714cfffff'
yugabyte(#     ]
yugabyte(#   )
yugabyte-#
yugabyte-#   -- Step 2: bounding box filter (cheap comparisons)
yugabyte-#   AND p.lat BETWEEN 40.422634 AND 40.458566
yugabyte-#   AND p.lon BETWEEN -80.019506 AND -79.972294
yugabyte-#
yugabyte-#   -- Step 3: exact distance (meters)
yugabyte-#   AND (
yugabyte(#     6371000 * acos(
yugabyte(#       cos(radians(40.4406)) * cos(radians(p.lat)) *
yugabyte(#       cos(radians(p.lon) - radians(-79.9959)) +
yugabyte(#       sin(radians(40.4406)) * sin(radians(p.lat))
yugabyte(#     )
yugabyte(#   ) <= 2000
yugabyte-# ORDER BY p.id
yugabyte-# LIMIT 25;
   id   |        lat         |        lon
--------+--------------------+--------------------
 102041 |  40.44048545592179 | -80.00203920045017
 102170 |  40.43895236254896 | -79.98888492554804
 102310 |  40.44243399399757 |  -79.9931120959056
 102313 | 40.438165460439045 | -80.00363590020014
 102394 |  40.44156095577823 | -80.00055097865172
 102466 |  40.44395687559483 | -79.99865994423243
 102531 |  40.43954827998846 | -79.98783857837434
 102617 |  40.43574226774673 | -80.00152976024349
 102639 |  40.44253182084223 | -79.99878397075616
 102671 | 40.435247075845375 | -79.99048789947692
 102702 |  40.44017611873405 | -79.99558718878541
 102716 | 40.441373300960734 | -79.98651820436055
 102725 |  40.43608408122357 |  -79.9938495749125
 102736 |  40.43855217983227 | -79.99565138761412
 102778 |  40.44340663086255 | -79.98846989830469
 102787 | 40.446979140086185 | -80.00062490208099
 102815 |  40.44486309243447 | -80.00336504633111
 102826 |  40.43541374551199 | -80.00097252032793
 102841 | 40.440102355420514 | -79.99445128977621
 102849 | 40.445880633699986 | -79.99579220301376
 102853 |  40.44093337089909 | -79.99088411810804
 102889 |   40.4369217528762 | -79.99991923958389
 102964 |  40.44306805298324 | -79.99700248538768
 103042 | 40.439102073143815 |  -80.0051635970044
 103104 |  40.43741011656871 | -79.99986008512319
(25 rows)

With the demo dataset:

● Coarse cell filter: 39 / 2000
● Full radius query: 39 / 2000

Even on synthetic data, the important takeaway is that YugabyteDB can execute a “near-me” query using standard SQL, and the expensive math runs only after the coarse filter has reduced the candidate set.

(If you want a more dramatic reduction between the coarse stage and the final stage, increase SPREAD_DEG or reduce RADIUS_METERS in the demo script.)

✅ Summary

In this tip, we focused on the data preparation layer that enables geospatial queries in YugabyteDB without PostGIS.

Specifically, we showed how to:

● Convert latitude/longitude points into H3 or geohash cells
● Expand a center point into a set of neighboring cells for radius queries
● Generate SQL-ready cell arrays that plug directly into YSQL
● Use these cells to dramatically reduce the number of rows evaluated by geospatial math

By preparing spatial cells outside YugabyteDB and indexing them, we make high-scale geospatial filtering possible in SQL workloads without a specialized spatial index.

Combined with the query patterns from Part 1, this approach makes it possible to support common geospatial workloads, such as near-me searches and polygon containment, using standard SQL and indexes.

👉 Next: validating these patterns with synthetic benchmarks and p95/p99 latency measurements in Part 3, Benchmarking Geospatial Queries in YugabyteDB.

Have Fun!