Geospatial Queries in YugabyteDB Without PostGIS

Many applications rely on geospatial data for things like:

● “Find all points within X miles”
● “Which records fall inside this city or zip code?”
● Geo-fencing, service areas, and proximity searches

In PostgreSQL, these questions are often answered with PostGIS.

In practice, however, most applications rely on a small, well-defined subset of geospatial functionality, primarily radius searches and polygon containment, rather than the full breadth of spatial features.

This tip shows how those common geospatial queries can be implemented efficiently in YugabyteDB using a SQL-based, scalable approach that does not require PostGIS.

The key idea is to break geospatial queries into explicit, layered steps, separating coarse filtering, candidate reduction, and exact calculations in a way that maps naturally to YugabyteDB’s distributed architecture.

📌 This is Part 1 of a two-part mini-series

Part 1 (this post): The query patterns and overall strategy
Part 2: Generate H3 / Geohash Cells for Fast Geo Filtering

By the end of this post, you should have a clear mental model for:

● Which geospatial workloads YugabyteDB can support today
● How those queries are structured
● Why this approach scales without spatial indexes

⚠️ A quick note on PostGIS and extensions

YugabyteDB supports a defined set of PostgreSQL extensions. As a rule of thumb, extensions that deeply depend on PostgreSQL’s storage or index internals (like PostGIS) should be treated as not supported unless explicitly documented.

Note: The YugabyteDB engineering team is well aware of how important PostGIS support is for many geospatial workloads. As soon as official support becomes available, this tip will be updated with that news.

Until then, the techniques below represent practical, production-tested alternatives commonly used in large-scale systems.

🧠 The Key Idea: Progressive (Layered) Geo Filtering

PostGIS typically combines three responsibilities into a single function call:

1. Spatial indexing
2. Candidate filtering
3. Exact geometric calculation

Without PostGIS, we separate those layers and make each one explicit.

The pattern looks like this:

				
					lat/lon
  → coarse spatial key (H3 or geohash)
    → indexed equality or join
      → bounding-box filter
        → exact distance or point-in-polygon math

Each step dramatically reduces the number of rows processed by the next.

This works well in YugabyteDB because:

● Equality lookups and joins scale linearly
● Secondary indexes are globally distributed
● Final math is applied to small result sets

🧱 Schema: Store a Coarse Spatial Key

Every point stores:

● Latitude
● Longitude
● A coarse spatial key (referred to as a cell, such as an H3 index or geohash)

Example:

				
					CREATE TABLE geo_points (
  id        BIGSERIAL PRIMARY KEY,
  lat       DOUBLE PRECISION NOT NULL,
  lon       DOUBLE PRECISION NOT NULL,
  cell      TEXT NOT NULL
);

CREATE INDEX geo_points_cell_idx ON geo_points (cell);

Why store cell?

● It allows index-based pruning before any geometry math runs
● It distributes data evenly across tablets
● It gives you control over the granularity of spatial searches

📌 Important:

YugabyteDB does not generate cell values itself. They are computed outside the database (for example, using Python) and stored as regular columns.
👉 That generation process is covered in Part 2 of this series: Generate H3 / Geohash Cells for Fast Geo Filtering

📍 Radius Queries (Near-Me Searches)

Radius queries are implemented as three explicit filtering layers, each progressively more precise and more expensive.

Step 1: Coarse Spatial Filtering (Cell Lookup)

Quickly narrow the candidate set using an indexed equality lookup on a coarse spatial key.

				
					SELECT id, lat, lon
FROM geo_points
WHERE cell = ANY(:cells);

🔍 Where do :cells come from?

This is the critical bridge.

● Given a center point and radius:
- ○ H3: compute the center cell, then expand to neighboring cells (k-ring)
- ○ Geohash: compute the prefix and include adjacent prefixes near edges

Those cell lists are generated outside YugabyteDB and passed in as a parameter.

👉 See Part 2 for runnable Python examples: Generate H3 / Geohash Cells for Fast Geo Filtering

Step 2: Bounding Box Filtering (Axis-Aligned Bounding Box)

				
					AND lat BETWEEN :min_lat AND :max_lat
AND lon BETWEEN :min_lon AND :max_lon

Apply a rectangular bounding box around the search radius to eliminate points that cannot possibly match, using only simple comparisons. This step is orders of magnitude cheaper than trigonometric distance calculations and dramatically reduces the workload of the final step.

Step 3: Exact Distance Calculation (Haversine / Great-Circle Distance)

At this stage, we apply an exact great-circle distance calculation (commonly referred to as the Haversine formula) to precisely determine which points fall within the requested radius.

				
					AND (
  6371000 * acos(
    cos(radians(:center_lat)) * cos(radians(lat)) *
    cos(radians(lon) - radians(:center_lon)) +
    sin(radians(:center_lat)) * sin(radians(lat))
  )
) <= :radius_meters;

Performing the Haversine calculation only after coarse spatial filtering ensures the trigonometric cost is applied to a small candidate set, keeping queries fast and predictable at scale.

🧩 Putting It All Together: Full Radius Query Example

The following example shows a complete radius (“near-me”) query that combines all three layers:

1. Coarse spatial filtering using cells
2. Bounding box reduction
3. Exact distance calculation (Haversine / great-circle)

This is the exact pattern you would use in a real application.

Example: Find All Points Within a Radius

Assumptions:

● :center_lat, :center_lon – center point
● :radius_meters – search radius
● :cells – list of H3 or geohash cells covering the radius
(generated outside YugabyteDB; see Part 2, Generate H3 / Geohash Cells for Fast Geo Filtering)

The query:

				
					SELECT
  p.id,
  p.lat,
  p.lon
FROM geo_points p
WHERE
  -- Step 1: Coarse spatial filtering (indexed lookup)
  p.cell = ANY(:cells)

  -- Step 2: Bounding box filtering (cheap comparisons)
  AND p.lat BETWEEN :min_lat AND :max_lat
  AND p.lon BETWEEN :min_lon AND :max_lon

  -- Step 3: Exact distance check (Haversine / great-circle)
  AND (
    6371000 * acos(
      cos(radians(:center_lat)) * cos(radians(p.lat)) *
      cos(radians(p.lon) - radians(:center_lon)) +
      sin(radians(:center_lat)) * sin(radians(p.lat))
    )
  ) <= :radius_meters;

🧠 How to Read This Query

● The database does not “know” this is a geospatial query: It sees indexed equality filters, range predicates, and math.
● Most rows are eliminated early
- ○ cell = ANY(:cells) uses a secondary index
- ○ Bounding boxes remove obvious non-matches
● Trigonometry runs last
- ○ Only on a small candidate set
- ○ Keeps p95 / p99 latency predictable

This is the same execution strategy spatial engines use internally, made explicit.

🗺️ Polygon Queries (Zip Codes, Cities, Regions)

Polygon-based queries, such as determining which points fall within a city boundary, zip code, or service area, can also be implemented using the same progressive filtering approach.

Instead of evaluating geometry directly for every row, we convert polygons into a join problem by precomputing which spatial cells intersect each polygon.

Step 1: Precompute Polygon Cell Coverage (Outside YugabyteDB)

For each polygon, we determine the set of H3 or geohash cells that intersect it and store those relationships in a lookup table.

				
					CREATE TABLE geo_polygon_cells (
  polygon_id BIGINT,
  cell       TEXT,
  PRIMARY KEY (polygon_id, cell)
);

🔍 How are polygon cells generated?

● Outside YugabyteDB:
- ○ Take the polygon geometry
- ○ Compute all H3 or geohash cells that cover it
- ○ Store the (polygon_id, cell) pairs in geo_polygon_cells
👉 The same cell-generation logic from Part 2: Generate H3 / Geohash Cells for Fast Geo Filtering, applies here.

This step is performed once per polygon, not per query.

Step 2: Coarse Polygon Filtering via Join

We now reduce the problem to a fast, indexed join, quickly identify candidate points whose spatial cell overlaps the target polygon.

				
					SELECT p.*
FROM geo_points p
JOIN geo_polygon_cells c
  ON p.cell = c.cell
WHERE c.polygon_id = :polygon_id;

This join replaces what would traditionally be a spatial index lookup in PostGIS.

Step 3: Optional Exact Point-in-Polygon Check

Depending on accuracy requirements, the join result may already be sufficient.

If exact geometric correctness is required (for example, to eliminate edge cases where a spatial cell partially overlaps the polygon), apply a final point-in-polygon calculation to the reduced result set.

The precision of this approach depends on the chosen cell resolution, finer resolutions reduce edge over-inclusion at the cost of more cells.

As with radius queries, this exact math is applied only after coarse filtering has dramatically reduced the number of candidate rows.

Example: Point-in-Polygon Using Ray Casting (SQL):

Assume polygon vertices are stored in order:

				
					CREATE TABLE polygons (
  polygon_id BIGINT,
  vertex_id  INT,
  lat        DOUBLE PRECISION,
  lon        DOUBLE PRECISION,
  PRIMARY KEY (polygon_id, vertex_id)
);

And candidate points have already been reduced via the cell join:

				
					WITH candidate_points AS (
  SELECT p.id, p.lat, p.lon
  FROM geo_points p
  JOIN geo_polygon_cells c
    ON p.cell = c.cell
  WHERE c.polygon_id = :polygon_id
)
SELECT cp.*
FROM candidate_points cp
WHERE (
  SELECT MOD(COUNT(*), 2)
  FROM polygons v1
  JOIN polygons v2
    ON v1.polygon_id = v2.polygon_id
   AND v2.vertex_id = v1.vertex_id + 1
  WHERE v1.polygon_id = :polygon_id
    AND (
      (v1.lat > cp.lat) <> (v2.lat > cp.lat)
    )
    AND (
      cp.lon <
        (v2.lon - v1.lon) * (cp.lat - v1.lat)
        / (v2.lat - v1.lat)
        + v1.lon
    )
) = 1;

🧠 How This Works (High Level)

● A horizontal “ray” is projected from the point
● Each polygon edge crossing toggles an inside/outside state
● An odd number of crossings means the point is inside
● An even number means it is outside

This is a well-known, deterministic algorithm used in many GIS engines.

🔍 Choosing Between H3 and Geohash

Both H3 and geohash can be used to implement the query patterns shown in this post. The right choice depends on how precise and flexible your geospatial queries need to be.

When H3 is the better choice:

● You need accurate radius searches
● You frequently expand searches to neighboring regions
● You want uniform cell sizes across latitudes
● You plan to support both point and polygon workloads

When Geohash Is a Better Fit:

● Your use cases are relatively simple
● You prefer a string-based, human-readable key
● You already use geohash in your application stack
● Approximate region grouping is sufficient

In practice, H3 tends to be the better default for most proximity and polygon workloads, while geohash remains a viable option for simpler use cases.

👉 Next step: see how to generate H3 or geohash cells, including radius expansion and polygon coverage, in the companion tip: Generate H3 / Geohash Cells for Fast Geo Filtering

📊 Benchmarking p95 / p99 Latency (What to Measure and Why)

Because geospatial workloads vary widely by data distribution and query shape, there is no single “correct” latency number to publish.

Instead, the goal when benchmarking geospatial queries in YugabyteDB is to measure relative cost and tail-latency behavior as each filtering layer is applied.

What to Benchmark

Start with the same query and progressively add each layer:

1. Cell-only filter
2. Cell + bounding box
3. Cell + bounding box + exact math

Measure:

● p50 latency (baseline)
● p95 latency (steady-state pressure)
● p99 latency (tail behavior)

What You Should Observe

In a correctly layered design:

● The cell-only filter should be extremely fast and stable
● Adding a bounding box should reduce candidate rows with minimal added cost
● The exact distance or point-in-polygon math should:
- ○ Increase p50 slightly
- ○ Increase p95 modestly
- ○ Have limited impact on p99 because it runs on a small result set

If p99 grows significantly after adding exact math, it usually indicates:

● Too many cells are being scanned
● Cell resolution is too coarse
● Bounding boxes are not tight enough

How to Benchmark Without Production Data

You can generate a synthetic dataset that is sufficient for performance validation:

● Randomly distribute latitude/longitude pairs
● Precompute cells (H3 or geohash)
● Store polygons of varying sizes
● Run repeated queries against fixed radii and regions

The absolute numbers don’t matter… the shape of the latency curve does.

Why Tail Latency Matters Here

Geospatial queries often sit behind:

● User-facing APIs
● “Near me” searches
● Interactive map experiences

Averages can look fine while p99 tells a very different story.

By explicitly measuring tail latency at each stage, you ensure that:

● Expensive math is never applied at scale
● Performance remains predictable as data grows

Summary: PostGIS Use Cases → YugabyteDB Query Patterns

The table below summarizes how common PostGIS-style geospatial workloads can be implemented in YugabyteDB using the layered approach described in this post.

PostGIS-Style Use Case	YugabyteDB Pattern
Radius / proximity search	Cell filter → bounding box → Haversine / great-circle distance
Point-in-polygon	Cell join → optional exact point-in-polygon math
Geo-fencing	Polygon cell membership (cell join) → optional exact check for edge cases
Service areas / regions	Precomputed `geo_polygon_cells` tables → joins for fast lookup
Heatmaps / density analysis	`GROUP BY cell` (and optionally bucket by time, type, etc.)

Each pattern:

● Avoids spatial extensions
● Uses standard SQL operators
● Applies expensive math only after candidate reduction
● Scales linearly as data grows

For implementation details on generating and managing spatial cells, see the companion tip:

👉 Generate H3 / Geohash Cells for Fast Geo Filtering

⚡ Final thoughts

YugabyteDB can support common geospatial workloads, including proximity searches and polygon-based filtering, without requiring PostGIS.

By making each step explicit:

● Coarse spatial filtering using H3 or geohash cells
● Bounding box reduction
● Exact distance or geometry math only where needed

…you gain predictable performance, horizontal scalability, and full control over query behavior.

This approach has been successfully used for large-scale datasets where PostGIS-style spatial indexing is not available or desirable.

👉 Next: see how to generate H3 or geohash cells, including radius expansion and polygon coverage, in the companion tip: Generate H3 / Geohash Cells for Fast Geo Filtering

Have Fun!