Measure Node-to-Node Latency (Even When ping Is Disabled)

In many production environments, ICMP (ping, mtr) is blocked, but you still need to understand network latency between YugabyteDB nodes, especially in multi-AZ / multi-region setups.

This tip shows how to:

  • ✔️ Measure TCP connect latency between nodes (works even when ping is disabled)

  • ✔️ Use COPY FROM PROGRAM to call a shell script from inside the database

  • ✔️ Auto-discover nodes using yb_servers()

  • ✔️ Capture cloud / region / zone at measurement time

  • ✔️ Expose a neat N×N “latency matrix” view in YSQL

🔐 Prerequisites
  • ● You’re running this as a superuser (or a role allowed to use COPY FROM PROGRAM).

  • ● YugabyteDB is configured to allow COPY FROM PROGRAM (same requirement as in the “call a shell script from the database” tip).

  • ● Your OS has:

    • bash (with /dev/tcp support — standard in Bash)

    • timeout (from coreutils)

  •  
1️⃣ Create the Tables

We’ll use:

  • ● A staging table for raw script output

  • ● A main table that stores latency + placement snapshot

Staging table (raw TCP measurements)
				
					CREATE TABLE IF NOT EXISTS node_tcp_latency_raw (
    src_host    text,
    dst_host    text,
    dst_port    int,
    rtt_ms      double precision,
    measured_at timestamptz
);
				
			
Main table (historical latency + placement info)
				
					CREATE TABLE IF NOT EXISTS node_tcp_latency (
    src_host    text,
    src_cloud   text,
    src_region  text,
    src_zone    text,
    dst_host    text,
    dst_cloud   text,
    dst_region  text,
    dst_zone    text,
    dst_port    int,
    rtt_ms      double precision,
    measured_at timestamptz
);
				
			

Each row in node_tcp_latency includes:

  • ● Where the latency was measured from (src_*)

  • ● Where it was measured to (dst_*)

  • ● The TCP port tested (e.g., 5433 for YSQL)

  • ● The RTT in ms

  • ● The timestamp of the measurement

2️⃣ Shell Script: TCP Latency via /dev/tcp

On each node where you’ll run YSQL, create this script:

Path:

				
					/home/yugabyte/scripts/yb_tcp_latency.sh
				
			

Contents:

				
					#!/usr/bin/env bash
#
# Usage:
#   yb_tcp_latency.sh host1:port1 host2:port2 ...
#

set -euo pipefail

SRC_HOST=$(hostname -I | awk '{print $1}')
MEASURED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
SAMPLES=${SAMPLES:-5}
TIMEOUT=${TIMEOUT:-2}

measure_one() {
  local target="$1"
  local host="${target%:*}"
  local port="${target##*:}"

  local sum_ns=0
  local ok=0

  for i in $(seq 1 "$SAMPLES"); do
    local start_ns end_ns

    start_ns=$(date +%s%N)

    # TCP handshake using bash's /dev/tcp
    if timeout "${TIMEOUT}"s bash -c ":</dev/tcp/${host}/${port}" 2>/dev/null; then
      end_ns=$(date +%s%N)
      sum_ns=$((sum_ns + end_ns - start_ns))
      ok=$((ok + 1))
    else
      # connection failed/timed out; stop trying this target
      break
    fi
  done

  if [ "$ok" -gt 0 ]; then
    local avg_ns=$((sum_ns / ok))
    local rtt_ms
    rtt_ms=$(awk -v ns="$avg_ns" 'BEGIN { printf "%.3f", ns/1000000 }')

    # TSV line: src_host  dst_host  dst_port  rtt_ms  measured_at
    printf "%s\t%s\t%s\t%s\t%s\n" \
      "$SRC_HOST" "$host" "$port" "$rtt_ms" "$MEASURED_AT"
  fi
}

if [ "$#" -eq 0 ]; then
  echo "Usage: $0 host1:port1 [host2:port2 ...]" >&2
  exit 1
fi

for t in "$@"; do
  measure_one "$t"
done
				
			

Make it executable:

				
					chmod +x /home/yugabyte/scripts/yb_tcp_latency.sh
				
			
3️⃣ Use yb_servers() to Build the Target List and Run the Script with \gexec

From ysqlsh on a node, we want to:

  1. Get all other nodes from yb_servers()

  2. Build a COPY FROM PROGRAM command using those host:port strings

  3. Execute that command via \gexec so it runs server-side

Step 3.1: Truncate the staging table
				
					TRUNCATE node_tcp_latency_raw;
				
			
Step 3: Generate and run the COPY with \gexec
				
					SELECT format(
  $$COPY node_tcp_latency_raw (src_host, dst_host, dst_port, rtt_ms, measured_at)
    FROM PROGRAM '/home/yugabyte/scripts/yb_tcp_latency.sh %s';$$,
  string_agg(host || ':5433', ' ')
)
FROM yb_servers()
WHERE host <> host(inet_server_addr());
\gexec
				
			

What this does:

  • string_agg(host || ':5433', ' ')172.150.28.137:5433 172.161.27.158:5433

  • format(...) wraps that into a full COPY ... FROM PROGRAM ... command

  • \gexec executes that generated COPY statement

Fyi, my YugabyteDB cluster has three nodes, spread across three regions, one in each region:

				
					yugabyte=# SELECT host, cloud, region, zone FROM yb_servers() ORDER BY cloud, region, zone;
      host      | cloud |  region   |    zone
----------------+-------+-----------+------------
 172.152.18.7   | aws   | us-east-1 | us-east-1a
 172.161.27.158 | aws   | us-east-2 | us-east-2a
 172.150.28.137 | aws   | us-west-1 | us-west-1a
(3 rows)
				
			
🧩 Step 4: Insert Into the Main Table With Placement Metadata
				
					INSERT INTO node_tcp_latency (
    src_host, src_cloud, src_region, src_zone,
    dst_host, dst_cloud, dst_region, dst_zone,
    dst_port, rtt_ms, measured_at
)
SELECT
    r.src_host,
    s.cloud,  s.region, s.zone,
    r.dst_host,
    d.cloud,  d.region, d.zone,
    r.dst_port,
    r.rtt_ms,
    r.measured_at
FROM node_tcp_latency_raw r
LEFT JOIN yb_servers() s ON s.host = r.src_host
LEFT JOIN yb_servers() d ON d.host = r.dst_host;
				
			

Check the data:

				
					SELECT * FROM node_tcp_latency;
				
			
				
					yugabyte=# SELECT * FROM node_tcp_latency;
   src_host   | src_cloud | src_region |  src_zone  |    dst_host    | dst_cloud | dst_region |  dst_zone  | dst_port | rtt_ms |      measured_at
--------------+-----------+------------+------------+----------------+-----------+------------+------------+----------+--------+------------------------
 172.152.18.7 | aws       | us-east-1  | us-east-1a | 172.161.27.158 | aws       | us-east-2  | us-east-2a |     5433 | 14.744 | 2025-11-21 20:23:03+00
 172.152.18.7 | aws       | us-east-1  | us-east-1a | 172.150.28.137 | aws       | us-west-1  | us-west-1a |     5433 | 70.694 | 2025-11-21 20:23:03+00
(2 rows)
				
			

You can periodically re-run the TRUNCATE → \gexec → INSERT sequence to build a history.

5️⃣ Views: Latest Latency + N×N Matrix
5.1 Latest measurement per (src, dst, port)
				
					CREATE OR REPLACE VIEW node_tcp_latency_latest AS
SELECT
    src_host,
    src_cloud,
    src_region,
    src_zone,
    dst_host,
    dst_cloud,
    dst_region,
    dst_zone,
    dst_port,
    rtt_ms,
    measured_at
FROM (
    SELECT *,
        row_number() OVER (
            PARTITION BY src_host, dst_host, dst_port
            ORDER BY measured_at DESC
        ) AS rn
    FROM node_tcp_latency
) t
WHERE rn = 1;
				
			
5.2 N×N “matrix” view (JSON per source node)
				
					CREATE OR REPLACE VIEW node_tcp_latency_matrix AS
SELECT
    src_host,
    src_cloud,
    src_region,
    src_zone,
    jsonb_object_agg(
        dst_host || ':' || dst_port ||
        ' [' || dst_region || '/' || dst_zone || ']',
        rtt_ms
        ORDER BY dst_region, dst_zone, dst_host, dst_port
    ) AS latency_ms_by_target
FROM node_tcp_latency_latest
GROUP BY src_host, src_cloud, src_region, src_zone;
				
			

Example:

				
					SELECT * FROM node_tcp_latency_matrix;
				
			
				
					yugabyte=# SELECT * FROM node_tcp_latency_matrix;
   src_host   | src_cloud | src_region |  src_zone  |                                             latency_ms_by_target
--------------+-----------+------------+------------+--------------------------------------------------------------------------------------------------------------
 172.152.18.7 | aws       | us-east-1  | us-east-1a | {"172.150.28.137:5433 [us-west-1/us-west-1a]": 70.694, "172.161.27.158:5433 [us-east-2/us-east-2a]": 14.744}
(1 row)
				
			

This gives you a compact per-node view of all current latencies to other nodes, with full placement context.

I can connect to a different node, and run a simlar test.

				
					yugabyte=# SELECT host, cloud, region, zone, host = host(inet_server_addr()) "Connected Node?" FROM yb_servers() ORDER BY cloud, region, zone;
      host      | cloud |  region   |    zone    | Connected Node?
----------------+-------+-----------+------------+-----------------
 172.152.18.7   | aws   | us-east-1 | us-east-1a | f
 172.161.27.158 | aws   | us-east-2 | us-east-2a | t
 172.150.28.137 | aws   | us-west-1 | us-west-1a | f
(3 rows)

yugabyte=# TRUNCATE node_tcp_latency_raw;
TRUNCATE TABLE

yugabyte=# SELECT format(
yugabyte(#   $$COPY node_tcp_latency_raw (src_host, dst_host, dst_port, rtt_ms, measured_at)
yugabyte$#     FROM PROGRAM '/home/yugabyte/scripts/yb_tcp_latency.sh %s';$$,
yugabyte(#   string_agg(host || ':5433', ' ')
yugabyte(# )
yugabyte-# FROM yb_servers()
yugabyte-# WHERE host <> host(inet_server_addr());

                                               format
----------------------------------------------------------------------------------------------------
 COPY node_tcp_latency_raw (src_host, dst_host, dst_port, rtt_ms, measured_at)                     +
     FROM PROGRAM '/home/yugabyte/scripts/yb_tcp_latency.sh 172.150.28.137:5433 172.152.18.7:5433';
(1 row)

yugabyte=# \gexec
COPY 2

yugabyte=# INSERT INTO node_tcp_latency (
yugabyte(#     src_host, src_cloud, src_region, src_zone,
yugabyte(#     dst_host, dst_cloud, dst_region, dst_zone,
yugabyte(#     dst_port, rtt_ms, measured_at
yugabyte(# )
yugabyte-# SELECT
yugabyte-#     r.src_host,
yugabyte-#     s.cloud  AS src_cloud,
yugabyte-#     s.region AS src_region,
yugabyte-#     s.zone   AS src_zone,
yugabyte-#     r.dst_host,
yugabyte-#     d.cloud  AS dst_cloud,
yugabyte-#     d.region AS dst_region,
yugabyte-#     d.zone   AS dst_zone,
yugabyte-#     r.dst_port,
yugabyte-#     r.rtt_ms,
yugabyte-#     r.measured_at
yugabyte-# FROM node_tcp_latency_raw r
yugabyte-# LEFT JOIN yb_servers() s ON s.host = r.src_host
yugabyte-# LEFT JOIN yb_servers() d ON d.host = r.dst_host;
INSERT 0 2

yugabyte=# SELECT * FROM node_tcp_latency_matrix;
    src_host    | src_cloud | src_region |  src_zone  |                                             latency_ms_by_target
----------------+-----------+------------+------------+--------------------------------------------------------------------------------------------------------------
 172.152.18.7   | aws       | us-east-1  | us-east-1a | {"172.150.28.137:5433 [us-west-1/us-west-1a]": 70.694, "172.161.27.158:5433 [us-east-2/us-east-2a]": 14.744}
 172.161.27.158 | aws       | us-east-2  | us-east-2a | {"172.152.18.7:5433 [us-east-1/us-east-1a]": 14.741, "172.150.28.137:5433 [us-west-1/us-west-1a]": 52.594}
(2 rows)
				
			
6️⃣ How to Read the Numbers

Rough guidelines:

  • ● 0–1 ms: same rack / same AZ

  • ● 1–5 ms: same region, cross-AZ

  • ● 10–30 ms: typical cross-region in the same continent

  • ● 40–90 ms: US coast-to-coast

  • ● 100+ ms: inter-continental

Because we store cloud/region/zone with each row, you can:

  • See how topology impacts latency

  • Track historical changes (e.g., nodes moved between zones)

  • Group or graph by region/zone over time

Summary

In this tip we built an in-database mechanism to measure node-to-node latency in YugabyteDB, even when ICMP (ping, mtr) is blocked.

We used COPY FROM PROGRAM to run a shell script (yb_tcp_latency.sh) directly from YSQL. The script measures TCP connect latency using Bash’s /dev/tcp, which works reliably in hardened environments where ICMP is disabled. We automatically discovered all other nodes using yb_servers(), then used \gexec to dynamically build and execute a COPY command that feeds the measurement results into a small staging table.

Each measurement was then enriched with placement metadata (cloud, region, zone) pulled from yb_servers() at the time of collection and stored in the node_tcp_latency history table, allowing accurate analysis even if topology changes in the future.

Two views make the data easy to consume:

  • node_tcp_latency_latest shows the most recent RTT for each (src_host → dst_host) pair.

  • node_tcp_latency_matrix provides a compact, placement-aware JSON “latency matrix” that makes it easy to compare cross-AZ and cross-region network characteristics.

Together, this gives you a flexible, SQL-visible, placement-aware view of real network latency inside the cluster, all without relying on OS-level tools or ICMP access.

Have Fun!

Wow!! After searching for this song online for years, I finally found it on YouTube! I had it on an ’80s mix tape that’s disappeared somewhere over the years — just like the memories of freshman year and the girl I met and lost. She was the one who introduced me to the band Synch and their lead singer/drummer Jimmy Harnen (from Pennsylvania). Hearing it again brings everything ... the mix tape, the girl from freshman year, the whole time ... flooding back!