We learned in a previous YugabyteDB Tip that we can Consolidate Data From YB Catalog Tables From All Nodes To One Node.
That solution required a lot of tedious setup using foreign data wrappers. In today’s tip, we explore how a shell script can be used to achieve the same result.
Here is the script:
#!/bin/bash
# Script: yb_cluster_query.sh
# Description: Run a SQL query across all nodes in a YB cluster and aggregate output.
# Created: 04/11/2025
# Last update: 07/08/2025
set -euo pipefail
# ------------------------------
# Usage
# ------------------------------
if [[ $# -lt 4 ]]; then
echo "Usage: $0 [output_file] [delimiter]"
exit 1
fi
# ------------------------------
# Arguments
# ------------------------------
MASTER_HOST="$1"
DATABASE="$2"
USER="$3"
SQL_FILE="$4"
OUTPUT_FILE="${5:-cluster_output.csv}"
DELIM="${6:-|}"
PORT=5433
# ------------------------------
# Read SQL query
# ------------------------------
if [[ ! -f "$SQL_FILE" ]]; then
echo "❌ SQL file not found: $SQL_FILE"
exit 1
fi
QUERY=$(<"$SQL_FILE")
QUERY="${QUERY%;}" # Strip trailing semicolon
# ------------------------------
# Get host list from cluster
# ------------------------------
HOSTS=$(ysqlsh -h "$MASTER_HOST" -p "$PORT" -U "$USER" -d "$DATABASE" -Atc "SELECT host FROM yb_servers();" 2>/dev/null)
if [[ -z "$HOSTS" ]]; then
echo "❌ No hosts found using yb_servers() from $MASTER_HOST"
exit 1
fi
# ------------------------------
# Create temp working dir
# ------------------------------
TMP_DIR=$(mktemp -d)
trap 'rm -rf "$TMP_DIR"' EXIT
# ------------------------------
# Get column headers
# ------------------------------
HEADER_LINE=$(ysqlsh -h "$MASTER_HOST" -p "$PORT" -U "$USER" -d "$DATABASE" -F "$DELIM" -Ac "SELECT * FROM (${QUERY}) AS subq LIMIT 0;" 2>/dev/null | head -n 1)
echo "host${DELIM}${HEADER_LINE}" > "$TMP_DIR/header"
# ------------------------------
# Function to run query on a host
# ------------------------------
run_query() {
local HOST="$1"
local OUT_FILE="$TMP_DIR/${HOST}.out"
ysqlsh -h "$HOST" -p "$PORT" -U "$USER" -d "$DATABASE" -AtF "$DELIM" -c "$QUERY" 2>/dev/null |
sed "s/^/${HOST}${DELIM}/" > "$OUT_FILE"
if [[ ! -s "$OUT_FILE" ]]; then
echo "⚠️ Warning: No output for $HOST" >&2
fi
}
export -f run_query
export QUERY PORT DELIM TMP_DIR USER DATABASE
# ------------------------------
# Execute queries in parallel
# ------------------------------
echo "$HOSTS" | xargs -P 10 -I {} bash -c 'run_query "$@"' _ {}
# ------------------------------
# Compile final output
# ------------------------------
cat "$TMP_DIR/header" > "$OUTPUT_FILE"
find "$TMP_DIR" -name '*.out' -exec cat {} + | sort >> "$OUTPUT_FILE"
echo -e "\n✅ Results written to: $OUTPUT_FILE"
Below is a brief demonstration of the script in use…
On a three node cluster I’ll create a user named lucy and a database named lucy_db, assigning ownership of the database to the new user:
yugabyte=# SELECT host, cloud, region, zone FROM yb_servers() ORDER BY host;
host | cloud | region | zone
-----------+-------+-----------+----------------
127.0.0.1 | aws | us-east-1 | us-east-1-zone
127.0.0.2 | aws | us-east-2 | us-east-2-zone
127.0.0.3 | aws | us-west-2 | us-west-2-zone
(3 rows)
yugabyte=# CREATE USER lucy;
CREATE ROLE
yugabyte=# CREATE DATABASE lucy_db WITH OWNER lucy;
CREATE DATABASE
Next, we’ll have the new user do some work on nodes 127.0.0.1 and 127.0.0.2:
[root@cloud-server-0 ~]# ysqlsh -h 127.0.0.1 -U lucy -d lucy_db -c "CREATE TABLE tab (id INT PRIMARY KEY);"
CREATE TABLE
[root@cloud-server-0 ~]# ysqlsh -h 127.0.0.1 -U lucy -d lucy_db -c "INSERT INTO tab SELECT generate_series(1, 100000);"
INSERT 0 100000
[root@cloud-server-0 ~]# ysqlsh -h 127.0.0.1 -U lucy -d lucy_db -c "SELECT COUNT(*) FROM tab;"
count
--------
100000
(1 row)
[root@cloud-server-0 ~]# ysqlsh -h 127.0.0.2 -U lucy -d lucy_db -c "SELECT COUNT(*) FROM tab WHERE id BETWEEN 10000 AND 100000;"
count
-------
90001
(1 row)
I want to check the pg_stat_statements statistics for this user across the cluster. While this can be done through YugabyteDB Anywhere, it’s also easy to accomplish using the new script!
pg_stat_statements, pg_user, and pg_database.
[root@cloud-server-0 ~]# cat pg_stat_statements.sql
SELECT datname, LEFT(query, 25) query, calls, rows, mean_exec_time FROM pg_stat_statements ss JOIN pg_user u ON u.usesysid = userid JOIN pg_database d ON d.oid = dbid WHERE usename = 'lucy'
Now I’m ready to run the script, passing in all the required parameters, including my SQL file.
[root@cloud-server-0 ~]# ./yb_cluster_query.sh 127.0.0.1 lucy_db yugabyte pg_stat_statements.sql cluster_output.csv "|"
Warning: No output for 127.0.0.3
✅ Results written to cluster_output.csv
[root@cloud-server-0 ~]# cat cluster_output.csv
host|datname|query|calls|rows|mean_exec_time
127.0.0.1|lucy_db|CREATE TABLE tab (id INT |1|0|178.813274
127.0.0.1|lucy_db|INSERT INTO tab SELECT ge|1|100000|838.076101
127.0.0.1|lucy_db|SELECT COUNT(*) FROM tab|1|1|14.20426
127.0.0.2|lucy_db|SELECT COUNT(*) FROM tab |1|1|39.662449
Note: The initial output from the script includes a warning indicating that no results were returned from node 127.0.0.3. In this case, it simply means that the user Lucy hasn’t executed any SQL statements on that node.
For easy analysis, I can load the data from the file into a YugbayteDB temp table:
yugabyte=# CREATE TEMP TABLE pg_stat_statements_lucy(host TEXT, datname TEXT, query TEXT, calls
CREATE TABLE
yugabyte=# \COPY pg_stat_statements_lucy FROM 'cluster_output.csv' WITH (FORMAT 'csv', HEADER TRUE, DELIMITER '|');
WARNING: batched COPY is not supported on temporary tables
DETAIL: Defaulting to using one transaction for the entire copy.
HINT: Either copy onto non-temporary table or set rows_per_transaction option to `0` to disable batching and remove this warning.
COPY 4
yugabyte=# SELECT * FROM pg_stat_statements_lucy;
host | datname | query | calls | rows | mean_exec_time
-----------+---------+---------------------------+-------+--------+----------------
127.0.0.1 | lucy_db | CREATE TABLE tab (id INT | 1 | 0 | 178.813274
127.0.0.1 | lucy_db | INSERT INTO tab SELECT ge | 1 | 100000 | 838.076101
127.0.0.1 | lucy_db | SELECT COUNT(*) FROM tab | 1 | 1 | 14.20426
127.0.0.2 | lucy_db | SELECT COUNT(*) FROM tab | 1 | 1 | 39.662449
(4 rows)
yugabyte=# SELECT * FROM pg_stat_statements_lucy WHERE mean_exec_time > 500;
host | datname | query | calls | rows | mean_exec_time
-----------+---------+---------------------------+-------+--------+----------------
127.0.0.1 | lucy_db | INSERT INTO tab SELECT ge | 1 | 100000 | 838.076101
(1 row)
Have Fun!
