Handling Variable-Length IN (…) Clauses with pg_hint_plan in YugabyteDB

Sometimes you need to force a specific query plan, even when the query includes an IN (...) list whose length changes at runtime.

At first glance, that seems impossible.

After all:

				
					WHERE username IN ('Lucy','Maple')
				
			

is not the same statement as:

				
					WHERE username IN ('Lucy','Maple','Jane','Helen')
				
			

So how could a single pg_hint_plan hint-table entry match both?

Recently I decided to try this out myself… honestly not knowing if it would even work.

The result turned out to be a neat trick that makes variable-length IN lists fully hintable.

đź’ˇ Key Insight
A pg_hint_plan hint can apply to any number of values in an IN list — but only if the query is written so that the statement shape remains constant. The trick is to pass the list as a single array parameter:
username = ANY($1::text[])
Because the SQL statement shape does not change, the query ID stays the same, allowing the hint-table entry to match every time.

What you’ll build in 5 minutes

A tiny demo where:

  • 1. Default plan uses the index

  • 2. You insert one hint-table row to force a SeqScan(users)

  • 3. The hint continues to apply even if you pass 2, 3, or 20 usernames… because the query is executed as ANY($1::text[])

Prereqs

Step 1 – Create table + data

				
					CREATE TABLE users (
  user_id    SERIAL PRIMARY KEY,
  username   TEXT NOT NULL,
  email      TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_username ON users(username);

INSERT INTO users (username, email) VALUES
('Lucy', 'lucy@example.com'),
('Maple', 'maple@example.com'),
('Jane', 'jane@example.com'),
('Helen', 'helen@example.com'),
('Josh', 'josh@example.com'),
('Captain', 'captiain@example.com');
				
			

Step 2 – Enable hinting + hint table + query-id matching

				
					CREATE EXTENSION IF NOT EXISTS pg_hint_plan;
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

SET pg_hint_plan.enable_hint = on;
SET pg_hint_plan.enable_hint_table = on;

-- This is the magic for the hint table to use queryid matching in YugabyteDB
SET pg_hint_plan.yb_use_query_id_for_hinting = on;
				
			

pg_hint_plan + the hint table are documented here.

We will use query-ID based hinting, which is typically more reliable than text matching.

Step 3 – Try the Obvious Approach

Most applications generate SQL like this:

				
					SELECT *
FROM users
WHERE username = ANY(ARRAY['Helen','Josh']::text[]);
				
			

Let’s check the plan.

				
					EXPLAIN (COSTS OFF)
SELECT *
FROM users
WHERE username = ANY(ARRAY['Helen','Josh']::text[]);
				
			

Example output:

				
					Index Scan using idx_users_username on users
  Index Cond: (username = ANY ('{Helen,Josh}'::text[]))
				
			

Now try the same query with three values instead of two.

				
					EXPLAIN (COSTS OFF)
SELECT *
FROM users
WHERE username = ANY(ARRAY['Helen','Josh','Jane']::text[]);
				
			

Example output:

				
					Index Scan using idx_users_username on users
  Index Cond: (username = ANY ('{Helen,Josh,Jane}'::text[]))
				
			

At first glance these queries appear identical. The execution plans are the same, and EXPLAIN does not reveal any differences.

However, this is where things get interesting.

🔍 Important Detail
Even though the SQL appears nearly identical, YugabyteDB may internally normalize these statements differently depending on the number of elements in the array. For example, the database might internally represent them as:
ANY(ARRAY[$1,$2])
ANY(ARRAY[$1,$2,$3])
Because the normalized query shapes are different, they can produce different query IDs. This matters because pg_hint_plan can match hints using the query ID. If the query ID changes, the hint-table entry may not match.

Seeing the Normalized Queries (Proof)

Run both queries first:

				
					SELECT * FROM users WHERE username = ANY(ARRAY['Helen','Josh']::text[]);
SELECT * FROM users WHERE username = ANY(ARRAY['Helen','Josh','Jane']::text[]);
				
			

Now check what pg_stat_statements recorded:

				
					.      queryid       |                                     query
---------------------+--------------------------------------------------------------------------------
 4631546261497781939 | SELECT * FROM users WHERE username = ANY(ARRAY[$1,$2,$3]::text[])
 7235749465602974648 | SELECT * FROM users WHERE username = ANY(ARRAY[$1,$2]::text[])
 (2 rows)
				
			

Even though the original SQL differed only in the number of array values, YugabyteDB recorded different normalized query shapes

Which means the query IDs differ.

đź§  Why This Matters
When using pg_hint_plan hint tables, YugabyteDB can match hints using the query ID. If two queries generate different queryid values, they are treated as different statements, even if the SQL looks almost identical. This means a hint-table entry matching one query will not match the other.

Step 4 – The Fix: Keep the Query Shape Constant

Instead of embedding the values directly in the SQL statement, pass the list as a single bind parameter.

Prepare a statement:

				
					PREPARE users_by_name(text[]) AS
SELECT *
FROM users
WHERE username = ANY($1::text[]);
				
			

Now execute it with different list sizes.

Two names:

				
					EXPLAIN (COSTS OFF) EXECUTE users_by_name(ARRAY['Lucy','Maple']);
				
			

Three names:

				
					EXPLAIN (COSTS OFF) EXECUTE users_by_name(ARRAY['Jane','Helen','Josh']);
				
			

Six names:

				
					EXPLAIN (COSTS OFF) EXECUTE users_by_name(ARRAY['Lucy','Maple','Jane','Helen','Josh','Captain']);
				
			

The normalized query shape remains:

				
					username = ANY($1)
				
			

Which means the query ID remains constant.

Step 5 – Capture the Query ID

We need to capture the query ID of the PREPARE statement. For thatm we can check pg_stat_statements:

But first you have to execute the prepared statement atleast once.  For fun, we can execute all three examples above:

				
					EXECUTE users_by_name(ARRAY['Lucy','Maple']);
EXECUTE users_by_name(ARRAY['Jane','Helen','Josh']);
EXECUTE users_by_name(ARRAY['Lucy','Maple','Jane','Helen','Josh','Captain']);
				
			

Now we can query pg_stat_statements:

				
					SELECT queryid, query, calls
FROM pg_stat_statements
WHERE query ILIKE 'prepare users_by_name%'
ORDER BY calls DESC;
				
			

Example output:

				
					.      queryid       |              query               | calls
---------------------+----------------------------------+-------
 2544768220314887462 | PREPARE users_by_name(text[]) AS+|     3
                     | SELECT *                        +|
                     | FROM users                      +|
                     | WHERE username = ANY($1::text[]) |
				
			

Step 6 – Insert the Hint

				
					INSERT INTO hint_plan.hints
(norm_query_string, application_name, hints)
VALUES
('2544768220314887462','', 'SeqScan(users)');
				
			

Step 7 – Verify the Hint Works

Two names:

				
					EXPLAIN (COSTS OFF) EXECUTE users_by_name(ARRAY['Lucy','Maple']);
				
			

Example output:

				
					 Seq Scan on users
   Storage Filter: (username = ANY ('{Lucy,Maple}'::text[]))
				
			

Three names:

				
					EXPLAIN (COSTS OFF) EXECUTE users_by_name(ARRAY['Jane','Helen','Josh']);
				
			

Example output:

				
					 Seq Scan on users
   Storage Filter: (username = ANY ('{Jane,Helen,Josh}'::text[]))
				
			

Six names:

				
					EXPLAIN (COSTS OFF) EXECUTE users_by_name(ARRAY['Lucy','Maple','Jane','Helen','Josh','Captain']);
				
			

Example output:

				
					 Seq Scan on users
   Storage Filter: (username = ANY ('{Lucy,Maple,Jane,Helen,Josh,Captain}'::text[]))
				
			

App-side: how to bind the array (examples)

Java / JDBC
				
					String sql = "SELECT * FROM users WHERE username = ANY (?::text[])";
PreparedStatement ps = conn.prepareStatement(sql);

Array arr = conn.createArrayOf("text", usernames.toArray());
ps.setArray(1, arr);

ResultSet rs = ps.executeQuery();
				
			
Python / psycopg (psycopg3)
				
					sql = "SELECT * FROM users WHERE username = ANY(%s::text[])"
cur.execute(sql, (["Lucy","Maple","Jane"],))
				
			
.NET / Npgsql
				
					var cmd = new NpgsqlCommand("SELECT * FROM users WHERE username = ANY(@p::text[])", conn);
cmd.Parameters.AddWithValue("p", usernames);   // usernames is string[]
using var reader = cmd.ExecuteReader();
				
			

Key rule: don’t generate IN ('a','b','c') or ANY(ARRAY['a','b','c']). Always bind the list as one parameter.

Conclusion

This experiment started with a simple question:

  • Can pg_hint_plan handle queries with variable-length IN clauses?

At first glance, it seems unlikely. Queries like:

				
					username IN ('Lucy','Maple')
username IN ('Lucy','Maple','Jane')
username IN ('Lucy','Maple','Jane','Helen')
				
			

… are clearly different SQL statements, so it would be reasonable to assume that each variation would generate a different query identifier, making it difficult to match a single hint.

However, once the query is written using a single array bind parameter:

				
					username = ANY($1::text[])
				
			

… the situation changes dramatically.

The query shape becomes constant, regardless of how many values appear in the array. As a result, YugabyteDB generates the same normalized statement and the same queryid every time the query runs.

đź’ˇ Key Takeaway
Because the query shape is stable, the queryid is also stable. That means we only needed to insert the query’s queryid into the pg_hint_plan hint table once. From that point forward, the hint applies automatically to all sessions executing that same parameterized query, regardless of how many values appear in the array.

In other words, even though the application may send different lists of usernames on every request, the database sees a single consistent statement pattern. This allows pg_hint_plan to reliably match the hint and apply the desired plan across the entire workload.

For applications that generate dynamic filter lists, using bind arrays together with hint-table matching provides a powerful and scalable way to influence query planning without modifying application SQL or embedding hints directly in queries.

Have Fun!