Better Performance on JSONB Data Value Search with a GIN Index

YSQL supports the JSON and JSONB data types which both can store JSON (JavaScript Object Notation) data.

JSON stores an exact copy of the input text, where as JSONB does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys.

There are many built-in YSQL functions to help query data with JSON and JSONB columns.

For today’s tip, we’ll be looking at the JSONB data type.

Example:

				
					yugabyte=# CREATE TABLE users (username TEXT PRIMARY KEY, phones JSONB);
CREATE TABLE

yugabyte=# INSERT INTO users SELECT 'Jim', '{"Home":"412-123-1234","Cell":"412-124-2134"}';
INSERT 0 1

yugabyte=# INSERT INTO users SELECT 'Mark', '{"Home":"123-234-2311","Cell":"422-231-3333", "Work":"987-291-2929"}';
INSERT 0 1

yugabyte=# SELECT * FROM users WHERE phones->>'Work' = '987-291-2929';
 username |                                  phones
----------+--------------------------------------------------------------------------
 Mark     | {"Cell": "422-231-3333", "Home": "123-234-2311", "Work": "987-291-2929"}
(1 row)
				
			

To help improve the perfomance on similar queries on the “Work” phone, we can add an index on that JSONB attribute.

				
					yugabyte=# CREATE INDEX users_phones_work_idx ON users(((phones->>'Work')::TEXT) ASC);
CREATE INDEX

yugabyte=# EXPLAIN SELECT * FROM users WHERE phones->>'Work' = '987-291-2929';
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Scan using users_phones_work_idx on users  (cost=0.00..5.25 rows=10 width=64)
   Index Cond: ((phones ->> 'Work'::text) = '987-291-2929'::text)
(2 rows)
				
			

But what if I wanted to search for other phone numbers (i.e. Cell or Home)?

I’d have to create an index for each of those JSONB key values.

				
					yugabyte=# CREATE INDEX users_phones_cell_idx ON users(((phones->>'Cell')::TEXT) ASC);
CREATE INDEX

yugabyte=# CREATE INDEX users_phones_home_idx ON users(((phones->>'Home')::TEXT) ASC);
CREATE INDEX
				
			

That doesn’t scale very well if I add new phone number types.

Perhaps a better solution is to use a Generalized inverted index (GIN).

A GIN index will allow me to quickly search the JSONB column values for the existence of some particular value!

				
					yugabyte=# CREATE INDEX users_phones_gin_idx ON users USING GIN (jsonb_to_tsvector('simple'::regconfig, phones, '["string"]'::jsonb));
CREATE INDEX

yugabyte=# EXPLAIN SELECT * FROM users WHERE jsonb_to_tsvector('simple', phones, '["string"]') @@ to_tsquery('987-291-2929');
                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
 Index Scan using users_phones_gin_idx on users  (cost=12.25..20.26 rows=1000 width=64)
   Index Cond: (jsonb_to_tsvector('simple'::regconfig, phones, '["string"]'::jsonb) @@ to_tsquery('987-291-2929'::text)
)
(2 rows)

yugabyte=# SELECT * FROM users WHERE jsonb_to_tsvector('simple', phones, '["string"]') @@ to_tsquery('987-291-2929');
 username |                                  phones
----------+--------------------------------------------------------------------------
 Mark     | {"Cell": "422-231-3333", "Home": "123-234-2311", "Work": "987-291-2929"}
(1 row)
				
			

Have Fun!

Cool Tree Garage