How to Index Encrypted Data in YugabyteDB Using SHA-256 Hashes

Storing sensitive information like Social Security Numbers (SSNs) securely is non-negotiable… but what happens when you need to search or index that data?

If you use PGP column-level encryption (as supported in YugabyteDB and PostgreSQL), the encryption is non-deterministic, meaning that encrypting the same SSN twice results in different ciphertexts. This is great for security, but it makes indexing or direct lookup impossible.

Why You Can’t Index Encrypted Columns

Consider this:

				
					CREATE INDEX ON employee(ssn_encrypted);

				
			

That might work syntactically, but it’s meaningless for lookups because every encrypted value is unique, even if the underlying SSN is the same.

Even worse:

				
					SELECT * FROM employee WHERE ssn_encrypted = PGP_PUB_ENCRYPT('123-45-6789', ...);

				
			

This won’t match anything, because every encryption call produces different bytes.

The A Solution: Add a Searchable Hash Column

To support fast lookups and keep the original SSN encrypted, store a separate SHA-256 hash of the plaintext SSN and index that.

Example:

				
					CREATE TABLE employee (
  id UUID PRIMARY KEY,
  name TEXT,
  ssn_encrypted BYTEA NOT NULL, --> Stores the PGP-encrypted SSN
  ssn_hash TEXT NOT NULL        --> Stores the hash of the SSN for indexing and searching
);

				
			

An INSERT might look like this:

				
					INSERT INTO employee (id, name, ssn_encrypted, ssn_hash)
VALUES (
  gen_random_uuid(),
  'Loni Anderson',
  PGP_PUB_ENCRYPT('123-45-6789', dearmor($$<public_key_block_here>$$)),
  encode(digest('123-45-6789', 'sha256'), 'hex')
);
				
			

Index the Hash, not the Encrypted Column:

				
					CREATE INDEX employee_ssn_hash_idx ON employee (ssn_hash);
				
			

This gives you the performance and filterability you’d expect, without sacrificing encryption.

A search by SSN (with Decryption) could be accomplished as:

				
					SELECT id, name,
  PGP_PUB_DECRYPT(ssn_encrypted, dearmor($$<private_key_block_here>$$)) AS ssn
FROM employee
WHERE ssn_hash = encode(digest('123-45-6789', 'sha256'), 'hex');
				
			

This pattern is ideal for SSNs, email addresses, account numbers, or any sensitive identifier that must be encrypted but also queried efficiently!

The following is a complete example using public and private keys generated based on the guidelines from the YugabyteDB Tip: Column Level Public-Key Encryption in YSQL.

				
					yugabyte=# CREATE TABLE employee (
yugabyte(#   id UUID PRIMARY KEY,
yugabyte(#   name TEXT,
yugabyte(#   ssn_encrypted BYTEA NOT NULL,
yugabyte(#   ssn_hash TEXT NOT NULL
yugabyte(# );
CREATE TABLE

yugabyte=# \set public_key `cat ./public_key.txt`

yugabyte=# INSERT INTO employee (id, name, ssn_encrypted, ssn_hash)
yugabyte-# VALUES (
yugabyte(#   gen_random_uuid(),
yugabyte(#   'Loni Anderson',
yugabyte(#   PGP_PUB_ENCRYPT('123-45-6789', dearmor(:'public_key')),
yugabyte(#   encode(digest('123-45-6789', 'sha256'), 'hex')
yugabyte(# );
INSERT 0 1

yugabyte=# \set private_key `cat ./private_key.txt`

yugabyte=# EXPLAIN (ANALYZE)
yugabyte-# SELECT id, name,
yugabyte-#   PGP_PUB_DECRYPT(ssn_encrypted, dearmor(:'private_key')) AS ssn
yugabyte-# FROM employee
yugabyte-# WHERE ssn_hash = encode(digest('123-45-6789', 'sha256'), 'hex');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on employee  (cost=0.00..105.00 rows=1000 width=80) (actual time=3.668..3.672 rows=1 loops=1)
   Storage Filter: (ssn_hash = '01a54629efb952287e554eb23ef69c52097a75aecc0e3a93ca0855ab6d7a31a0'::text)
 Planning Time: 5.096 ms
 Execution Time: 3.737 ms
 Peak Memory Usage: 47 kB
(5 rows)
				
			

Notice that the query triggered a sequential scan on the table. To optimize this, we’ll create an index. 

As an added benefit, we’ll make it a covering index, enabling index-only scans, meaning the query can be satisfied entirely from the index without accessing the base table.

				
					yugabyte=# CREATE INDEX employee_ssn_hash_idx ON employee (ssn_hash) INCLUDE (id, name, ssn_encrypted);
CREATE INDEX

yugabyte=# EXPLAIN (ANALYZE)
yugabyte-# SELECT id, name,
yugabyte-#   PGP_PUB_DECRYPT(ssn_encrypted, dearmor(:'private_key')) AS ssn
yugabyte-# FROM employee
yugabyte-# WHERE ssn_hash = encode(digest('123-45-6789', 'sha256'), 'hex');
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using employee_ssn_hash_idx on employee  (cost=0.00..5.15 rows=10 width=80) (actual time=3.560..3.565 rows=1 loops=1)
   Index Cond: (ssn_hash = '01a54629efb952287e554eb23ef69c52097a75aecc0e3a93ca0855ab6d7a31a0'::text)
   Heap Fetches: 0
 Planning Time: 6.390 ms
 Execution Time: 3.623 ms
 Peak Memory Usage: 51 kB
(6 rows)
				
			
Note

In this tip we used encode(..., 'hex') for convenience… hex strings are easy to log, share, and query without worrying about bytea literal syntax. However, you can just as well store the raw SHA-256 digest in a BYTEA column. This avoids the extra space overhead of hex (32 bytes vs. 64 characters) and can be slightly more efficient. Both approaches work with indexes; the choice depends on whether you prefer developer convenience (hex text) or leaner storage/performance (bytea).

Have Fun!

This weekend, we had the honor of attending (and funding the bar tab for) our daughter’s and future son-in-law’s engagement party in Cincinnati, OH. It was also the first time we met Sam’s parents. Are they our in-laws yet? Pre-in-laws? In-law-adjacent? Whatever the label, they were lovely… and probably just as nervous as we were! We brought the drinks, they brought the snacks, it all worked out beautifully. In truth, our daughter (and Sam) ran the show, we just showed up and followed instructions.