Obfuscate Sensitive Data to Help Enforce Privacy and Compliance Regulations

Organizations handling sensitive data must ensure that personally identifiable information (PII) is anonymized or pseudonymized to protect user privacy and avoid legal penalties.

The PostgreSQL Anonymizer extension is a tool that masks, obfuscates, or pseudonymizes sensitive data to help enforce privacy and compliance with regulations like GDPR and HIPAA.

Version v1.3.1 of the extension is now bundled with YugabyteDB 2.25.1.0 in Tech Preview.

Let’s look at an example where we anonymize employee last names and partially mask their social security numbers.

To enable the Anonymizer extension, add anon to shared_preload_libraries in the PostgreSQL server configuration using the YB-TServer --ysql_pg_conf_csv flag.

				
					[root@cloud-server-0 ~]# yugabyted version

----------------------------------------------------------------------
|                              Version                               |
----------------------------------------------------------------------
| Version        : 2.25.1.0-b381                                     |
| Build Time     : 16 Mar 2025 20:59:55 UTC                          |
| Build Hash     : 98f8845f7828449f31bd4de043d827e8819ef7ac          |
----------------------------------------------------------------------

[root@cloud-server-0 ~]# yugabyted start --tserver_flags="ysql_pg_conf_csv={shared_preload_libraries=anon}" > start.log
				
			

Next, we need to install the extension and initialize a set of “fake” data covering various categories (e.g., first and last names, email addresses, addresses, etc.).

				
					yugabyte=# CREATE EXTENSION anon;
CREATE EXTENSION

yugabyte=# BEGIN;
BEGIN

yugabyte=*# SET LOCAL yb_default_copy_from_rows_per_transaction = 0;
SET

yugabyte=*# SET LOCAL yb_non_ddl_txn_for_sys_tables_allowed TO TRUE;
SET

yugabyte=*# SELECT anon.init(); -- Loads fake data
 init
------
 t
(1 row)

yugabyte=*# COMMIT;
COMMIT
				
			

One feature of the anon extension is that it provides various functions for generating realistic fake data.

				
					yugabyte=# \df anon.fake*
                            List of functions
 Schema |      Name       | Result data type | Argument data types | Type
--------+-----------------+------------------+---------------------+------
 anon   | fake_address    | text             |                     | func
 anon   | fake_city       | text             |                     | func
 anon   | fake_company    | text             |                     | func
 anon   | fake_country    | text             |                     | func
 anon   | fake_email      | text             |                     | func
 anon   | fake_first_name | text             |                     | func
 anon   | fake_iban       | text             |                     | func
 anon   | fake_last_name  | text             |                     | func
 anon   | fake_postcode   | text             |                     | func
 anon   | fake_siret      | text             |                     | func
(10 rows)
				
			
				
					yugabyte=# SELECT anon.fake_first_name() || ' ' || anon.fake_last_name() AS "Full Name"
yugabyte-#   FROM generate_series(1, 10);
    Full Name
------------------
 Jackie Whitney
 Bruce Short
 Nancy Lyons
 Karla Hobbs
 Joann Little
 Ronnie Santana
 Seth Ramirez
 Alejandro Bolton
 Donna Lewis
 Jack King
(10 rows)
				
			

Now let’s create an employee table and poulate it with a few records:

				
					yugabyte=# CREATE TABLE employee (id BIGINT PRIMARY KEY, first_name TEXT, last_name TEXT, ssn TEXT);
CREATE TABLE

yugabyte=# INSERT INTO employee VALUES (1,'Joshua', 'Baskin','123-45-6789'), (2,'Susan', 'Lawrence','111-22-3333')
, (3,'Billy', 'Kopecki','222-33-4444');
INSERT 0 3

yugabyte=# SELECT * FROM employee ORDER BY id;
 id | first_name | last_name |     ssn
----+------------+-----------+-------------
  1 | Joshua     | Baskin    | 123-45-6789
  2 | Susan      | Lawrence  | 111-22-3333
  3 | Billy      | Kopecki   | 222-33-4444
(3 rows)
				
			
Our goal is to dynamillay mask the data in the last_name amd ssn columns for particular users.

We need to declare masking rules for the columns and then apply the new rules to dynamic masking:

				
					yugabyte=# SECURITY LABEL FOR anon ON COLUMN employee.last_name IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL

yugabyte=# SECURITY LABEL FOR anon ON COLUMN employee.ssn IS 'MASKED WITH FUNCTION anon.partial(ssn, 0, $$***-**-$$, 4)';
SECURITY LABEL

yugabyte=# SELECT anon.start_dynamic_masking(false);
 start_dynamic_masking
-----------------------
 t
(1 row)
				
			
Next, let’s create a masked user with read access to the employee table:
				
					yugabyte=# CREATE ROLE zoltar WITH LOGIN;
CREATE ROLE

yugabyte=# SECURITY LABEL FOR anon ON ROLE zoltar IS 'MASKED';
SECURITY LABEL

yugabyte=# GRANT SELECT ON employee TO zoltar;
GRANT
				
			

Next, let’s turn on dynamic masking for the yugabyte database:

				
					yugabyte=# ALTER DATABASE yugabyte SET anon.transparent_dynamic_masking TO true;
ALTER DATABASE
				
			
Now, when I connect as the zoltar user and query the employee table, the last_name and ssn columns will have the masking rules applied:
				
					yugabyte=# \c - zoltar

You are now connected to database "yugabyte" as user "zoltar".
yugabyte=> SELECT * FROM employee ORDER BY id;
 id | first_name | last_name |     ssn
----+------------+-----------+-------------
  1 | Joshua     | Long      | ***-**-6789
  2 | Susan      | Bentley   | ***-**-3333
  3 | Billy      | Zamora    | ***-**-4444
(3 rows)
				
			

Here are the SQL commands from today’s YugabyteDB Tip, so you can try this new feature out for yourself:

				
					CREATE EXTENSION anon;

BEGIN;
SET LOCAL yb_default_copy_from_rows_per_transaction = 0;
SET LOCAL yb_non_ddl_txn_for_sys_tables_allowed TO TRUE;
SELECT anon.init(); -- Loads fake data
COMMIT;

\df anon.fake*

SELECT anon.fake_first_name() || ' ' || anon.fake_last_name() AS "Full Name" FROM generate_series(1, 10);

CREATE TABLE employee (id BIGINT PRIMARY KEY, first_name TEXT, last_name TEXT, ssn TEXT);
INSERT INTO employee VALUES (1,'Joshua', 'Baskin','123-45-6789'), (2,'Susan', 'Lawrence','111-22-3333')
, (3,'Billy', 'Kopecki','222-33-4444');
SELECT * FROM employee ORDER BY id;

SECURITY LABEL FOR anon ON COLUMN employee.last_name IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR anon ON COLUMN employee.ssn IS 'MASKED WITH FUNCTION anon.partial(ssn, 0, $$***-**-$$, 4)';
SELECT anon.start_dynamic_masking(false);

CREATE ROLE zoltar WITH LOGIN;
SECURITY LABEL FOR anon ON ROLE zoltar IS 'MASKED';
GRANT SELECT ON employee TO zoltar;

ALTER DATABASE yugabyte SET anon.transparent_dynamic_masking TO true;

\c - zoltar
SELECT * FROM employee ORDER BY id;
				
			

Have Fun!

Our "young" Japanese Maple is sprouting its spring buds!