Access Data In An External File Like a Database Table

If you have some data sitting in an external file and would like to use it in your database queries, you can “easily” do that in YSQL!

I have this simple text file with some true facts about our favorite distributed SQL database:

				
					yugabyte=# \! cat /root/data.txt
1|Yugabyte is cool!
2|Yugabyte is awesome!
3|Yugabyte is neat!
				
			

I want to query the data in the files as if it were a table. First I need to read the data file and will use the pg_read_file function to do that.

				
					yugabyte=# SELECT * FROM pg_read_file('/root/data.txt');
      pg_read_file
------------------------
 1|Yugabyte is cool!   +
 2|Yugabyte is awesome!+
 3|Yugabyte is neat!   +

(1 row)
				
			

Next we need to get creative to split the data out into columns and rows. To do that, I will use a combination of the following functions: split_part, unnest and string_to_array.

				
					yugabyte=# SELECT id, fact
yugabyte-#   FROM (SELECT split_part(unnest(string_to_array(pg_read_file('/root/data.txt'), CHR(10))), '|', 1) id,
yugabyte(#                split_part(unnest(string_to_array(pg_read_file('/root/data.txt'), CHR(10))), '|', 2) fact) foo
yugabyte-#  WHERE id <> ''
yugabyte-#    AND fact <> '';
 id |         fact
----+----------------------
 1  | Yugabyte is cool!
 2  | Yugabyte is awesome!
 3  | Yugabyte is neat!
(3 rows)
				
			

Now I can query the external data in the file like I would any other table:

				
					yugabyte=# SELECT id, fact
yugabyte-#   FROM (SELECT split_part(unnest(string_to_array(pg_read_file('/root/data.txt'), CHR(10))), '|', 1) id,
yugabyte(#                split_part(unnest(string_to_array(pg_read_file('/root/data.txt'), CHR(10))), '|', 2) fact) foo
yugabyte-#  WHERE fact ILIKE '%awesome%';
 id |         fact
----+----------------------
 2  | Yugabyte is awesome!
(1 row)
				
			

And I can even join this external data with other “real” tables in my database:

				
					yugabyte=# SELECT * FROM facts_internal; -- This is a "real" database table
 id |     fact
----+---------------
  2 | It really is!
(1 row)

yugabyte=# SELECT facts_external.id, facts_external.fact fact_ext, facts_internal.fact fact_int
yugabyte-#   FROM (SELECT id, fact
yugabyte(#           FROM (SELECT split_part(unnest(string_to_array(pg_read_file('/root/data.txt'), CHR(10))), '|', 1) id,
yugabyte(#                        split_part(unnest(string_to_array(pg_read_file('/root/data.txt'), CHR(10))), '|', 2) fact) foo
yugabyte(#          WHERE id <> ''
yugabyte(#            AND fact <> '') facts_external
yugabyte-#   JOIN facts_internal
yugabyte-#     ON facts_external.id::INT = facts_internal.id;
 id |       fact_ext       |   fact_int
----+----------------------+---------------
 2  | Yugabyte is awesome! | It really is!
(1 row)
				
			

Have Fun!