Load TAB Delimited Files Using the COPY Command in YCQL

You can use the YCQL COPY command to load data from a CSV file into a table. The default expected delimiter of the COPY command is a single comma. However, it is very common to instead use the non-printable tab character as the field delimiter.

You are probably aware that the non-printable tab character can be represented as “\t” in Linux commands.


For example:

				
					[root@localhost ~]# printf "1\t2\tTEST1\tTEST2\n" | cat -t
1^I2^ITEST1^ITEST2
				
			

So if we have a file that is delimited by the non-prinatable tab character, we expect to be able to tell the COPY command to use “\t”.

				
					[root@localhost ~]# cat -t test.csv
1^I2^ITEST1^ITEST2

[root@localhost ~]# ycqlsh -e "CREATE KEYSPACE tabtest;"

[root@localhost ~]# ycqlsh -e "CREATE TABLE tabtest.t1 (c1 INT, c2 INT, c3 VARCHAR, c4 VARCHAR, PRIMARY KEY(c1));"

[root@localhost ~]# ycqlsh -e "COPY tabtest.t1 (c1, c2, c3, c4) FROM 'test.csv' WITH DELIMITER = '\t';"
Using 5 child processes

Starting copy of tabtest.t1 with columns [c1, c2, c3, c4].
<stdin>:1:"delimiter" must be a 1-character string
				
			

Uh oh, that didn’t work. The COPY command is expecting a single character as the delimiter.

So how do we provide a literal tab character to the COPY command?

Easy…


On Windows, use Ctrl+V (or Control+V on a Mac), and then the tab key
.


That key combination will put a literal tab on the command line!

				
					[root@localhost ~]# ycqlsh -e "COPY tabtest.t1 (c1, c2, c3, c4) FROM 'test.csv' WITH DELIMITER = '      ';"
Using 5 child processes

Starting copy of tabtest.t1 with columns [c1, c2, c3, c4].
Processed: 1 rows; Rate:       1 rows/s; Avg. rate:       2 rows/s
1 rows imported from 1 files in 0.447 seconds (0 skipped).

[root@localhost ~]# ycqlsh -e "SELECT * FROM tabtest.t1;"

 c1 | c2 | c3    | c4
----+----+-------+-------
  1 |  2 | TEST1 | TEST2

(1 rows)
				
			

Have Fun!