Search a LIST Collection Data Type in YCQL

YCQL supports collection data types to specify columns for data objects that can contain more than one value.

One such type is a LIST which is an ordered collection of elements of the same primitive type.

Example:

				
					yugabyte@ycqlsh> CREATE TABLE ks.test (pk INT PRIMARY KEY, letters LIST<TEXT>);

yugabyte@ycqlsh> INSERT INTO ks.test (pk, letters) VALUES (1, ['A', 'B', 'C']);

yugabyte@ycqlsh> INSERT INTO ks.test (pk, letters) VALUES (2, ['B', 'C']);

yugabyte@ycqlsh> SELECT * FROM ks.test;

 pk | letters
----+-----------------
  1 | ['A', 'B', 'C']
  2 |      ['B', 'C']

(2 rows)

yugabyte@ycqlsh> SELECT letters[0], letters[1], letters[2] FROM ks.test;

 letters['0'] | letters['1'] | letters['2']
--------------+--------------+--------------
            A |            B |            C
            B |            C |         null

(2 rows)

yugabyte@ycqlsh> SELECT * FROM ks.test WHERE letters[0] = 'A';

 pk | letters
----+-----------------
  1 | ['A', 'B', 'C']

(1 rows)
				
			

You might have a case where you’ll want to show the rows where the LIST contains a particular value.

Cassandra 2.1 introduced the CONTAINS clause to do this.

This feature is not currently avaiable in YugabyteDB but the feature request is being tracked here.

				
					yugabyte@ycqlsh> SELECT * FROM ks.test WHERE letters CONTAINS 'A';
SyntaxException: Invalid SQL Statement. syntax error, unexpected IDENT, expecting end_of_file
SELECT * FROM ks.test WHERE letters CONTAINS 'A';
                                    ^^^^^^^^
 (ql error -11)
				
			

As a work around on smaller data sets, one option is is to manually scan the rows for matches outside of YCQL.

Example:

I have this simple Catalog table with some items. One of the columns is a list of colors that the item is avaiable in.

				
					[root@localhost ~]# alias c
alias c='ycqlsh xxx.xx.xx.xxx -u yugabyte -p password'

[root@localhost ~]# c -e "DESCRIBE ks.catalog;"

CREATE TABLE ks.catalog (
    id int PRIMARY KEY,
    item text,
    colors list<text>
) WITH default_time_to_live = 0
    AND transactions = {'enabled': 'false'};

[root@localhost ~]# c -e "SELECT * FROM ks.catalog;"

 id | item      | colors
----+-----------+---------------------------------------------
  5 |  Pants #2 |                ['Purple', 'Black', 'White']
  1 |  Shirt #1 |            ['Red', 'Blue', 'Green', 'Grey']
  6 |  Pants #3 |           ['Green', 'Black', 'Red', 'Blue']
  7 | Shorts #1 |           ['Green', 'Black', 'Red', 'Blue']
  9 | Shorts #3 |                     ['Red', 'Blue', 'Grey']
 10 | Shorts #4 |                                    ['Blue']
  4 |  Pants #1 | ['Red', 'Grey', 'Purple', 'Black', 'White']
  2 |  Shirt #2 |                   ['Pink', 'Blue', 'Green']
  8 | Shorts #2 |                             ['Red', 'Blue']
  3 |  Shirt #3 |         ['Red', 'Purple', 'Black', 'White']

(10 rows)
				
			

I’d like to list all of the items that come in Grey. To accomplish that I could do something like this from the command line…

				
					[root@localhost ~]# c -e "SELECT id, item, colors FROM ks.catalog;" | awk -F"|" -v env_var="'Grey'" '{if ((NR>1 && NR<4) || (NR>3 && $(NF) ~ env_var)) print}'
 id | item      | colors
----+-----------+---------------------------------------------
  1 |  Shirt #1 |            ['Red', 'Blue', 'Green', 'Grey']
  9 | Shorts #3 |                     ['Red', 'Blue', 'Grey']
  4 |  Pants #1 | ['Red', 'Grey', 'Purple', 'Black', 'White']
				
			

Note, for this to work, the column having the LIST data type needs to be selected last.

For fun, I slapped togther a shell scriptr that you might find useful. It encapsulates the logic in the example above so that you just need to pass in some parameters to do the search.

Here is the shell script code:

				
					#!/bin/bash
shopt -s expand_aliases

# Script:      ycql_contains.sh
# Description: This script will return rows from a YCQL table having a LIST column that contains
#                a specfied value.
# Created:     03/14/2023
# Last update: 03/14/2023

# Reference:   https://docs.yugabyte.com/preview/api/ycql/expr_subscript/

# Set up
# Check if keyspace name was provided
if [ -z "$1" ]
then
  echo "Please provide a keyspace name"
  exit 0
else
  YB_KEYSPACE_NAME=$1;
fi

# Check if table name was provided
if [ -z "$2" ]
then
  echo "Please provide a table name"
  exit 0
else
  YB_TABLE_NAME=$2;
fi

# Check if column name was provided
if [ -z "$3" ]
then
  echo "Please provide a column name"
  exit 0
else
  YB_COLUMN_NAME=$3;
fi

# Check if search value was provided
if [ -z "$4" ]
then
  echo "Please provide a search value"
  exit 0
else
  SEARCH_VALUE=$4;
fi

# Store WHERE clause for later
if [ ! -z "$5" ]
then
  YB_WHERE=$5;
fi

# Set up the YCQLSH alias below per your environment
YB_PATH='/root/yugabyte-2.17.1.0/bin/ycqlsh'
YB_IP='xxx.xx.xx.xxx' # This can be any node runnig the T-Server process
YB_USER='yugabyte'
YB_PASSWORD='password'

alias y="$YB_PATH $YB_IP -u $YB_USER -p $YB_PASSWORD"

# Check if table exists
tc=`y -e "SELECT COUNT(*) FROM system_schema.tables WHERE keyspace_name = '$YB_KEYSPACE_NAME' AND table_name = '$YB_TABLE_NAME';" | awk 'FNR==4 {print}'`

if [ $tc -gt 0 ]
then
  # If the column exists, get the column type
  cc=`y -e "SELECT type FROM system_schema.columns WHERE keyspace_name = '$YB_KEYSPACE_NAME' AND table_name = '$YB_TABLE_NAME' AND column_name = '$YB_COLUMN_NAME';" | awk 'FNR==4 {print}'`

  if [ ! -z "$cc" ]
  then
    #Get the other columns of the table
    c1=`y -e "SELECT COUNT(*) FROM system_schema.columns WHERE keyspace_name = '$YB_KEYSPACE_NAME' AND table_name = '$YB_TABLE_NAME' AND column_name != '$YB_COLUMN_NAME';" | awk 'FNR==4 {print}'`
    c2=`y -e "SELECT column_name FROM system_schema.columns WHERE keyspace_name = '$YB_KEYSPACE_NAME' AND table_name = '$YB_TABLE_NAME' AND column_name != '$YB_COLUMN_NAME';" | awk -v c2_env=$(($c1+4)) '{if (NR>3 && NR<c2_env) print}' | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}'`
    c2=${c2//[![:print:]]/}", $YB_COLUMN_NAME"

    cc=`y -e "SELECT type FROM system_schema.columns WHERE keyspace_name = '$YB_KEYSPACE_NAME' AND table_name = '$YB_TABLE_NAME' AND column_name = '$YB_COLUMN_NAME';" | awk 'FNR==4 {print}'`

    if [ $cc == "list<text>" ]; then
      if [ -z "$YB_WHERE" ]
      then
        y -e "SELECT $c2 FROM $YB_KEYSPACE_NAME.$YB_TABLE_NAME;" | awk -F"|" -v env_var="'$SEARCH_VALUE'" '{if ((NR>1 && NR<4) || (NR>3 && $(NF) ~ env_var)) print}'
      else
        y -e "SELECT $c2 FROM $YB_KEYSPACE_NAME.$YB_TABLE_NAME WHERE $YB_WHERE;" | awk -F"|" -v env_var="'$SEARCH_VALUE'" '{if ((NR>1 && NR<4) || (NR>3 && $(NF) ~ env_var)) print}'
      fi
    else
      if [ -z "$YB_WHERE" ]
      then
        y -e "SELECT $c2 FROM $YB_KEYSPACE_NAME.$YB_TABLE_NAME;" | awk -F"|" -v env_var="$SEARCH_VALUE" '{if ((NR>1 && NR<4) || (NR>3 && ($(NF) ~ env_var"," || $(NF) ~ env_var"]"))) print}'
      else
        y -e "SELECT $c2 FROM $YB_KEYSPACE_NAME.$YB_TABLE_NAME WHERE $YB_WHERE;" | awk -F"|" -v env_var="$SEARCH_VALUE" '{if ((NR>1 && NR<4) || (NR>3 && ($(NF) ~ env_var"," || $(NF) ~ env_var"]"))) print}'
      fi
    fi

  else
    echo "Column \"$3\" does not exist."

  fi

else
  echo "Table \"$1.$2\" does not exist."

fi

exit 0
				
			

A call would look like this:

				
					./ycql_contains.sh keyspace_name table_name column_name search_value optional_where_clause
				
			
				
					[root@localhost ~]# ./ycql_contains.sh ks catalog colors Grey
 id | item      | colors
----+-----------+---------------------------------------------
  1 |  Shirt #1 |            ['Red', 'Blue', 'Green', 'Grey']
  9 | Shorts #3 |                     ['Red', 'Blue', 'Grey']
  4 |  Pants #1 | ['Red', 'Grey', 'Purple', 'Black', 'White']

[root@localhost ~]# ./ycql_contains.sh ks catalog colors Grey id=9
 id | item      | colors
----+-----------+-------------------------
  9 | Shorts #3 | ['Red', 'Blue', 'Grey']
				
			

Note that this work around won’t be ideal on large data sets if you don’t provide a WHERE clause with one or more partition key columns.

Have Fun!

Pier 1, Cortez Beach, FL