Identify Unused YCQL Indexes

Yugabyte Cloud Query Language (YCQL) is a semi-relational SQL API that is best fit for internet-scale OLTP and HTAP applications needing massive data ingestion and blazing-fast queries. It supports strongly consistent secondary indexes, a native JSON column type, and distributed transactions. It has its roots in the Cassandra Query Language (CQL).

Using Secondary indexes in YCQL enhances database performance by enabling the database server to find rows faster.

YCQL supports Unique, Partial, Covering, and Multi-column secondary indexes.

If indexes are so great, why not create more and more of them?

Because indexes come at a cost. Each INSERT operation incurs an overhead.

Developers sometimes add too many indexes in an attempt to boost query performance, but this can lead to inefficiencies.

This is especially true if created indexes are not used by the query workoad!

How do we identify unused indexes in YCQL?

We learned in a previous YugabyteDB tip that this is relatively easy to do in YSQL.

Here’s one solution, using a bash script, for identifying indexes that are not being used by a partitcular set of queries for a given YCQL keyspace.

Bash script code:

				
					#!/bin/bash
shopt -s expand_aliases

# Script:      ycql_unused_indexes.sh
# Description: This script will identify unused indexes for a given keyspace and set of queries
# Created:     07/09/2024
# Last update: 07/09/2024

# Set up
# Check if keyspace name was provided
if [ -z "$1" ]
then
  echo "Please provide a keyspace name"
  exit 0
else
  YB_KEYSPACE_NAME=$1;
fi

# Check if sql file was provided
if [ -z "$2" ]
then
  echo "Please provide a full path to the file containing relevant SQL queries"
  exit 0
else
  YB_SQL_FILE=$2;
fi

# Set up the YCQLSH alias below per your environment
YB_PATH='ycqlsh'
YB_IP='127.0.0.1' # This can be any node runnig the T-Server process
YB_USER='cassandra'
YB_PASSWORD='password'

alias y="$YB_PATH $YB_IP -u $YB_USER -p $YB_PASSWORD"

# Store index usage from explain plans
INDEX_USAGE=$(cat $YB_SQL_FILE | sed 's/sELECT/EXPLAIN SELECT/gI' | y | grep Index)

# Get the indexes in the provided keyspace
indexes=$(y -e "SELECT table_name, index_name FROM system_schema.indexes WHERE keyspace_name = '$YB_KEYSPACE_NAME'" | awk 'FNR>=4 {print}' | head -n -2 | sed "s/ //g")

# Iterate through index list, output those that are not used

echo "The following indexes are not used by the queries provided."
echo "You may want to consider dropping them."
echo ""

not_found_cnt=0
IFS=$'\n'

for index in $indexes
do
  TABLE=$(echo $index | awk -F '|' '{print $1}')
  INDEX=$(echo $index | awk -F '|' '{print $2}')

  if [[ "$INDEX_USAGE" != *"$YB_KEYSPACE_NAME.$INDEX"* ]]; then
    not_found_cnt=$((not_found_cnt+1))
    echo "$not_found_cnt. $TABLE.$INDEX"
  fi

done

echo ""
echo "Total unused indexes: $not_found_cnt"

exit 0

Here is an example…

A table and its indexes:

				
					[root@Cloud_Server_0 ~]# ycqlsh 127.0.0.1 -u cassandra -p password -r -e "DESC TABLE k.t;"

CREATE TABLE k.t (
    c1 int,
    c2 int,
    c3 text,
    PRIMARY KEY (c1, c2)
) WITH CLUSTERING ORDER BY (c2 ASC)
    AND default_time_to_live = 0
    AND transactions = {'enabled': 'true'};
CREATE INDEX t_c2_idx ON k.t (c3, c1, c2)
    WITH CLUSTERING ORDER BY (c1 ASC, c2 ASC)
    AND transactions = {'enabled': 'true'};
CREATE INDEX t_c2_idx2 ON k.t (c3, c1, c2)
    WITH CLUSTERING ORDER BY (c1 ASC, c2 ASC))
    AND transactions = {'enabled': 'true'};

A file containing several sample queries:

				
					[root@Cloud_Server_0 ~]# cat q.sql
SELECT * FROM k.t WHERE c3 = 'A';
SELECT c3 FROM k.t WHERE c1 = 1 AND c3 = 'A';
SELECT c3 FROM k.t WHERE c1 = 1 AND c2 = 1 AND c3 = 'A';

Executing the bash script:

				
					[root@Cloud_Server_0 ~]# ./ycql_unused_indexes.sh k q.sql
The following indexes are not used by the queries provided.
You may want to consider dropping them.

1. t.t_c2_idx2

Total unused indexes: 1

Have Fun!