The DISTINCT ON
clause in YugabyteDB YSQL provides greater flexibility than the standard DISTINCT
clause by allowing us to retrieve unique rows based on specific columns.
It enables us to specify which row to retain for each unique value using an ORDER BY
clause.
Examle:
Let’s say we have a students
table that stores exam scores:
yugabyte=# CREATE TABLE students (
yugabyte(# id SERIAL PRIMARY KEY,
yugabyte(# name TEXT,
yugabyte(# subject TEXT,
yugabyte(# score INT,
yugabyte(# exam_date DATE
yugabyte(# );
CREATE TABLE
yugabyte=# INSERT INTO students (name, subject, score, exam_date) VALUES
yugabyte-# ('Jane', 'Distributed SQL Databases', 85, '2025-03-01'),
yugabyte-# ('Jane', 'Distributed SQL Databases', 90, '2025-03-15'),
yugabyte-# ('Lucy', 'Distributed SQL Databases', 88, '2025-03-10'),
yugabyte-# ('Lucy', 'Distributed SQL Databases', 92, '2025-03-20'),
yugabyte-# ('Alan', 'Distributed SQL Databases', 78, '2025-03-05');
INSERT 0 5
To get the highest score for each student in “Distributed SQL Databases”, we can run:
yugabyte=# SELECT DISTINCT ON (name) name, subject, score, exam_date
yugabyte-# FROM students
yugabyte-# WHERE subject = 'Distributed SQL Databases'
yugabyte-# ORDER BY name, score DESC;
name | subject | score | exam_date
------+---------------------------+-------+------------
Alan | Distributed SQL Databases | 78 | 2025-03-05
Jane | Distributed SQL Databases | 90 | 2025-03-15
Lucy | Distributed SQL Databases | 92 | 2025-03-20
(3 rows)
To achieve the same result using ROW_NUMBER()
, you can use a Common Table Expression (CTE) to rank students based on their highest scores and then retrieve only the top-ranked row for each student.
yugabyte=# WITH ranked_students AS (
yugabyte(# SELECT name,
yugabyte(# subject,
yugabyte(# score,
yugabyte(# exam_date,
yugabyte(# ROW_NUMBER() OVER (PARTITION BY name ORDER BY score DESC) AS rn
yugabyte(# FROM students
yugabyte(# WHERE subject = 'Distributed SQL Databases'
yugabyte(# )
yugabyte-# SELECT name, subject, score, exam_date
yugabyte-# FROM ranked_students
yugabyte-# WHERE rn = 1;
name | subject | score | exam_date
------+---------------------------+-------+------------
Alan | Distributed SQL Databases | 78 | 2025-03-05
Jane | Distributed SQL Databases | 90 | 2025-03-15
Lucy | Distributed SQL Databases | 92 | 2025-03-20
(3 rows)
DISTINCT ON
is often a more elegant and concise way to get the top 1 result per group in YugabyteDB YSQL compared to using ROW_NUMBER()
. Have Fun!
