In YugabyteDB’s YSQL API the WIDTH_BUCKET
function constructs equal width histograms, in which the histogram range is divided into intervals (buckets) of identical sizes.
The syntax of the function is:
WIDTH_BUCKET ( expression, hist_min, hist_max, bucket_count )
Example:
Say we have a table named TV_SHOW
that stores TV show names and their corresponding average viewer age:
yugabyte=# SELECT * FROM tv_show ORDER BY name;
name | age_avg
-----------------------+---------
60 Minutes | 100
American Idol | 25
Andy Griffith Show | 91
Arrested Development | 32
Blue Bloods | 63
CSI: NY | 58
Dallas | 33
Dancing with DBAs | 42
Family Guy | 18
Golden Girls | 63
Magnum P.I. | 60
Modern Family | 32
Murder She Wrote | 53
SpongeBob SquarePants | 11
Wheel of Fortune | 42
Wiggles | 7
(16 rows)
We want to group the viewers into the following ten age groups:
We can do this easily with the WIDTH_BUCKET function:
yugabyte=# SELECT name,
age_avg,
width_bucket(age_grp, 0, 99, 10) agr_grp
FROM tv_show
ORDER BY width_bucket;
name | age_avg | age_grp
-----------------------+---------+---------
Wiggles | 7 | 1
SpongeBob SquarePants | 11 | 2
Family Guy | 18 | 2
American Idol | 25 | 3
Dallas | 33 | 4
Arrested Development | 32 | 4
Modern Family | 32 | 4
Wheel of Fortune | 42 | 5
Dancing with DBAs | 42 | 5
Murder She Wrote | 53 | 6
CSI: NY | 58 | 6
Magnum P.I. | 60 | 7
Blue Bloods | 63 | 7
Golden Girls | 63 | 7
Andy Griffith Show | 91 | 10
60 Minutes | 100 | 11
(16 rows)
Note: The
WIDTH_BUCKET
function will create an over flow bucket automatically, hence “60 Minutes” falling into group 11… We can also count the number of shows that fall in to each of the ranges:
yugabyte=# SELECT width_bucket(age_avg, 0, 99, 10) age_grp,
yugabyte-# COUNT(*) cnt
yugabyte-# FROM tv_show
yugabyte-# GROUP BY 1
yugabyte-# ORDER BY 1;
age_grp | cnt
---------+-----
1 | 1
2 | 2
3 | 1
4 | 3
5 | 2
6 | 2
7 | 3
10 | 1
11 | 1
(9 rows)
Have Fun!