Using the WIDTH_BUCKET Function

In YugabyteDB’s YSQL API the WIDTH_BUCKET function constructs equal width histograms, in which the histogram range is divided into intervals (buckets) of identical sizes.

The syntax of the function is:

				
					WIDTH_BUCKET ( expression, hist_min, hist_max, bucket_count )

Example:

Say we have a table named TV_SHOW that stores TV show names and their corresponding average viewer age:

				
					yugabyte=# SELECT * FROM tv_show ORDER BY name;
         name          | age_avg
-----------------------+---------
 60 Minutes            |     100
 American Idol         |      25
 Andy Griffith Show    |      91
 Arrested Development  |      32
 Blue Bloods           |      63
 CSI: NY               |      58
 Dallas                |      33
 Dancing with DBAs     |      42
 Family Guy            |      18
 Golden Girls          |      63
 Magnum P.I.           |      60
 Modern Family         |      32
 Murder She Wrote      |      53
 SpongeBob SquarePants |      11
 Wheel of Fortune      |      42
 Wiggles               |       7
(16 rows)

We want to group the viewers into the following ten age groups:

We can do this easily with the WIDTH_BUCKET function:

				
					yugabyte=# SELECT name,
       age_avg,
       width_bucket(age_grp, 0, 99, 10) agr_grp
  FROM tv_show
 ORDER BY width_bucket;
         name          | age_avg | age_grp
-----------------------+---------+---------
 Wiggles               |       7 |        1
 SpongeBob SquarePants |      11 |        2
 Family Guy            |      18 |        2
 American Idol         |      25 |        3
 Dallas                |      33 |        4
 Arrested Development  |      32 |        4
 Modern Family         |      32 |        4
 Wheel of Fortune      |      42 |        5
 Dancing with DBAs     |      42 |        5
 Murder She Wrote      |      53 |        6
 CSI: NY               |      58 |        6
 Magnum P.I.           |      60 |        7
 Blue Bloods           |      63 |        7
 Golden Girls          |      63 |        7
 Andy Griffith Show    |      91 |       10
 60 Minutes            |     100 |       11
(16 rows)

Note: The WIDTH_BUCKET function will create an over flow bucket automatically, hence “60 Minutes” falling into group 11…

We can also count the number of shows that fall in to each of the ranges:

				
					yugabyte=# SELECT width_bucket(age_avg, 0, 99, 10) age_grp,
yugabyte-#        COUNT(*) cnt
yugabyte-#   FROM tv_show
yugabyte-#  GROUP BY 1
yugabyte-#  ORDER BY 1;
 age_grp | cnt
---------+-----
       1 |   1
       2 |   2
       3 |   1
       4 |   3
       5 |   2
       6 |   2
       7 |   3
      10 |   1
      11 |   1
(9 rows)

Have Fun!