yb_hash_code is a function that returns the hash of a set of given input values using the hash function DocDB uses to shard its data.
In effect, it provides direct access to the hash value of any given row of a YSQL table, allowing one to infer a row’s physical location.
This enables an application to specify queries based on the physical location of a row or set of rows.
The hashed value returned from yb_hash_code is influenced by the data type of the value passed to it.
Example:
yugabyte=# \x
Expanded display is on.
yugabyte=# SELECT YB_HASH_CODE(12345::INT) "12345::INT",
yugabyte-# YB_HASH_CODE(12345::VARCHAR(10)) "12345::VARCHAR(10)",
yugabyte-# YB_HASH_CODE(12345::VARCHAR(100)) "12345::VARCHAR(100)",
yugabyte-# YB_HASH_CODE(12345::NUMERIC(100,0)) "12345::NUMERIC(100,0)",
yugabyte-# YB_HASH_CODE(12345::NUMERIC(100,5)) "12345::NUMERIC(100,5)",
yugabyte-# YB_HASH_CODE(12345::NUMERIC(100,10)) "12345::NUMERIC(100,10)",
yugabyte-# YB_HASH_CODE(12345::FLOAT) "12345::FLOAT";
-[ RECORD 1 ]----------+------
12345::INT | 42669
12345::VARCHAR(10) | 22093
12345::VARCHAR(100) | 22093
12345::NUMERIC(100,0) | 11640
12345::NUMERIC(100,5) | 11640
12345::NUMERIC(100,10) | 11640
12345::FLOAT | 9431
When joining two tables a column that is part of the partition key, it’s important that the columns have the same data type.
Otherwise there is a chance the the data from one table might reside on a different node than the other table.
Example:
yugabyte=# CREATE TABLE a (c INT PRIMARY KEY);
CREATE TABLE
yugabyte=# INSERT INTO a SELECT 1;
INSERT 0 1
yugabyte=# CREATE TABLE b ( c NUMERIC(10, 0) PRIMARY KEY);
CREATE TABLE
yugabyte=# INSERT INTO b SELECT 1;
INSERT 0 1
yugabyte=# SELECT COUNT(*) FROM a JOIN b ON b.c = a.c; -- This JOIN is okay...
count
-------
1
(1 row)
yugabyte=# SELECT COUNT(*) FROM a JOIN b ON YB_HASH_CODE(b.c) = YB_HASH_CODE(a.c); -- But the data is on different nodes!
count
-------
0
(1 row)
This scenario could result in higher query latency dependning on the network between the nodes.
It’s best practice to join tables on columns that have the same data type – especially in a distributed SQL database like YugabyteDB!
Have Fun!
