sql

This post is a short story about a bug that we found a couple of days ago in our backend application. The bug that originated with a simple mistake in an SQL statement, but it was the one we'd expect to be handled by the database itself. Some time ago we considered ditching MySQL for Postgres, and as a matter of fact, that switch would have saved us this time. Let's see how both products handle something we can call casted joins.

What happened exactly? In one of our applications we've prepared an SQL statement which looked more or less like this:

SELECT product.id, product.name, category.name FROM product JOIN category ON product.category_id = category.id;

This doesn't look strange (and it shouldn't), but the bug was hidden in tables' structures (it's simplified since we don't need to know too much):

mysql> describe product;
+-------------+------+------+-----+---------+-------+
| Field       | Type | Null | Key | Default | Extra |
+-------------+------+------+-----+---------+-------+
| id          | text | YES  |     | NULL    |       |
| category_id | text | YES  |     | NULL    |       |
| name        | text | YES  |     | NULL    |       |
+-------------+------+------+-----+---------+-------+    

mysql> describe category;
+-------------+---------+------+-----+---------+-------+
| Field       | Type    | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| id          | int(11) | YES  |     | NULL    |       |
| category_id | text    | YES  |     | NULL    |       |
| name        | text    | YES  |     | NULL    |       |
+-------------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)

Now you can see what was wrong - we tried to join product.category_id of type text with category.id which was an int! We should've used category_id field from each table! Let's just say that the result of the JOIN was later used for updating something somewhere else. A couple days ago we noticed some strange values being updated, so we wanted to examine what was going on. It turned out that some values actually matched in this query (and vast majority didn't) and it always resulted in broken updates. What was going on?

It turned out, that our textual IDs (we use UUIDs) matched numerical IDs when they start with a sequence of digits was equal to that numerical one. It surprised us because since we rely strongly on strict typing (we have 100% Go backend and 100% Typescript frontend in our projects) we expected our database to behave similarly. That's why I wanted to compare how MySQL and Postgres handle such situation.

In the samples below I create two tables with one value field each: ints with numerical value and strings with textual. Then I try to insert two entries to each table: number 1 and text "1a". Finally, I make a JOIN on them and see what the result is.

Casted join in MySQL

Preparing the field:

mysql> create database playground;
Query OK, 1 row affected (0.00 sec)
mysql> use playground;
Database changed
mysql> CREATE TABLE ints (value int);
Query OK, 0 rows affected (0.03 sec)
mysql> CREATE TABLE strings (value text);
Query OK, 0 rows affected (0.02 sec)

Now we insert values to numerical table:

mysql> INSERT INTO ints VALUES (1);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO ints VALUES ("1a");
ERROR 1265 (01000): Data truncated for column 'value' at row 1

Cool! We cannot insert a text value as integer because it just is not one! We verify that:

mysql> SELECT * FROM ints;
+-------+
| value |
+-------+
|     1 |
+-------+
1 row in set (0.01 sec)

Now we do the same with strings table:

mysql> INSERT INTO strings VALUES ("1a");
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO strings VALUES (1);
Query OK, 1 row affected (0.01 sec)

Well, we inserted a number as text which might be bothering a bit, but in fact, all programming languages I know allow me to convert number to string pretty easily, so I won't be devastated. We can verify both values got inserted:

mysql> SELECT * FROM strings;
+-------+
| value |
+-------+
| 1a    |
| 1     |
+-------+
2 rows in set (0.00 sec)

Now our dreadful join:

mysql> SELECT ints.value, strings.value FROM ints JOIN strings USING (value);
+-------+-------+
| value | value |
+-------+-------+
|     1 | 1a    |
|     1 | 1     |
+-------+-------+
2 rows in set (0.00 sec)

Bleh!

Casted join in Postgres

Again, start with the obvious part:

postgres=# create database playground;
CREATE DATABASE
postgres=# CREATE TABLE ints (value int);
CREATE TABLE
postgres=# CREATE TABLE strings (value text);
CREATE TABLE

By the way, Postgres didn't require me to change database explicitly - I like that! Now let's insert values to ints and see what got in:

postgres=# INSERT INTO ints VALUES (1);
INSERT 0 1
postgres=# INSERT INTO ints VALUES ('1a');
ERROR:  invalid input syntax for integer: "1a"
LINE 1: INSERT INTO ints VALUES ('1a');
                                        ^
postgres=# SELECT * FROM ints;
value 
-------
    1
(1 row)

So far so good, let's see the second table:

postgres=# INSERT INTO strings VALUES ('1a');
INSERT 0 1
postgres=# INSERT INTO strings VALUES (1);
INSERT 0 1
postgres=# SELECT * FROM strings;
value 
-------
1a
1
(2 rows)

Again we inserted number as text, but we'll be fine I guess. Now here comes our real test:

postgres=# SELECT ints.value, strings.value FROM ints JOIN strings USING (value);
ERROR:  JOIN/USING types integer and text cannot be matched

Yay! That's what I'd expect from an SQL database to do in this case. Don't assume I did what I wanted, rather make me cast the values myself (to make sure I know what I'm doing, which I wasn't in our application's code).

Advantage: Postgres.