How do you implement True/False columns?
There are many ways to do it, each with its own pros and cons.
ENUM
Create you column as ENUM(‘F’, ‘T’), or ENUM(‘N’,’Y’) or ENUM(‘0’, ‘1’).
This is the method used in the mysql tables (e.g. mysql.user privileges table). It’s very simple and intuitive. It truly restricts the values to just two options, which serves well. It’s compact (just one byte).
A couple disadvantages to this method:
- Enums are represented by numerical values (which is good) and start with 1 instead of 0. This means ‘F’ is 1, and ‘T’ is 2, and they both translate to True when directly used in a booleanic expression (e.g. IF(val, ‘True’, ‘False’) always yields ‘True’)
- There’s no real convention. Is it ‘Y’/’N’? ‘T’/’F’? ‘P’/’N’? ‘1’/’0′?
CHAR(1)
Simple again. Proposed values are, as before, ‘F’, ‘T’ etc. This time there’s no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an ‘X’.
Watch out for the charset! If it’s utf8 you pay with 3 bytes instead of just 1. And, again, ‘T’, ‘F’, ‘Y’, ‘N’ values all evaluate as True. It is possible to use the zero-valued character, but it defeats the purpose of using CHAR.
CHAR(0)
Many are unaware that it’s even valid to make this definition. What does it mean? Take a look at the following table:
CREATE TABLE `t1` ( `bval` char(0) DEFAULT NULL ); mysql> INSERT INTO t1 VALUES (''); mysql> INSERT INTO t1 VALUES (''); mysql> INSERT INTO t1 VALUES (NULL); mysql> SELECT * FROM t1; +------+ | bval | +------+ | | | | | NULL | +------+
NULLable columns cause for an additional storage per row. There’s one bit per NULLable column which notes down whether the column’s value is NULL or not. If you only have one NULLable column, you must pay for this bit with 1 byte. If you have two NULLable columns, you still only pay with 1 byte.
Furthermore:
mysql> SELECT bval IS NOT NULL FROM t1; +------------------+ | bval IS NOT NULL | +------------------+ | 1 | | 1 | | 0 | +------------------+
So this plays somewhat nicely into booleanic expressions.
However, this method is unintuitive and confusing. I personally don’t use it.
TINYINT
With integer values, we can get down to 0 and 1. With TINYINT, we only pay with 1 byte of storage. As with CHAR(1), we cannot prevent anyone from INSERTing other values. But that doesn’t really matter, if we’re willing to accept that 0 evaluates as False, and all other values as True. In this case, boolean expressions work very well with your column values.
BOOL/BOOLEAN
These are just synonyms to TINYINT. I like to define my boolean values as such. Alas, when issuing a SHOW CREATE TABLE the definition is just a normal TINYINT. Still, it is clearer to look at if you’re storing your table schema under your version control.
@Henrik,
Yes, in case of multiple columns and uneven distribution. Actually, I find this kind of indexing pretty popular. e.g. key(is_unhandled, customer_id).
When distribution is really uneven, even a single-column index on the boolean type can be useful (although my experience shows that you can usually extend to add other columns and benefit more queries using the index).
You can use (and we mostly do) type ENUM(”,’Y’) so ” translates to False.
@Pavel: great!
It’s also relevant to mention that creating a [VAR]CHAR(5) column to store ‘True’ or ‘False’ is not efficient due to extra bytes. I’ve seen this example before, and changing to ENUM(‘True’,’False’) both addresses no code changes, and reduces the size to 1 byte.
@Ronald,
Good point.