Choosing MySQL boolean data types

June 3, 2010

How do you implement True/False columns?

There are many ways to do it, each with its own pros and cons.

ENUM

Create you column as ENUM('F', 'T'), or ENUM('N','Y') or ENUM('0', '1').

This is the method used in the mysql tables (e.g. mysql.user privileges table). It's very simple and intuitive. It truly restricts the values to just two options, which serves well. It's compact (just one byte).

A couple disadvantages to this method:

  1. Enums are represented by numerical values (which is good) and start with 1 instead of 0. This means 'F' is 1, and 'T' is 2, and they both translate to True when directly used in a booleanic expression (e.g. IF(val, 'True', 'False') always yields 'True')
  2. There's no real convention. Is it 'Y'/'N'? 'T'/'F'? 'P'/'N'? '1'/'0'?

CHAR(1)

Simple again. Proposed values are, as before, 'F', 'T' etc. This time there's no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an 'X'.

Watch out for the charset! If it's utf8 you pay with 3 bytes instead of just 1. And, again, 'T', 'F', 'Y', 'N' values all evaluate as True. It is possible to use the zero-valued character, but it defeats the purpose of using CHAR.

CHAR(0)

Many are unaware that it's even valid to make this definition. What does it mean? Take a look at the following table:

CREATE TABLE `t1` (
 `bval` char(0) DEFAULT NULL
);
mysql> INSERT INTO t1 VALUES ('');
mysql> INSERT INTO t1 VALUES ('');
mysql> INSERT INTO t1 VALUES (NULL);

mysql> SELECT * FROM t1;
+------+
| bval |
+------+
|      |
|      |
| NULL |
+------+

NULLable columns cause for an additional storage per row. There's one bit per NULLable column which notes down whether the column's value is NULL or not. If you only have one NULLable column, you must pay for this bit with 1 byte. If you have two NULLable columns, you still only pay with 1 byte.

Furthermore:

mysql> SELECT bval IS NOT NULL FROM t1;
+------------------+
| bval IS NOT NULL |
+------------------+
|                1 |
|                1 |
|                0 |
+------------------+

So this plays somewhat nicely into booleanic expressions.

However, this method is unintuitive and confusing. I personally don't use it.

TINYINT

With integer values, we can get down to 0 and 1. With TINYINT, we only pay with 1 byte of storage. As with CHAR(1), we cannot prevent anyone from INSERTing other values. But that doesn't really matter, if we're willing to accept that 0 evaluates as False, and all other values as True. In this case, boolean expressions work very well with your column values.

BOOL/BOOLEAN

These are just synonyms to TINYINT. I like to define my boolean values as such. Alas, when issuing a SHOW CREATE TABLE the definition is just a normal TINYINT. Still, it is clearer to look at if you're storing your table schema under your version control.

tags: ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

12 Comments to "Choosing MySQL boolean data types"

  1. Andrew Hutchings wrote:

    It may be interesting to note that if you need multiple T/F columns in a table you can pack them all in the BIT data type. This would only take approximately (M+7)/8 bytes where M is the number of bits (although you could probably do this if you bitmask a TINYINT too).

  2. shlomi wrote:

    @Andrew,
    Cool. This is DBA-ing C-style. But makes for complicated code and impossible indexing. When considering such approach, I think the CHAR(0) provides with better (less worse?) design.

  3. Andrew Hutchings wrote:

    @Shlomi
    Ah yes, good points, I didn't consider indexing :)

    The other advantage of BIT data type is you can have multiple BIT(1) columns and if you are using NDB it will pack them into a 4 byte block automatically. I don't think other storage engines do this unfortunately.

  4. shlomi wrote:

    @Andrew,
    Didn't know cluster does that (don't know much about cluster, for that matter...). That's cool.

  5. Henrik Ingo wrote:

    Hmm... So what is a good use case where you even want to index a boolean column? Ok, so as part of a multi-column index it makes sense.

    Depending on the app, packing several T/F values into one BIT field may even be an advantage, if there is a case where you use *all* of the boolean values to select rows.

  6. shlomi wrote:

    @Henrik,

    Yes, in case of multiple columns and uneven distribution. Actually, I find this kind of indexing pretty popular. e.g. key(is_unhandled, customer_id).
    When distribution is really uneven, even a single-column index on the boolean type can be useful (although my experience shows that you can usually extend to add other columns and benefit more queries using the index).

  7. Pavel Francírek wrote:

    You can use (and we mostly do) type ENUM('','Y') so '' translates to False.

  8. shlomi wrote:

    @Pavel: great!

  9. Ronald Bradford wrote:

    It's also relevant to mention that creating a [VAR]CHAR(5) column to store 'True' or 'False' is not efficient due to extra bytes. I've seen this example before, and changing to ENUM('True','False') both addresses no code changes, and reduces the size to 1 byte.

  10. shlomi wrote:

    @Ronald,
    Good point.

  11. shlomi wrote:

    @Andrew,
    My bad: with InnoDB, you cannot create an index on CHAR(0) columns anyhow. So my comment #2 becomes irrelevant.

  12. Janis wrote:

    So it is about to guessing when mysql will have simple boolean data type? It is rather funny to see how all of us make workarounds of design fault or bug in mysql.

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org