Data Types – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Thu, 24 Jan 2013 08:08:40 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation https://shlomi-noach.github.io/blog/mysql/re-mysql-5-1-vs-mysql-5-5-floats-doubles-and-scientific-notation https://shlomi-noach.github.io/blog/mysql/re-mysql-5-1-vs-mysql-5-5-floats-doubles-and-scientific-notation#comments Thu, 24 Jan 2013 08:08:40 +0000 https://shlomi-noach.github.io/blog/?p=6028 Reading Sheeri’s MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.

However, I was also concerned about the final action taken: using “–ignore-columns” to avoid comparing the FLOAT/DOUBLE types.

The –float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:

mysql> create table tf(f float);
Query OK, 0 rows affected (0.11 sec)

mysql> insert into tf values(0.0000958084);
Query OK, 1 row affected (0.04 sec)

mysql-5.1> select * from tf;
+-------------+
| f           |
+-------------+
| 9.58084e-05 |
+-------------+

mysql-5.5> select * from tf;
+--------------+
| f            |
+--------------+
| 0.0000958084 |
+--------------+

How can we normalize the notation?

Easily: CAST it as DECIMAL. Consider:

mysql-5.1> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+-------------+--------------+
| f           | fn           |
+-------------+--------------+
| 9.58084e-05 | 0.0000958084 |
+-------------+--------------+

mysql-5.5> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+--------------+--------------+
| f            | fn           |
+--------------+--------------+
| 0.0000958084 | 0.0000958084 |
+--------------+--------------+

The normalization works well in both cases; also, taking care to only normalize values in the range [-1, 1].

The change in pt-table-checksum? One line of code:

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND($result, $float_precision)";
         }

Turns to

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND(IF($result BETWEEN -1 AND 1, CAST($result AS DECIMAL(65,30)), $result), $float_precision)";
         }

I’ve just submitted a blueprint, but I would think changing one line of code shouldn’t wait till next release of Percona Toolkit: one can edit in-place their /usr/bin/pt-table-checksum and proceed to validate their database integrity.

]]>
https://shlomi-noach.github.io/blog/mysql/re-mysql-5-1-vs-mysql-5-5-floats-doubles-and-scientific-notation/feed 10 6028
CHAR and spaces https://shlomi-noach.github.io/blog/mysql/char-and-spaces https://shlomi-noach.github.io/blog/mysql/char-and-spaces#comments Tue, 12 Jun 2012 05:28:08 +0000 https://shlomi-noach.github.io/blog/?p=4528 I know about it, I knew about it all along, but… it’s so easy to fall for it; there’s just so much absurdity!

A CHAR type has a known number of characters. For example, the column:

CountryCode CHAR(3) CHARSET ascii NOT NULL

– is known to have exactly three characters. These could be ‘USA’, ‘FRA’, etc.

What happens with spaces? A SQL CHAR type ignores any trailing spaces; thus, the code ‘GB ‘ (the characters ‘G’, ‘B’, and the space ‘ ‘) is interpreted as ‘GB’. Trailing spaces are not regarded as part of the text. Want to see some absurdity?

CREATE TABLE `c_test` (
  `c` char(1) DEFAULT NULL
);
INSERT INTO c_test VALUES ('a');
INSERT INTO c_test VALUES ('b');
INSERT INTO c_test VALUES (' ');

SELECT c, LENGTH(c), c = 'a', c = 'b', c = ' ', LENGTH(' ') FROM c_test;
+------+-----------+---------+---------+---------+-------------+
| c    | LENGTH(c) | c = 'a' | c = 'b' | c = ' ' | LENGTH(' ') |
+------+-----------+---------+---------+---------+-------------+
| a    |         1 |       1 |       0 |       0 |           1 |
| b    |         1 |       0 |       1 |       0 |           1 |
|      |         0 |       0 |       0 |       1 |           1 |
+------+-----------+---------+---------+---------+-------------+

Note that LENGTH(c) is zero on third row, but nevertheless it equals the space character (c = ‘ ‘). Although, the length of the space character is… 1 (and this is because the constant ‘ ‘ is interpreted as VARCHAR).

To emphasize the absurdity, now take a closer look:

SELECT c, LENGTH(c), c = ' ', LENGTH(' '), c = '', LENGTH('') FROM c_test;
+------+-----------+---------+-------------+--------+------------+
| c    | LENGTH(c) | c = ' ' | LENGTH(' ') | c = '' | LENGTH('') |
+------+-----------+---------+-------------+--------+------------+
| a    |         1 |       0 |           1 |      0 |          0 |
| b    |         1 |       0 |           1 |      0 |          0 |
|      |         0 |       1 |           1 |      1 |          0 |
+------+-----------+---------+-------------+--------+------------+

I add a comparison with the empty string , and it turns out the space character equals both a space character and the empty string, though their lengths do not agree.

Now, I usually don’t bother with CHAR types, and prefer the VARCHAR or TEXT types, but every once in a while I do use it. Most recently, in a stored routine code in common_schema, I iterate & parse some text. I iterated the text char by char, and declared a local CHAR(1) variable to assign iterated value to. It sure bit me when I got to a space character which insisted it was the empty text (indicating end of iteration).

Of course the immediate and correct solution was to change the local variable’s type to VARCHAR(1) instead. But the time it consumed to find the problem… Be warned and be aware!

]]>
https://shlomi-noach.github.io/blog/mysql/char-and-spaces/feed 12 4528
23:59:59 is not the end of the day. No, really! https://shlomi-noach.github.io/blog/mysql/235959-is-not-the-end-of-the-day-no-really https://shlomi-noach.github.io/blog/mysql/235959-is-not-the-end-of-the-day-no-really#comments Wed, 27 Jul 2011 18:16:00 +0000 https://shlomi-noach.github.io/blog/?p=3839 How would you check whether some TIMESTAMP column falls within a given day, say July 26th, 2011?

This is a question I tend to ask students, and usually I get the same range of answers. Some are wrong; some cannot utilize an index, some are correct, and some are temporarily correct. I wish to take a closer look at the last. Such an answer looks like this:

SELECT * FROM some_table WHERE timstamp_column BETWEEN '2011-07-26 00:00:00' AND '2011-07-26 23:59:59'

Yikes! I get my allergies when I see this one.

Technically this seems correct. And it seems to work so far for people. There are two things that disturb me:

  1. ’23:59:59′ refers to the beginning of the last second of the day. This means a full second is missing in concept. No one would write such a code on a C, Java or PHP application: that would be a newbie’s mistake. But people feel at ease doing it with SQL
  2. Of course, this works when timestamps are in a one second resolution. Which leads me to:

  3. One second resolution is a temporary issue. Already MariaDB 5.3 presents with microsecond support for NOW() and TIMESTAMP, TIME and DATETIME columns. Sometime in the not-so-far future the standard MySQL distribution will have that, too.

Soon people will start losing data, reporting wrong numbers, perform incorrect aggregations. My advise:

grep -iR "23:59:59" /path/to/application/code/*

And get rid of it. A good alternative would be:

SELECT * FROM some_table WHERE timstamp_column >= DATE('2011-07-26') AND timestamp_column < (DATE('2011-07-26') + INTERVAL 1 DAY)

(Why use all this INTERVAL stuff? Think 28/29/30/31 days in month and get the answer)

And do not use LIKE ‘2011-07-26 %’. Read this if you’re not sure why.

]]>
https://shlomi-noach.github.io/blog/mysql/235959-is-not-the-end-of-the-day-no-really/feed 13 3839
Announcing common_schema: common views & routines for MySQL https://shlomi-noach.github.io/blog/mysql/announcing-common_schema-common-views-routines-for-mysql https://shlomi-noach.github.io/blog/mysql/announcing-common_schema-common-views-routines-for-mysql#comments Wed, 13 Jul 2011 04:25:24 +0000 https://shlomi-noach.github.io/blog/?p=3794 Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.

What does it do?

There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals… There are basic functions answering for common needs.

Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.

Here are a few highlights:

There’s more. Take a look at the common_schema documentation for full listing. And it’s evolving: I’ve got quite a few ideas already for future components.

Some of these views rely on heavyweight INFORMATION_SCHEMA tables. You should be aware of the impact and risks.

What do I need to install?

There’s no script or executable file. It’s just a schema. The distribution in an SQL file which generates common_schema. Much like a dump file.

What are the system requirements?

It’s just between you and your MySQL. There are currently three distribution files, dedicated for different versions of MySQL (and allowing for increased functionality):

  • common_schema_mysql_51: fits all MySQL >= 5.1 distributions
  • common_schema_innodb_plugin: fits MySQL >= 5.1, with InnoDB plugin + INFORMATION_SCHEMA tables enabled
  • common_schema_percona_server: fits Percona Server >= 5.1

Refer to the documentation for more details.

What are the terms of use?

common_schema is released under the BSD license.

Where can I download it?

On the common_schema project page. Enjoy it!

]]>
https://shlomi-noach.github.io/blog/mysql/announcing-common_schema-common-views-routines-for-mysql/feed 7 3794
TIMESTAMP vs. DATETIME, which should I be using? https://shlomi-noach.github.io/blog/mysql/timestamp-vs-datetime-which-should-i-be-using https://shlomi-noach.github.io/blog/mysql/timestamp-vs-datetime-which-should-i-be-using#comments Sun, 22 May 2011 04:11:47 +0000 https://shlomi-noach.github.io/blog/?p=3361 They both look the same, have roughly the same use. Nevertheless, they differ in many respects. I’ll note the difference, and note a few pitfalls and peculiarities.

Range

TIMESTAMP starts with the epoch, ‘1970-01-01 00:00:01’ UTC and ends with ‘2038-01-19 03:14:07’ UTC. This is all very nice today, and may actually hold up till our retirement, but the younger readers may yet have to deal with the Bug2K+38™, certain to arrive.

DATETIME starts with ‘1000-01-01 00:00:00’ and lasts up to ‘9999-12-31 23:59:59’. More on that later.

In respect of range, your current event logs may well use a TIMESTAMP value, while your grandfather’s and granddaughter’s birth dates may require DATETIME.

In general I would suggest that anything that relates to now, can be a TIMESTAMP. A new entry is added? It is added now. It can be represented by a TIMESTAMP. Anything has an expiry time of a few minutes, perhaps a few days or a month? You’ll be safe using it up till late 2037. Anything else had better use a DATETIME. In particular, dates relating to issues such as birth, insurance, the market etc. fall into this category.

History, however, does not even hold up with DATETIME. Rome fell long before MIN(DATETIME). You will have to manage your own. Not even YEAR will help you out.

Storage

TIMESTAMP makes for 4 bytes. DATETIME makes for 8 bytes. Now that we have this behind us, let’s see why.

Internal representation

A TIMESTAMP is merely the number of elapsed seconds since the epoch. It is a number. Not only is this a number, it is an incrementing number, and without gaps. Ever ascending. It actually equals the UNIX_TIMESTAMP() function.

A DATETIME is more of an oddity. From the manual:

  • DATETIME: Eight bytes:
    • A four-byte integer packed as YYYY×10000 + MM×100 + DD
    • A four-byte integer packed as HH×10000 + MM×100 + SS

Huh?

There’s nothing sequential about a DATETIME. The value which follows 20110307095159 is 20110307095200. It’s like the string representation without any delimiters. See the following (ignore the fraction, it’s not really there):

root@mysql-5.1.51> SELECT NOW()+0;
+-----------------------+
| NOW()+0               |
+-----------------------+
| 20110307095238.000000 |
+-----------------------+

The implication of this internal representation is that conversions must be made back and forth. When you want to add 5 seconds to a DATETIME, MySQL cannot simply add 5 to some number, but must make for a more complicated calculation.

Peculiarities

Now here’s a question: why does DATETIME start with ‘1000-01-01 00:00:00’? Can you provide with a convincing argument? The only argument I can find is hardly convincing.

Did you ever need to type in a password with your phone? Say, for your Credit Card company, or your Bank? Did you get recorded instructions saying “You must type six digits. You must not repeat the same digits three successive times“? That’s somewhat nice, and probably a good advice. Did you ever get told “The first digit must not be 0“? That’s just stupid. This means we’re giving up on 100,000 valid passwords, thereby reducing our search space by 10%!

I suspect in both cases the reasoning is the same: if you start with a 0, and we represent it as a number, it won’t make up for the same number of digits we intended it to.

In the case of your Credit Card company, there is no excuse. What’s the problem with padding with zeros till you make those six digits?

In the case of MySQL’s DATETIME, I can see a weak reasoning. Read this page on the docs to find out that a DATETIME can be represented in many forms, and can be automatically deduced from a string in many formats. Both strings ‘110307095100’ and ‘20110307095100’ make for the same DATETIME value. It is based on the number of characters in your text, that the parsing decides how to act.

The fact the first year is 1000 makes the number of digits predictable.

My thoughts?

I don’t see why I should care about automatically converting texts of different formats to a DATETIME. As a programmer, I’m perfectly content with strict typing. I don’t mind passing around only texts of the form ‘YYYY-MM-DD HH:MM:SS’ (or, better yet, passing date objects and letting my connector do the translation). Moreover, I prefer it that way! It makes me feel safer, that I haven’t passed a wrong text by mistake, to be silently accepted.

For this reason I don’t like the idea of losing the ability to use DATETIME on 1,000 lost years. As far as I can tell, this is a MySQL specific issue; there is no ANSI SQL for DATETIME.

[UPDATE:  Sheeri notes the biggest difference: DATETIME does not support time zones. A good discussion is also available on OurSQL]

 

]]>
https://shlomi-noach.github.io/blog/mysql/timestamp-vs-datetime-which-should-i-be-using/feed 9 3361
Implicit casting you don’t want to see around https://shlomi-noach.github.io/blog/mysql/implicit-casting-you-dont-want-to-see-around https://shlomi-noach.github.io/blog/mysql/implicit-casting-you-dont-want-to-see-around#comments Wed, 07 Jul 2010 08:53:37 +0000 https://shlomi-noach.github.io/blog/?p=2344 In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:

Number-String comparisons

Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:

mysql> SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql> SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
|           0 |
+-------------+

The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.

Direct DATE arithmetics

The first query seems to work, but is completely incorrect. The second explains why. The third is a total mess.

mysql> SELECT DATE('2010-01-01')+3;
+----------------------+
| DATE('2010-01-01')+3 |
+----------------------+
|             20100104 |
+----------------------+
1 row in set (0.00 sec)

mysql> SELECT DATE('2010-01-01')-3;
+----------------------+
| DATE('2010-01-01')-3 |
+----------------------+
|             20100098 |
+----------------------+
1 row in set (0.00 sec)

mysql> SELECT '2010-01-01' - 3;
+------------------+
| '2010-01-01' - 3 |
+------------------+
|             2007 |
+------------------+
1 row in set, 1 warning (0.00 sec)

Number-String comparisons, big integers

Look at the following crazy comparisons:

mysql> SELECT 1234 = '1234';
+---------------+
| 1234 = '1234' |
+---------------+
|             1 |
+---------------+

mysql> SELECT 123456789012345678 = '123456789012345678';
+-------------------------------------------+
| 123456789012345678 = '123456789012345678' |
+-------------------------------------------+
|                                         0 |
+-------------------------------------------+

mysql> SELECT 123456789012345678 = '123456789012345677';
+-------------------------------------------+
| 123456789012345678 = '123456789012345677' |
+-------------------------------------------+
|                                         1 |
+-------------------------------------------+

The amazing result of the last two comparisons may strike as odd. Actually, it may strike as a bug, and indeed when a customer approached me with this behavior I was at loss for words. But this is documented. The manual describes the cases for casting, then states: “… In all other cases, the arguments are compared as floating-point (real) numbers. …”

Lessons learned:

  • Be careful when comparing strings with floating point values. Matching depends on how both are represented.
  • Avoid converting temporal types to strings when doing date manipulation.
  • Avoid direct math on temporal types.
  • Avoid casting BIGINTs represented by strings. Casting will turn out to use FLOATs and may be incorrect.

Last but not least:

  • Use the proper data types for your data’s representation. When dealing with numbers, use numbers. When dealing with temporal values, use temporal types.
]]>
https://shlomi-noach.github.io/blog/mysql/implicit-casting-you-dont-want-to-see-around/feed 3 2344
Choosing MySQL boolean data types https://shlomi-noach.github.io/blog/mysql/choosing-mysql-boolean-data-types https://shlomi-noach.github.io/blog/mysql/choosing-mysql-boolean-data-types#comments Thu, 03 Jun 2010 05:24:11 +0000 https://shlomi-noach.github.io/blog/?p=2181 How do you implement True/False columns?

There are many ways to do it, each with its own pros and cons.

ENUM

Create you column as ENUM(‘F’, ‘T’), or ENUM(‘N’,’Y’) or ENUM(‘0’, ‘1’).

This is the method used in the mysql tables (e.g. mysql.user privileges table). It’s very simple and intuitive. It truly restricts the values to just two options, which serves well. It’s compact (just one byte).

A couple disadvantages to this method:

  1. Enums are represented by numerical values (which is good) and start with 1 instead of 0. This means ‘F’ is 1, and ‘T’ is 2, and they both translate to True when directly used in a booleanic expression (e.g. IF(val, ‘True’, ‘False’) always yields ‘True’)
  2. There’s no real convention. Is it ‘Y’/’N’? ‘T’/’F’? ‘P’/’N’? ‘1’/’0′?

CHAR(1)

Simple again. Proposed values are, as before, ‘F’, ‘T’ etc. This time there’s no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an ‘X’.

Watch out for the charset! If it’s utf8 you pay with 3 bytes instead of just 1. And, again, ‘T’, ‘F’, ‘Y’, ‘N’ values all evaluate as True. It is possible to use the zero-valued character, but it defeats the purpose of using CHAR.

CHAR(0)

Many are unaware that it’s even valid to make this definition. What does it mean? Take a look at the following table:

CREATE TABLE `t1` (
 `bval` char(0) DEFAULT NULL
);
mysql> INSERT INTO t1 VALUES ('');
mysql> INSERT INTO t1 VALUES ('');
mysql> INSERT INTO t1 VALUES (NULL);

mysql> SELECT * FROM t1;
+------+
| bval |
+------+
|      |
|      |
| NULL |
+------+

NULLable columns cause for an additional storage per row. There’s one bit per NULLable column which notes down whether the column’s value is NULL or not. If you only have one NULLable column, you must pay for this bit with 1 byte. If you have two NULLable columns, you still only pay with 1 byte.

Furthermore:

mysql> SELECT bval IS NOT NULL FROM t1;
+------------------+
| bval IS NOT NULL |
+------------------+
|                1 |
|                1 |
|                0 |
+------------------+

So this plays somewhat nicely into booleanic expressions.

However, this method is unintuitive and confusing. I personally don’t use it.

TINYINT

With integer values, we can get down to 0 and 1. With TINYINT, we only pay with 1 byte of storage. As with CHAR(1), we cannot prevent anyone from INSERTing other values. But that doesn’t really matter, if we’re willing to accept that 0 evaluates as False, and all other values as True. In this case, boolean expressions work very well with your column values.

BOOL/BOOLEAN

These are just synonyms to TINYINT. I like to define my boolean values as such. Alas, when issuing a SHOW CREATE TABLE the definition is just a normal TINYINT. Still, it is clearer to look at if you’re storing your table schema under your version control.

]]>
https://shlomi-noach.github.io/blog/mysql/choosing-mysql-boolean-data-types/feed 14 2181
But I DO want MySQL to say “ERROR”! https://shlomi-noach.github.io/blog/mysql/but-i-do-want-mysql-to-say-error https://shlomi-noach.github.io/blog/mysql/but-i-do-want-mysql-to-say-error#comments Fri, 12 Mar 2010 04:53:28 +0000 https://shlomi-noach.github.io/blog/?p=2005 MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you.

Warnings and errors

It would be nice to:

  • Have an auto_propagate_warning_to_error server variable (global/session/both) which, well, does what it says.
  • Have an i_am_really_not_a_dummy server variable which implies stricter checks for all the above and prevents you from doing with anything that may be problematic (or rolls back your transactions on your invalid actions).

Connectors may be nice enough to propagate warnings to errors – that’s good. But not enough: since data is already committed in MySQL.

If I understand correctly, and maybe it’s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that “MySQL just works out of the box and does not require me to configure or understand anything”).

MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity — or, be nice to everyone, just allow me to specify what “nice” means for me.

]]>
https://shlomi-noach.github.io/blog/mysql/but-i-do-want-mysql-to-say-error/feed 18 2005
Useful temporal functions & queries https://shlomi-noach.github.io/blog/mysql/useful-temporal-functions-queries https://shlomi-noach.github.io/blog/mysql/useful-temporal-functions-queries#respond Tue, 08 Dec 2009 09:46:24 +0000 https://shlomi-noach.github.io/blog/?p=1666 Here’s a complication of some common and useful time & date calculations and equations. Some, though very simple, are often misunderstood, leading to inefficient or incorrect implementations.

There are many ways to solve such problems. I’ll present my favorites.

Querying for time difference

Given two timestamps: ts1 (older) and ts2 (newer), how much time has passed between them?

One can use TIMEDIFF() & DATEDIFF(), or compare two UNIX_TIMESTAMP() values. My personal favorite is to use TIMESTAMPDIFF(). Reason being that I’m usually interested in a specific metric, like the number of hours which have passed, or the number of days, disregarding the smaller minute/second resolution. Which allows one to:

SELECT TIMESTAMPDIFF(HOUR, ts1, ts2)

Take, for example:

SELECT TIMESTAMPDIFF(MONTH, '2008-10-07 00:00:00', '2009-12-06 00:00:00')

The function correctly identifies the number of days per month, and provides with 13, being the truncated number of full months.

Doing arithmetics

One can use TIMESTAMPADD(), or DATE_SUB(), but, again, when dealing with specific resolutions, I find “+ INTERVAL” to be the most convenient:

SELECT ts1 + INTERVAL 10 HOUR

This allows me to only add by a specific unit: SECOND, MINUTE, HOUR, DAY, WEEK, etc. Many times I find this is exactly what I want.

SELECT TIMESTAMP('2009-12-06 20:14:52') + INTERVAL 4 WEEK AS ts2;
+---------------------+
| ts2                 |
+---------------------+
| 2010-01-03 20:14:52 |
+---------------------+

Checking if a timestamp is in a given date

This one is very popular, and most poorly treated.

Say we have a sales table, with some ts column. We want to SELECT all sales on Dec 25th, 2008. I’ve seen so many solutions, many in writing. Let’s look at them:

Wrong:

SELECT * FROM sales WHERE ts BETWEEN '2008-12-25' AND '2008-12-26'

Why is this wrong? Because BETWEEN is inclusive. A sale taking place on ‘2008-12-26 00:00:00‘ will match our condition.

Correct but inefficient:

SELECT * FROM sales WHERE DATE(ts) = DATE('2008-12-25')

Why is this inefficient? Because a function is used over the ts column. This disables use of any index we might have on ts, leading to full table scan.

Correct but inefficient:

SELECT * FROM sales WHERE ts LIKE '2008-12-25 %'

Why is this inefficient? Because a function is used over the ts column. Can you see it? It’s an implicit CAST function, which casts the TIMESTAMP value to a character value, so as to perform a string comparison.

Correct but ugh:

SELECT * FROM sales WHERE ts BETWEEN '2008-12-25 00:00:00' AND '2008-12-25 23:59:59'

Why is it ugh? Because, well, …Ugh!

Correct:

SELECT * FROM sales WHERE ts >= DATE('2008-12-25') AND ts < DATE('2008-12-26')

This allows for indexing to be used properly. The DATE() casting is not strictly required here, but is generally safer.

Truncating to last midnight

Surprisingly, this simple question sees a lot of incorrect solution attempts. The quickest, safest way to get “last midnight” is:

SELECT DATE(ts)

or, if you like to be stricter:

SELECT TIMESTAMP(DATE(ts))

For example:

SELECT TIMESTAMP(DATE('2009-12-06 20:14:52')) AS midnight;
+---------------------+
| midnight            |
+---------------------+
| 2009-12-06 00:00:00 |
+---------------------+

Truncating to last round hour

Similar to the above, but utilizes arithmetic:

SELECT DATE(ts) + INTERVAL HOUR(ts) HOUR

For example:

SELECT ts, DATE(ts) + INTERVAL HOUR(ts) HOUR FROM sales LIMIT 5;
+---------------------+-----------------------------------+
| ts                  | DATE(ts) + INTERVAL HOUR(ts) HOUR |
+---------------------+-----------------------------------+
| 2009-01-05 05:17:00 | 2009-01-05 05:00:00               |
| 2009-03-09 00:49:00 | 2009-03-09 00:00:00               |
| 2009-02-20 00:14:00 | 2009-02-20 00:00:00               |
| 2009-02-14 22:42:00 | 2009-02-14 22:00:00               |
| 2009-03-14 04:50:00 | 2009-03-14 04:00:00               |
+---------------------+-----------------------------------+

Round to closest round hour

Taking the classic round() implementation, which states:

round(x) := int(x + 0.5)

We write:

SELECT DATE(ts + INTERVAL 30 MINUTE) + INTERVAL HOUR(ts + INTERVAL 30 MINUTE) HOUR

Example:

SELECT ts, DATE(ts + INTERVAL 30 MINUTE) + INTERVAL HOUR(ts + INTERVAL 30 MINUTE) HOUR AS rounded FROM sales ORDER BY HOUR(ts) DESC LIMIT 5;
+---------------------+---------------------+
| ts                  | rounded             |
+---------------------+---------------------+
| 2009-03-25 23:54:00 | 2009-03-26 00:00:00 |
| 2009-03-13 23:45:00 | 2009-03-14 00:00:00 |
| 2009-01-29 22:53:00 | 2009-01-29 23:00:00 |
| 2009-01-18 22:22:00 | 2009-01-18 22:00:00 |
| 2009-01-14 22:16:00 | 2009-01-14 22:00:00 |
+---------------------+---------------------+

Count number of midnights between two timestamps, inclusive

Given two timestamps, ts1 and ts2, what is the number of midnights between them?

SELECT TIMESTAMPDIFF(DAY, DATE(ts1), ts2) + IF(DATE(ts1) = ts1, 1, 0);

Example:

SELECT ts, ts2, TIMESTAMPDIFF(DAY, DATE(ts), ts2) + IF(DATE(ts) = ts, 1, 0) AS number_of_midnights FROM sales LIMIT 10;
+---------------------+---------------------+---------------------+
| ts                  | ts2                 | number_of_midnights |
+---------------------+---------------------+---------------------+
| 2009-01-05 05:17:00 | 2009-01-05 19:17:00 |                   0 |
| 2009-03-09 00:49:00 | 2009-03-11 15:49:00 |                   2 |
| 2009-02-20 00:14:00 | 2009-02-23 02:14:00 |                   3 |
| 2009-02-14 22:42:00 | 2009-02-18 07:42:00 |                   4 |
| 2009-03-14 04:50:00 | 2009-03-17 16:50:00 |                   3 |
| 2009-02-16 04:01:00 | 2009-02-19 08:01:00 |                   3 |
| 2009-01-20 05:36:00 | 2009-01-21 08:36:00 |                   1 |
| 2009-02-07 15:57:00 | 2009-02-07 22:57:00 |                   0 |
| 2009-02-13 14:59:00 | 2009-02-15 22:59:00 |                   2 |
| 2009-01-11 03:02:00 | 2009-01-13 11:02:00 |                   2 |
+---------------------+---------------------+---------------------+

Further notes

A full listing of temporal functions can be found on the MySQL documentation. There’s almost always more than one way to solve a problem. I’ve seen (and done, in the past) many calculations done on the application side due to lack of familiarity with the available functions.

Please share your own common solutions below!

]]>
https://shlomi-noach.github.io/blog/mysql/useful-temporal-functions-queries/feed 0 1666
Character sets: latin1 vs. ascii https://shlomi-noach.github.io/blog/mysql/character-sets-latin1-vs-ascii https://shlomi-noach.github.io/blog/mysql/character-sets-latin1-vs-ascii#comments Wed, 08 Jul 2009 07:39:02 +0000 https://shlomi-noach.github.io/blog/?p=828 Unless specified otherwise, latin1 is the default character set in MySQL.

What I usually find in schemes are columns which are either utf8 or latin1. The utf8 columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and latin1 column being all the rest (passwords, digests, email addresses, hard-coded values etc.)

I find latin1 to be improper for such purposes and suggest that ascii be used instead. The reason being that latin1 implies a European text (with swedish collation). It is unclear for an outsider, when finding a latin1 column, whether it should actually contain West European characters, or is it just being used for ascii text, utilizing the fact that a character in latin1 only requires 1 byte of storage.

Well, this is what the ascii character set is for. When I see an ascii column, I know for sure no West European characters are allowed; just the plain old a-zA-Z0-9 etc. It is clearer from the schema’s definition what the stored values should be.

A note to MySQL

It’s been long since the Swedish roots of the company have dictated defaults. New instances should default to either ascii or  utf8 (the latter being the most common and space efficient unicode protocol): character sets that are locale-neutral. Really, how many people realize that when they ORDER BY a text column, rows are sorted according to Swedish dictionary ordering?

]]>
https://shlomi-noach.github.io/blog/mysql/character-sets-latin1-vs-ascii/feed 8 828