Useful temporal functions & queries

December 8, 2009

Here's a complication of some common and useful time & date calculations and equations. Some, though very simple, are often misunderstood, leading to inefficient or incorrect implementations.

There are many ways to solve such problems. I'll present my favorites.

Querying for time difference

Given two timestamps: ts1 (older) and ts2 (newer), how much time has passed between them?

One can use TIMEDIFF() & DATEDIFF(), or compare two UNIX_TIMESTAMP() values. My personal favorite is to use TIMESTAMPDIFF(). Reason being that I'm usually interested in a specific metric, like the number of hours which have passed, or the number of days, disregarding the smaller minute/second resolution. Which allows one to:

SELECT TIMESTAMPDIFF(HOUR, ts1, ts2)

Take, for example:

SELECT TIMESTAMPDIFF(MONTH, '2008-10-07 00:00:00', '2009-12-06 00:00:00')

The function correctly identifies the number of days per month, and provides with 13, being the truncated number of full months.

Doing arithmetics

One can use TIMESTAMPADD(), or DATE_SUB(), but, again, when dealing with specific resolutions, I find "+ INTERVAL" to be the most convenient:

SELECT ts1 + INTERVAL 10 HOUR

This allows me to only add by a specific unit: SECOND, MINUTE, HOUR, DAY, WEEK, etc. Many times I find this is exactly what I want.

SELECT TIMESTAMP('2009-12-06 20:14:52') + INTERVAL 4 WEEK AS ts2;
+---------------------+
| ts2                 |
+---------------------+
| 2010-01-03 20:14:52 |
+---------------------+

Checking if a timestamp is in a given date

This one is very popular, and most poorly treated.

Say we have a sales table, with some ts column. We want to SELECT all sales on Dec 25th, 2008. I've seen so many solutions, many in writing. Let's look at them:

Wrong:

SELECT * FROM sales WHERE ts BETWEEN '2008-12-25' AND '2008-12-26'

Why is this wrong? Because BETWEEN is inclusive. A sale taking place on '2008-12-26 00:00:00' will match our condition.

Correct but inefficient:

SELECT * FROM sales WHERE DATE(ts) = DATE('2008-12-25')

Why is this inefficient? Because a function is used over the ts column. This disables use of any index we might have on ts, leading to full table scan.

Correct but inefficient:

SELECT * FROM sales WHERE ts LIKE '2008-12-25 %'

Why is this inefficient? Because a function is used over the ts column. Can you see it? It's an implicit CAST function, which casts the TIMESTAMP value to a character value, so as to perform a string comparison.

Correct but ugh:

SELECT * FROM sales WHERE ts BETWEEN '2008-12-25 00:00:00' AND '2008-12-25 23:59:59'

Why is it ugh? Because, well, ...Ugh!

Correct:

SELECT * FROM sales WHERE ts >= DATE('2008-12-25') AND ts < DATE('2008-12-26')

This allows for indexing to be used properly. The DATE() casting is not strictly required here, but is generally safer.

Truncating to last midnight

Surprisingly, this simple question sees a lot of incorrect solution attempts. The quickest, safest way to get "last midnight" is:

SELECT DATE(ts)

or, if you like to be stricter:

SELECT TIMESTAMP(DATE(ts))

For example:

SELECT TIMESTAMP(DATE('2009-12-06 20:14:52')) AS midnight;
+---------------------+
| midnight            |
+---------------------+
| 2009-12-06 00:00:00 |
+---------------------+

Truncating to last round hour

Similar to the above, but utilizes arithmetic:

SELECT DATE(ts) + INTERVAL HOUR(ts) HOUR

For example:

SELECT ts, DATE(ts) + INTERVAL HOUR(ts) HOUR FROM sales LIMIT 5;
+---------------------+-----------------------------------+
| ts                  | DATE(ts) + INTERVAL HOUR(ts) HOUR |
+---------------------+-----------------------------------+
| 2009-01-05 05:17:00 | 2009-01-05 05:00:00               |
| 2009-03-09 00:49:00 | 2009-03-09 00:00:00               |
| 2009-02-20 00:14:00 | 2009-02-20 00:00:00               |
| 2009-02-14 22:42:00 | 2009-02-14 22:00:00               |
| 2009-03-14 04:50:00 | 2009-03-14 04:00:00               |
+---------------------+-----------------------------------+

Round to closest round hour

Taking the classic round() implementation, which states:

round(x) := int(x + 0.5)

We write:

SELECT DATE(ts + INTERVAL 30 MINUTE) + INTERVAL HOUR(ts + INTERVAL 30 MINUTE) HOUR

Example:

SELECT ts, DATE(ts + INTERVAL 30 MINUTE) + INTERVAL HOUR(ts + INTERVAL 30 MINUTE) HOUR AS rounded FROM sales ORDER BY HOUR(ts) DESC LIMIT 5;
+---------------------+---------------------+
| ts                  | rounded             |
+---------------------+---------------------+
| 2009-03-25 23:54:00 | 2009-03-26 00:00:00 |
| 2009-03-13 23:45:00 | 2009-03-14 00:00:00 |
| 2009-01-29 22:53:00 | 2009-01-29 23:00:00 |
| 2009-01-18 22:22:00 | 2009-01-18 22:00:00 |
| 2009-01-14 22:16:00 | 2009-01-14 22:00:00 |
+---------------------+---------------------+

Count number of midnights between two timestamps, inclusive

Given two timestamps, ts1 and ts2, what is the number of midnights between them?

SELECT TIMESTAMPDIFF(DAY, DATE(ts1), ts2) + IF(DATE(ts1) = ts1, 1, 0);

Example:

SELECT ts, ts2, TIMESTAMPDIFF(DAY, DATE(ts), ts2) + IF(DATE(ts) = ts, 1, 0) AS number_of_midnights FROM sales LIMIT 10;
+---------------------+---------------------+---------------------+
| ts                  | ts2                 | number_of_midnights |
+---------------------+---------------------+---------------------+
| 2009-01-05 05:17:00 | 2009-01-05 19:17:00 |                   0 |
| 2009-03-09 00:49:00 | 2009-03-11 15:49:00 |                   2 |
| 2009-02-20 00:14:00 | 2009-02-23 02:14:00 |                   3 |
| 2009-02-14 22:42:00 | 2009-02-18 07:42:00 |                   4 |
| 2009-03-14 04:50:00 | 2009-03-17 16:50:00 |                   3 |
| 2009-02-16 04:01:00 | 2009-02-19 08:01:00 |                   3 |
| 2009-01-20 05:36:00 | 2009-01-21 08:36:00 |                   1 |
| 2009-02-07 15:57:00 | 2009-02-07 22:57:00 |                   0 |
| 2009-02-13 14:59:00 | 2009-02-15 22:59:00 |                   2 |
| 2009-01-11 03:02:00 | 2009-01-13 11:02:00 |                   2 |
+---------------------+---------------------+---------------------+

Further notes

A full listing of temporal functions can be found on the MySQL documentation. There's almost always more than one way to solve a problem. I've seen (and done, in the past) many calculations done on the application side due to lack of familiarity with the available functions.

Please share your own common solutions below!

tags: , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Leave Your Comment

 
Powered by Wordpress and MySQL. Theme by openark.org