Contest for Glory: write a self throttling MySQL query

What’s all this about?

I’ve you’ve been to my talk in London, then you’ve already got this as homework. If not, allow me to fill in the details:

I was speaking about MySQL’s programmatic nature in many aspects (best if you read the slides!). We discussed user defined variables, derived tables, ordering constructs, order of evaluation, time suspension and time calculation.

An issue I presented was that of a very long running query. Say it runs for 20 minutes. It kills your I/O. Do you KILL the query or wait? It could terminate in 5 seconds from now, and if you kill it now, you lose everything. But it may yet run for 3 more hours!

We discussed a futuristic feature where the query would gracefully terminate after some designated time. However, futuristic wishes do not help us.

A self throttling query

I suggested the idea for a self throttling query. We know how to throttle writing queries, such as DELETE queries: we break them into small chunks, then work each chunk at a time, setting ourselves to sleep in between chunks. This is how –sleep and –sleep-ratio work in oak-chunk-update. It is how –sleep and –sleep-coef work in pt-archiver.

But can the same be done for SELECT queries?

The answer is: yes. And it can be done from within the query itself. And it’s your job to do it.

So here are the rules:

For simplicity, assume the SELECT executes over a single large table.
We assume no GROUP BY, DISTINCT and the like.
There can be WHERE, ORDER BY etc.

As an example:

SELECT rental_id, TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days FROM sakila.rental

Here is the problem:

The query takes too much disk I/O

Suggestion: make the query self throttling. We’re going to make it run for a longer time, allowing for non-I/O periods within the query. We assume we are not bothered with locks.

Task #1

Write a self throttling query: rewrite the query above such that for every 1,000 rows, it goes to sleep for 2 seconds.

Task #2

Improve upon the previous solution: rewrite the query such that for every 1,000 rows, it goes to sleep for the same amount of time it took for those 1,000 rows to execute (effectively doubling the execution time of the query).

Advanced

There are issues with security, but in principal it is possible to have the query go to sleep when actual disk I/O exceeds some value. Code it!

Prove it

devise a method to prove that sleeps do occur as required.

How to submit a solution?

Simplest is to add as comment on this post. Use <pre>…</pre> for formatted code.
You may also email me directly.

I will pick up the best solution according to correctness, elegance and time of submission.

The winners get glory: their names to be mentioned on this blog. Shall we save the discussion on the exact font face and size for later on?

UPDATE: it should have been clear from my talk: no stored routines involved!

UPDATE: solution is on: Self throttling MySQL queries

9 thoughts on “Contest for Glory: write a self throttling MySQL query”

shlomi says:

October 29, 2011 at 6:38 am

@Baron,
This contest is so desolate! Apart from Marc no one has submitted anything…
I’ll give it a couple more days, then publish a solution. Do feel free to post your ideas.
Pingback: Self throttling MySQL queries | code.openark.org
harald says:

November 24, 2011 at 6:28 pm

how many days are “a couple of days”? 😉

thanks!
shlomi says:

November 24, 2011 at 8:27 pm

@harald,

The solution has long been published. You should check on my site more often!
Here’s the link:
Self throttling MySQL queries

This site uses Akismet to reduce spam. Learn how your comment data is processed.