What's all this about?
I've you've been to my talk in London, then you've already got this as homework. If not, allow me to fill in the details:
I was speaking about MySQL's programmatic nature in many aspects (best if you read the slides!). We discussed user defined variables, derived tables, ordering constructs, order of evaluation, time suspension and time calculation.
An issue I presented was that of a very long running query. Say it runs for 20 minutes. It kills your I/O. Do you KILL the query or wait? It could terminate in 5 seconds from now, and if you kill it now, you lose everything. But it may yet run for 3 more hours!
We discussed a futuristic feature where the query would gracefully terminate after some designated time. However, futuristic wishes do not help us.
A self throttling query
I suggested the idea for a self throttling query. We know how to throttle writing queries, such as DELETE queries: we break them into small chunks, then work each chunk at a time, setting ourselves to sleep in between chunks. This is how --sleep and --sleep-ratio work in oak-chunk-update. It is how --sleep and --sleep-coef work in pt-archiver.
But can the same be done for SELECT queries?
The answer is: yes. And it can be done from within the query itself. And it's your job to do it.
So here are the rules:
- For simplicity, assume the SELECT executes over a single large table.
- We assume no GROUP BY, DISTINCT and the like.
- There can be WHERE, ORDER BY etc.
As an example:
SELECT rental_id, TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days FROM sakila.rental
Here is the problem:
- The query takes too much disk I/O
Suggestion: make the query self throttling. We're going to make it run for a longer time, allowing for non-I/O periods within the query. We assume we are not bothered with locks.
Task #1
- Write a self throttling query: rewrite the query above such that for every 1,000 rows, it goes to sleep for 2 seconds.
Task #2
- Improve upon the previous solution: rewrite the query such that for every 1,000 rows, it goes to sleep for the same amount of time it took for those 1,000 rows to execute (effectively doubling the execution time of the query).
Advanced
- There are issues with security, but in principal it is possible to have the query go to sleep when actual disk I/O exceeds some value. Code it!
Prove it
devise a method to prove that sleeps do occur as required.
How to submit a solution?
- Simplest is to add as comment on this post. Use <pre>...</pre> for formatted code.
- You may also email me directly.
I will pick up the best solution according to correctness, elegance and time of submission.
The winners get glory: their names to be mentioned on this blog. Shall we save the discussion on the exact font face and size for later on?
UPDATE: it should have been clear from my talk: no stored routines involved!
UPDATE: solution is on: Self throttling MySQL queries
Sounds fun, so I take the bait
This proposed solution assumes 5.6, with the performance schema.
create function tools.throttle(every_n integer, delay integer)
returns varchar(20)
begin
declare sent integer;
declare dummy integer;
if (every_n 0) and (sent % every_n = 0)) then
set dummy = (select sleep(delay));
end if;
return sysdate();
end
Execute this once in the current session:
set @my_tid= (select THREAD_ID from performance_schema.threads
where PROCESSLIST_ID = connection_id());
Then, rewrite queries like this:
select * from big_table;
select from big_table;
to the throttled version:
select big_table.*, tools.throttle(1000, 2) from big_table;
select , tools.throttle(1000, 2) from big_table;
To prove it works, the short answer is to look at the timestamps returned by tools.throttle.
The long version is to look at the performance schema waits in events_waits_history_long and see the sleeps there.
Regards,
-- Marc
Link | October 28th, 2011 at 12:22 am
Correction (due to brackets tags lost in post)
select anything-you-want from big_table
becomes
select anything-you-want, tools.throttle(1000, 2) from big_table.
Link | October 28th, 2011 at 12:25 am
Proposed entry, which assumes mysql 5.6 and the performance schema:
-- helper
create function tools.throttle(every_n integer, delay integer)
returns varchar(20)
begin
declare sent integer;
declare dummy integer;
if (every_n <= 0) then
signal sqlstate 'HUMMM' set message_text = "Very funny. Try harder.",
mysql_errno = 50000;
end if;
set sent =
(select ROWS_SENT from performance_schema.events_statements_current
where THREAD_ID = @my_tid);
if ((sent > 0) and (sent % every_n = 0)) then
set dummy = (select sleep(delay));
end if;
return sysdate();
end
$$
-- once in the session, do this:
set @my_tid= (select THREAD_ID from performance_schema.threads
where PROCESSLIST_ID = connection_id());
-- before
select * from big_table;
select anything_you_want from big_table;
-- after
select big_table.*, tools.throttle(1000, 2) from big_table;
select anything_you_want, tools.throttle(1000, 2) from big_table;
Link | October 28th, 2011 at 12:55 am
@Marc,
very cool. Actually, I forgot to add an important thing: my talk suggested no stored routines are required ...
Link | October 28th, 2011 at 7:38 am
I guess I can say that I figured it out in a conversation with Shlomi after recalling something funny about most SHOW commands... and maybe that is a hint. But it's better than posting the solution and depriving others of the fun
Link | October 28th, 2011 at 9:44 pm
@Baron,
This contest is so desolate! Apart from Marc no one has submitted anything...
I'll give it a couple more days, then publish a solution. Do feel free to post your ideas.
Link | October 29th, 2011 at 6:38 am
Self throttling MySQL queries | code.openark.org wrote:
[...] Contest for Glory: write a self throttling MySQL query [...]
Link | November 1st, 2011 at 9:58 am
how many days are "a couple of days"?
thanks!
Link | November 24th, 2011 at 6:28 pm
@harald,
The solution has long been published. You should check on my site more often!
Here's the link:
Self throttling MySQL queries
Link | November 24th, 2011 at 8:27 pm