What’s all this about?
I’ve you’ve been to my talk in London, then you’ve already got this as homework. If not, allow me to fill in the details:
I was speaking about MySQL’s programmatic nature in many aspects (best if you read the slides!). We discussed user defined variables, derived tables, ordering constructs, order of evaluation, time suspension and time calculation.
An issue I presented was that of a very long running query. Say it runs for 20 minutes. It kills your I/O. Do you KILL the query or wait? It could terminate in 5 seconds from now, and if you kill it now, you lose everything. But it may yet run for 3 more hours!
We discussed a futuristic feature where the query would gracefully terminate after some designated time. However, futuristic wishes do not help us.
A self throttling query
I suggested the idea for a self throttling query. We know how to throttle writing queries, such as DELETE queries: we break them into small chunks, then work each chunk at a time, setting ourselves to sleep in between chunks. This is how –sleep and –sleep-ratio work in oak-chunk-update. It is how –sleep and –sleep-coef work in pt-archiver.
But can the same be done for SELECT queries?
The answer is: yes. And it can be done from within the query itself. And it’s your job to do it.
So here are the rules:
- For simplicity, assume the SELECT executes over a single large table.
- We assume no GROUP BY, DISTINCT and the like.
- There can be WHERE, ORDER BY etc.
As an example:
SELECT rental_id, TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days FROM sakila.rental
Here is the problem:
- The query takes too much disk I/O
Suggestion: make the query self throttling. We’re going to make it run for a longer time, allowing for non-I/O periods within the query. We assume we are not bothered with locks.
Task #1
- Write a self throttling query: rewrite the query above such that for every 1,000 rows, it goes to sleep for 2 seconds.
Task #2
- Improve upon the previous solution: rewrite the query such that for every 1,000 rows, it goes to sleep for the same amount of time it took for those 1,000 rows to execute (effectively doubling the execution time of the query).
Advanced
- There are issues with security, but in principal it is possible to have the query go to sleep when actual disk I/O exceeds some value. Code it!
Prove it
devise a method to prove that sleeps do occur as required.
How to submit a solution?
- Simplest is to add as comment on this post. Use <pre>…</pre> for formatted code.
- You may also email me directly.
I will pick up the best solution according to correctness, elegance and time of submission.
The winners get glory: their names to be mentioned on this blog. Shall we save the discussion on the exact font face and size for later on?
UPDATE: it should have been clear from my talk: no stored routines involved!
UPDATE: solution is on: Self throttling MySQL queries
Sounds fun, so I take the bait 😉
This proposed solution assumes 5.6, with the performance schema.
create function tools.throttle(every_n integer, delay integer)
returns varchar(20)
begin
declare sent integer;
declare dummy integer;
if (every_n 0) and (sent % every_n = 0)) then
set dummy = (select sleep(delay));
end if;
return sysdate();
end
Execute this once in the current session:
set @my_tid= (select THREAD_ID from performance_schema.threads
where PROCESSLIST_ID = connection_id());
Then, rewrite queries like this:
select * from big_table;
select from big_table;
to the throttled version:
select big_table.*, tools.throttle(1000, 2) from big_table;
select , tools.throttle(1000, 2) from big_table;
To prove it works, the short answer is to look at the timestamps returned by tools.throttle.
The long version is to look at the performance schema waits in events_waits_history_long and see the sleeps there.
Regards,
— Marc
Correction (due to brackets tags lost in post)
select anything-you-want from big_table
becomes
select anything-you-want, tools.throttle(1000, 2) from big_table.
Proposed entry, which assumes mysql 5.6 and the performance schema:
— helper
create function tools.throttle(every_n integer, delay integer)
returns varchar(20)
begin
declare sent integer;
declare dummy integer;
if (every_n <= 0) then
signal sqlstate ‘HUMMM’ set message_text = “Very funny. Try harder.”,
mysql_errno = 50000;
end if;
set sent =
(select ROWS_SENT from performance_schema.events_statements_current
where THREAD_ID = @my_tid);
if ((sent > 0) and (sent % every_n = 0)) then
set dummy = (select sleep(delay));
end if;
return sysdate();
end
$$
— once in the session, do this:
set @my_tid= (select THREAD_ID from performance_schema.threads
where PROCESSLIST_ID = connection_id());
— before
select * from big_table;
select anything_you_want from big_table;
— after
select big_table.*, tools.throttle(1000, 2) from big_table;
select anything_you_want, tools.throttle(1000, 2) from big_table;
@Marc,
very cool. Actually, I forgot to add an important thing: my talk suggested no stored routines are required … 😉
I guess I can say that I figured it out in a conversation with Shlomi after recalling something funny about most SHOW commands… and maybe that is a hint. But it’s better than posting the solution and depriving others of the fun 🙂