Contest for Glory: write a self throttling MySQL query

What’s all this about?

I’ve you’ve been to my talk in London, then you’ve already got this as homework. If not, allow me to fill in the details:

I was speaking about MySQL’s programmatic nature in many aspects (best if you read the slides!). We discussed user defined variables, derived tables, ordering constructs, order of evaluation, time suspension and time calculation.

An issue I presented was that of a very long running query. Say it runs for 20 minutes. It kills your I/O. Do you KILL the query or wait? It could terminate in 5 seconds from now, and if you kill it now, you lose everything. But it may yet run for 3 more hours!

We discussed a futuristic feature where the query would gracefully terminate after some designated time. However, futuristic wishes do not help us.

A self throttling query

I suggested the idea for a self throttling query. We know how to throttle writing queries, such as DELETE queries: we break them into small chunks, then work each chunk at a time, setting ourselves to sleep in between chunks. This is how –sleep and –sleep-ratio work in oak-chunk-update. It is how –sleep and –sleep-coef work in pt-archiver.

But can the same be done for SELECT queries?

The answer is: yes. And it can be done from within the query itself. And it’s your job to do it.

So here are the rules:

  • For simplicity, assume the SELECT executes over a single large table.
  • We assume no GROUP BY, DISTINCT and the like.
  • There can be WHERE, ORDER BY etc.

As an example:

SELECT rental_id, TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days FROM sakila.rental

Here is the problem:

  • The query takes too much disk I/O

Suggestion: make the query self throttling. We’re going to make it run for a longer time, allowing for non-I/O periods within the query. We assume we are not bothered with locks.

Task #1

  • Write a self throttling query: rewrite the query above such that for every 1,000 rows, it goes to sleep for 2 seconds.

Task #2

  • Improve upon the previous solution: rewrite the query such that for every 1,000 rows, it goes to sleep for the same amount of time it took for those 1,000 rows to execute (effectively doubling the execution time of the query).

Advanced

  • There are issues with security, but in principal it is possible to have the query go to sleep when actual disk I/O exceeds some value. Code it!

Prove it

devise a method to prove that sleeps do occur as required.

How to submit a solution?

  • Simplest is to add as comment on this post. Use <pre>…</pre> for formatted code.
  • You may also email me directly.

I will pick up the best solution according to correctness, elegance and time of submission.

The winners get glory: their names to be mentioned on this blog. Shall we save the discussion on the exact font face and size for later on?

UPDATE: it should have been clear from my talk: no stored routines involved!

UPDATE: solution is on: Self throttling MySQL queries

9 thoughts on “Contest for Glory: write a self throttling MySQL query

  1. Sounds fun, so I take the bait 😉

    This proposed solution assumes 5.6, with the performance schema.

    create function tools.throttle(every_n integer, delay integer)
    returns varchar(20)
    begin
    declare sent integer;
    declare dummy integer;

    if (every_n 0) and (sent % every_n = 0)) then
    set dummy = (select sleep(delay));
    end if;

    return sysdate();
    end

    Execute this once in the current session:

    set @my_tid= (select THREAD_ID from performance_schema.threads
    where PROCESSLIST_ID = connection_id());

    Then, rewrite queries like this:

    select * from big_table;
    select from big_table;

    to the throttled version:

    select big_table.*, tools.throttle(1000, 2) from big_table;
    select , tools.throttle(1000, 2) from big_table;

    To prove it works, the short answer is to look at the timestamps returned by tools.throttle.
    The long version is to look at the performance schema waits in events_waits_history_long and see the sleeps there.

    Regards,
    — Marc

  2. Correction (due to brackets tags lost in post)

    select anything-you-want from big_table
    becomes
    select anything-you-want, tools.throttle(1000, 2) from big_table.

  3. Proposed entry, which assumes mysql 5.6 and the performance schema:

    — helper

    create function tools.throttle(every_n integer, delay integer)
    returns varchar(20)
    begin
    declare sent integer;
    declare dummy integer;

    if (every_n <= 0) then
    signal sqlstate ‘HUMMM’ set message_text = “Very funny. Try harder.”,
    mysql_errno = 50000;
    end if;

    set sent =
    (select ROWS_SENT from performance_schema.events_statements_current
    where THREAD_ID = @my_tid);

    if ((sent > 0) and (sent % every_n = 0)) then
    set dummy = (select sleep(delay));
    end if;

    return sysdate();
    end
    $$

    — once in the session, do this:

    set @my_tid= (select THREAD_ID from performance_schema.threads
    where PROCESSLIST_ID = connection_id());

    — before

    select * from big_table;
    select anything_you_want from big_table;

    — after

    select big_table.*, tools.throttle(1000, 2) from big_table;
    select anything_you_want, tools.throttle(1000, 2) from big_table;

  4. @Marc,

    very cool. Actually, I forgot to add an important thing: my talk suggested no stored routines are required … 😉

  5. I guess I can say that I figured it out in a conversation with Shlomi after recalling something funny about most SHOW commands… and maybe that is a hint. But it’s better than posting the solution and depriving others of the fun 🙂

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.