Easy SELECT COUNT(*) with split()

The two conservative ways of getting the number of rows in an InnoDB table are:

  • SELECT COUNT(*) FROM my_table:
    provides with an accurate number, but makes for a long running transaction which take ages on large tables. Long transactions make for locks
  • SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA=’my_schema’ AND TABLE_NAME=’my_table’, or get same info via SHOW TABLE STATUS.
    Gives immediate response, but the value can be way off; it can be two times as large as real value, or half the value. For query execution plans this may be a “good enough” estimation, but typically you just can’t trust it for your own purposes.

Get a good estimate using chunks

You can get a good estimate by calculating the total number of rows in steps. Walk the table 1,000 rows at a time, and keep a counter. Each chunk is its own transaction, so, if the table is modified while counting, the final value does not make for an accurate account at any point in time. Typically this should be a far better estimate than TABLE_ROWS.

QueryScript’s split() construct provides you with the means to work this out. Consider this script:

set @total := 0;

split(SELECT COUNT(*) FROM world.City INTO @chunk) {
  set @total = @total + @chunk;
}

select @total;

split() breaks the above SELECT COUNT(*) into distinct chunks, like:

SELECT COUNT(*) FROM world.City WHERE ((((`City`.`ID` > '3000'))) AND (((`City`.`ID` < '4000')) OR ((`City`.`ID` = '4000')))) INTO @chunk

You can make this a one liner like this:

call common_schema.run(“set @total := 0;split(SELECT COUNT(*) FROM world.City INTO @chunk) set @total = @total + @chunk; select @total;”);

If you like to watch the progress, add some verbose:

call common_schema.run("set @total := 0;split(SELECT COUNT(*) FROM world.City INTO @chunk) {set @total = @total + @chunk; select $split_step, @total} select @total;");

QueryScript is available via common_schema.

4
Leave a Reply

avatar
4 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
shlomiJannes Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
Jannes
Guest
Jannes

What’s the reason for WHERE ((((`City`.`ID` > ‘3000’))) AND (((`City`.`ID` < '4000')) OR ((`City`.`ID` = '4000')))) ?

Why not City.ID BETWEEN 3000 AND 3999 ? (and then start the next chunk at 4000)

Honest question

Jannes
Guest
Jannes

Thanks, I knew there would be a good reason.

1 I knew, but didn’t think of 2. Still find it amazing that they haven’t fixed 3 after all these years.

And now that I see how you actually find a minimum and maximum value such that the chunk size is 1000, I understand your remark about sparse too.