In some of my previous posts I’ve used a numbers table, like one holding values 1, 2, 3, …, 255. Such table can be used for string walking, joining with other tables, performing iterations.
The existence of number tables has always been a little pain. Yes, they’re very, very simple, but they need to be there. So if you just need to script some SQL query, you may find that you need to create such tables. Ummm… this means you need to have privileges (at least CREATE TEMPORARY and INSERT, if not CREATE).
The other day, Baron Schwartz posted How to round to the nearest whole multiple or fraction in SQL. In an offhand way, he generated some random numbers using the mysql.help_topic table. I then realized that post solved something I’ve been looking for: using a sure-to-exist table on any MySQL installation.
What does the table consist of? It consists, among other columns, an incrementing help_topic_id column:
SELECT help_topic_id FROM mysql.help_topic LIMIT 10; +---------------+ | help_topic_id | +---------------+ | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | +---------------+
Still feels unsafe?
The above result provides with sequential integers. But can we guarantee this? Will the numbers never have skipped values? We don’t have to rely on these values. We can force them to our liking:
SELECT @counter := @counter+1 AS value FROM mysql.help_topic, (SELECT @counter := 0) AS sel1 LIMIT 10; +-------+ | value | +-------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | +-------+
All we actually need is the existence of rows within this table. We don’t care which columns, what their names are, and of which data types they are. Said table currently has 484 rows. One can use CROSS JOIN to achieve more than that:
SELECT @counter := @counter+1 AS value FROM mysql.help_topic t1, mysql.help_topic t2, (SELECT @counter := 0) AS sel1 LIMIT 20000; +-------+ | value | +-------+ | 1 | | 2 | | 3 | | 4 | | 5 | ... | 19992 | | 19993 | | 19994 | | 19995 | | 19996 | | 19997 | | 19998 | | 19999 | | 20000 | +-------+
Number generation
We are now in full control of generated numbers. We don’t have to generate sequential numbers. We can generate odd numbers only; multiples of 10, of PI… Following I’ll be generating the Fibonacci series:
SELECT @c3 := @c1 + @c2 AS value, @c1 := @c2, @c2 := @c3 FROM mysql.help_topic, (SELECT @c1 := 1, @c2 := 0) sel1 LIMIT 15; +-------+------------+------------+ | value | @c1 := @c2 | @c2 := @c3 | +-------+------------+------------+ | 1 | 0 | 1 | | 1 | 1 | 1 | | 2 | 1 | 2 | | 3 | 2 | 3 | | 5 | 3 | 5 | | 8 | 5 | 8 | | 13 | 8 | 13 | | 21 | 13 | 21 | | 34 | 21 | 34 | | 55 | 34 | 55 | | 89 | 55 | 89 | | 144 | 89 | 144 | | 233 | 144 | 233 | | 377 | 233 | 377 | | 610 | 377 | 610 | +-------+------------+------------+
Conclusion
Using 5.0 and above, you can also use the various INFORMATION_SCHEMA tables (e.g. INFORMATION_SCHEMA.COLLATIONS). Some of these may be slow to load, though.
When you can (and need), have a prepared numbers table. When unable to create one, you can generate such numbers using tables which are certain to exist (at least until the next major version).
On the same vein, you may like these two posts:
http://datacharmer.blogspot.com/2007/12/data-from-nothing-solution-to-pop-quiz.html
http://datacharmer.blogspot.com/2007/12/pop-quiz-with-prize-generate-4-billion.html
Giuseppe
Interesting stuff about the help_topic table, thanks. I’d just worked out how to calculate the Fibonacci sequence on my blog (using one of my defined tables to join against) as part of my investigation into solving problems on the Euler project. http://www.oxfordtechnotes.co.uk/sqlblog/blog4.php/2009/08/29/project-euler-q2-with-mysql I may well use the help_topic table in future.
I mostly use UNIONs for this purpose:
SELECT a.x*100+b.x*10+c.x
FROM (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) a, (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) b, (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) c
I don’t know what’s better. You don’t need more priviledges for this than SELECT and it works with 4.1, but of course that puts more load onto the parser and you have much more UNIONs. I’m hoping that generating the bulk of the data with JOINs uses optimized code in the MySQL server like the JOIN buffer and temporary tables and that the overhead doesn’t matter any more.
@Giuseppe,
Thanks for the references; Very interesting!
@Mark,
What a coincidence!
@strcmp,
My personal view is that shorter is usually better. Your solution has the property of choosing exactly 1000 thousand values, though.
as a former assembler and C programmer and number cruncher iterative code like “@counter := @counter + 1” looks suboptimal to me, because it is a data dependency between the iterations, forcing the code to be executed serially. of course that’s just me and a totally useless comment right now, but even MySQL may once leave the stone ages and execute JOINs and table/index scans in parallel… bulk operations ‘feel’ better.