Generating numbers out of seemingly thin air

In some of my previous posts I’ve used a numbers table, like one holding values 1, 2, 3, …, 255. Such table can be used for string walking, joining with other tables, performing iterations.

The existence of number tables has always been a little pain. Yes, they’re very, very simple, but they need to be there. So if you just need to script some SQL query, you may find that you need to create such tables. Ummm… this means you need to have privileges (at least CREATE TEMPORARY and INSERT, if not CREATE).

The other day, Baron Schwartz posted How to round to the nearest whole multiple or fraction in SQL. In an offhand way, he generated some random numbers using the mysql.help_topic table. I then realized that post solved something I’ve been looking for: using a sure-to-exist table on any MySQL installation.

What does the table consist of? It consists, among other columns, an incrementing help_topic_id column:

SELECT help_topic_id FROM mysql.help_topic LIMIT 10;
+---------------+
| help_topic_id |
+---------------+
|             0 |
|             1 |
|             2 |
|             3 |
|             4 |
|             5 |
|             6 |
|             7 |
|             8 |
|             9 |
+---------------+

Still feels unsafe?

The above result provides with sequential integers. But can we guarantee this? Will the numbers never have skipped values? We don’t have to rely on these values. We can force them to our liking:

SELECT @counter := @counter+1 AS value
FROM mysql.help_topic, (SELECT @counter := 0) AS sel1
LIMIT 10;
+-------+
| value |
+-------+
|     1 |
|     2 |
|     3 |
|     4 |
|     5 |
|     6 |
|     7 |
|     8 |
|     9 |
|    10 |
+-------+

All we actually need is the existence of rows within this table. We don’t care which columns, what their names are, and of which data types they are. Said table currently has 484 rows. One can use CROSS JOIN to achieve more than that:

SELECT @counter := @counter+1 AS value
FROM mysql.help_topic t1, mysql.help_topic t2, (SELECT @counter := 0) AS sel1
LIMIT 20000;
+-------+
| value |
+-------+
|     1 |
|     2 |
|     3 |
|     4 |
|     5 |
...
| 19992 |
| 19993 |
| 19994 |
| 19995 |
| 19996 |
| 19997 |
| 19998 |
| 19999 |
| 20000 |
+-------+

Number generation

We are now in full control of generated numbers. We don’t have to generate sequential numbers. We can generate odd numbers only; multiples of 10, of PI… Following I’ll be generating the Fibonacci series:

SELECT @c3 := @c1 + @c2 AS value, @c1 := @c2, @c2 := @c3
FROM mysql.help_topic, (SELECT @c1 := 1, @c2 := 0) sel1
LIMIT 15;
+-------+------------+------------+
| value | @c1 := @c2 | @c2 := @c3 |
+-------+------------+------------+
|     1 |          0 |          1 |
|     1 |          1 |          1 |
|     2 |          1 |          2 |
|     3 |          2 |          3 |
|     5 |          3 |          5 |
|     8 |          5 |          8 |
|    13 |          8 |         13 |
|    21 |         13 |         21 |
|    34 |         21 |         34 |
|    55 |         34 |         55 |
|    89 |         55 |         89 |
|   144 |         89 |        144 |
|   233 |        144 |        233 |
|   377 |        233 |        377 |
|   610 |        377 |        610 |
+-------+------------+------------+

Conclusion

Using 5.0 and above, you can also use the various INFORMATION_SCHEMA tables (e.g. INFORMATION_SCHEMA.COLLATIONS). Some of these may be slow to load, though.

When you can (and need), have a prepared numbers table. When unable to create one, you can generate such numbers using tables which are certain to exist (at least until the next major version).

9 thoughts on “Generating numbers out of seemingly thin air”

On the same vein, you may like these two posts:

http://datacharmer.blogspot.com/2007/12/data-from-nothing-solution-to-pop-quiz.html
http://datacharmer.blogspot.com/2007/12/pop-quiz-with-prize-generate-4-billion.html

Giuseppe

Interesting stuff about the help_topic table, thanks. I’d just worked out how to calculate the Fibonacci sequence on my blog (using one of my defined tables to join against) as part of my investigation into solving problems on the Euler project. http://www.oxfordtechnotes.co.uk/sqlblog/blog4.php/2009/08/29/project-euler-q2-with-mysql I may well use the help_topic table in future.

I mostly use UNIONs for this purpose:

SELECT a.x*100+b.x*10+c.x
FROM (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) a, (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) b, (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) c

I don’t know what’s better. You don’t need more priviledges for this than SELECT and it works with 4.1, but of course that puts more load onto the parser and you have much more UNIONs. I’m hoping that generating the bulk of the data with JOINs uses optimized code in the MySQL server like the JOIN buffer and temporary tables and that the overhead doesn’t matter any more.

@Giuseppe,
Thanks for the references; Very interesting!

@Mark,
What a coincidence!

@strcmp,
My personal view is that shorter is usually better. Your solution has the property of choosing exactly 1000 thousand values, though.

as a former assembler and C programmer and number cruncher iterative code like “@counter := @counter + 1” looks suboptimal to me, because it is a data dependency between the iterations, forcing the code to be executed serially. of course that’s just me and a totally useless comment right now, but even MySQL may once leave the stone ages and execute JOINs and table/index scans in parallel… bulk operations ‘feel’ better.

Giuseppe Maxia says:

September 1, 2009 at 9:54 am

On the same vein, you may like these two posts:

http://datacharmer.blogspot.com/2007/12/data-from-nothing-solution-to-pop-quiz.html
http://datacharmer.blogspot.com/2007/12/pop-quiz-with-prize-generate-4-billion.html

Giuseppe
Mark says:

September 1, 2009 at 10:03 am

Interesting stuff about the help_topic table, thanks. I’d just worked out how to calculate the Fibonacci sequence on my blog (using one of my defined tables to join against) as part of my investigation into solving problems on the Euler project. http://www.oxfordtechnotes.co.uk/sqlblog/blog4.php/2009/08/29/project-euler-q2-with-mysql I may well use the help_topic table in future.
strcmp says:

September 1, 2009 at 11:14 am

I mostly use UNIONs for this purpose:

SELECT a.x*100+b.x*10+c.x
FROM (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) a, (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) b, (
SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) c

I don’t know what’s better. You don’t need more priviledges for this than SELECT and it works with 4.1, but of course that puts more load onto the parser and you have much more UNIONs. I’m hoping that generating the bulk of the data with JOINs uses optimized code in the MySQL server like the JOIN buffer and temporary tables and that the overhead doesn’t matter any more.
shlomi says:

September 1, 2009 at 1:15 pm

@Giuseppe,
Thanks for the references; Very interesting!

@Mark,
What a coincidence!

@strcmp,
My personal view is that shorter is usually better. Your solution has the property of choosing exactly 1000 thousand values, though.
strcmp says:

September 1, 2009 at 2:52 pm

as a former assembler and C programmer and number cruncher iterative code like “@counter := @counter + 1” looks suboptimal to me, because it is a data dependency between the iterations, forcing the code to be executed serially. of course that’s just me and a totally useless comment right now, but even MySQL may once leave the stone ages and execute JOINs and table/index scans in parallel… bulk operations ‘feel’ better.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Still feels unsafe?

Number generation

Conclusion

9 thoughts on “Generating numbers out of seemingly thin air”

Leave a Reply