Generating numbers out of seemingly thin air

September 1, 2009

In some of my previous posts I've used a numbers table, like one holding values 1, 2, 3, ..., 255. Such table can be used for string walking, joining with other tables, performing iterations.

The existence of number tables has always been a little pain. Yes, they're very, very simple, but they need to be there. So if you just need to script some SQL query, you may find that you need to create such tables. Ummm... this means you need to have privileges (at least CREATE TEMPORARY and INSERT, if not CREATE).

The other day, Baron Schwartz posted How to round to the nearest whole multiple or fraction in SQL. In an offhand way, he generated some random numbers using the mysql.help_topic table. I then realized that post solved something I've been looking for: using a sure-to-exist table on any MySQL installation.

What does the table consist of? It consists, among other columns, an incrementing help_topic_id column:

SELECT help_topic_id FROM mysql.help_topic LIMIT 10;
+---------------+
| help_topic_id |
+---------------+
|             0 |
|             1 |
|             2 |
|             3 |
|             4 |
|             5 |
|             6 |
|             7 |
|             8 |
|             9 |
+---------------+

Still feels unsafe?

The above result provides with sequential integers. But can we guarantee this? Will the numbers never have skipped values? We don't have to rely on these values. We can force them to our liking:

SELECT @counter := @counter+1 AS value
FROM mysql.help_topic, (SELECT @counter := 0) AS sel1
LIMIT 10;
+-------+
| value |
+-------+
|     1 |
|     2 |
|     3 |
|     4 |
|     5 |
|     6 |
|     7 |
|     8 |
|     9 |
|    10 |
+-------+

All we actually need is the existence of rows within this table. We don't care which columns, what their names are, and of which data types they are. Said table currently has 484 rows. One can use CROSS JOIN to achieve more than that:

SELECT @counter := @counter+1 AS value
FROM mysql.help_topic t1, mysql.help_topic t2, (SELECT @counter := 0) AS sel1
LIMIT 20000;
+-------+
| value |
+-------+
|     1 |
|     2 |
|     3 |
|     4 |
|     5 |
...
| 19992 |
| 19993 |
| 19994 |
| 19995 |
| 19996 |
| 19997 |
| 19998 |
| 19999 |
| 20000 |
+-------+

Number generation

We are now in full control of generated numbers. We don't have to generate sequential numbers. We can generate odd numbers only; multiples of 10, of PI... Following I'll be generating the Fibonacci series:

SELECT @c3 := @c1 + @c2 AS value, @c1 := @c2, @c2 := @c3
FROM mysql.help_topic, (SELECT @c1 := 1, @c2 := 0) sel1
LIMIT 15;
+-------+------------+------------+
| value | @c1 := @c2 | @c2 := @c3 |
+-------+------------+------------+
|     1 |          0 |          1 |
|     1 |          1 |          1 |
|     2 |          1 |          2 |
|     3 |          2 |          3 |
|     5 |          3 |          5 |
|     8 |          5 |          8 |
|    13 |          8 |         13 |
|    21 |         13 |         21 |
|    34 |         21 |         34 |
|    55 |         34 |         55 |
|    89 |         55 |         89 |
|   144 |         89 |        144 |
|   233 |        144 |        233 |
|   377 |        233 |        377 |
|   610 |        377 |        610 |
+-------+------------+------------+

Conclusion

Using 5.0 and above, you can also use the various INFORMATION_SCHEMA tables (e.g. INFORMATION_SCHEMA.COLLATIONS). Some of these may be slow to load, though.

When you can (and need), have a prepared numbers table. When unable to create one, you can generate such numbers using tables which are certain to exist (at least until the next major version).

tags:
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

9 Comments to "Generating numbers out of seemingly thin air"

  1. Giuseppe Maxia wrote:

    On the same vein, you may like these two posts:

    http://datacharmer.blogspot.com/2007/12/data-from-nothing-solution-to-pop-quiz.html
    http://datacharmer.blogspot.com/2007/12/pop-quiz-with-prize-generate-4-billion.html

    Giuseppe

  2. Mark wrote:

    Interesting stuff about the help_topic table, thanks. I'd just worked out how to calculate the Fibonacci sequence on my blog (using one of my defined tables to join against) as part of my investigation into solving problems on the Euler project. http://www.oxfordtechnotes.co.uk/sqlblog/blog4.php/2009/08/29/project-euler-q2-with-mysql I may well use the help_topic table in future.

  3. strcmp wrote:

    I mostly use UNIONs for this purpose:

    SELECT a.x*100+b.x*10+c.x
    FROM (
    SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
    ) a, (
    SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
    ) b, (
    SELECT 0 x UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
    ) c

    I don't know what's better. You don't need more priviledges for this than SELECT and it works with 4.1, but of course that puts more load onto the parser and you have much more UNIONs. I'm hoping that generating the bulk of the data with JOINs uses optimized code in the MySQL server like the JOIN buffer and temporary tables and that the overhead doesn't matter any more.

  4. shlomi wrote:

    @Giuseppe,
    Thanks for the references; Very interesting!

    @Mark,
    What a coincidence!

    @strcmp,
    My personal view is that shorter is usually better. Your solution has the property of choosing exactly 1000 thousand values, though.

  5. strcmp wrote:

    as a former assembler and C programmer and number cruncher iterative code like "@counter := @counter + 1" looks suboptimal to me, because it is a data dependency between the iterations, forcing the code to be executed serially. of course that's just me and a totally useless comment right now, but even MySQL may once leave the stone ages and execute JOINs and table/index scans in parallel... bulk operations 'feel' better.

  6. Toby wrote:

    strcmp, It can be phrased as a self join, but may not be efficient (even quadratic): https://slashdot.org/~toby/journal/199210

  7. Rob Wultsch wrote:

    A much cleaner way of doing what is described above can be found in generate_series ( http://www.postgresql.org/docs/8.4/static/functions-srf.html ) which is found in a different free RDMS...

  8. ajd4096 wrote:

    Don't be so sure that the help tables are *always* installed.

  9. shlomi wrote:

    @adj4096

    Would you care to elaborate?
    If so, you can use the INFORMATION_SCHEMA tables.

Leave Your Comment

 
Powered by Wordpress and MySQL. Theme by openark.org