On stored routines and dynamic statements

June 18, 2012

I very much enjoyed reading Overloading Procedures by Michael McLaughlin: good stuff!

I'm dealing with similar issues in common_schema/QueryScript, where I implement a whole new scripting language within MySQL, interpreted by stored routines. I am now finalizing the next version of common_schema/QueryScript, with a major addition to the scripting language to put yet even more power at the hands of the programmer/DBA using simple, clean syntax.

Still hush hush, the development of that feature touched at the very same issues described in Michael's post. Present in current release, these issues are intensified by the use and complexity of the new development. Here are a few insights of mine:

Internal array implementation

Like Michael, I started by implementing arrays through tables. That is, create a (temporary, in my case) table, wrap it up with a lot of stored routine code, and simulate an array. This array is not yet provided to the user, but is used internally for QueryScript's own code.

Well, disappointment here: during load tests on intense structures, such as a foreach loop, where each iteration of the loop requires the creation of an array, I found that the current solution does not hold well on busy servers.

Seemingly, there's nothing wrong with the creation of a new table every once in a while -- and in particular a temporary table. However, I quickly found out that a busy server thrashes the table cache with such intense rate of creation/dropping of tables. The competition over the table cache mutex becomes intolerable and hogs not only the script's execution but the entire server's.

There's also the issue of the type of array values -- no going around using textual columns, of course, but -- how long? A VARCHAR(32767) should be enough for any reasonable implementation, but -- how much memory would that consume? Both MEMORY and standard temporary tables (Percona Server has that partially resolved) use a fixed row format, which means a 32K text is actually allocated in memory even when your value is 'x'.

The next release completely rewrites the internal array implementation. How does it work now?

By avoiding using tables in the first place. To implement arrays now, I chose to use a well formatted text, such that can be easily parsed and manipulated.

I chose XML format, for which MySQL provides with a couple functions. Far from covering real XML manipulation requirements, but with a little effort and hacking, can provide with a reasonable base for work.

This also means the entire array resides in a single string, and this can be implemented by a User Defined Variable. So -- no tables!

Moreover, there is now no need for stored procedures, since no dynamic SQL is sued. It's all functions, which makes for an easier access. And to wrap it up -- no need to decide about value limits. What fits in the string is what you get.

This solution is not perfect, either, of course. There's more CPU used to parse/manipulate the XML (not a bug problem on my work since arrays are typically small), allocation/reallocation of memory (any manipulation assigns a new value to the User Defined Variable), and handling of NULLs is more complex.

Bottom line: overdoing CREATE TABLE/DROP TABLE, even for TEMPORARY, can be a pain. Avoid it.

Dyamic SQL

You can't invoke dynamic SQL from within dynamic SQL. There is a wide variety of additional commands that cannot be invoked by dynamic SQL.

This calls for some attention. If you want to be able to dynamically invoke server commands issued by the user, you either limit the user's choice of commands, or you find another way.

QueryScript is a classic "interpreted by an interpreter" code. I wish there was a better way, but I use stored routines to interpret QueryScript text. So the QueryScript interpreter takes care not to over do it. While just reading the script's text and sending it for dynamic invocation is certainly easier to code, it makes for said limitations. Thanks to Roland's contributions, common_schema parses user's text, analyses and find intended queries, and -- sometimes -- simply runs it without dynamic invocation. Think nested if-else statement.

Still, most user's code is indeed invoked dynamically.

Parsing by comma

If a user provides a comma delimited string, it is very dangerous to parse it via comma. The text 'one,two,three,four' seems fine, but what about 'one,two,"hi, dad"'?

So common_schema does a lot of such parsing. It takes care to properly analyze quoted text. But more than that: since the parsing of such text is costly, common_schema translates such text to a well-formed delimited text, such that there is a known, unique delimiter, not to be found within quotes, such that be splitting using said delimiter we get the definite tokenizing of the text.

To illustrate how it works: it attempts an unlikely single-character delimiter. Consider '\0'. Is '\0' found in the original text? If not, we can once analyze the proper tokenization, then replace appropriate commas with '\0'. Wait, is '\0' found anywhere in the text? Then we try the even more unlikely delimiter '\0\b', then the yet unlikelier '\0\b\r', etc.


QueryScript is a way to write stored SQL code "the way it ought to be": simple, clean, useful, dynamic, flexible, wicked cool. A lot is going on, and I have enough ideas to last a couple years worth of development. No intention to create a heavyweight, full blown language. On the contrary: the language is intended to be simple, lightweight, and provide with easy to access interface to complex SQL operations. Start the journey here.

tags: , , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Powered by Wordpress and MySQL. Theme by openark.org