Why delegating code to MySQL Stored Routines is poor engineering practice

I happen to use stored routines with MySQL. In fact, my open source project common_schema heavily utilizes them. DBA-wise, I think they provide with a lot of power (alas, the ANSI:SQL 2003 syntax feels more like COBOL than a sane programming language, which is why I use QueryScript instead).

However I wish to discuss the use of stored routines as integral part of your application code, which I discourage.

The common discussion on whether to use or not use stored routines typically revolves around data transfer (with stored routines you transfer less data since it’s being processed on server side), security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API) and performance (with MySQL this is not what you would expect, as routines are interpreted & their queries re-evaluated, as opposed to other RDBMS you may be used to).

But I wish to discuss the use of stored routines from an engineering standpoint. The first couple of points I raise are cultural/behavioural.

2nd grade citizens

Your stored routines are not likely to integrate well with your IDE. While your Java/Scala/PHP/Ruby/whatnot code comfortably lies within your home directory, the stored routines live in their own space: a database container. They’re not as visible to you as your standard code. Your IDE is unaware of their existence and is unlikely to have the necessary plugin/state of mind to be able to view these.

This leads to difficulty in maintaining the code. People typically resort to using some SQL-oriented GUI tool such as MySQL Workbench, SequelPro or other, commercial tools. But these tools, while make it easy to edit your routine code, do not integrate (well?) with your source control. I can’t say I’ve used all GUI tools; but how many of them will have Git/SVN/Mercurial connectors? How many of them will keep local history changes once you edit a routine? I’m happy to get introduced to such a tool.

Even with such integration, you’re split between two IDEs. And if you’re the command line enthusiast, well, you can’t just svn ci -m “fixed my stored procedure bug”. Your code is simply not in your trunk directory.

It can be done. You could maintain the entire routine code from within your source tree, and hats off to all those who do it. Most will not. See later on about deployments for more on this.

Testing

While engineers are keen on writing unit tests for every class and method they create, they are less keen on doing the same for stored routines. This is an observation, having seen many instalments. And I can tell you why: your stored routine testing will not integrate well with your JUnit/PHPUnit/…

There are testing frameworks for databases, and indeed I hacked my own mini unit testing code with common_schema. But it’s a different testing framework. You might also have realized by now that testing databases is somewhat different. It can be done, and hats off again to those that implement it as common practice. Many don’t. Database are often more heavyweight to test. Not all operations done by routines are easily rolled back, which leads to having to rebuild the entire dataset before tests. This in itself leads to longer test periods and a need for multiple test databases so as to allow for concurrent builds.

How many companies practice both version control and unit testing over their routine code? I believe not many (and am happy to hear about those who do). To be more direct, of all the companies I ever consulted to: I have never seen one that does both.

Debugging

MySQL stored routines do not have built in debugging capabilities. To debug your routines, you will have to use one of two methods:

Simulate your routine code (ie mimic their execution on top of some interpreter). There are tools to do that. For me this is a complete NO GO and utterly untrustworthy. You can mimic what you think is how the routine should behave, but never they full behaviour. While developing common_schema I came upon plenty weird behaviour, some of it bugs, that you just can’t build into your emulation.
Inject debugging code into your routine code. I do that with RDebug. You can do breakpoints, step into, step out, most of the interesting stuff. Other tools do that as well. It is not the right way to go: you’re essentially modifying your code, placing more locks, communicating, and losing some functionality. It is a necessary evil solution for a necessary evil programming method… How good can that be?

The right way to go would be to have debugging API built into the MySQL server.

But, wait, that would still be next to worthless, since our discussion is over programming with stored routines: letting your application call upon stored routines in your database. Until the day where I could use my IntelliJ debugger to step from my java method which calls upon a stored procedure, and into the stored procedure itself, debugging your code is completely detached from your stored routine debugging.

Refactoring & deploying

Say you wanted to add a column to your table: you would go ahead and add it, and perhaps populate it. You would then modify your application code to support this new column, and deploy. Say you wanted to drop a table column. You would first deploy changes to your application code that ignore said column, and once the code is in place you would go and actually make the DROP.

How do you do the same with a stored routine? Support your routine accepts two parameters, and you wish to add a third?

There is no support for optional parameters. Your routine either accepts two parameters or three. Your application code will have to provide the exact number of parameters. You will have to deploy both your SQL changes and your application changes at the same time. This is by definition impossible, unless you are OK with a stop the world approach, which is unlikely in production.

Code constraints

One solution to the above is to create a new routines. Somehow “overload” it. But you can’t overload a stored routine; you’ll have to create a routine by a new name. This will allow you to slowly and smoothly migrate between the two.

Ahem, smoothly? How easy is it to find all invocations of a certain routines from your code? It will be typically lie in some String, or within some XML config file. There is no safe “find references to this procedure” IDE mechanism. There is no constraint in your IDE that will tell you “there is no such procedure” if you misspell the name.

Trash bin

Suppose you overcame the above. You now have two routines. You need to remember to DROP the old one, right? Will you?

When presenting common_schema, a common question I ask the audience is as follows:

Suppose I accessed your database and listed the entire set of stored functions and procedures. How many of them are you not even sure are in use anymore? How many of them you think you can DROP, but are too afraid to, and keep them in just in case?

I wouldn’t commonly ask that question had it not always provides a common nodding and smiling in the audience. People forget to drop their routines, and then forget about them, and are never sure whether they are used (your IDE doesn’t easily tell you that, remember? Sure, you can grep around; that’s not what most engineers would do). And those routines pile up to become trash.

Data or code?

Last but not least: a stored routine is a piece of code, right? Well, as far as the database is concerned, it’s really a piece of data. It’s located within a schema. It’s stored. It is an integral part of your data set: when you back up your data, you’re most likely to backup the code as well. When you restore, you’re likely to restore both. There are obvious advantages to that, DB-wise. Or should I say, DBA-wise. Engineering-wise? Does a database-restore operation count as code deployment? We can argue over beer.

Final notes

Having said all that: yes, I’m using an occasional stored routine. I see these occasions as a necessary evil, and sometimes it’s just the correct solution.

I’m happy to know what methods have been developed out there to overcome the above, please share; and please feel free to contradict the above.

11 thoughts on “Why delegating code to MySQL Stored Routines is poor engineering practice”

Mark Leith says:

February 7, 2014 at 12:46 am

For debugging (and even tracing usage pre 5.7 and the new stored program instrumentation that Marc mentions above), I don’t know why more people don’t instrument stored programs like they do applications – add logging.

I’m a great proponent of having the database manage itself for routine tasks, and have written routines that do things like daily partition management, killing long running transactions, etc. (many things like common schema), and have always built logging in to them.

The current “framework” I use is a standard InnoDB table with timestamp, program name, log level (debug, notice, warning, critical) and a string message. A procedure and or function that wraps inserting the messages (takes level, program name and message), and a table that stores the config for each program (name, log level set). Each program fetches it’s own config and then starts to log at the appropriate level depending on the flag. These can range from logging actions taken, to performance debugging (because the logs have a millisecond resolution timestamp).

Now, the extra overhead this might add to performance critical routines may not be insignificant, but you can set it to not log too, and skip most of it (other than config check).

The log table itself is partitioned, and managed with events with a rolling window stored (that log to the log table!)
Justin Swanhart says:

February 7, 2014 at 3:07 am

Mark,

Flexviews does just that. The logging function checks a session variable and only does the inserts in debug mode to reduce overhead.
Bill Karwin says:

February 9, 2014 at 2:15 am

I have also observed heavy reliance on stored procedures become a bottleneck for performance.

The environment had many PHP app servers that were mostly idle, while the single database server whose CPU load was pegged.

It turned out the project had a policy that all SQL code was mandated to be implemented as stored procedures. Many of the procedures did dynamic SQL, and had to do complex validation of input parameters against INFORMATION_SCHEMA to assure against SQL injection.

So a disproportionate amount of application load was focused on the single db server, instead of distributed over many app servers where it could be scaled better.
Daniël van Eeden says:

February 11, 2014 at 10:47 pm

I think Stored routines should be used with care and only in cases which would heavily bennefit from running close to the data or which can’t easily be done otherwise. And also for things which need to be scheduled with the event scheduler.

There are a few other alternatives to SQL/PSM: Using UDFs (also a great way to crash a server!) or something which is ran by a UDF like the V8 UDF from Roland Bouman: http://www.slideshare.net/rpbouman/my-sq-lv8udfs
Shantanu Oak says:

March 6, 2014 at 3:57 pm

PHP has great testing/ debugging tools and cool IDE is available for free. And I still see a lot of sloppy code. For simple tasks I do always insist on procedures. As noted by Bill Karwin I have seen companies where they had a policy that all SQL (MS-SQL) code was mandated to be implemented as stored procedures. I do not agree with such an aggressive view.