Why delegating code to MySQL Stored Routines is poor engineering practice

I happen to use stored routines with MySQL. In fact, my open source project common_schema heavily utilizes them. DBA-wise, I think they provide with a lot of power (alas, the ANSI:SQL 2003 syntax feels more like COBOL than a sane programming language, which is why I use QueryScript instead).

However I wish to discuss the use of stored routines as integral part of your application code, which I discourage.

The common discussion on whether to use or not use stored routines typically revolves around data transfer (with stored routines you transfer less data since it’s being processed on server side), security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API) and performance (with MySQL this is not what you would expect, as routines are interpreted & their queries re-evaluated, as opposed to other RDBMS you may be used to).

But I wish to discuss the use of stored routines from an engineering standpoint. The first couple of points I raise are cultural/behavioural.

2nd grade citizens

Your stored routines are not likely to integrate well with your IDE. While your Java/Scala/PHP/Ruby/whatnot code comfortably lies within your home directory, the stored routines live in their own space: a database container. They’re not as visible to you as your standard code. Your IDE is unaware of their existence and is unlikely to have the necessary plugin/state of mind to be able to view these.

This leads to difficulty in maintaining the code. People typically resort to using some SQL-oriented GUI tool such as MySQL Workbench, SequelPro or other, commercial tools. But these tools, while make it easy to edit your routine code, do not integrate (well?) with your source control. I can’t say I’ve used all GUI tools; but how many of them will have Git/SVN/Mercurial connectors? How many of them will keep local history changes once you edit a routine? I’m happy to get introduced to such a tool.

Even with such integration, you’re split between two IDEs. And if you’re the command line enthusiast, well, you can’t just svn ci -m “fixed my stored procedure bug”. Your code is simply not in your trunk directory.

It can be done. You could maintain the entire routine code from within your source tree, and hats off to all those who do it. Most will not. See later on about deployments for more on this.

Testing

While engineers are keen on writing unit tests for every class and method they create, they are less keen on doing the same for stored routines. This is an observation, having seen many instalments. And I can tell you why: your stored routine testing will not integrate well with your JUnit/PHPUnit/…

There are testing frameworks for databases, and indeed I hacked my own mini unit testing code with common_schema. But it’s a different testing framework. You might also have realized by now that testing databases is somewhat different. It can be done, and hats off again to those that implement it as common practice. Many don’t. Database are often more heavyweight to test. Not all operations done by routines are easily rolled back, which leads to having to rebuild the entire dataset before tests. This in itself leads to longer test periods and a need for multiple test databases so as to allow for concurrent builds.

How many companies practice both version control and unit testing over their routine code? I believe not many (and am happy to hear about those who do). To be more direct, of all the companies I ever consulted to: I have never seen one that does both.

Debugging

MySQL stored routines do not have built in debugging capabilities. To debug your routines, you will have to use one of two methods:

  • Simulate your routine code (ie mimic their execution on top of some interpreter). There are tools to do that. For me this is a complete NO GO and utterly untrustworthy. You can mimic what you think is how the routine should behave, but never they full behaviour. While developing common_schema I came upon plenty weird behaviour, some of it bugs, that you just can’t build into your emulation.
  • Inject debugging code into your routine code. I do that with RDebug. You can do breakpoints, step into, step out, most of the interesting stuff. Other tools do that as well. It is not the right way to go: you’re essentially modifying your code, placing more locks, communicating, and losing some functionality. It is a necessary evil solution for a necessary evil programming method… How good can that be?

The right way to go would be to have debugging API built into the MySQL server.

But, wait, that would still be next to worthless, since our discussion is over programming with stored routines: letting your application call upon stored routines in your database. Until the day where I could use my IntelliJ debugger to step from my java method which calls upon a stored procedure, and into the stored procedure itself, debugging your code is completely detached from your stored routine debugging.

Refactoring & deploying

Say you wanted to add a column to your table: you would go ahead and add it, and perhaps populate it. You would then modify your application code to support this new column, and deploy. Say you wanted to drop a table column. You would first deploy changes to your application code that ignore said column, and once the code is in place you would go and actually make the DROP.

How do you do the same with a stored routine? Support your routine accepts two parameters, and you wish to add a third?

There is no support for optional parameters. Your routine either accepts two parameters or three. Your application code will have to provide the exact number of parameters. You will have to deploy both your SQL changes and your application changes at the same time. This is by definition impossible, unless you are OK with a stop the world approach, which is unlikely in production.

Code constraints

One solution to the above is to create a new routines. Somehow “overload” it. But you can’t overload a stored routine; you’ll have to create a routine by a new name. This will allow you to slowly and smoothly migrate between the two.

Ahem, smoothly? How easy is it to find all invocations of a certain routines from your code? It will be typically lie in some String, or within some XML config file. There is no safe “find references to this procedure” IDE mechanism. There is no constraint in your IDE that will tell you “there is no such procedure” if you misspell the name.

Trash bin

Suppose you overcame the above. You now have two routines. You need to remember to DROP the old one, right? Will you?

When presenting common_schema, a common question I ask the audience is as follows:

Suppose I accessed your database and listed the entire set of stored functions and procedures. How many of them are you not even sure are in use anymore? How many of them you think you can DROP, but are too afraid to, and keep them in just in case?

I wouldn’t commonly ask that question had it not always provides a common nodding and smiling in the audience. People forget to drop their routines, and then forget about them, and are never sure whether they are used (your IDE doesn’t easily tell you that, remember? Sure, you can grep around; that’s not what most engineers would do). And those routines pile up to become trash.

Data or code?

Last but not least: a stored routine is a piece of code, right? Well, as far as the database is concerned, it’s really a piece of data. It’s located within a schema. It’s stored. It is an integral part of your data set: when you back up your data, you’re most likely to backup the code as well. When you restore, you’re likely to restore both. There are obvious advantages to that, DB-wise. Or should I say, DBA-wise. Engineering-wise? Does a database-restore operation count as code deployment? We can argue over beer.

Final notes

Having said all that: yes, I’m using an occasional stored routine. I see these occasions as a necessary evil, and sometimes it’s just the correct solution.

I’m happy to know what methods have been developed out there to overcome the above, please share; and please feel free to contradict the above.

11 thoughts on “Why delegating code to MySQL Stored Routines is poor engineering practice

  1. Having used extensively pl/sql in the past, it seems to me that most of these points are either moot or only apply to the particular flavour of stored-procedures available in mysql, but not in other rdbms.

    – code versioning, debugging: there is a plethora of tools and techniques to help you do that. Moot point.
    I personally do not use a single IDE all day but switch constantly between a couple of programs, and do not find it cumbersome at all. Most developers I know do not use the ide for git operations either, but resort to command line (most IDEs integrations with scm tools have a habit of sucking badly, starting from the first implementations of support for svn in eclipse, and ongoing up to today).
    A good engineer has a firm understanding of the underlying principles and can adapt to different tools or craft his own. A slave to the IDE is a code monkey 😉

    – refactoring: plain wrong.
    In fact stored routines help shield the app from the db schema implementation, as they expose an API which is closer to the business logic than to storage concerns.
    When you mention adding a column, you forget to mention that all app code written directly accessing the table must not have a select *, or there are still chances it might break.
    And what about changing the semantics/type of a column or table, or dropping it altogether? Could you do an upgrade-without-stop-the-world when NOT using stored procedures? Would direct-table-access help?
    Even without taking into account the fact that other dbs support optional params and even named ones, versioning the db API just like you’d version a rest API is just so much better.
    And there is a good chance that grepping through the code for a proc invocation is easier than to find all access to a table, if code creates sql in a structured way instead of hardcoding it everywhere

    – testing: lazy devs do not write tests for their app code, either.
    And let’s not even start on the holy wars of unit-testing-vs-functional-testing, the neverending quest for perfect-mock-objects, the optimize-code-for-unit-test-execution-time-instead-of-runtime-speed conundrum.
    Definitely not a solved problem in application-code land.

    – trash bin: doesn’t it apply equally to table-based access. Actually, in a stored proc you can add a logging statement and find out when it is used and by which db session, including the ip and name of the client app. Not sure this is as easy to do with a table (maybe via a trigger? which is itself a stored proc 😉 )

    – data/or/code: this one is also an actual advantage. The other approach is to actually store the ddl for creating the schema along with the code in the scm system, since both have to be versioned and in sync. If the code for accessing the schema is backed up along the schema itself, you can just forget about that

    The benefits you seem to ignore:

    – efficiency. Putting code close to the data gives you order of magnitude improvements over using an ORM in speed of execution and resource consumption

    – multi-app access. Encapsulating data access in an API layer means you can write apps in java, php, c#, without having to reimplement the same data consistency checks and table joins in all of them

    And the elephant in the room:

    – migrating to a different database vendor.
    Basically impossible to do when you’re based on stored procs. Doable with a non-negligible effort when your using an ORM or db-abstraction layer and always being extra careful you do not use specific features of a db, which includes of course all tricks to make it fast (simple examples: is your mysql treating ” and NULL as semantically different? can not go to oracle! and what if your table names are longer than 30 chars? or your IN clauses contain more than 1000 elements? etc…).
    The main question here is: will my app ever need to migrate?
    In 16 years of practice I have seen this happen once (for project-apps, of course product-apps can benefit from day 1 of multi-db support). It took more than 3 years of work.
    So just evaluate carefully the chances that this will happen to your application.

    To sum it up: in my experience Stored Procedures are actually BETTER Engineering Practice, but quite underestimated by uneducated developers who only think in terms of their current language/framework/tool

  2. ps: not to sound too much like a fanboy / grumpy old fart:

    a) big drawback of using stored procs for your whole application: yes, the db-based languages suck big time compared to ruby/php/python

    b) I am not against IDEs in general. When coding on Oracle, I always asked my employer to shell out for a license of TOAD before I’d sign the contract

  3. @gggeek,

    Thank you. Good comments; allow me to reply to some:

    – “Having used extensively pl/sql in the past, it seems to me that most of these points are either moot or only apply to the particular flavour of stored-procedures available in mysql, but not in other rdbms.”

    -> Exactly so. The title of this post clearly states “MySQL”, and specifically indicates the use of MySQL stored routines.

    – Debugging: please suggest “a plethora of tools and techniques to help you do that” *on MySQL* 🙂

    – “In fact stored routines help shield the app from the db schema implementation, as they expose an API which is closer to the business logic than to storage concerns.”

    -> agreed. I mention this in the text: “…security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API)… ”

    – “When you mention adding a column, you forget to mention that all app code written directly accessing the table must not have a select *, or there are still chances it might break.
    And what about changing the semantics/type of a column or table, or dropping it altogether? Could you do an upgrade-without-stop-the-world when NOT using stored procedures? Would direct-table-access help?”

    -> Yes, there are ways to refactor a table without-top-the-world when not using stored procedures. Ironically, I am author of the first such tool for MySQL called oak-online-alter-table, later forked by Facebook and re-implemented by Percona, and it uses triggers, which are some form of stored routines in MySQL as you indicated. This solution allows direct table access even while table is being altered.

    – “testing: lazy devs do not write tests for their app code, either.”

    -> I agree. However, when you throw a developer into an environment that has poor support for testing, as opposed to throwing her into an environment with strong support for testing, where would you expect said developer to be more tests-productive?

    – “efficiency. Putting code close to the data gives you order of magnitude improvements over using an ORM in speed of execution and resource consumption”

    -> I did in fact mention that; I also stated that this post does not zoom into this.

    – multi-app access. Encapsulating data access in an API layer means you can write apps in java, php, c#, without having to reimplement the same data consistency checks and table joins in all of them

    -> Absolutely agree. But also to the point of having a single server possibly do the CPU computation that multiple other servers might have done themselves.

    – migrating to a different database vendor.

    -> Hear here. I have yet to see a large scale database use that is completely abstractized.

    I believe you come from the world of Oracle DB. I am well accustomed to training Oracle or SQL Server DBAs migrating to MySQL. The MySQL stored routines are not nearly as strong or fast as those in Oracle & SQL Server. There is a painful switch such DBAs have to make when switching over to MySQL.

    Thanks again

  4. Just a couple of points to notice.

    About debugging, one can (agreed, it takes some extra work) add asserts — yes, assert — in the stored program logic.

    http://stackoverflow.com/questions/9238978/sql-scripts-does-the-equivalent-of-a-define-exist/9276721#9276721

    And up vote me while you are at it. Yes, shameless 😉

    About the trash bin, the performance schema can help.

    SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME
    FROM events_statements_summary_by_program
    WHERE COUNT_STAR = 0

    Regards,
    — Marc

  5. @Marc,
    LOL, precompile for SQL is hack of the day!

    Thanks for the performance_schema pointer. It seems like p_s turns into the answer for the universe.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.