Why delegating code to MySQL Stored Routines is poor engineering practice

February 6, 2014

I happen to use stored routines with MySQL. In fact, my open source project common_schema heavily utilizes them. DBA-wise, I think they provide with a lot of power (alas, the ANSI:SQL 2003 syntax feels more like COBOL than a sane programming language, which is why I use QueryScript instead).

However I wish to discuss the use of stored routines as integral part of your application code, which I discourage.

The common discussion on whether to user or not use stored routines typically revolves around data transfer (with stored routines you transfer less data since it's being processed on server side), security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API) and performance (with MySQL this is not what you would expect, as routines are interpreted & their queries re-evaluated, as opposed to other RDBMS you may be used to).

But I wish to discuss the use of stored routines from an engineering standpoint. The first couple of points I raise are cultural/behavioural.

2nd grade citizens

Your stored routines are not likely to integrate well with your IDE. While your Java/Scala/PHP/Ruby/whatnot code comfortably lies within your home directory, the stored routines live in their own space: a database container. They're not as visible to you as your standard code. Your IDE is unaware of their existence and is unlikely to have the necessary plugin/state of mind to be able to view these.

This leads to difficulty in maintaining the code. People typically resort to using some SQL-oriented GUI tool such as MySQL Workbench, SequelPro or other, commercial tools. But these tools, while make it easy to edit your routine code, do not integrate (well?) with your source control. I can't say I've used all GUI tools; but how many of them will have Git/SVN/Mercurial connectors? How many of them will keep local history changes once you edit a routine? I'm happy to get introduced to such a tool.

Even with such integration, you're split between two IDEs. And if you're the command line enthusiast, well, you can't just svn ci -m "fixed my stored procedure bug". Your code is simply not in your trunk directory.

It can be done. You could maintain the entire routine code from within your source tree, and hats off to all those who do it. Most will not. See later on about deployments for more on this.

Testing

While engineers are keen on writing unit tests for every class and method they create, they are less keen on doing the same for stored routines. This is an observation, having seen many instalments. And I can tell you why: your stored routine testing will not integrate well with your JUnit/PHPUnit/...

There are testing frameworks for databases, and indeed I hacked my own mini unit testing code with common_schema. But it's a different testing framework. You might also have realized by now that testing databases is somewhat different. It can be done, and hats off again to those that implement it as common practice. Many don't. Database are often more heavyweight to test. Not all operations done by routines are easily rolled back, which leads to having to rebuild the entire dataset before tests. This in itself leads to longer test periods and a need for multiple test databases so as to allow for concurrent builds.

How many companies practice both version control and unit testing over their routine code? I believe not many (and am happy to hear about those who do). To be more direct, of all the companies I ever consulted to: I have never seen one that does both.

Debugging

MySQL stored routines have built in debugging capabilities. To debug your routines, you will have to use one of two methods:

  • Simulate your routine code (ie mimic their execution on top of some interpreter). There are tools to do that. For me this is a complete NO GO and utterly untrustworthy. You can mimic what you think is how the routine should behave, but never they full behaviour. While developing common_schema I came upon plenty weird behaviour, some of it bugs, that you just can't build into your emulation.
  • Inject debugging code into your routine code. I do that with RDebug. You can do breakpoints, step into, step out, most of the interesting stuff. Other tools do that as well. It is not the right way to go: you're essentially modifying your code, placing more locks, communicating, and losing some functionality. It is a necessary evil solution for a necessary evil programming method... How good can that be?

The right way to go would be to have debugging API built into the MySQL server.

But, wait, that would still be next to worthless, since our discussion is over programming with stored routines: letting your application call upon stored routines in your database. Until the day where I could use my IntelliJ debugger to step from my java method which calls upon a stored procedure, and into the stored procedure itself, debugging your code is completely detached from your stored routine debugging.

Refactoring & deploying

Say you wanted to add a column to your table: you would go ahead and add it, and perhaps populate it. You would then modify your application code to support this new column, and deploy. Say you wanted to drop a table column. You would first deploy changes to your application code that ignore said column, and once the code is in place you would go and actually make the DROP.

How do you do the same with a stored routine? Support your routine accepts two parameters, and you wish to add a third?

There is no support for optional parameters. Your routine either accepts two parameters or three. Your application code will have to provide the exact number of parameters. You will have to deploy both your SQL changes and your application changes at the same time. This is by definition impossible, unless you are OK with a stop the world approach, which is unlikely in production.

Code constraints

One solution to the above is to create a new routines. Somehow "overload" it. But you can't overload a stored routine; you'll have to create a routine by a new name. This will allow you to slowly and smoothly migrate between the two.

Ahem, smoothly? How easy is it to find all invocations of a certain routines from your code? It will be typically lie in some String, or within some XML config file. There is no safe "find references to this procedure" IDE mechanism. There is no constraint in your IDE that will tell you "there is no such procedure" if you misspell the name.

Trash bin

Suppose you overcame the above. You now have two routines. You need to remember to DROP the old one, right? Will you?

When presenting common_schema, a common question I ask the audience is as follows:

Suppose I accessed your database and listed the entire set of stored functions and procedures. How many of them are you not even sure are in use anymore? How many of them you think you can DROP, but are too afraid to, and keep them in just in case?

I wouldn't commonly ask that question had it not always provides a common nodding and smiling in the audience. People forget to drop their routines, and then forget about them, and are never sure whether they are used (your IDE doesn't easily tell you that, remember? Sure, you can grep around; that's not what most engineers would do). And those routines pile up to become trash.

Data or code?

Last but not least: a stored routine is a piece of code, right? Well, as far as the database is concerned, it's really a piece of data. It's located within a schema. It's stored. It is an integral part of your data set: when you back up your data, you're most likely to backup the code as well. When you restore, you're likely to restore both. There are obvious advantages to that, DB-wise. Or should I say, DBA-wise. Engineering-wise? Does a database-restore operation count as code deployment? We can argue over beer.

Final notes

Having said all that: yes, I'm using an occasional stored routine. I see these occasions as a necessary evil, and sometimes it's just the correct solution.

I'm happy to know what methods have been developed out there to overcome the above, please share; and please feel free to contradict the above.

tags: , , , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

11 Comments to "Why delegating code to MySQL Stored Routines is poor engineering practice"

  1. gggeek wrote:

    Having used extensively pl/sql in the past, it seems to me that most of these points are either moot or only apply to the particular flavour of stored-procedures available in mysql, but not in other rdbms.

    - code versioning, debugging: there is a plethora of tools and techniques to help you do that. Moot point.
    I personally do not use a single IDE all day but switch constantly between a couple of programs, and do not find it cumbersome at all. Most developers I know do not use the ide for git operations either, but resort to command line (most IDEs integrations with scm tools have a habit of sucking badly, starting from the first implementations of support for svn in eclipse, and ongoing up to today).
    A good engineer has a firm understanding of the underlying principles and can adapt to different tools or craft his own. A slave to the IDE is a code monkey ;-)

    - refactoring: plain wrong.
    In fact stored routines help shield the app from the db schema implementation, as they expose an API which is closer to the business logic than to storage concerns.
    When you mention adding a column, you forget to mention that all app code written directly accessing the table must not have a select *, or there are still chances it might break.
    And what about changing the semantics/type of a column or table, or dropping it altogether? Could you do an upgrade-without-stop-the-world when NOT using stored procedures? Would direct-table-access help?
    Even without taking into account the fact that other dbs support optional params and even named ones, versioning the db API just like you'd version a rest API is just so much better.
    And there is a good chance that grepping through the code for a proc invocation is easier than to find all access to a table, if code creates sql in a structured way instead of hardcoding it everywhere

    - testing: lazy devs do not write tests for their app code, either.
    And let's not even start on the holy wars of unit-testing-vs-functional-testing, the neverending quest for perfect-mock-objects, the optimize-code-for-unit-test-execution-time-instead-of-runtime-speed conundrum.
    Definitely not a solved problem in application-code land.

    - trash bin: doesn't it apply equally to table-based access. Actually, in a stored proc you can add a logging statement and find out when it is used and by which db session, including the ip and name of the client app. Not sure this is as easy to do with a table (maybe via a trigger? which is itself a stored proc ;-) )

    - data/or/code: this one is also an actual advantage. The other approach is to actually store the ddl for creating the schema along with the code in the scm system, since both have to be versioned and in sync. If the code for accessing the schema is backed up along the schema itself, you can just forget about that

    The benefits you seem to ignore:

    - efficiency. Putting code close to the data gives you order of magnitude improvements over using an ORM in speed of execution and resource consumption

    - multi-app access. Encapsulating data access in an API layer means you can write apps in java, php, c#, without having to reimplement the same data consistency checks and table joins in all of them

    And the elephant in the room:

    - migrating to a different database vendor.
    Basically impossible to do when you're based on stored procs. Doable with a non-negligible effort when your using an ORM or db-abstraction layer and always being extra careful you do not use specific features of a db, which includes of course all tricks to make it fast (simple examples: is your mysql treating '' and NULL as semantically different? can not go to oracle! and what if your table names are longer than 30 chars? or your IN clauses contain more than 1000 elements? etc...).
    The main question here is: will my app ever need to migrate?
    In 16 years of practice I have seen this happen once (for project-apps, of course product-apps can benefit from day 1 of multi-db support). It took more than 3 years of work.
    So just evaluate carefully the chances that this will happen to your application.

    To sum it up: in my experience Stored Procedures are actually BETTER Engineering Practice, but quite underestimated by uneducated developers who only think in terms of their current language/framework/tool

  2. gggeek wrote:

    ps: not to sound too much like a fanboy / grumpy old fart:

    a) big drawback of using stored procs for your whole application: yes, the db-based languages suck big time compared to ruby/php/python

    b) I am not against IDEs in general. When coding on Oracle, I always asked my employer to shell out for a license of TOAD before I'd sign the contract

  3. shlomi wrote:

    @gggeek,

    Thank you. Good comments; allow me to reply to some:

    - "Having used extensively pl/sql in the past, it seems to me that most of these points are either moot or only apply to the particular flavour of stored-procedures available in mysql, but not in other rdbms."

    -> Exactly so. The title of this post clearly states "MySQL", and specifically indicates the use of MySQL stored routines.

    - Debugging: please suggest "a plethora of tools and techniques to help you do that" *on MySQL* :)

    - "In fact stored routines help shield the app from the db schema implementation, as they expose an API which is closer to the business logic than to storage concerns."

    -> agreed. I mention this in the text: "...security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API)... "

    - "When you mention adding a column, you forget to mention that all app code written directly accessing the table must not have a select *, or there are still chances it might break.
    And what about changing the semantics/type of a column or table, or dropping it altogether? Could you do an upgrade-without-stop-the-world when NOT using stored procedures? Would direct-table-access help?"

    -> Yes, there are ways to refactor a table without-top-the-world when not using stored procedures. Ironically, I am author of the first such tool for MySQL called oak-online-alter-table, later forked by Facebook and re-implemented by Percona, and it uses triggers, which are some form of stored routines in MySQL as you indicated. This solution allows direct table access even while table is being altered.

    - "testing: lazy devs do not write tests for their app code, either."

    -> I agree. However, when you throw a developer into an environment that has poor support for testing, as opposed to throwing her into an environment with strong support for testing, where would you expect said developer to be more tests-productive?

    - "efficiency. Putting code close to the data gives you order of magnitude improvements over using an ORM in speed of execution and resource consumption"

    -> I did in fact mention that; I also stated that this post does not zoom into this.

    - multi-app access. Encapsulating data access in an API layer means you can write apps in java, php, c#, without having to reimplement the same data consistency checks and table joins in all of them

    -> Absolutely agree. But also to the point of having a single server possibly do the CPU computation that multiple other servers might have done themselves.

    - migrating to a different database vendor.

    -> Hear here. I have yet to see a large scale database use that is completely abstractized.

    I believe you come from the world of Oracle DB. I am well accustomed to training Oracle or SQL Server DBAs migrating to MySQL. The MySQL stored routines are not nearly as strong or fast as those in Oracle & SQL Server. There is a painful switch such DBAs have to make when switching over to MySQL.

    Thanks again

  4. Marc Alff wrote:

    Just a couple of points to notice.

    About debugging, one can (agreed, it takes some extra work) add asserts -- yes, assert -- in the stored program logic.

    http://stackoverflow.com/questions/9238978/sql-scripts-does-the-equivalent-of-a-define-exist/9276721#9276721

    And up vote me while you are at it. Yes, shameless ;)

    About the trash bin, the performance schema can help.

    SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME
    FROM events_statements_summary_by_program
    WHERE COUNT_STAR = 0

    Regards,
    -- Marc

  5. shlomi wrote:

    @Marc,
    LOL, precompile for SQL is hack of the day!

    Thanks for the performance_schema pointer. It seems like p_s turns into the answer for the universe.

  6. Mark Leith wrote:

    For debugging (and even tracing usage pre 5.7 and the new stored program instrumentation that Marc mentions above), I don't know why more people don't instrument stored programs like they do applications - add logging.

    I'm a great proponent of having the database manage itself for routine tasks, and have written routines that do things like daily partition management, killing long running transactions, etc. (many things like common schema), and have always built logging in to them.

    The current "framework" I use is a standard InnoDB table with timestamp, program name, log level (debug, notice, warning, critical) and a string message. A procedure and or function that wraps inserting the messages (takes level, program name and message), and a table that stores the config for each program (name, log level set). Each program fetches it's own config and then starts to log at the appropriate level depending on the flag. These can range from logging actions taken, to performance debugging (because the logs have a millisecond resolution timestamp).

    Now, the extra overhead this might add to performance critical routines may not be insignificant, but you can set it to not log too, and skip most of it (other than config check).

    The log table itself is partitioned, and managed with events with a rolling window stored (that log to the log table!)

  7. Justin Swanhart wrote:

    Mark,

    Flexviews does just that. The logging function checks a session variable and only does the inserts in debug mode to reduce overhead.

  8. Bill Karwin wrote:

    I have also observed heavy reliance on stored procedures become a bottleneck for performance.

    The environment had many PHP app servers that were mostly idle, while the single database server whose CPU load was pegged.

    It turned out the project had a policy that all SQL code was mandated to be implemented as stored procedures. Many of the procedures did dynamic SQL, and had to do complex validation of input parameters against INFORMATION_SCHEMA to assure against SQL injection.

    So a disproportionate amount of application load was focused on the single db server, instead of distributed over many app servers where it could be scaled better.

  9. Daniƫl van Eeden wrote:

    I think Stored routines should be used with care and only in cases which would heavily bennefit from running close to the data or which can't easily be done otherwise. And also for things which need to be scheduled with the event scheduler.

    There are a few other alternatives to SQL/PSM: Using UDFs (also a great way to crash a server!) or something which is ran by a UDF like the V8 UDF from Roland Bouman: http://www.slideshare.net/rpbouman/my-sq-lv8udfs

  10. Shantanu Oak wrote:

    PHP has great testing/ debugging tools and cool IDE is available for free. And I still see a lot of sloppy code. For simple tasks I do always insist on procedures. As noted by Bill Karwin I have seen companies where they had a policy that all SQL (MS-SQL) code was mandated to be implemented as stored procedures. I do not agree with such an aggressive view.

  11. Bob Hairgrove wrote:

    Good article with food for thought! Here are my 2 cents:

    Stored procedures are really important for implementing security features IMHO. For example, you can give your MySQL user which logs into the database from a web application just SELECT and EXECUTE privileges. Any CRUD operations will have to go through SP's running with DEFINER privileges (assuming that the definer has CRUD privileges), but for most adequately normalized schemata, these usually require operations on multiple tables which need to be executed within a transaction as a logical unit. Keeping that logic inside a SP makes application development much simpler IMHO and eliminates the need for CRUD privileges at the table level. SELECT statements can be executed normally by client application code since they are read-only. And since MySQL lacks the ROLE features of Oracle, PostgreSQL et al., you can mimic that by implementing login validation checks within a SP.

    And speaking of table-level access, dealing with adding and deleting columns in a table, etc. -- here is where you would find views useful. Views and SP's provide the best means of implementation and data abstraction.

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org