Limiting table disk quota in MySQL

Question asked by a student: is there a way to limit a table’s quote on disk? Say, limit a table to 2GB, after which it will refuse to grow? Note that the requirement is that rows are never DELETEd. The table must simply refuse to be updated once it reaches a certain size.

There is no built-in way to limit a table’s quota on disk. First thing to observe is that MySQL has nothing to do with this. It is entirely up to the storage engine to provide with such functionality. The storage engine is the one to handle data storage: how table and keys are stored on disk. Just consider the difference between MyISAM’s .MYD & .MYI to InnoDB’s shared tablespace ibdata1 to InnoDB’s file-per table .ibd files.

The only engine I know of that has a quota is the MEMORY engine: it accepts the max_heap_table_size, which limits the size of a single table in memory. Hrmmm… In memory…

Why limit?

I’m not as yet aware of the specific requirements of said company, but this is not the first time I heard this question.

The fact is: when MySQL runs out of disk space, it goes with a BOOM. It crashed ungracefully, with binary logs being out of sync, replication being out of sync. To date, and I’ve seen some cases, InnoDB merely crashes and manages to recover once disk space is salvaged, but I am not certain this is guaranteed to be the case. Anyone?

And, with MyISAM…, who knows?

Rule #1 of MySQL disk usage: don’t run out of disk space.

Workarounds

I can think of two workarounds, none of which is pretty. The first involves triggers (actually, a few variations for this one), the second involves privileges. Continue reading » “Limiting table disk quota in MySQL”

Thoughts and ideas for Online Schema Change

Here’s a few thoughts on current status and further possibilities for Facebook’s Online Schema Change (OSC) tool. I’ve had these thoughts for months now, pondering over improving oak-online-alter-table but haven’t got around to implement them nor even write them down. Better late than never.

The tool has some limitations. Some cannot be lifted, some could. Quoting from the announcement and looking at the code, I add a few comments. I conclude with a general opinion on the tool’s abilities.

“The original table must have PK. Otherwise an error is returned.”

This restriction could be lifted: it’s enough that the table has a UNIQUE KEY. My original oak-online-alter-table handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.

However, this restriction is of no real interest. As we’re mostly interested in InnoDB tables, and since any InnoDB table should have a PRIMARY KEY, we shouldn’t care too much.

“No foreign keys should exist. Otherwise an error is returned.”

Tricky stuff. With oak-online-alter-table, changes to the original table were immediately reflected in the ghost table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn’t be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).

The Facebook patch works differently: it captures changes and writes them to a delta table,  to be later (asynchronously) analyzed and make for a replay of actions on the ghost table. Continue reading » “Thoughts and ideas for Online Schema Change”

Triggers Use Case Compilation, Part III

The previous two parts have looked at some solutions offered by triggers. Let’s look now at some wishful triggers solutions, which are currently unavailable because of triggers limitations.

Triggers Use Case Compilation, Part I

Triggers Use Case Compilation, Part II

Limitations and wishful features

Triggers are slow

The overhead of adding triggers is usually an even breaker. But I would like to believe speed will improve in time!

Triggers cannot act on the same table which activated them.

A thing I would like to do is have a rotating table. A log table is a perfect example: I only want to store logs up to 7 days back, or up to 1M rows. ON INSERT, (or once every 1000 inserts or so), I wish to remove oldest rows. This is not possible today since I can’t DELETE rows from the same table which caused the ON INSERT trigger to run. It can’t be hacked by calling on another table, then doing a circular trigger trick. MySQL will raise an error on run time, complaining about a loop.

Continue reading » “Triggers Use Case Compilation, Part III”

Triggers Use Case Compilation, Part II

In Triggers Use Case Compilation, Part I, I’ve demonstrated some triggers use scenarios.

We continue our examples of triggers usage.

Counters and aggregations bookkeeping

Consider the City table: each city belongs to a certain country. Some questions we may be interested in are:

  • How many cities are there per country?
  • What’s the sum of cities population per country?
  • What’s the population of the largest city per country?

Answering any of these questions is an easy SQL excercise. But aggregation is required, and full table scan (or full index scan, if we’re lucky) is essentially part of any execution plan. What if we can’t pay the price for these queries? What if we need immediate, or near immediate response?

Continue reading » “Triggers Use Case Compilation, Part II”

Triggers Use Case Compilation, Part I

I’ve run by quite a few triggers lately on production systems. In previous posts, I’ve written about problems solved with triggers. So here’s a compilation of some solutions based on triggers; and some problems which are not (yet?) solvable due to current triggers limitations.

Triggers can be used to:

  • Maintain integrity
  • Enhance security
  • Enhance logging
  • Assist with archiving
  • Restrict table size
  • Manage caching
  • Manage counters

Triggers are not fast. In fact, they can add quite an overhead if misused. Some of the triggers presented here are known to work on real life production systems, though, and work well. But make sure you benchmark before embarking on extensive application changes. Continue reading » “Triggers Use Case Compilation, Part I”

Using triggers to block malicious code: an example

Web applications face constant exploitation attempts. Those with a user base must keep their users’ private data, well… private.

While the MySQL security model allows restricting users access to databases, tables and even columns, it has no built in feature for restricting the rows access within the given table.

One cannot allow a user to only update rows 0 through 99, but restrict that user from updating rows 100 to 199. Such restrictions are usually managed in the application level, by adding a necessary “… AND filtering_column = some_value…”

Many web application have the notion of an ‘admin’ account, or several such accounts, which provide greater control over the application. The ‘admin’ account is one account to which many attacks are targeted. One such attack is an attempt to modify the admin’s password, such that the attacker can later log in with and access restricted data. Continue reading » “Using triggers to block malicious code: an example”

Using memcached functions for MySQL; an automated alternative to Query Cache

There’s a lot of buzz around memcached. memcached is widely used, and has clients for many programming languages and platforms. TangentOrg have developed a memcached client in the form of MySQL UDFs (User Defined Functions).

I wish to discuss the memcached functions for MySQL: if and how they should be used.

Disclaimer: I do not work with memcached functions for MySQL on a production system; all that is written here reflects my opinion on how things should be done.

With memcached functions for MySQL, we can do the following:

SELECT memc_set('mykey', 'The answer is 42');
SELECT memc_get('mykey');

(See my previous post on how to install memcached functions for MySQL).

In what scenario should we use these functions?

I believe memcached is the right tool for the application level. I am less enthusiastic about using it from MySQL. Sure, pushing it down to MySQL centralizes everything. Instead of having all my application code (PHP, Java etc.) access memcached separately, they can all access one single MySQL node, which gets to access memcached. I see two problems with this approach: Continue reading » “Using memcached functions for MySQL; an automated alternative to Query Cache”