Getting rid of huge ibdata file, no dump required, part II

This post continues Getting rid of huge ibdata file, no dump required, part I, where I describe way of converting your single-tablespace InnoDB database into a file-per-table one, without the pain of exporting and importing everything at once.

In previous part we put aside the issue of foreign keys. We address this issue now.

What if my InnoDB tables have foreign keys?

MyISAM does not support them, so you can’t just ALTER an InnoDB table to MyISAM and back into InnoDB, and expect everything to work.

Alas, this calls for additional steps (i.e. additional ALTER commands). However, these still fall well under the concept of “do it one table at a time, then take time to recover your breath and replication lag”.

Save , drop and restore your Foreign Keys setup

You can use common_schema‘s  sql_foreign_keys to get the full listing and create definition of your foreign keys. For example, assume we use the sakila database: Continue reading » “Getting rid of huge ibdata file, no dump required, part II”

Getting rid of huge ibdata file, no dump required

You have been told (guilty as charged), that the only way to get rid of the huge InnoDB tablespace file (commonly named ibdata1), when moving to innodb_file_per_table, is to do a logical dump of your data, completely erase everything, then import the dump.

To quickly reiterate, you can only delete the ibdata1 file when no InnoDB tables exist. Delete this file with an existing InnoDB table, even a table in its own tablespace, and nothing ever works anymore.

The problem with the dump-based solution

The impact of doing a logical dump is often overwhelming. Well, the dump may be tolerable, but the restore is much longer. The real pain is that you can’t do this one table at a time: you have to destroy everything before dropping the ibdata1 file; you then have to import everything.

Perhaps the most common scenario is that we do the changes on a slave, so as not to completely shut down our database. This is nice; no one is aware of the shutdown process. However, Huston, we have a problem: we need to make sure we can keep up the binary logs on the master for the duration of the entire process. Continue reading » “Getting rid of huge ibdata file, no dump required”

Auto caching tables

Is there a way to create a caching table, some sort of a materialized view, such that upon selecting from that table, its data is validated/invalidated?

Hint: yes.

But to elaborate the point: say I have some table data_table. Can I rewrite all my queries which access data_table to read from some autocache_data_table, but have nothing changed in the query itself? No caveats, no additional WHEREs, and still have that autocache_data_table provide with the correct data, dynamically updated by some rule of our choice?

And: no crontab, no event scheduler, and no funny triggers on data_table? In such way that invalidation/revalidation occurs upon SELECT?

Well, yes.

This post is long, but I suggest you read it through to understand the mechanism, it will be worthwhile.

Background

The following derives from my long research on how to provide better, faster and safer access to INFORMATION_SCHEMA tables. It is however not limited to this exact scenario, and in this post I provide with a simple, general purpose example. I’ll have more to share about INFORMATION_SCHEMA specific solutions shortly.

I was looking for a server side solution which would not require query changes, apart from directing the query to other tables. Solution has to be supported by all standard MySQL installs; so: no plugins, no special rebuilds. Continue reading » “Auto caching tables”

Self throttling MySQL queries

Recap on the problem:

  • A query takes a long time to complete.
  • During this time it makes for a lot of I/O.
  • Query’s I/O overloads the db, making for other queries run slow.

I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the query does not perform I/O at that time, which then means other queries can have their chance to execute.

I present two approaches:

  • The naive approach: for every 1,000 rows, the query sleep for 1 second
  • The factor approach: for every 1,000 rows, the query sleeps for the amount of time it took to iterate those 1,000 rows (effectively doubling the total runtime of the query). Continue reading » “Self throttling MySQL queries”

Limiting table disk quota in MySQL

Question asked by a student: is there a way to limit a table’s quote on disk? Say, limit a table to 2GB, after which it will refuse to grow? Note that the requirement is that rows are never DELETEd. The table must simply refuse to be updated once it reaches a certain size.

There is no built-in way to limit a table’s quota on disk. First thing to observe is that MySQL has nothing to do with this. It is entirely up to the storage engine to provide with such functionality. The storage engine is the one to handle data storage: how table and keys are stored on disk. Just consider the difference between MyISAM’s .MYD & .MYI to InnoDB’s shared tablespace ibdata1 to InnoDB’s file-per table .ibd files.

The only engine I know of that has a quota is the MEMORY engine: it accepts the max_heap_table_size, which limits the size of a single table in memory. Hrmmm… In memory…

Why limit?

I’m not as yet aware of the specific requirements of said company, but this is not the first time I heard this question.

The fact is: when MySQL runs out of disk space, it goes with a BOOM. It crashed ungracefully, with binary logs being out of sync, replication being out of sync. To date, and I’ve seen some cases, InnoDB merely crashes and manages to recover once disk space is salvaged, but I am not certain this is guaranteed to be the case. Anyone?

And, with MyISAM…, who knows?

Rule #1 of MySQL disk usage: don’t run out of disk space.

Workarounds

I can think of two workarounds, none of which is pretty. The first involves triggers (actually, a few variations for this one), the second involves privileges. Continue reading » “Limiting table disk quota in MySQL”

Personal observation: more migrations from MyISAM to InnoDB

I’m evidencing an increase in the planning, confidence & execution for MyISAM to InnoDB migration.

How much can a single consultant observe? I agree Oracle should not go to PR based on my experience. But I find that:

  • More companies are now familiar with InnoDB than there used to.
  • More companies are interested in migration to InnoDB than there used to.
  • More companies feel such migration to be safe.
  • More companies start up with an InnoDB based solution than with a MyISAM based solution.

This is the way I see it. No doubt, the Oracle/Sun deal made its impact. The fact that InnoDB is no longer a 3rd party; the fact Oracle invests in InnoDB and no other engine (Falcon is down, no real development on MyISAM); the fact InnoDB is to be the default engine: all these put companies at ease with migration.

Continue reading » “Personal observation: more migrations from MyISAM to InnoDB”

A MyISAM backup is blocking as read-only, including mysqldump backup

Actually this is (almost) all I wanted to say. This is intentionally posted with all related keywords in title, in the hope that a related search on Google will result with this post on first page.

I’m just still encountering companies who use MyISAM as their storage engine and are unaware that their nightly backup actually blocks their application, basically rendering their product unavailable for long minutes to hours on a nightly basis.

So this is posted as a warning for those who were not aware of this fact.

There is no hot (non blocking) backup for MyISAM. Closest would be file system snapshot, but even this requires flushing of tables, which may take a while to complete. If you must have a hot backup, then either use replication – and take the risk of the slave not being in complete sync with the master – or use another storage engine, i.e. InnoDB.

The depth of an index: primer

InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).

In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.

Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.

What affects the depth of an index?

There are quite a few structural issues, but it boils down to two important factors:

  1. The number of rows in the table: obviously, more rows leads to larger index, larger indexes grow in depth.
  2. The size of the indexed column(s). An index on an INT column can be expected to be shallower than an index on a CHAR(32) column (on a very small number of rows they may have the same depth, so we’ll assume a large number of rows).

Continue reading » “The depth of an index: primer”

LOCK TABLES in MyISAM is NOT a poor man’s tranactions substitute

I get to hear that a lot: that LOCK TABLES with MyISAM is some sort of replacement for transactions; some model we can work with which gives us ‘transactional flavor’.

It isn’t, and here’s why.

When we speak of a transactional database/engine, we check out its ACID compliance. Let’s break out the ACID and see what LOCK TABLES provides us with:

  • A: Atomicity. MyISAM does not provide atomicity.  If we have LOCK TABLES followed by two statements, then closed by UNLOCK TABLES, then it follows that a crash between the two statements will have the first one applied, the second one not not applied. No mechanism ensures an “all or nothing” behavior.
  • C: Consistency. An error in a statement would roll back the entire transaction in a transactional database. This won’t work on MyISAM: every statement is “committed” immediately.
  • I: Isolation. Without LCOK TABLES, working with MyISAM resembles using the read uncommitted, or dirty read isolation level. With LOCK TABLES – it depends. If you were to use LOCK TABLES … WRITE on all tables in all statements, you would get the serializable isolation level. Actually it would be more than serializable. It would be truely serial.
  • D: Durability. Did the INSERT succeed? And did the power went down just after? MyISAM provides not guarantees that the data will be there.

Continue reading » “LOCK TABLES in MyISAM is NOT a poor man’s tranactions substitute”

MySQL User Group Meetings in Israel

This is a short note that the MySQL User Group Meetings in Israel are established (well, re-established after a very long period).

Thanks to Eddy Resnick from Sun Microsystems Israel who has set up the meetings. So far, we’ve had 2 successful meetings, and we intend to have more! First one was in Sun’s offices in Herzlia; second one, held last week, was at Interbit (a MySQL training center) in Ramat Gan. We hope to hold these meetings on a monthly basis, and the next ones are expected to be held at Interbit.

A new (blessed) law in Israel forbids us from sending invitations for these meetings via email without prior consent of the recepient (this law has passed as means of stopping spam). We do realize there are many users out there who would be interested in these meeting. For those users: please stay tuned to Interbit’s website, where future meetings will be published – or just give them a call!

It was my honor to present a short session, one of three in this last meeting. Other presenters were Erad Deutch, who presented “MySQL Success Stories”, and Moshe Kaplan, who presented “Sharding Solutions”. I have presented “MyISAM & InnoDB Tuning Fundamentals”, where I have layed down the basics behind parameter tuning for these storage engines.

As per audience request, here’s the presentation in PDF format:

I intend to give sessions in future meetings, and have already started working on my next one. So please come, it’s a fun way to pass a nice afternoon. See you there!