Backup – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Mon, 13 Oct 2014 12:59:24 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Announcing orchestrator-agent https://shlomi-noach.github.io/blog/mysql/announcing-orchestrator-agent https://shlomi-noach.github.io/blog/mysql/announcing-orchestrator-agent#comments Mon, 13 Oct 2014 12:59:24 +0000 https://shlomi-noach.github.io/blog/?p=7019 orchestrator-agent is a side-kick, complementary project of orchestrator, implementing a daemon service on one’s MySQL hosts which communicates with and accepts commands from orchestrator, built with the original purpose of providing an automated solution for provisioning new or corrupted slaves.

It was built by Outbrain, with Outbrain’s specific use case in mind. While we release it as open source, only a small part of its functionality will appeal to the public (this is why it’s not strictly part of the orchestrator project, which is a general purpose, wide-audience solution). Nevertheless, it is a simple implementation of a daemon, such that can be easily extended by the community. The project is open for pull-requests!

A quick breakdown of orchestrator-agent is as follows:

  • Executes as a daemon on linux hosts
  • Interacts and invokes OS commands (via bash)
  • Does not directly interact with a MySQL server running on that host (does not connect via mysql credentials)
  • Expects a single MySQL service on host
  • Can control the MySQL service (e.g. stop, start)
  • Is familiar with LVM layer on host
  • Can take LVM snapshots, mount snapshots, remove snapshots
  • Is familiar with the MySQL data directory, disk usage, file system
  • Can send snapshot data from a mounted snapshot on a running MySQL host
  • Can prepare data directory and receive snapshot data from another host
  • Recognizes local/remote datacenters
  • Controlled by orchestrator, two orchestrator-agents implement an automated and audited solution for seeding a new/corrupted MySQL host based on a running server.

Offline, hard-copy backups aside, Outbrain implements quick backups via LVM snapshots. Some slaves (depending on strength, datacenter etc.) take the role of snapshot servers. These slaves take an LVM snapshot once per day, and keep it open unmounted. At any given time we might have multiple open snapshots on a snapshot server. At a few minutes notice, we are able to restore MySQL to yesterday’s, the day before, two days before, … state. This only takes mounting of the desired snapshot and starting MySQL over the mounted volume.

We’ve in fact put this method to practice in at least one major occasion last year, that proved it to be of significant worth: we were able to fix a compromise to some dataset within minutes of action.

But we also use these snapshots on a daily basis, putting them constantly to practice (hence validating this backup method routinely): we provision new MySQL servers based on those snapshots. A new or corrupted server can be seeded with a snapshot data (preferably from a host in same datacenter). We don’t need to perform a backup now. We can take the one from tonight.

There is some fuss around this: need to verify MySQL not running on target host, clean up the target host, choose the best source host, choose the best snapshot on source host, mount it, deliver the data, accept the data, do post-copy cleanup, start the MySQL service on target host, … Or we might choose to actually do take a fresh, immediate snapshot from a server not configured as snapshot server… We used to do all these by shell scripts, but realized this will not sustain us.

Orchestrator-agent was invented with this particular use case in mind, and orchestrator was extended to support these operations. Orchestrator can act as the controller of a seed/provisioning operation, getting info from/to orchestrator-agent and coordinating two agents to send/receive the data.

The agent still relies on some external commands (via bash) to implement some functionality; to some extent, we have broken down complex shell scripts then wrapped again, now controlled and audited by the agent; but this will slowly change as we move more code into orchestrator-agent itself.

Should you use orchestrator-agent?

If, like us, you use LVM snapshots, at least as a partial form of backup – bingo!

Otherwise, orchestrator-agent can assist in stopping/starting the MySQL service, tailing the error log; simple stuff like that — probably too much fuss for such basic operations.

But it can be easily extended to support other forms of backup (though do note that orchestrator-agent does not and should not perform scheduled backups, merely manipulate them); so if you like to contribute code into orchestrator-agent, please let us know! It would be nice to have a solution that works for multiple use cases.

Get it

orchestrator-agent is released under the Apache 2.0 license, and is free to use.

Pre-compiled, RPM and deb packages can be found on the releases page.

 

]]>
https://shlomi-noach.github.io/blog/mysql/announcing-orchestrator-agent/feed 1 7019
A new MySQL backups temperature scale, with showers https://shlomi-noach.github.io/blog/mysql/a-new-mysql-backups-temperature-scale-with-showers https://shlomi-noach.github.io/blog/mysql/a-new-mysql-backups-temperature-scale-with-showers#comments Tue, 26 Jun 2012 06:08:15 +0000 https://shlomi-noach.github.io/blog/?p=4602 We’re used to Cold, Warm and Hot backups. This scale of three temperatures does not quite reflect the impact of backups on your MySQL database.

In this post I offer a new backup temperature scale, and (somewhat seriously) compare it with showers. Call it the backup shower scale.

A database backup is like a shower: the colder it is, the less time you want to spend doing it.

Cold

A cold backup requires taking your database down (i.e. stop the service).

Example: file system copy

This can work well for replicating slaves, which may not be required for normal operation. You take the slave down, turn off the service, make your backup, turn everything on again, and let the backup catch up. Just make sure its master has all the necessary binary logs.

A friend was staying at my place and was lecturing me on the benefits of cold showers; how it was good for your health. A couple hours later, preparing for bed, I hear him exclaiming from the bathroom: “Whaaaaa! There’s no hot water!”

Warm

A backup which requires locking down your database as read-only.

Examples: MyDumper, mysqlhotcopy (MyISAM only), mysqldump (non-transactional)

Again, this may do well on slaves. As for working out a warm backup on the master: I don’t know. Is a mildly-warm shower really any better than a cold one?

Not quite hot

A backup which is supposedly non-interruptive to normal operation. You are allowed to keep writes to the database. But, things are not quite as they seem.

Example: mysqldump –single-transaction

mysqldump can make a backup in an open transaction. With InnoDB this means a consistent snapshot of the data. But open transactions lead to locks. These accumulate. MVCC makes for changes unable to fall back into the baseline, waiting for the backup’s transaction to complete. Eventually, there are so many locks that your database is as good as dead. mysqldump doesn’t work well for very large databases under heavy load.

[UPDATE: As per Domas’ clarification, MyDumper falls under this category, as well]

In terms of shower, you have reasonably hot water for some time, but they eventually run cold.

Hot

A hot backup is such that does not impose any constraints on the database itself. But it may impose load on other components, such as the disks, memory, CPU. This leads to an implicit impact on MySQL’s performance.

An LVM snapshot works this way. With LVM snapshots, writes to the database cause for blocks copy-on-write. The disk is busier. MySQL has to compete for disk use. An LVM backup works just well, but has a noticeable impact on the database’s ability to keep up as before.

You may have enough hot water for your shower, but there’s not enough pressure. The minute the dishwasher starts working, your shower turns to a sad dripping.

Searing hot

A searing hot backup is one that does not interfere with database operation, explicitly or implicitly, to a reasonable extent.

Xtrabackup with throttling is such a backup. The impact can be made low such that it is unnoticeable. You are willing to have the backup take longer time to complete.

]]>
https://shlomi-noach.github.io/blog/mysql/a-new-mysql-backups-temperature-scale-with-showers/feed 5 4602
Upgrading to Barracuda & getting rid of huge ibdata1 file https://shlomi-noach.github.io/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file https://shlomi-noach.github.io/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file#comments Tue, 15 Feb 2011 08:01:15 +0000 https://shlomi-noach.github.io/blog/?p=3304 Some of this is old stuff, but more people are now converting to InnoDB plugin, so as to enjoy table compression, performance boosts. Same holds for people converting to Percona’s XtraDB. InnoDB plugin requires innodb_file_per_table. No more shared tablespace file.

So your ibdata1 file is some 150GB, and it won’t reduce. Really, it won’t reduce. You set innodb_file_per_table=1, do ALTER TABLE t ENGINE=InnoDB (optionally ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8), and you get all your tables in file-per-table .ibd files.

But the original ibdata1 file is still there. It has to be there, don’t delete it! It contains more than your old data.

InnoDB tablespace files never reduce in size, it’s an old-time annoyance. The only way to go round it, if you need the space, is to completely drop them and start afresh. That’s one of the things so nice about file-per-table: an ALTER TABLE actually creates a new tablespace file and drops the original one.

The procedure

The procedure is somewhat painful:

  • Dump everything logically (either use mysqldump, mk-parallel-dump, or do it your own way)
  • Erase your data (literally, delete everything under your datadir)
  • Generate a new empty database
  • Load your dumped data.

Using replication

Replication makes this less painful. Set up a slave, have it follow up on the master.

  • Stop your slave.
  • Make sure to backup the replication position (e.g. write SHOW SLAVE STATUS on a safe location, or copy master.info file).
  • Work out the dump-erase-generate-load steps on the slave.
  • Reattach the slave to the master using saved data.

For this to succeed you must keep enough binary logs on the master for the entire dump-load period, which could be lengthy.

Upgrading to barracuda

If you wish to upgrade your InnoDB tables to Barracuda format, my advice is this:

  1. Follow the steps above to generate a file-per-table working slave
  2. Stop the slave
  3. Configure skip_slave_start
  4. Restart MySQL
  5. One by one do the ALTER TABLE into Barracuda format (ROW_FORMAT=COMPACT or ROW_FORMAT=COMPRESSED)

Note that if you’re about to do table compression, the ALTER statements become considerably slower the better the compression is.

If your dataset is very large, and you can’t keep so many binary logs, you may wish to break step 5 above into:

  • ALTER a large table
  • Restart MySQL
  • Start slave, wait for it to catch up
  • Restart MySQL again

and do the same for all large tables.

Why all these restarts?

I’ve been upgrading to Barracuda for a long time now. I have clearly noticed that ALTER into a COMPRESSED format works considerably slower after the slave has done some “real work”. This in particular relates to the last “renaming table” stage. There was a bug with earlier InnoDB plugin versions which made this stage hang. It was solved. But it still takes some time for this last, weird stage, where the new replacement table is complete, and it’s actually been renamed in place of the old table, and the old table renamed into something like “#sql-12345.ibd”, and all that needs to be done is have it dropped, and… Well, it takes time.

My observation is it works faster on a freshly started server. Which is why I take the bother to restart MySQL before each large table conversion.

]]>
https://shlomi-noach.github.io/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file/feed 16 3304
An argument for using mysqldump https://shlomi-noach.github.io/blog/mysql/an-argument-for-using-mysqldump https://shlomi-noach.github.io/blog/mysql/an-argument-for-using-mysqldump#comments Tue, 09 Nov 2010 04:29:54 +0000 https://shlomi-noach.github.io/blog/?p=3080 I fully agree with Morgan’s An argument for not using mysqldump. This post does not come to contradict it, but rather shed a positive light on mysqldump.

I usually prefer an LVM snapshot based backup, or using XtraBackup. And, with databases as large as dozens of GB and above, I consider mysqldump to be a poor alternative. Poor in runtime, poor in overhead while taking the backup.

However once in a while I get to be reminded that mysqldump just works.

As a recent example, I had a server which was killed after an ALTER TABLE statement hanged forever (table already ALTERed, but old scheme never dropped). The old table data still hanged around the file system, but was not recognized by InnoDB. Trying out DISCARD TABLESPACE did not do the job, and eventually file was dropped.

So far, reasonable. InnoDB would complain about some table it never recognized in the first place, but all would work. That is, until backup was concerned. With innobackup or XtraBackup the restore would fail on some internal problem. LVM would work, but would only copy+paste the problem: innobackup would never again be able to be used on this database.

It turned out a 120GB InnoDB compressed data (roughly 250GB uncompressed) would dump in –single-transaction in a matter of 4 hours and would restore in a matter of some 20 hours. A whole lot more than the 3 hours total it would take for an LVM backup for that database. But the data would load well; no missing tablespaces.

I’ve had similar incidents in the past. Not to mention the issue of compressing shared tablespace file.

There’s something about being able to say “I’m not sure how long this is going to take; maybe a day or two. But in the end, we will have problems P1, P2 & P3 resolved“.

I like the clean state you get from a mysqldump restore.

]]>
https://shlomi-noach.github.io/blog/mysql/an-argument-for-using-mysqldump/feed 9 3080
mylvmbackup HOWTO: minimal privileges & filesystem copy https://shlomi-noach.github.io/blog/mysql/mylvmbackup-howto-minimal-privileges-filesystem-copy https://shlomi-noach.github.io/blog/mysql/mylvmbackup-howto-minimal-privileges-filesystem-copy#comments Tue, 17 Aug 2010 17:42:40 +0000 https://shlomi-noach.github.io/blog/?p=2839 This HOWTO discusses two (unrelated) issues with mylvmbackup:

  • The minimal privileges required to take MySQL backups with mylvmbackup.
  • Making (non compressed) file system copy of one’s data files.

Minimal privileges

Some just give mylvmbackup the root account, which is far too permissive. We now consider what the minimal requirements of mylvmbackup are.

The queries mylvmbackup issues are:

  • FLUSH TABLES
  • FLUSH TABLES WITH READ LOCK
  • SHOW MASTER STATUS
  • SHOW SLAVE STATUS
  • UNLOCK TABLES

Both SHOW MASTER STATUS & SHOW SLAVE STATUS require either the SUPER or REPLICATION CLIENT privilege. Since SUPER is more powerful, we choose REPLICATION CLIENT.

The FLUSH TABLES * and UNLOCK TABLES require the RELOAD privilege.

However, we are not done yet. mylvmbackup connects to the mysql database, which means we must also have some privilege there, too. We choose the SELECT privilege.

Finally, here are the commands to create a mylvmbackup user with minimal privileges:

CREATE USER 'mylvmbackup'@'localhost' IDENTIFIED BY '12345';
GRANT RELOAD, REPLICATION CLIENT ON *.* TO 'mylvmbackup'@'localhost';
GRANT SELECT ON mysql.* TO 'mylvmbackup'@'localhost';

In the mylvmbackup.conf file, the correlating rows are:

[mysql]
user=mylvmbackup
password=12345
host=localhost

Filesystem copy

By default, mylvmbackup creates a .tar.gz compressed backup file of your data. This is good if the reason you’re running mylvmbackup is to, well, make a backup. However, as with all backups, one may be making the backup so as to create a replication server. But in this case you don’t really want compressed data: you want the data extracted on the replication server, just as it is on the original host.

mylvmbackup supports backing up the files using rsync.

To copy MySQL data to a remote host, configure the following in the mylvmbackup.conf file:

[fs]
backupdir=shlomi@backuphost:/data/backup/mysql
[misc]
backuptype=rsync

You may be prompted to enter password, unless you have the user’s public key stored on the remote host.

Normally, rsync is considered as remote-sync, but it also works on local file systems. If you have a remote directory mounted on your file system (e.g. with nfs), you can use the fact that rsync works just as well with local file systems:

[fs]
backupdir=/mnt/backup/mysql
[misc]
backuptype=rsync

Voila! Your backup is complete.

]]>
https://shlomi-noach.github.io/blog/mysql/mylvmbackup-howto-minimal-privileges-filesystem-copy/feed 7 2839
Tips for taking MySQL backups using LVM https://shlomi-noach.github.io/blog/mysql/tips-for-taking-mysql-backups-using-lvm https://shlomi-noach.github.io/blog/mysql/tips-for-taking-mysql-backups-using-lvm#comments Tue, 03 Aug 2010 06:45:29 +0000 https://shlomi-noach.github.io/blog/?p=2717 LVM uses copy-on-write to implement snapshots. Whenever you’re writing data to some page, LVM copies the original page (the way it looked like when the snapshot was taken) to the snapshot volume. The snapshot volume must be large enough to accommodate all pages written to for the duration of the snapshot’s lifetime. In other words, you must be able to copy the data somewhere outside (tape, NFS, rsync, etc.) in less time than it would take for the snapshot to fill up.

While LVM allows for hot backups of MySQL, it still poses an impact on the disks. An LVM snapshot backup may not go unnoticed by the MySQL users.

Some general guidelines for making life easier with LVM backups follow.

Lighter, longer snapshots

If you’re confident that you have enough space on your snapshot volume, you may take the opportunity to make for a longer backup time. Why? Because you would then be able to reduce the stress from the file system. Use ionice when copying your data from the snapshot volume:

ionice -c 2 cp -R /mnt/mysql_snapshot /mnt/backup/daily/20100719/

[Update: this is only on the cfq I/O scheduler; thanks, Vojtech]

Are you running out of space?

Monitor snapshot’s allocated size: if there’s just one snapshot, do it like this:

lvdisplay | grep Allocated                                                                                                                  Mon Jul 19 09:51:29 2010

 Allocated to snapshot  3.63%

Don’t let it reach 100%.

Avoid running out of space

To make sure you don’t run out of snapshot allocated size, stop all administrative scripts.

  • Are you running your weekly purging of old data? You will be writing a lot of pages, and all will have to fit in the snapshot.
  • Building your reports? You may be creating large temporary tables; make sure these are not on the snapshot volume.
  • Rebuilding your Sphinx fulltext index? Make sure it is not on the snapshot volume, or postpone till after backup.

You will gain not only snapshot space, but also faster backups.

Someone did the job before you

Use mylvmbackup: the MySQL LVM backup script by Lenz Grimmer. Or do it manually: follow this old-yet-relevant post by Peter Zaitsev.

]]>
https://shlomi-noach.github.io/blog/mysql/tips-for-taking-mysql-backups-using-lvm/feed 5 2717
A MyISAM backup is blocking as read-only, including mysqldump backup https://shlomi-noach.github.io/blog/mysql/a-myisam-backup-is-blocking-as-read-only-including-mysqldump-backup https://shlomi-noach.github.io/blog/mysql/a-myisam-backup-is-blocking-as-read-only-including-mysqldump-backup#comments Tue, 18 May 2010 17:29:05 +0000 https://shlomi-noach.github.io/blog/?p=2441 Actually this is (almost) all I wanted to say. This is intentionally posted with all related keywords in title, in the hope that a related search on Google will result with this post on first page.

I’m just still encountering companies who use MyISAM as their storage engine and are unaware that their nightly backup actually blocks their application, basically rendering their product unavailable for long minutes to hours on a nightly basis.

So this is posted as a warning for those who were not aware of this fact.

There is no hot (non blocking) backup for MyISAM. Closest would be file system snapshot, but even this requires flushing of tables, which may take a while to complete. If you must have a hot backup, then either use replication – and take the risk of the slave not being in complete sync with the master – or use another storage engine, i.e. InnoDB.

]]>
https://shlomi-noach.github.io/blog/mysql/a-myisam-backup-is-blocking-as-read-only-including-mysqldump-backup/feed 7 2441
Defined your MySQL backup & recovery plan recently? https://shlomi-noach.github.io/blog/mysql/defined-your-mysql-backup-recovery-plan-recently https://shlomi-noach.github.io/blog/mysql/defined-your-mysql-backup-recovery-plan-recently#respond Wed, 17 Feb 2010 09:52:00 +0000 https://shlomi-noach.github.io/blog/?p=1976 Following up on Ronald Bradford‘s Checked your MySQL recovery process recently? post, I wish to add a prequel.

To see whether you have a clear definition of your backup requirements, ask yourself these questions:

  • Is there a backup/restore plan?
  • Is there a written backup/restore plan?
  • How fast do you need to recover a backup? What’s the longest downtime you will allow from the point of failure to the point of restored, functional database?
  • How much data are you willing to lose in case of crash? A second’s worth of data? An hour’s worth? A day’s worth? None?
  • Are you willing to allow that the database becomes read-only when taking the backup? Or completely down?
  • Are you willing to take the risk that the backup will not be 100% compatible with the data? (Backing up your slave holds this risk)
  • Is your manager willing to all that you are willing?
  • Is the backup plan known to the management team, or do they just know that “the database has backups“?

The above checklist is by no means complete.

I have a vivid memory of a very good sys admin who failed on the last two points. He had some very sour days when recovering a lost file from tape took much longer than was affordable to some contract.

I found that technical people rarely share the same views as marketing/management. Make sure to consult with the management team; they will have a clearer view on what the company can afford and what it cannot afford.

]]>
https://shlomi-noach.github.io/blog/mysql/defined-your-mysql-backup-recovery-plan-recently/feed 0 1976
On restoring a single table from mysqldump https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump#comments Tue, 01 Dec 2009 08:25:00 +0000 https://shlomi-noach.github.io/blog/?p=1630 Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject.

I also wish to note performance issues with the two suggested solutions, and offer improvements.

Problem relevance

While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small – and it doesn’t matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big – and it would not be worthwhile to restore from mysqldump anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.

Problem recap

Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?

Let’s review the two referenced solutions. I’ll be using the employees db on mysql-sandbox for testing. I’ll choose a very small table to restore: departments (only a few rows in this table).

Security based solution

Chris offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I’m afraid, not too efficient, for two reasons:

  1. MySQL needs to process all irrelevant queries (ALTER, INSERT, …) only to disallow them due to access violation errors.
  2. Assuming restore is from remote host, we overload the network with all said irrelevant queries.

Just how inefficient? Let’s time it:

mysql> grant usage on *.* to 'restoreuser'@'localhost';
mysql> grant select on *.* to 'restoreuser'@'localhost';
mysql> grant all on employees.departments to 'restoreuser'@'localhost';

$ time mysql --user=restoreuser --socket=/tmp/mysql_sandbox21701.sock --force employees < /tmp/employees.sql
...
ERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'
ERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'
...
(lot's of these messages)
...

real    0m31.945s
user    0m6.328s
sys     0m0.508s

So, at about 30 seconds to restore a 9 rows table.

Text filtering based solution.

gtowey offers parsing the dump file beforehand:

  • First, parse with grep, to detect rows where tables are referenced within dump file
  • Second, parse with sed, extracting relevant rows.

Let’s time this one:

$ time grep -n 'Table structure' /tmp/employees.sql
23:-- Table structure for table `departments`
48:-- Table structure for table `dept_emp`
89:-- Table structure for table `dept_manager`
117:-- Table structure for table `employees`
161:-- Table structure for table `salaries`
301:-- Table structure for table `titles`

real    0m0.397s
user    0m0.232s
sys     0m0.164s

$ time sed -n 23,48p /tmp/employees.sql | ./use employees

real    0m0.562s
user    0m0.380s
sys     0m0.176s

Much faster: about 1 second, compared to 30 seconds from above.

Nevertheless, I find two issues here:

  1. A correctness problem: this solution somewhat assumes that there’s only a single table with desired name. I say “somewhat” since it leaves this for the user.
  2. An efficiency problem: it reads the dump file twice. First parsing it with grep, then with sed.

A third solution

sed is much stronger than presented. In fact, the inquiry made by grep in gtowey’s solution can be easily handled by sed:

$ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees

real    0m0.573s
user    0m0.416s
sys     0m0.152s

So, the “/^– Table structure for table \`departments\`/,/^– Table structure for table/p” part tells sed to only print those rows starting from the departments table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).

And, we only do it in 0.57 seconds: about half the time of previous attempt.

Now, just to be more correct, we only wish to consider the employees.department table. So, assuming there’s more than one database dumped (and, by consequence, USE statements in the dump-file), we use:

cat /tmp/employees.sql | sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" | ./use employees

Further notes

  • All tests used warmed-up caches.
  • The sharp eyed readers would notice that departments is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I’ve created an xdepartments table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we’re still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.

[UPDATE: see also following similar post: Extract a Single Table from a mysqldump File]

Conclusion

classic-shell-scriptingIts is always best to test on large datasets, to get a feel on performance.

It’s best to save MySQL the trouble of parsing & ignoring statements. Scripting utilities like sed, awk & grep have been around for ages, and are well optimized. They excel at text processing.

I’ve used sed many times in transforming dump outputs; for example, in converting MyISAM to InnoDB tables; to convert Antelope InnoDB tables to Barracuda format, etc. grep & awk are also very useful.

May I recommend, at this point, reading Classic Shell Scripting, a very easy to follow book, which lists the most popular command line utilities like grep, sed, awk, sort, (countless more) and shell scripting in general. While most of these utilities are well known, the book excels in providing suprisingly practical, simple solution to common tasks.

]]>
https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump/feed 14 1630
Parameters to use on mysqldump https://shlomi-noach.github.io/blog/mysql/parameters-to-use-on-mysqldump https://shlomi-noach.github.io/blog/mysql/parameters-to-use-on-mysqldump#comments Mon, 13 Oct 2008 07:03:50 +0000 https://shlomi-noach.github.io/blog/?p=4 mysqldump is commonly used for making a MySQL database backup or for setting up a replication.

As in all mysql binaries, there are quite a few parameters to mysqldump. Some are just niceties but some flags are a must. Of course, choosing the parameters to use greatly depends on your requirements, database setup, network capacity etc.

Here is my usual setup for mysqldump. The parameters below apply for an InnoDB based schema (no MyISAM, Memory tables). Parameters can be specified on the command line, or under the [mysqld] scope in the MySQL configuration file.

mysqldump -u dump_user -p -h db_host --routines --master-data --single-transaction  --skip-add-locks --skip-lock-tables --default-character-set=utf8 --compress my_db

Let’s review these parameters and see their effect:

  • -u or --user: This is the user which initiates the dump. Depending on other parameters, the user may need to have quite a few privileges, such as SELECT, RELOAD, FILE, REPLICATION CLIENT etc. Since I do not usually allow for remote root access into mysql, I create a temporary user solely for the purpose of the dump (many times it’s a one-time action), for the specific machine from which the dump is run, and provide this user with all necessary permissions.
  • -h or --host: I try not to dump from the same machine on which MySQL is running. If I do, I prefer to dump into a different disk from that on which the data and log files reside. The dump itself may create a heavy load on the machine (setting locks, performing lots of non cached IO operations). Since the target of the dump is mostly to create a backup on another machine, or set up replication on another machine, the dump has better not run from the MySQL machine.
  • --routines: It is really an annoyance to have to remember this flag. In contrast to –triggers, which is by default TRUE, the --routines parameter is by default FALSE, which means if you forget it – you don’t get the stored functions and procedures in your schema.
  • --master-data: I always enable binary logs on the MySQL nodes I work on. While binary logs may lead to more IO operations (writing binary logs make for more disk writes, obviously, but also disable some InnoDB optimizations), may consume more disk space (once I’ve worked with a company which had such a burst of traffic, that the binary logs to completely filled their disk in less than one day). If binary logs are enabled, the --master-data parameter allows for easy replication setup: the dump includes the CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=... statement, so no need to do stuff like SHOW MASTER STATUS on the dumped node. Optionally, you can set --master-data=2 to have the statement commented.
  • --single-transaction --skip-add-locks --skip-lock-tables: When working with transactional-only storage engines (InnoDB is the most popular choice, but new engines are coming: Falcon, PBXT, Transactional-Maria, SolidDB and more), these parameters allow for a non-interruptive backup, which does not place read locks on all tables. It is possible to keep on reading and writing to the database while mysqldump is running with single transaction. Running in this mode does have its penalty: more IO operations (due to MVCC’s duplication of data while many transactions access the same data for Read/Write). The server is likely to perform more slowly during the dump time.
  • --default-character-set=utf8: I’ve seen so many MySQL installations in which world-wide textual data was stored in the Latin1 charset than I can remember. Many developers, who are testing using standard English data, are not even aware of the issues arrising from changing the data later on to utf8. But even those who are, are usually unaware of the necessity to configure the character set on a per connection basis, or for their specific clients (JDBC or PHP connectors, etc). mysqldump is no different, and if you have non-latin text in your tables, always remember to set this option.
  • --compress: when dumping to another machine, especially a remote one, using this option to GZIP the data between the MySQL server and the mysqldump client. This will make for more CPU operations, but CPU is usually cheap nowdays, and the compression may well save you hours of network transfer time.
]]>
https://shlomi-noach.github.io/blog/mysql/parameters-to-use-on-mysqldump/feed 10 4