mysqldump – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Wed, 06 Jul 2011 06:41:00 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Useful sed / awk liners for MySQL https://shlomi-noach.github.io/blog/mysql/useful-sed-awk-liners-for-mysql https://shlomi-noach.github.io/blog/mysql/useful-sed-awk-liners-for-mysql#comments Wed, 06 Jul 2011 06:41:00 +0000 https://shlomi-noach.github.io/blog/?p=3685 Listing some useful sed / awk liners to use with MySQL. I use these on occasion.

sed, awk & grep have many overlapping features. Some simple tasks can be performed by either. For example, stripping empty lines can be performed by either:

grep '.'
awk '/./'
sed '/./!d'
grep -v '^$'
awk '!/^$/'
sed '/^$/d'

It’s a matter of taste & convention which tool and variation to use. So for any script I suggest, there may be many variations, possibly cleaner, shorter; feel free to comment.

mysqldump

The output of mysqldump is in particular useful when one wishes to make transformation on data or metadata.

  • Convert MyISAM tables to InnoDB:
mysqldump | sed -e 's/^) ENGINE=MyISAM/) ENGINE=InnoDB/'

I’ve had several occasion when people said this type of conversion assumes no ‘ENGINE=MyISAM’ snippet exists within row data. This is not so. The ‘^) ENGINE=MyISAM/’ pattern strictly requires that this text is outside row data. No row data begins with a ‘)’. This is a safe conversion.

  • Convert InnoDB to InnoDB plugin, compressed tables:
mysqldump | sed -e 's/^) ENGINE=InnoDB/) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8/'
  • Slice out a specific database (assumes existence of the USE statement):
sed -n "/^USE \`employees\`/,/^USE \`/p"
  • Slice out a specific table:
sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p"
  • Combine the above two statements to slice a specific table from a specific database:
sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p"

See also On restoring a single table from mysqldump.

my.cnf

Some my.cnf files are just a mess to read. Here’s some normalizing scripts:

  • Strip a my.cnf file from comments, remove blank lines, normalize spaces:
cat my.sandbox.cnf | sed '/^#/d' | sed '/^$/d' | sed -e 's/[ \t]\+//g'
  • Same, but only present [mysqld] section parameters:
cat my.sandbox.cnf | sed -n '/^\[mysqld\]/,/^\[/p' | sed '/^\[/d' | sed '/^#/d' | sed '/^$/d' | sed -e 's/[ \t]\+//g'
  • Only present [mysqld] section parameters, tab delimited (this is useful in exporting and comparing instance parameters):
cat my.sandbox.cnf | sed -n '/^\[mysqld\]/,/^\[/p' | sed '/^\[/d' | sed '/^#/d' | sed '/^$/d' | sed -e 's/[ \t]\+//g' | sed -e 's/=/\t/'
  • Multi-word parameters in my.cnf can be written with either hyphens or underscores. innodb_file_per_table is the same as innodb-file-per-table, as well as innodb_file-per_table. The following normalizes the parameter names to using underscores only, keeping from changing values (e.g. ‘mysql-bin’ parameter value should not change). It isn’t pretty!
cat my.sandbox.cnf | awk -F "=" 'NF < 2 {print} sub("=", "=~placeholder~=") {print}' | awk -F "=~placeholder~=" 'NF < 2 {gsub("-", "_", $0); print} NF==2 {gsub("-", "_", $1); print $1 "=" $2}'
grep “.”
awk ‘/./’
sed ‘/./!d’
grep -v ‘^$’
awk ‘!/^$/’
sed ‘/^$/d’
]]>
https://shlomi-noach.github.io/blog/mysql/useful-sed-awk-liners-for-mysql/feed 10 3685
Upgrading to Barracuda & getting rid of huge ibdata1 file https://shlomi-noach.github.io/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file https://shlomi-noach.github.io/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file#comments Tue, 15 Feb 2011 08:01:15 +0000 https://shlomi-noach.github.io/blog/?p=3304 Some of this is old stuff, but more people are now converting to InnoDB plugin, so as to enjoy table compression, performance boosts. Same holds for people converting to Percona’s XtraDB. InnoDB plugin requires innodb_file_per_table. No more shared tablespace file.

So your ibdata1 file is some 150GB, and it won’t reduce. Really, it won’t reduce. You set innodb_file_per_table=1, do ALTER TABLE t ENGINE=InnoDB (optionally ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8), and you get all your tables in file-per-table .ibd files.

But the original ibdata1 file is still there. It has to be there, don’t delete it! It contains more than your old data.

InnoDB tablespace files never reduce in size, it’s an old-time annoyance. The only way to go round it, if you need the space, is to completely drop them and start afresh. That’s one of the things so nice about file-per-table: an ALTER TABLE actually creates a new tablespace file and drops the original one.

The procedure

The procedure is somewhat painful:

  • Dump everything logically (either use mysqldump, mk-parallel-dump, or do it your own way)
  • Erase your data (literally, delete everything under your datadir)
  • Generate a new empty database
  • Load your dumped data.

Using replication

Replication makes this less painful. Set up a slave, have it follow up on the master.

  • Stop your slave.
  • Make sure to backup the replication position (e.g. write SHOW SLAVE STATUS on a safe location, or copy master.info file).
  • Work out the dump-erase-generate-load steps on the slave.
  • Reattach the slave to the master using saved data.

For this to succeed you must keep enough binary logs on the master for the entire dump-load period, which could be lengthy.

Upgrading to barracuda

If you wish to upgrade your InnoDB tables to Barracuda format, my advice is this:

  1. Follow the steps above to generate a file-per-table working slave
  2. Stop the slave
  3. Configure skip_slave_start
  4. Restart MySQL
  5. One by one do the ALTER TABLE into Barracuda format (ROW_FORMAT=COMPACT or ROW_FORMAT=COMPRESSED)

Note that if you’re about to do table compression, the ALTER statements become considerably slower the better the compression is.

If your dataset is very large, and you can’t keep so many binary logs, you may wish to break step 5 above into:

  • ALTER a large table
  • Restart MySQL
  • Start slave, wait for it to catch up
  • Restart MySQL again

and do the same for all large tables.

Why all these restarts?

I’ve been upgrading to Barracuda for a long time now. I have clearly noticed that ALTER into a COMPRESSED format works considerably slower after the slave has done some “real work”. This in particular relates to the last “renaming table” stage. There was a bug with earlier InnoDB plugin versions which made this stage hang. It was solved. But it still takes some time for this last, weird stage, where the new replacement table is complete, and it’s actually been renamed in place of the old table, and the old table renamed into something like “#sql-12345.ibd”, and all that needs to be done is have it dropped, and… Well, it takes time.

My observation is it works faster on a freshly started server. Which is why I take the bother to restart MySQL before each large table conversion.

]]>
https://shlomi-noach.github.io/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file/feed 16 3304
An argument for using mysqldump https://shlomi-noach.github.io/blog/mysql/an-argument-for-using-mysqldump https://shlomi-noach.github.io/blog/mysql/an-argument-for-using-mysqldump#comments Tue, 09 Nov 2010 04:29:54 +0000 https://shlomi-noach.github.io/blog/?p=3080 I fully agree with Morgan’s An argument for not using mysqldump. This post does not come to contradict it, but rather shed a positive light on mysqldump.

I usually prefer an LVM snapshot based backup, or using XtraBackup. And, with databases as large as dozens of GB and above, I consider mysqldump to be a poor alternative. Poor in runtime, poor in overhead while taking the backup.

However once in a while I get to be reminded that mysqldump just works.

As a recent example, I had a server which was killed after an ALTER TABLE statement hanged forever (table already ALTERed, but old scheme never dropped). The old table data still hanged around the file system, but was not recognized by InnoDB. Trying out DISCARD TABLESPACE did not do the job, and eventually file was dropped.

So far, reasonable. InnoDB would complain about some table it never recognized in the first place, but all would work. That is, until backup was concerned. With innobackup or XtraBackup the restore would fail on some internal problem. LVM would work, but would only copy+paste the problem: innobackup would never again be able to be used on this database.

It turned out a 120GB InnoDB compressed data (roughly 250GB uncompressed) would dump in –single-transaction in a matter of 4 hours and would restore in a matter of some 20 hours. A whole lot more than the 3 hours total it would take for an LVM backup for that database. But the data would load well; no missing tablespaces.

I’ve had similar incidents in the past. Not to mention the issue of compressing shared tablespace file.

There’s something about being able to say “I’m not sure how long this is going to take; maybe a day or two. But in the end, we will have problems P1, P2 & P3 resolved“.

I like the clean state you get from a mysqldump restore.

]]>
https://shlomi-noach.github.io/blog/mysql/an-argument-for-using-mysqldump/feed 9 3080
On restoring a single table from mysqldump https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump#comments Tue, 01 Dec 2009 08:25:00 +0000 https://shlomi-noach.github.io/blog/?p=1630 Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject.

I also wish to note performance issues with the two suggested solutions, and offer improvements.

Problem relevance

While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small – and it doesn’t matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big – and it would not be worthwhile to restore from mysqldump anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.

Problem recap

Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?

Let’s review the two referenced solutions. I’ll be using the employees db on mysql-sandbox for testing. I’ll choose a very small table to restore: departments (only a few rows in this table).

Security based solution

Chris offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I’m afraid, not too efficient, for two reasons:

  1. MySQL needs to process all irrelevant queries (ALTER, INSERT, …) only to disallow them due to access violation errors.
  2. Assuming restore is from remote host, we overload the network with all said irrelevant queries.

Just how inefficient? Let’s time it:

mysql> grant usage on *.* to 'restoreuser'@'localhost';
mysql> grant select on *.* to 'restoreuser'@'localhost';
mysql> grant all on employees.departments to 'restoreuser'@'localhost';

$ time mysql --user=restoreuser --socket=/tmp/mysql_sandbox21701.sock --force employees < /tmp/employees.sql
...
ERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'
ERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'
...
(lot's of these messages)
...

real    0m31.945s
user    0m6.328s
sys     0m0.508s

So, at about 30 seconds to restore a 9 rows table.

Text filtering based solution.

gtowey offers parsing the dump file beforehand:

  • First, parse with grep, to detect rows where tables are referenced within dump file
  • Second, parse with sed, extracting relevant rows.

Let’s time this one:

$ time grep -n 'Table structure' /tmp/employees.sql
23:-- Table structure for table `departments`
48:-- Table structure for table `dept_emp`
89:-- Table structure for table `dept_manager`
117:-- Table structure for table `employees`
161:-- Table structure for table `salaries`
301:-- Table structure for table `titles`

real    0m0.397s
user    0m0.232s
sys     0m0.164s

$ time sed -n 23,48p /tmp/employees.sql | ./use employees

real    0m0.562s
user    0m0.380s
sys     0m0.176s

Much faster: about 1 second, compared to 30 seconds from above.

Nevertheless, I find two issues here:

  1. A correctness problem: this solution somewhat assumes that there’s only a single table with desired name. I say “somewhat” since it leaves this for the user.
  2. An efficiency problem: it reads the dump file twice. First parsing it with grep, then with sed.

A third solution

sed is much stronger than presented. In fact, the inquiry made by grep in gtowey’s solution can be easily handled by sed:

$ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees

real    0m0.573s
user    0m0.416s
sys     0m0.152s

So, the “/^– Table structure for table \`departments\`/,/^– Table structure for table/p” part tells sed to only print those rows starting from the departments table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).

And, we only do it in 0.57 seconds: about half the time of previous attempt.

Now, just to be more correct, we only wish to consider the employees.department table. So, assuming there’s more than one database dumped (and, by consequence, USE statements in the dump-file), we use:

cat /tmp/employees.sql | sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" | ./use employees

Further notes

  • All tests used warmed-up caches.
  • The sharp eyed readers would notice that departments is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I’ve created an xdepartments table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we’re still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.

[UPDATE: see also following similar post: Extract a Single Table from a mysqldump File]

Conclusion

classic-shell-scriptingIts is always best to test on large datasets, to get a feel on performance.

It’s best to save MySQL the trouble of parsing & ignoring statements. Scripting utilities like sed, awk & grep have been around for ages, and are well optimized. They excel at text processing.

I’ve used sed many times in transforming dump outputs; for example, in converting MyISAM to InnoDB tables; to convert Antelope InnoDB tables to Barracuda format, etc. grep & awk are also very useful.

May I recommend, at this point, reading Classic Shell Scripting, a very easy to follow book, which lists the most popular command line utilities like grep, sed, awk, sort, (countless more) and shell scripting in general. While most of these utilities are well known, the book excels in providing suprisingly practical, simple solution to common tasks.

]]>
https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump/feed 14 1630
Reasons to use innodb_file_per_table https://shlomi-noach.github.io/blog/mysql/reasons-to-use-innodb_file_per_table https://shlomi-noach.github.io/blog/mysql/reasons-to-use-innodb_file_per_table#comments Thu, 21 May 2009 03:40:42 +0000 https://shlomi-noach.github.io/blog/?p=614 When working with InnoDB, you have two ways for managing the tablespace storage:

  1. Throw everything in one big file (optionally split).
  2. Have one file per table.

I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable.

A single tablespace

Having everything in one big file means all tables and indexes, from all schemes, are ‘mixed’ together in that file.

This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my log table, the now unused space can be occupied by new rows of any other table.

This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.

An annoying property of InnoDB’s tablespaces is that they never shrink. So after purging those rows from the log table, the tablespace file (usually ibdata1) still keeps the same storage. It does not release storage to the file system.

I’ve seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around.

There’s little to do in this case. Well, one can always purge the rows. Sure, the space would be reused by InnoDB. But having a file which consumes some 80-90% of disk space is a performance catastrophe. It means the disk needle needs to move large distances. Overall disk performance runs very low.

The best way to solve this is to setup a new slave (after purging of the rows), and dump the data into that slave.

InnoDB Hot Backup

The funny thing is, the ibbackup utility will copy the tablespace file as it is. If it was 120GB, of which only 30GB are used, you still get a 120GB backed up and restored.

mysqldump, mk-parallel-dump

mysqldump would be your best choice if you only had the original machine to work with. Assuming you’re only using InnoDB, a dump with –single-transaction will do the job. Or you can utilize mk-parallel-dump to speed things up (depending on your dump method and accessibility needs, mind the locking).

innodb_file_per_table

With this parameter set, a .ibd file is created per table. What we get is this:

  • Tablespace is not shared among different tables, and certainly not among different schemes.
  • Each file is considered a tablespace of its own.
  • Again, tablespace never reduces in size.
  • It is possible to regain space per tablespace.

Wait. The last two seem conflicting, don’t they? Let’s explain.

In our log table example, we purge many rows (up to 90GB of data is removed). The .ibd file does not shrink. But we can do:

ALTER TABLE log ENGINE=InnoDB

What will happen is that a new, temporary file is created, into which the table is rebuilt. Only existing data is added to the new table. Once comlete, the original table is removed, and the new table renamed as the original table.

Sure, this takes a long time, during which the table is completely locked: no writes and no reads allowed. But still – it allows us to regain disk space.

With the new InnoDB plugin, disk space is also regained when execuing a TRUNCATE TABLE log statement.

Fragmentation is not as bad as in a single tablespace: the data is limited within the boundaries of a smaller file.

Monitoring

One other nice thing about innodb_file_per_table is that it is possible to monitor table size on the file system level. You don’t need access to MySQL, to use SHOW TABLE STATUS or to query the INFORMATION_SCHEMA. You can just look up the top 10 largest files under your MySQL data directory (and subdirectories), and monitor their size. You can see which table grows fastest.

Backup

Last, it is not yet possible to backup single InnoDB tables by copying the .ibd files. But hopefully work will be done in this direction.

]]>
https://shlomi-noach.github.io/blog/mysql/reasons-to-use-innodb_file_per_table/feed 37 614
Parameters to use on mysqldump https://shlomi-noach.github.io/blog/mysql/parameters-to-use-on-mysqldump https://shlomi-noach.github.io/blog/mysql/parameters-to-use-on-mysqldump#comments Mon, 13 Oct 2008 07:03:50 +0000 https://shlomi-noach.github.io/blog/?p=4 mysqldump is commonly used for making a MySQL database backup or for setting up a replication.

As in all mysql binaries, there are quite a few parameters to mysqldump. Some are just niceties but some flags are a must. Of course, choosing the parameters to use greatly depends on your requirements, database setup, network capacity etc.

Here is my usual setup for mysqldump. The parameters below apply for an InnoDB based schema (no MyISAM, Memory tables). Parameters can be specified on the command line, or under the [mysqld] scope in the MySQL configuration file.

mysqldump -u dump_user -p -h db_host --routines --master-data --single-transaction  --skip-add-locks --skip-lock-tables --default-character-set=utf8 --compress my_db

Let’s review these parameters and see their effect:

  • -u or --user: This is the user which initiates the dump. Depending on other parameters, the user may need to have quite a few privileges, such as SELECT, RELOAD, FILE, REPLICATION CLIENT etc. Since I do not usually allow for remote root access into mysql, I create a temporary user solely for the purpose of the dump (many times it’s a one-time action), for the specific machine from which the dump is run, and provide this user with all necessary permissions.
  • -h or --host: I try not to dump from the same machine on which MySQL is running. If I do, I prefer to dump into a different disk from that on which the data and log files reside. The dump itself may create a heavy load on the machine (setting locks, performing lots of non cached IO operations). Since the target of the dump is mostly to create a backup on another machine, or set up replication on another machine, the dump has better not run from the MySQL machine.
  • --routines: It is really an annoyance to have to remember this flag. In contrast to –triggers, which is by default TRUE, the --routines parameter is by default FALSE, which means if you forget it – you don’t get the stored functions and procedures in your schema.
  • --master-data: I always enable binary logs on the MySQL nodes I work on. While binary logs may lead to more IO operations (writing binary logs make for more disk writes, obviously, but also disable some InnoDB optimizations), may consume more disk space (once I’ve worked with a company which had such a burst of traffic, that the binary logs to completely filled their disk in less than one day). If binary logs are enabled, the --master-data parameter allows for easy replication setup: the dump includes the CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=... statement, so no need to do stuff like SHOW MASTER STATUS on the dumped node. Optionally, you can set --master-data=2 to have the statement commented.
  • --single-transaction --skip-add-locks --skip-lock-tables: When working with transactional-only storage engines (InnoDB is the most popular choice, but new engines are coming: Falcon, PBXT, Transactional-Maria, SolidDB and more), these parameters allow for a non-interruptive backup, which does not place read locks on all tables. It is possible to keep on reading and writing to the database while mysqldump is running with single transaction. Running in this mode does have its penalty: more IO operations (due to MVCC’s duplication of data while many transactions access the same data for Read/Write). The server is likely to perform more slowly during the dump time.
  • --default-character-set=utf8: I’ve seen so many MySQL installations in which world-wide textual data was stored in the Latin1 charset than I can remember. Many developers, who are testing using standard English data, are not even aware of the issues arrising from changing the data later on to utf8. But even those who are, are usually unaware of the necessity to configure the character set on a per connection basis, or for their specific clients (JDBC or PHP connectors, etc). mysqldump is no different, and if you have non-latin text in your tables, always remember to set this option.
  • --compress: when dumping to another machine, especially a remote one, using this option to GZIP the data between the MySQL server and the mysqldump client. This will make for more CPU operations, but CPU is usually cheap nowdays, and the compression may well save you hours of network transfer time.
]]>
https://shlomi-noach.github.io/blog/mysql/parameters-to-use-on-mysqldump/feed 10 4