On restoring a single table from mysqldump

December 1, 2009

Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject.

I also wish to note performance issues with the two suggested solutions, and offer improvements.

Problem relevance

While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small - and it doesn't matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big - and it would not be worthwhile to restore from mysqldump anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.

Problem recap

Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?

Let's review the two referenced solutions. I'll be using the employees db on mysql-sandbox for testing. I'll choose a very small table to restore: departments (only a few rows in this table).

Security based solution

Chris offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I'm afraid, not too efficient, for two reasons:

  1. MySQL needs to process all irrelevant queries (ALTER, INSERT, ...) only to disallow them due to access violation errors.
  2. Assuming restore is from remote host, we overload the network with all said irrelevant queries.

Just how inefficient? Let's time it:

mysql> grant usage on *.* to 'restoreuser'@'localhost';
mysql> grant select on *.* to 'restoreuser'@'localhost';
mysql> grant all on employees.departments to 'restoreuser'@'localhost';

$ time mysql --user=restoreuser --socket=/tmp/mysql_sandbox21701.sock --force employees < /tmp/employees.sql
...
ERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'
ERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'
...
(lot's of these messages)
...

real    0m31.945s
user    0m6.328s
sys     0m0.508s

So, at about 30 seconds to restore a 9 rows table.

Text filtering based solution.

gtowey offers parsing the dump file beforehand:

  • First, parse with grep, to detect rows where tables are referenced within dump file
  • Second, parse with sed, extracting relevant rows.

Let's time this one:

$ time grep -n 'Table structure' /tmp/employees.sql
23:-- Table structure for table `departments`
48:-- Table structure for table `dept_emp`
89:-- Table structure for table `dept_manager`
117:-- Table structure for table `employees`
161:-- Table structure for table `salaries`
301:-- Table structure for table `titles`

real    0m0.397s
user    0m0.232s
sys     0m0.164s

$ time sed -n 23,48p /tmp/employees.sql | ./use employees

real    0m0.562s
user    0m0.380s
sys     0m0.176s

Much faster: about 1 second, compared to 30 seconds from above.

Nevertheless, I find two issues here:

  1. A correctness problem: this solution somewhat assumes that there's only a single table with desired name. I say "somewhat" since it leaves this for the user.
  2. An efficiency problem: it reads the dump file twice. First parsing it with grep, then with sed.

A third solution

sed is much stronger than presented. In fact, the inquiry made by grep in gtowey's solution can be easily handled by sed:

$ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees

real    0m0.573s
user    0m0.416s
sys     0m0.152s

So, the "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" part tells sed to only print those rows starting from the departments table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).

And, we only do it in 0.57 seconds: about half the time of previous attempt.

Now, just to be more correct, we only wish to consider the employees.department table. So, assuming there's more than one database dumped (and, by consequence, USE statements in the dump-file), we use:

cat /tmp/employees.sql | sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" | ./use employees

Further notes

  • All tests used warmed-up caches.
  • The sharp eyed readers would notice that departments is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I've created an xdepartments table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we're still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.

[UPDATE: see also following similar post: Extract a Single Table from a mysqldump File]

Conclusion

classic-shell-scriptingIts is always best to test on large datasets, to get a feel on performance.

It's best to save MySQL the trouble of parsing & ignoring statements. Scripting utilities like sed, awk & grep have been around for ages, and are well optimized. They excel at text processing.

I've used sed many times in transforming dump outputs; for example, in converting MyISAM to InnoDB tables; to convert Antelope InnoDB tables to Barracuda format, etc. grep & awk are also very useful.

May I recommend, at this point, reading Classic Shell Scripting, a very easy to follow book, which lists the most popular command line utilities like grep, sed, awk, sort, (countless more) and shell scripting in general. While most of these utilities are well known, the book excels in providing suprisingly practical, simple solution to common tasks.

tags: , , , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

14 Comments to "On restoring a single table from mysqldump"

  1. Chris wrote:

    yet another great solution, thanks for the comparison and improvements!

  2. Chris wrote:

    I just used your method in my PROD environment and it is defiantly easier and more efficient than my method with grants. Again, thanks for the slick method.

  3. WhirCat · restoring a single table from mysqldump wrote:

    [...] via On restoring a single table from mysqldump | code.openark.org. [...]

  4. shlomi wrote:

    @Chris

    Good! Credits due to gtowey

    Regards

  5. Gavin Towey wrote:

    Nice! The sed only method is a good improvement.

  6. Log Buffer #171: a Carnival of the Vanities for DBAs | The Pythian Blog wrote:

    [...] Noach shares his thoughts on restoring a single table from mysqldump. “Given a dump file (generated by mysqldump), how do you restore a single table, without [...]

  7. TVNshack wrote:

    Watchout the problem of the automatically generated AUTO_INCREMENT table option that mysqldump generates and that MATCH only with the INSERT statements that may come right after in the dump.

    Even if you specify a --no-data, that AUTO_INCREMENT=nnn table option will still appear. If you intend to create a table from that extracted definition and for which you may not want to insert the exact same data, LAST_INSERT_ID will not be adapted for the need of your empty table. Your table first row inserted relying on AUTO_INCREMENT column definition option will start with an offset AUTO_INCREMENT value.

    I would recommend to edit the mysqldump output to reset each AUTO_INCREMENT table option to AUTO_INCREMENT=1 and let the eventual INSERTs manage LAST_INSERT_ID

    (see bug #44425)

  8. shlomi wrote:

    @TVNshack

    sed can come in handy here, too.

  9. Vitaly wrote:

    Even it's not directly related to this topic, I noticed that if we cutting specific table dump from the whole DB dump there is a problem with restoring of TIMESTAMP fields - they become UTC instead of local time.

    adding

    /*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
    /*!40103 SET TIME_ZONE='+00:00' */;

    (as we have in a dump produced by mysqldump) before table data fixes this problem.

    Regards,
    Vitaly Karasik

  10. Extract satu tabel dari file backup mysqldump « Setsuga's Weblog wrote:

    [...] tidak lupa saya cantumkan link asli GA_googleAddAttr("AdOpt", "1"); GA_googleAddAttr("Origin", "other"); [...]

  11. Useful sed / awk liners for MySQL | code.openark.org wrote:

    [...] On restoring a single table from mysqldump [...]

  12. 使用tbdba-restore-mysqldump.pl切割mysqldump文件 - 一个故事@MySQL DBA wrote:

    [...] On restoring a single table from mysqldump:这篇文章则对比了使用grep sed 和“权限控制”三种方法的速度。 [...]

  13. 使用tbdba-restore-mysqldump.pl切割mysqldump文件 | 岭南六少 - 一朵在LAMP架构下挣扎的云 wrote:

    [...] On restoring a single table from mysqldump:这篇文章则对比了使用grep sed 和“权限控制”三种方法的速度。 [...]

  14. 切割mysqldump文件工具 wrote:

    [...] On restoring a single table from mysqldump:这篇文章则对比了使用grep sed 和“权限控制”三种方法的速度。 [...]

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org