{"id":1630,"date":"2009-12-01T10:25:00","date_gmt":"2009-12-01T08:25:00","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=1630"},"modified":"2009-12-16T09:10:05","modified_gmt":"2009-12-16T07:10:05","slug":"on-restoring-a-single-table-from-mysqldump","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/on-restoring-a-single-table-from-mysqldump","title":{"rendered":"On restoring a single table from mysqldump"},"content":{"rendered":"<p>Following <a href=\"http:\/\/everythingmysql.ning.com\/profiles\/blogs\/restore-one-table-from-an-all\">Restore one table from an ALL database dump<\/a> and <a href=\"http:\/\/gtowey.blogspot.com\/2009\/11\/restore-single-table-from-mysqldump.html\">Restore a Single Table From mysqldump<\/a>, I would like to add my own thoughts and comments on the subject.<\/p>\n<p>I also wish to note performance issues with the two suggested solutions, and offer improvements.<\/p>\n<h4>Problem relevance<\/h4>\n<p>While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small &#8211; and it doesn&#8217;t matter how you solve it (e.g. just open vi\/emacs and copy+paste). Too big &#8211; and it would not be worthwhile to restore from <em>mysqldump<\/em> anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.<\/p>\n<h4>Problem recap<\/h4>\n<p>Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?<\/p>\n<p>Let&#8217;s review the two referenced solutions. I&#8217;ll be using the <a href=\"http:\/\/dev.mysql.com\/doc\/employee\/en\/employee.html\">employees db<\/a> on <a href=\"https:\/\/launchpad.net\/mysql-sandbox\">mysql-sandbox<\/a> for testing. I&#8217;ll choose a very small table to restore: <strong>departments<\/strong> (only a few rows in this table).<\/p>\n<h4>Security based solution<\/h4>\n<p><a href=\"http:\/\/everythingmysql.ning.com\/profiles\/blogs\/restore-one-table-from-an-all\"><strong>Chris<\/strong><\/a> offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I&#8217;m afraid, not too efficient, for two reasons:<!--more--><\/p>\n<ol>\n<li>MySQL needs to process all irrelevant queries (ALTER, INSERT, &#8230;) only to disallow them due to access violation errors.<\/li>\n<li>Assuming restore is from remote host, we overload the network with all said irrelevant queries.<\/li>\n<\/ol>\n<p>Just how inefficient? Let&#8217;s time it:<\/p>\n<blockquote>\n<pre>mysql&gt; grant usage on *.* to 'restoreuser'@'localhost';\r\nmysql&gt; grant select on *.* to 'restoreuser'@'localhost';\r\nmysql&gt; grant all on employees.departments to 'restoreuser'@'localhost';\r\n\r\n$ time mysql --user=restoreuser --socket=\/tmp\/mysql_sandbox21701.sock --force employees &lt; \/tmp\/employees.sql\r\n...\r\nERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'\r\nERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'\r\n...\r\n(lot's of these messages)\r\n...\r\n\r\nreal\u00a0\u00a0\u00a0 <strong>0m31.945s<\/strong>\r\nuser\u00a0\u00a0\u00a0 0m6.328s\r\nsys\u00a0\u00a0\u00a0\u00a0 0m0.508s<\/pre>\n<\/blockquote>\n<p>So, at about <strong>30<\/strong> seconds to restore a 9 rows table.<\/p>\n<h4>Text filtering based solution.<\/h4>\n<p><a href=\"http:\/\/gtowey.blogspot.com\/2009\/11\/restore-single-table-from-mysqldump.html\"><strong>gtowey<\/strong><\/a> offers parsing the dump file beforehand:<\/p>\n<ul>\n<li>First, parse with <em>grep<\/em>, to detect rows where tables are referenced within dump file<\/li>\n<li>Second, parse with <em>sed<\/em>, extracting relevant rows.<\/li>\n<\/ul>\n<p>Let&#8217;s time this one:<\/p>\n<blockquote>\n<pre>$ time grep -n 'Table structure' \/tmp\/employees.sql\r\n23:-- Table structure for table `departments`\r\n48:-- Table structure for table `dept_emp`\r\n89:-- Table structure for table `dept_manager`\r\n117:-- Table structure for table `employees`\r\n161:-- Table structure for table `salaries`\r\n301:-- Table structure for table `titles`\r\n\r\nreal\u00a0\u00a0\u00a0 <strong>0m0.397s<\/strong>\r\nuser\u00a0\u00a0\u00a0 0m0.232s\r\nsys\u00a0\u00a0\u00a0\u00a0 0m0.164s\r\n\r\n$ time sed -n 23,48p \/tmp\/employees.sql | .\/use employees\r\n\r\nreal\u00a0\u00a0\u00a0 <strong>0m0.562s<\/strong>\r\nuser\u00a0\u00a0\u00a0 0m0.380s\r\nsys\u00a0\u00a0\u00a0\u00a0 0m0.176s<\/pre>\n<\/blockquote>\n<p>Much faster: about <strong>1<\/strong> second, compared to <strong>30<\/strong> seconds from above.<\/p>\n<p>Nevertheless, I find two issues here:<\/p>\n<ol>\n<li>A correctness problem: this solution somewhat assumes that there&#8217;s only a single table with desired name. I say &#8220;somewhat&#8221; since it leaves this for the user.<\/li>\n<li>An efficiency problem: it reads the dump file <em>twice<\/em>. First parsing it with <em>grep<\/em>, then with <em>sed<\/em>.<\/li>\n<\/ol>\n<h4>A third solution<\/h4>\n<p><em>sed<\/em> is much stronger than presented. In fact, the inquiry made by <em>grep<\/em> in gtowey&#8217;s solution can be easily handled by <em>sed<\/em>:<\/p>\n<blockquote>\n<pre>$ time sed -n \"\/^-- Table structure for table \\`departments\\`\/,\/^-- Table structure for table\/p\" \/tmp\/employees.sql | .\/use employees\r\n\r\nreal\u00a0\u00a0\u00a0 <strong>0m0.573s<\/strong>\r\nuser\u00a0\u00a0\u00a0 0m0.416s\r\nsys\u00a0\u00a0\u00a0\u00a0 0m0.152s<\/pre>\n<\/blockquote>\n<p>So, the <strong>&#8220;\/^&#8211; Table structure for table \\`departments\\`\/,\/^&#8211; Table structure for table\/p&#8221;<\/strong> part tells <em>sed<\/em> to only print those rows starting from the <strong>departments<\/strong> table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).<\/p>\n<p>And, we only do it in <strong>0.57<\/strong> seconds: about half the time of previous attempt.<\/p>\n<p>Now, just to be more correct, we only wish to consider the <strong>employees.department<\/strong> table. So, <em>assuming<\/em> there&#8217;s more than one database dumped (and, by consequence, <strong>USE<\/strong> statements in the dump-file), we use:<\/p>\n<blockquote>\n<pre>cat \/tmp\/employees.sql | sed -n \"\/^USE \\`employees\\`\/,\/^USE \\`\/p\" | sed -n \"\/^-- Table structure for table \\`departments\\`\/,\/^-- Table structure for table\/p\" | .\/use employees<\/pre>\n<\/blockquote>\n<h4>Further notes<\/h4>\n<ul>\n<li>All tests used warmed-up caches.<\/li>\n<li>The sharp eyed readers would notice that <strong>departments<\/strong> is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I&#8217;ve created an <strong>xdepartments<\/strong> table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we&#8217;re still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.<\/li>\n<\/ul>\n<p>[<strong>UPDATE<\/strong>: see also following similar post: <a href=\"http:\/\/blog.tsheets.com\/2008\/tips-tricks\/extract-a-single-table-from-a-mysqldump-file.html\">Extract a Single Table from a mysqldump File<\/a>]<\/p>\n<h4>Conclusion<\/h4>\n<p><a href=\"http:\/\/www.amazon.com\/Classic-Shell-Scripting-Arnold-Robbins\/dp\/0596005954\/ref=sr_1_1\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" title=\"classic-shell-scripting\" src=\"http:\/\/code.openark.org\/blog\/wp-content\/uploads\/2009\/12\/classic-shell-scripting.png\" alt=\"classic-shell-scripting\" width=\"144\" height=\"189\" \/><\/a>Its is always best to test on large datasets, to get a feel on performance.<\/p>\n<p>It&#8217;s best to save MySQL the trouble of parsing &amp; ignoring statements. Scripting utilities like <em>sed<\/em>, <em>awk<\/em> &amp; <em>grep<\/em> have been around for ages, and are well optimized. They excel at text processing.<\/p>\n<p>I&#8217;ve used <em>sed<\/em> many times in transforming dump outputs; for example, in converting MyISAM to InnoDB tables; to convert Antelope InnoDB tables to Barracuda format, etc. grep &amp; awk are also very useful.<\/p>\n<p>May I recommend, at this point, reading <a href=\"http:\/\/www.amazon.com\/Classic-Shell-Scripting-Arnold-Robbins\/dp\/0596005954\/ref=sr_1_1\">Classic Shell Scripting<\/a>, a very easy to follow book, which lists the most popular command line utilities like <em>grep<\/em>, <em>sed<\/em>, <em>awk<\/em>, <em>sort<\/em>, (countless more) and shell scripting in general. While most of these utilities are well known, the book excels in providing suprisingly practical, simple solution to common tasks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject. I also wish to note performance issues with the two suggested solutions, and offer improvements. Problem relevance While the problem is interesting, I just want to [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[7,45,9,52,50],"class_list":["post-1630","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-backup","tag-books","tag-mysqldump","tag-performance","tag-scripts"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-qi","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/1630","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=1630"}],"version-history":[{"count":33,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/1630\/revisions"}],"predecessor-version":[{"id":1750,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/1630\/revisions\/1750"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=1630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=1630"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=1630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}