Books – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Wed, 16 Dec 2009 07:10:05 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 On restoring a single table from mysqldump https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump#comments Tue, 01 Dec 2009 08:25:00 +0000 https://shlomi-noach.github.io/blog/?p=1630 Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject.

I also wish to note performance issues with the two suggested solutions, and offer improvements.

Problem relevance

While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small – and it doesn’t matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big – and it would not be worthwhile to restore from mysqldump anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.

Problem recap

Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?

Let’s review the two referenced solutions. I’ll be using the employees db on mysql-sandbox for testing. I’ll choose a very small table to restore: departments (only a few rows in this table).

Security based solution

Chris offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I’m afraid, not too efficient, for two reasons:

  1. MySQL needs to process all irrelevant queries (ALTER, INSERT, …) only to disallow them due to access violation errors.
  2. Assuming restore is from remote host, we overload the network with all said irrelevant queries.

Just how inefficient? Let’s time it:

mysql> grant usage on *.* to 'restoreuser'@'localhost';
mysql> grant select on *.* to 'restoreuser'@'localhost';
mysql> grant all on employees.departments to 'restoreuser'@'localhost';

$ time mysql --user=restoreuser --socket=/tmp/mysql_sandbox21701.sock --force employees < /tmp/employees.sql
...
ERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'
ERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'
...
(lot's of these messages)
...

real    0m31.945s
user    0m6.328s
sys     0m0.508s

So, at about 30 seconds to restore a 9 rows table.

Text filtering based solution.

gtowey offers parsing the dump file beforehand:

  • First, parse with grep, to detect rows where tables are referenced within dump file
  • Second, parse with sed, extracting relevant rows.

Let’s time this one:

$ time grep -n 'Table structure' /tmp/employees.sql
23:-- Table structure for table `departments`
48:-- Table structure for table `dept_emp`
89:-- Table structure for table `dept_manager`
117:-- Table structure for table `employees`
161:-- Table structure for table `salaries`
301:-- Table structure for table `titles`

real    0m0.397s
user    0m0.232s
sys     0m0.164s

$ time sed -n 23,48p /tmp/employees.sql | ./use employees

real    0m0.562s
user    0m0.380s
sys     0m0.176s

Much faster: about 1 second, compared to 30 seconds from above.

Nevertheless, I find two issues here:

  1. A correctness problem: this solution somewhat assumes that there’s only a single table with desired name. I say “somewhat” since it leaves this for the user.
  2. An efficiency problem: it reads the dump file twice. First parsing it with grep, then with sed.

A third solution

sed is much stronger than presented. In fact, the inquiry made by grep in gtowey’s solution can be easily handled by sed:

$ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees

real    0m0.573s
user    0m0.416s
sys     0m0.152s

So, the “/^– Table structure for table \`departments\`/,/^– Table structure for table/p” part tells sed to only print those rows starting from the departments table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).

And, we only do it in 0.57 seconds: about half the time of previous attempt.

Now, just to be more correct, we only wish to consider the employees.department table. So, assuming there’s more than one database dumped (and, by consequence, USE statements in the dump-file), we use:

cat /tmp/employees.sql | sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" | ./use employees

Further notes

  • All tests used warmed-up caches.
  • The sharp eyed readers would notice that departments is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I’ve created an xdepartments table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we’re still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.

[UPDATE: see also following similar post: Extract a Single Table from a mysqldump File]

Conclusion

classic-shell-scriptingIts is always best to test on large datasets, to get a feel on performance.

It’s best to save MySQL the trouble of parsing & ignoring statements. Scripting utilities like sed, awk & grep have been around for ages, and are well optimized. They excel at text processing.

I’ve used sed many times in transforming dump outputs; for example, in converting MyISAM to InnoDB tables; to convert Antelope InnoDB tables to Barracuda format, etc. grep & awk are also very useful.

May I recommend, at this point, reading Classic Shell Scripting, a very easy to follow book, which lists the most popular command line utilities like grep, sed, awk, sort, (countless more) and shell scripting in general. While most of these utilities are well known, the book excels in providing suprisingly practical, simple solution to common tasks.

]]>
https://shlomi-noach.github.io/blog/mysql/on-restoring-a-single-table-from-mysqldump/feed 14 1630
High Performance MySQL – a book to re-read https://shlomi-noach.github.io/blog/mysql/high-performance-mysql-a-book-to-re-read https://shlomi-noach.github.io/blog/mysql/high-performance-mysql-a-book-to-re-read#comments Sun, 27 Sep 2009 07:56:59 +0000 https://shlomi-noach.github.io/blog/?p=1346 I first read High Performance MySQL, 2nd edition about a year ago, when it first came out. I since re-read a few pages on occasion.

In my previous posts I’ve suggested ways to improve upon the common ranking solution. Very innovative stuff! Or… so I thought.

I happened to browse through the book today, and a section on User Variables caught my eye. “Let’s see if I get get some insight“, I thought to myself. Imagine my surprise when I realized almost everything I’ve suggested is discussed in this modest section, black on white, sitting on my bookshelf for over a year!

I have read it a year back, have forgotten all about it, have re-invented stuff already solved and discussed… Oh, for more brain capacity…

To be honest, this has happened to me more than once in the past few months; I’m taking the habit of browsing the web when I’m looking for answers to my problems; I forget that this book contains the answers to so many common, practical MySQL problems, and does so in a very direct and helpful manner.

So, yet again, thumbs up to High Performance MySQL. Really a must book. Get it if you haven’t already!

]]>
https://shlomi-noach.github.io/blog/mysql/high-performance-mysql-a-book-to-re-read/feed 4 1346
Unwalking a string with GROUP_CONCAT https://shlomi-noach.github.io/blog/mysql/unwalking-a-string-with-group_concat https://shlomi-noach.github.io/blog/mysql/unwalking-a-string-with-group_concat#comments Tue, 16 Jun 2009 05:54:49 +0000 https://shlomi-noach.github.io/blog/?p=840 “Walking a string” is an SQL technique to convert a single value into multiple rows result set. For example, walking the string ‘hello’ results with 5 rows, each of which contains a single character from the text.

I’ll present a brief example of walking a string, and then show how to “unwalk” the string: do the reverse operation.

To walk a string, an integers table is required (or this could be a good use for SeqEngine):

CREATE TABLE `int_table` (
  `int_col` int(11) NOT NULL,
  PRIMARY KEY  (`int_col`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

-- ...
-- INSERTS follow here
-- ...

mysql> SELECT * FROM int_table;
+---------+
| int_col |
+---------+
|       0 |
|       1 |
|       2 |
|       3 |
|       4 |
|       5 |
|       6 |
|       7 |
|       8 |
|       9 |
+---------+
10 rows in set (0.00 sec)

To convert a string to rows of characters, we join the text with the integers table (we assume there are enough numbers for covering the length of the text):

mysql> SELECT
         SUBSTRING(s, int_col+1, 1) AS c
       FROM int_table, (SELECT 'hello' AS s) sel1
       WHERE int_col < char_length(s);
+---+
| c |
+---+
| h |
| e |
| l |
| l |
| o |
+---+
5 rows in set (0.00 sec)

More on this can be found in the excellent SQL Cookbook.

Unwalking the string

Doing the inverse action – combining the string back from the multiple rows, can be easily done using GROUP_CONCAT. It’s interesting to learn that GROUP_CONCAT does not actually require any GROUP BY clause. When no such clause is provided in the SQL query, all searched rows are used.

Let’s assume now that we have a table of character values, which we want to concatenate back to a complete string. We can easily build this table:

CREATE TABLE characters AS
  SELECT
    SUBSTRING(s, int_col+1, 1) AS c
  FROM int_table, (SELECT 'hello' AS s) sel1
  WHERE int_col < char_length(s);

To reconstruct the text, we simply use MySQL’s GROUP_CONCAT with an empty separator:

mysql> SELECT GROUP_CONCAT(c separator '') AS s FROM characters;
+-------+
| s     |
+-------+
| hello |
+-------+
1 row in set (0.00 sec)
]]>
https://shlomi-noach.github.io/blog/mysql/unwalking-a-string-with-group_concat/feed 13 840
7 ways to convince MySQL to use the right index https://shlomi-noach.github.io/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index https://shlomi-noach.github.io/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index#comments Thu, 02 Apr 2009 16:06:32 +0000 https://shlomi-noach.github.io/blog/?p=695 Sometimes MySQL gets it wrong. It doesn’t use the right index.

It happens that MySQL generates a query plan which is really bad (EXPLAIN says it’s going to explore some 10,000,000 rows), when another plan (soon to show how was generated) says: “Sure, I can do that with 100 rows using a key”.

A true story

A customer had issues with his database. Queries were taking 15 minutes to complete, and the db in general was not responsive. Looking at the slow query log, I found the criminal query. Allow me to bring you up to speed:

A table is defined like this:

CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT,
  type INT UNSIGNED,
  level TINYINT unsigned,
  ...
  PRIMARY KEY(id),
  KEY `type` (type)
) ENGINE=InnoDB;

The offending query was this:

SELECT id FROM data
WHERE type=12345 AND level > 3
ORDER BY id

The facts were:

  • `t` has about 10,000,000 rows.
  • The index on `type` is selective: about 100 rows per value on average.
  • The query took a long time to complete.
  • EXPLAIN has shown that MySQL uses the PRIMARY KEY, hence searches 10,000,000 rows, filtered “using where”.
  • The other EXPLAIN has shown that by using the `type` key, only 110 rows are expected, to be filtered “using where”, then sorted “using filesort”

So MySQL acknowledged it was generating the wrong plan. The other plan was better by its own standards.

Solving the problem

Let’s walk through 7 ways to solve the problem, starting with the more aggressive solutions, refining to achieve desired behavior through subtle changes.

Solution #1: OPTIMIZE

If MySQL got it wrong, it may be because the table was frequently changed. This affects the statistics. If we can spare the time (table is locked during that time), we could help out by rebuilding the table.

Solution #2: ANALYZE

ANALYZE TABLE is less time consuming, in particular on InnoDB, where it is barely noticed. An ANALYZE will update the index statistics and help out in generating better query plans.

But hold on, the above two solutions are fine, but in the given case, MySQL already acknowledges better plans are at hand. The fact was I tried to run ANALYZE a few times, to no avail.

Solution #3: USE INDEX

Since the issue was urgent, my first thought went for the ultimate weapon:

SELECT id FROM data USE INDEX(type)
WHERE type=12345 AND level > 3
ORDER BY id

This instructs MySQL to only consider the indexes listed; in our example, I only want MySQL to consider using the `type` index. It is using this method that generated the other (good) EXPLAIN result. I could have gone even more ruthless and ask for FORCE INDEX.

Solution #4: IGNORE INDEX

A similar approach would be to explicitly negate the use of the PRIMARY KEY, like this:

SELECT id FROM data IGNORE INDEX(PRIMARY)
WHERE type=12345 AND level > 3
ORDER BY id

A moment of thinking

The above solutions are “ugly”, in the sense that this is not standard SQL. It’s too MySQL specific.

I’ve asked the programmers to do a quick rewrite, and had a few moments to consider: why did MySQL insist on using the PRIMARY KEY. Was it because I’ve asked it for the `id` column only? I rewrote as follows:

SELECT id, type, level FROM data
WHERE type=12345 AND level > 3
ORDER BY id

Nope. EXPLAIN got me the same bad plan. Then it must be the ORDER BY clause:

SELECT id FROM data
WHERE type=12345 AND level > 3

Sure enough, EXPLAIN now  indicates using the `type` index, only reading 110 rows. So MySQL preferred to scan 10,000,000 rows, just so that the rows are generated in the right ORDER, and so no sorting is required, when it could have read 110 rows (where each row is a mere INT) and sort them in no time.

Armed with this knowledge, a few more options come at hand.

Solution #5:Move some logic to the application

At about that point I got a message that the programmers were unable to add the USE INDEX part. Why? They were using the EJB framework, which limits your SQL-like queries to something very generic. Well, you can always drop the ORDER BY part and sort on the application side. That isn’t fun, but it’s been done.

Solution #6: Negate use of PRIMARY KEY

Can we force MySQL to use the `type` index, retain the ORDER BY, and do it all with standard SQL? Sure. The following query does this:

SELECT id, type, level FROM data
WHERE type=12345 AND level > 3
ORDER BY id+0

id+0 is a function on the `id` column. This makes MySQL unable to utilize the PRIMARY KEY (or any other index on `id`, had there been one).

In his book “SQL Tuning“, Dan Tow dedicates a chapter on hints and tips like the above. He shows how to control the use or non-use of indexes, the order by which subqueries are calculated, and more.

Unfortunately, the EJB specification said this was not allowed. You could not ORDER BY a fucntion. Only on normal column.

Solution #7: Make MySQL think the problem is harder than it really is

Almost out of options. Just a moment before settling for sorting on the application side, another issue can be considered: since MySQL was fooled once, can it be fooled again to make things right? Can we fool it to believe that the PRIMARY KEY would not be worthwhile to use? The following query does this:

SELECT id, type, level FROM data
WHERE type=12345 AND level > 3
ORDER BY id, type, level

Let’s reflect on this one. What is the order by which the rows are returned now? Answer: exactly as before. Since `id` is PRIMARY KEY, it is also UNIQUE, so no two `id` values are the same. Therefore, the secondary sorting column is redudant, and so is the following one. We get exactly the same result as “ORDER BY id”.

But MySQL didn’t catch this. This query caused MySQL to say: “Mmmmm. ‘ORDER BY id, type, level’ is not doable with the PRIMARY KEY only. Well, in this case, I had better used the `type` index”. Is this a weakness of MySQL? I guess so. Maybe it will be fixed in the future. But this was the fix that made the day.

]]>
https://shlomi-noach.github.io/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index/feed 39 695