Merging tables with INSERT…ON DUPLICATE KEY UPDATE

Had a case recently where I had to merge data from two identically structured tables containing nearly identical data.

“Nearly identical” meaning most table data is identical in both; sometimes a row is missing from one of the tables; sometimes same row (according to PK) appears in both, but some columns are NULL is one tables (while others can be NULL in the second).

Otherwise no contradicting data: it was not possible for some data to be “3” in one table and “4” in the other.

How do you create a merge of the tables, such that all missing rows are completed, and NULLs replaced by actual values when possible?

pt-table-sync comes to mind: one can do a bidirectional syncing of two tables, and actually stating how to resolve ambiguities (like “greater value wins”). Very powerful! An example would be:

pt-table-sync --bidirectional --conflict-column=a --conflict-comparison=greatest --tables ...

However I didn’t actually have any problem with the tables themselves. The two tables were just fine as they were; missing or NULL data does not indicate an error on their part. I wanted to get their merge. pt-table-sync is still up for the job: we can duplicate them, merge on the copy… But I prefer a query over an external script when possible.

INSERT…ON DUPLICATE KEY UPDATE

This MySQL-specific syntax is actually quite powerful. It basically says “if the insert fails due to unique constraint, you get a chance to update the row causing the failure”. But it also allows for smart setting of the column via the VALUES() clause. Let’s present some sample data and then see the solution. Continue reading » “Merging tables with INSERT…ON DUPLICATE KEY UPDATE”

MySQL Stored Routines Debugger & Debugging API: sneak preview video

This is a sneak peek video introduction/preview of an in-development free and open source server side debugger & debugging API for MySQL stored routines.

MySQL does not provide server side debugging capabilities for stored routines. Some tools exist, including MySQL’s own, that assist in stored routine debugging. These are all GUI based and, to the best of my knowledge, MS Windows based. There is one solution in alpha stage that is developed for Java/eclipse; I did not look at the code. See discussion here and here.

An ideal solution would be to have debugging API in the server itself – independently of your client, programming language or operating system. To the best of my knowledge, nothing like that is being developed.

I’m now presenting a rdebug: a stored routines server-side debugger, Pure-SQL, based on stored routines. rdebug is developed as part of common_schema, and actually relies on some of its power.

Like some other tools, it uses code injection and manipulation: it injects debugging info into your stored routine. You need to “compile” your routine with debugging info.

Unlike some other tools, it actually runs your stored routines. It does not mimic or simulate them on client side. It does not break them into smaller routines, attempting to assemble the original behavior from lego bricks.

The quick technical overview is that you use two processes (MySQL threads): the worker process running the routine (your natural call my_routine()), and the debugger process. The debugger process attaches itself to the worker process; it controls the worker by commands like “step over”; it gets data from the worker: what’s the current stack trace? What variables are now available and what are their values?; it manipulates the worker’s data: it can utilize breakpoints to modify worker’s local & session variables. Continue reading » “MySQL Stored Routines Debugger & Debugging API: sneak preview video”

Win a free Percona Live 2013 pass — unveiling riddle hints

Apparently my first attempt at rhyming proved to be unsuccessful: only two courageous men attempted solving the riddle. As I’m pretty sure a free pass would appeal to many, and I do have a few readers for my blog, I must conclude my riddle was just too hard. Obscure, perhaps.

Hope I didn’t scare anyone off. Without further ado I present some hints. This post will update with more hints as the day progresses — please refresh to see changes. I start with two hints.

But first, recap of the riddle:

Who will open your present,

Make you play pleasant,

Tidy your mess,

Do the same for all else?

It has something to do with the MySQL world. Continue reading » “Win a free Percona Live 2013 pass — unveiling riddle hints”

Sessions of interest in Percona Live 2013

Percona Live 2013 is shortly upon us, and it might be a good idea to watch for what’s ahead of us.

Talks of interest

There is no way I can do justice to all. I wish to point out a small number of sessions I am personally interested in attending. I will not be able to attend them all, since there are too many sessions of interest and too few instances of myself (merely one).

I’ve tried to list some talks which are not absolutely obvious (when Peter Zaitsev speaks of MySQL performance, or Monty speaks about MariaDB, or Robert Hodges or Domas speak about replication — well — you’re certain to have the ins and outs, right?). I can also expect Galera or Percona XtraDB Cluster talks to attract a lot of attention. There is a lot of good content for each.

But I was happy to find some very special talks this year, which are not the “every conference has got to have one talk about this” type. Here’s a hybrid collection of both types.

After constructing the list I’ve intentionally dropped two random sessions. If you are speaking, and not mentioned here, your talk must be one of those two! Continue reading » “Sessions of interest in Percona Live 2013”

Looking for a hack: share data between MySQL sessions

I’m looking for a way to share data between two MySQL connections/sessions. Obviously tables are the trivial answer, but for reasons of security (possibly insufficient privileges) I wish to avoid that.

The type of data to be passed can vary. Ideally I would be able to pass multiple pieces of information (dates, texts, integers, etc.) around. If impossible, I would do with texts only, and if impossible yet, I could do with a single text (but reasonably long).

There is a way to do so: by writing to the file system (SELECT INTO OUTFILE + LOAD_FILE()). However I wish to avoid it, since writing to files from within MySQL requires creation of a new file each time; no overwrite and no purging; this litters the file system.

So: any other tricks? Is there some way to pass data via GET_LOCK()/RELEASE_LOCK() (none that I can see other than Morse code)?

Is there some global variable that is unused, can be changed dynamically, and has enough space? (I couldn’t find one)

I appreciate any input.

Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

Reading Sheeri’s MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.

However, I was also concerned about the final action taken: using “–ignore-columns” to avoid comparing the FLOAT/DOUBLE types.

The –float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:

mysql> create table tf(f float);
Query OK, 0 rows affected (0.11 sec)

mysql> insert into tf values(0.0000958084);
Query OK, 1 row affected (0.04 sec)

mysql-5.1> select * from tf;
+-------------+
| f           |
+-------------+
| 9.58084e-05 |
+-------------+

mysql-5.5> select * from tf;
+--------------+
| f            |
+--------------+
| 0.0000958084 |
+--------------+

How can we normalize the notation?

Easily: CAST it as DECIMAL. Consider: Continue reading » “Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation”

MySQL security tasks easily solved with common_schema

Here are three security tasks I handled, which I’m happy to say were easily solved with common_schema‘s views and routines (with no prior planning). Two are so easy, that I actually now integrated them into common_schema 1.3:

  • Duplicate a user (create new user with same privileges as another’s)
  • Find users with identical set of grants (same roles)
  • Finding redundant users (users who only have privileges on non-existing objects); I was approached on this by Sheeri K. Cabral from Mozilla.

Duplicate user

How would you duplicate a grantee? That’s easy! Just get the SHOW GRANTS output, then do text search and replace: replace the existing account (e.g. ‘existing’@’localhost’) with the new account (e.g. ‘newcomer’@’localhost’).

Ahem. And how would you get the output of SHOW GRANTS? That’s right: you can’t do this from within the server. You have to go outside the server, incoke mysql client, sed your way into it, then connect to MySQL again to invoke the GRANT query… Or you can do this by hand, of course, or you can use the new mysqluserclone tool from MySQL utilities. Bottom line: you have to go outside the server. You can’t directly do this with your favorite GUI tool unless it has this function integrated.

But to have a truly automated, scriptable, server-side user-duplication you don’t need to go far, since the sql_show_grants view simulates a SHOW GRANTS output, but using plain old SQL. It produces the GRANT statement as SQL output. Which means you can REPLACE() the grantee. It’s actually a one liner, but is such a common operation that I created the duplicate_grantee() function for convenience. Just: Continue reading » “MySQL security tasks easily solved with common_schema”

common_schema: 1.3: security goodies, parameterized split(), json-to-xml, query checksum

common_schema 1.3 is released and is available for download. New and noteworthy in this version:

  • Parameterized split(): take further control over huge transactions by breaking them down into smaller chunks, now manually tunable if needed
  • duplicate_grantee(): copy+paste existing accounts along with their full set of privileges
  • similar_grants: find which accounts share the exact same set of privileges (i.e. have the same role)
  • json_to_xml(): translate any valid JSON object into its equivalent XML form
  • extract_json_value(): use XPath notation to extract info from JSON data, just as you would from XML
  • query_checksum(): given a query, calculate a checksum on the result set
  • random_hash(): get a 40 hexadecimal digits random hash, using a reasonably large changing input

Let’s take a closer look at the above:

Parameterized split()

split takes your bulk query and automagically breaks it down into smaller pieces. So instead of one huge UPDATE or DELETE or INSERT..SELECT transaction, you get many smaller transactions, each with smaller impact on I/O, locks, CPU.

As of 1.3, split() gets more exposed: you can have some control on its execution, and you also get a lot of very interesting info during operation.

Here’s an example of split() control:

set @script := "
  split({start:7015, step:2000} : UPDATE sakila.rental SET return_date = return_date + INTERVAL 1 DAY) 
    throttle 1;
";
call common_schema.run(@script);

In the above we choose a split size of 2,000 rows at a time; but we also choose to only start with 7015, skipping all rows prior to that value. Just what is that value? It depends on the splitting key (and see next example for just that); but in this table we can safely assume this is the rental_id PRIMARY KEY of the table.

You don’t have to use these control parameters. But they can save you some time and effort. Continue reading » “common_schema: 1.3: security goodies, parameterized split(), json-to-xml, query checksum”

Hierarchical data in INFORMATION_SCHEMA and derivatives

Just how often do you encounter hierarchical data? Consider a table with some parent-child relation, like the this classic employee table:

CREATE TABLE employee (
  employee_id INT UNSIGNED PRIMARY KEY,
  employee_name VARCHAR(100),
  manager_id INT UNSIGNED,
  CONSTRAINT `employee_manager_fk` FOREIGN KEY (manager_id) REFERENCES employee (employee_id)
) engine=innodb
;
+-------------+---------------+------------+
| employee_id | employee_name | manager_id |
+-------------+---------------+------------+
|           1 | Rachel        |       NULL |
|           2 | John          |          1 |
|           3 | Stan          |          1 |
|           4 | Naomi         |          2 |
+-------------+---------------+------------+

Questions like “What is the (nested) list of employees managed by Rachel?” or “Get Naomi’s managers chain of command” are classical questions. There are sometimes dependencies: if John leaves, does that mean Naomi loses her job, or does she get promoted, or something else? If John and Stan are on sick leave, does Rachel have any reason to come to work?

Hierarchical data is not limited to a single-table structure. Sometimes it takes a combination of a few tables (it’s just a JOIN) to make out the parent-child relation.

Hierarchical data is difficult to manage with SQL. This is especially true for MySQL, which does not support the WITH recursive query syntax.

Where can you find hierarchical data?

Even if you do not provide it by yourself, MySQL’s INFORMATION_SCHEMA has some for you. Partly obvious, partly implicit, here are some examples: Continue reading » “Hierarchical data in INFORMATION_SCHEMA and derivatives”

Percona Live 2013 schedule released!

I’m happy to report the release of the program schedule for the next Percona Live event, April 2013.

While you can see there are still some “TODO”s, and minor scheduling changes still to go, the overall reviewing process is complete, and the schedule is sound.

The committee has reviewed over 250 submissions in total, sessions and tutorials. It took two months to review the sessions and finalize the schedule. There were very good submissions, and it wasn’t till I started building the schedule that I realized just how many.

You know the feeling: you’re at a conference, and there’s this time slot where you just can’t make up your mind which session to attend. As it turns out, anyway I look at it, I get this at each and every single time slot in this schedule. Hopefully all attendees should feel similarly.

What can you expect to find in the Santa Clara, April conference?

As always, we get variety of topics. First day is tutorials, which I already mentioned, followed by three days of sessions and keynotes.

  • Some topics you can guess: always place for good backup practice, performance optimizations for your InnoDB deployment, good indexing strategies and more.
  • Open source and emerging technologies are well covered: from Tungsten, Galera/Percona XtraDB Cluster, NDB Cluster to MHA, HAProxy, Hadoop, Percona Toolkit and more.
  • MariaDB has good coverage, as well as Percona Server.
  • Scale topics such as sharding, building large deployments,Big Data.
  • Cloud is well represented, too. Deployments on Amazon, Google Cloud, OpenStack, SkySQL Cloud Data Suite; interesting content on virtualization.
  • Optimizer, better queries, query evaluation plans.
  • Replication, of course, is as always a hot topic, this year in particular due to anticipated 5.6 release
  • Speaking of 5.6, a lot of content on that, too. PERFORMANCE_SCHEMA, InnoDB, partitions, GIS and more…
  • Case studies: can you do without them? Get to hear how big companies used product/project X, what it took to make it work, how it helped them grow.
  • Chance to get up to speed on third party solutions: Akiban, Tokutek, ScaleDB…
  • Hardware: SSDs are all the rage. How do you nest choose and configure your hardware and deployment?
  • Best practices, security, monitoring, NoSQL…
  • And, I can’t list everything, but a LOT of good, unique content!

I will write some more on specific sessions I find that are of unique interest.

Finally, this is my first time acting as committee chairman, and so much of this work was new to me, and I got to learn it on the fly. Thankfully, the committee is composed of experienced conference veterans, not to mention experts, who gave good advice. I was happy to ask for their advice and happy to accept. I wish to thank the committee members for their kindness and openness! (That doesn’t mean I’m buying you all your beer though!)