Merging tables with INSERT…ON DUPLICATE KEY UPDATE

Had a case recently where I had to merge data from two identically structured tables containing nearly identical data.

“Nearly identical” meaning most table data is identical in both; sometimes a row is missing from one of the tables; sometimes same row (according to PK) appears in both, but some columns are NULL is one tables (while others can be NULL in the second).

Otherwise no contradicting data: it was not possible for some data to be “3” in one table and “4” in the other.

How do you create a merge of the tables, such that all missing rows are completed, and NULLs replaced by actual values when possible?

pt-table-sync comes to mind: one can do a bidirectional syncing of two tables, and actually stating how to resolve ambiguities (like “greater value wins”). Very powerful! An example would be:

pt-table-sync --bidirectional --conflict-column=a --conflict-comparison=greatest --tables ...

However I didn’t actually have any problem with the tables themselves. The two tables were just fine as they were; missing or NULL data does not indicate an error on their part. I wanted to get their merge. pt-table-sync is still up for the job: we can duplicate them, merge on the copy… But I prefer a query over an external script when possible.

INSERT…ON DUPLICATE KEY UPDATE

This MySQL-specific syntax is actually quite powerful. It basically says “if the insert fails due to unique constraint, you get a chance to update the row causing the failure”. But it also allows for smart setting of the column via the VALUES() clause. Let’s present some sample data and then see the solution. Continue reading » “Merging tables with INSERT…ON DUPLICATE KEY UPDATE”

Pop quiz: funny syntax

The following questions are of little importance, yet I find them entertaining. I stumbled upon these while developing QueryScript.

Can you guess the results of the following statements?

Pop quiz 1

SET @x := 7;
SELECT ++@x;

What is the computation result? What will be the value of @x?

Pop quiz 2

SET @ := 4;
SELECT @ + @'' + @``

What is the computation result?

Pop quiz 3

SET @a := 2;
SELECT @a = @'a' = @`a`

Do we get TRUE or FALSE? When? Continue reading » “Pop quiz: funny syntax”

MySQL joins: ON vs. USING vs. Theta-style

What is the difference between the following three syntaxes?

SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id)
SELECT * FROM film JOIN film_actor USING (film_id)
SELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id

The difference is mostly syntactic sugar, but with a couple interesting notes.

To put names, the first two are called “ANSI-style” while the third is called “Theta-style”.

Theta style

On the FROM clause, tables are listed as if with Cartesian products, and the WHERE clause specifies how the join should take place.

This is considered to be the “old” style. It is somewhat confusing to read. Consider the following query: Continue reading » “MySQL joins: ON vs. USING vs. Theta-style”

Syntax of the day: IS TRUE and IS FALSE

What makes for a true statement?

We usually test statements using a WHERE clause:

SELECT * FROM world.City WHERE Population > 1000000

The “Population > 1000000” statement makes for a boolean expression. Using WHERE is just one way of evaluating it. One can also test with IF():

SET @val := 7;
SELECT IF(@val > 2, 'Yes', 'No')

TRUE and FALSE

The two are keywords. They also map for the numerals 1 and 0, as follows:

mysql> SELECT TRUE, FALSE;
+------+-------+
| TRUE | FALSE |
+------+-------+
|    1 |     0 |
+------+-------+

Like in the C programming language, a nonzero value evaluates to a true value. A zero evaluates to false. A NULL evaluates to… well, NULL. But aside from 3-valued logic, what’s important in our case is that it is not true. Continue reading » “Syntax of the day: IS TRUE and IS FALSE”

Quoting text JavaScript/Python style

Unless your MySQL is configured to use ANSI_QUOTES in sql_mode, you are able to quote your text in one of two forms: using single quotes or double quotes:

UPDATE world.Country SET HeadOfState = 'Willy Wonka' WHERE Code='USA'
UPDATE world.Country SET HeadOfState = "Willy Wonka" WHERE Code="USA"

This makes for JavaScript- or Python-style quoting: you quote by your needs. Say you have a text which includes single quotes:

It is what you read when you don’t have to that determines what you will be when you can’t help it. – Oscar Wilde

You wish to insert this text to some tables. You could go through the trouble of escaping it:

INSERT INTO quotes (quote, author) VALUES (
  'It is what you read when you don\'t have to that determines what you will be when you can\'t help it.', 'Oscar Wilde');

or you could just wrap it in double quotes:

INSERT INTO quotes (quote, author) VALUES (
  "It is what you read when you don't have to that determines what you will be when you can't help it.", 'Oscar Wilde');

I find this useful when using SQL to generate queries. Take, for example, eval() for MySQL: the statement: Continue reading » “Quoting text JavaScript/Python style”

MySQL terminology: processes, threads & connections

There’s some confusion in the MySQL terminology for processes, threads & connections, which I will try to resolve. I can’t explain the full nature of what processes and threads are; please see Wikipedia [1] [2] for that. But here’s some basics with regard to MySQL:

  • MySQL server is a single process application.
  • It is multithreaded.
  • It (usually) acts as a TCP/IP server, accepting connections.
  • Each connection gets a dedicated thread.
  • These threads are sometimes named processes, and sometimes they’re referred to as connections.

The last part is where confusion arises, so let me discuss again the use of threads and connections in MySQL.

Continue reading » “MySQL terminology: processes, threads & connections”

SQL: good comments conventions

I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.

I list four major places where SQL comments are helpful. I’ll use the sakila database. It is originally scarcely commented; I’ll present it now enhanced with comments, to illustrate.

Table definitions

The CREATE TABLE statement allows for a comment, intended to describe the nature of the table:

CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Reflection of `film`, used for FULLTEXT search.'

It’s too bad the comment’s max length is 60 characters, though. However, it’s a very powerful field.

Column definitions

One may comment particular columns: Continue reading » “SQL: good comments conventions”

Views: better performance with condition pushdown

Justin’s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer’s site.

The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.

While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there’s a hack around this.

Let’s use the world database to illustrate. Look at the following view definition: Continue reading » “Views: better performance with condition pushdown”

Discovery of the day: GROUP BY … DESC

I happened on a query where, by mistake, an

SELECT ... ORDER BY x DESC LIMIT 1

was written as

SELECT ... GROUP BY x DESC LIMIT 1

And it took me by surprise to realize GROUP BY x DESC is a valid statement. I looked it up: yep! It’s documented.

In MySQL, GROUP BY results are sorted according to the group statement. You can override this by adding ORDER BY NULL (see past post). I wasn’t aware you can actually control the sort order.

But I DO want MySQL to say “ERROR”!

MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you.

Warnings and errors

It would be nice to: Continue reading » “But I DO want MySQL to say “ERROR”!”