Syntax – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Thu, 21 Feb 2013 12:12:48 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Merging tables with INSERT…ON DUPLICATE KEY UPDATE https://shlomi-noach.github.io/blog/mysql/merging-tables-with-insert-on-duplicate-key-update https://shlomi-noach.github.io/blog/mysql/merging-tables-with-insert-on-duplicate-key-update#comments Thu, 21 Feb 2013 12:12:48 +0000 https://shlomi-noach.github.io/blog/?p=6120 Had a case recently where I had to merge data from two identically structured tables containing nearly identical data.

“Nearly identical” meaning most table data is identical in both; sometimes a row is missing from one of the tables; sometimes same row (according to PK) appears in both, but some columns are NULL is one tables (while others can be NULL in the second).

Otherwise no contradicting data: it was not possible for some data to be “3” in one table and “4” in the other.

How do you create a merge of the tables, such that all missing rows are completed, and NULLs replaced by actual values when possible?

pt-table-sync comes to mind: one can do a bidirectional syncing of two tables, and actually stating how to resolve ambiguities (like “greater value wins”). Very powerful! An example would be:

pt-table-sync --bidirectional --conflict-column=a --conflict-comparison=greatest --tables ...

However I didn’t actually have any problem with the tables themselves. The two tables were just fine as they were; missing or NULL data does not indicate an error on their part. I wanted to get their merge. pt-table-sync is still up for the job: we can duplicate them, merge on the copy… But I prefer a query over an external script when possible.

INSERT…ON DUPLICATE KEY UPDATE

This MySQL-specific syntax is actually quite powerful. It basically says “if the insert fails due to unique constraint, you get a chance to update the row causing the failure”. But it also allows for smart setting of the column via the VALUES() clause. Let’s present some sample data and then see the solution.

Assume the following table definition:

create table t1 (
  pkdt datetime,
  pki int,
  a int,
  b int,
  c int,
  primary key (pkdt, pki)
);

Same structure holds for t1, t2 and tmerge — our target table. Looking at table data we have:

> select * from t1;
+---------------------+-----+------+------+------+
| pkdt                | pki | a    | b    | c    |
+---------------------+-----+------+------+------+
| 2012-01-01 00:00:00 |   1 |    4 |    5 |    6 |
| 2012-01-02 00:00:00 |   2 | NULL |    5 |    6 |
| 2012-01-04 00:00:00 |   4 |    4 | NULL | NULL |
| 2012-01-05 00:00:00 |   5 | NULL |    8 | NULL |
+---------------------+-----+------+------+------+

> select * from t2;
+---------------------+-----+------+------+------+
| pkdt                | pki | a    | b    | c    |
+---------------------+-----+------+------+------+
| 2012-01-01 00:00:00 |   1 |    4 | NULL |    6 |
| 2012-01-03 00:00:00 |   3 |    4 |    5 |    6 |
| 2012-01-04 00:00:00 |   4 |    4 |    5 |    6 |
| 2012-01-05 00:00:00 |   5 |    7 | NULL |    9 |
+---------------------+-----+------+------+------+

We can see a row with pki=2 appears in t1 but not in t2; a row with pki=3 appears in t2 but not in t2, and various NULLs appear throughout the rows that are shared.

To get the shared table, we throw in the data from t1 and t2 into tmerge, in such way that a real value overwrites a NULL, like this:

insert into tmerge select * from t1;

insert into tmerge select * from t2
on duplicate key update
  a = ifnull(tmerge.a, values(a)),
  b = ifnull(tmerge.b, values(b)),
  c = ifnull(tmerge.c, values(c))
;

So even while I’m inserting values to tmerge, I’m able to check for current value, compared to the value I wish to insert, and have time to make a decision. This is really cool! The result:

> select * from tmerge;
+---------------------+-----+------+------+------+
| pkdt                | pki | a    | b    | c    |
+---------------------+-----+------+------+------+
| 2012-01-01 00:00:00 |   1 |    4 |    5 |    6 |
| 2012-01-02 00:00:00 |   2 | NULL |    5 |    6 |
| 2012-01-03 00:00:00 |   3 |    4 |    5 |    6 |
| 2012-01-04 00:00:00 |   4 |    4 |    5 |    6 |
| 2012-01-05 00:00:00 |   5 |    7 |    8 |    9 |
+---------------------+-----+------+------+------+

Just what we wanted: all possible rows inside; real value takes over NULL whenever possible.

]]>
https://shlomi-noach.github.io/blog/mysql/merging-tables-with-insert-on-duplicate-key-update/feed 3 6120
Pop quiz: funny syntax https://shlomi-noach.github.io/blog/mysql/pop-quiz-funny-syntax https://shlomi-noach.github.io/blog/mysql/pop-quiz-funny-syntax#comments Mon, 05 Nov 2012 18:22:15 +0000 https://shlomi-noach.github.io/blog/?p=5445 The following questions are of little importance, yet I find them entertaining. I stumbled upon these while developing QueryScript.

Can you guess the results of the following statements?

Pop quiz 1

SET @x := 7;
SELECT ++@x;

What is the computation result? What will be the value of @x?

Pop quiz 2

SET @ := 4;
SELECT @ + @'' + @``

What is the computation result?

Pop quiz 3

SET @a := 2;
SELECT @a = @'a' = @`a`

Do we get TRUE or FALSE? When?

Hints

Consider the following queries as hints to the above questions:

SELECT +++++-@x;
SELECT @ = @'', @ = @``
SELECT (@a = @'a') = @`a`

Answers

  • Pop quiz 1

++@x is interpreted as +(+(@x)), which is in turn evaluated as 0+(0+(@x)). No relation to C‘s ++ operator. @x is unchanged.

  • Pop quiz 2

@x is the same as @’x’ and as @`x`. What’s funny is that one is allowed to create the empty-named user defined variable @”. Makes for a weird looking syntax, but nothing special about it. Computation result is 12.

  • Pop quiz 3

We’ve already established that @a, @’a’ and @`a` are the same, I just used this notation for adding pepper to the quiz. The real question is whether 2 = 2 = 2 holds true.  It does not. There is no 3-way comparison. All comparisons are in pairs, which is why the expression evaluates as (2 = 2) = 2, leading to 1 = 2, “1” being the TRUE value of (2 = 2). The only value of @a for which the expression holds true is 1.

]]>
https://shlomi-noach.github.io/blog/mysql/pop-quiz-funny-syntax/feed 1 5445
MySQL joins: ON vs. USING vs. Theta-style https://shlomi-noach.github.io/blog/mysql/mysql-joins-on-vs-using-vs-theta-style https://shlomi-noach.github.io/blog/mysql/mysql-joins-on-vs-using-vs-theta-style#comments Wed, 11 Jul 2012 04:54:20 +0000 https://shlomi-noach.github.io/blog/?p=5050 What is the difference between the following three syntaxes?

SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id)
SELECT * FROM film JOIN film_actor USING (film_id)
SELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id

The difference is mostly syntactic sugar, but with a couple interesting notes.

To put names, the first two are called “ANSI-style” while the third is called “Theta-style”.

Theta style

On the FROM clause, tables are listed as if with Cartesian products, and the WHERE clause specifies how the join should take place.

This is considered to be the “old” style. It is somewhat confusing to read. Consider the following query:

SELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id AND actor_id = 17 AND film.length > 120

The above lists films over 120 minutes in length, in which actor #17 plays. Never mind the results; what about the query? Being just one part of the WHERE clause, a one out of three elements in the AND expression, the join equation gets lost. It is difficult to find and isolate the terms which make for table joins as opposed to terms which filter out rows. In the above example it is still relatively easy to point out. How about a query with 5 tables and a 20 terms WHERE clause?

ANSI style: ON

With JOINON, one separates the join terms from the filtering terms. Rewriting the previous example:

SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id) WHERE actor_id = 17 AND film.length > 120

It is quite clear now what belongs to what.

Note: the parenthesis are not strictly required in the ON clause. I personally like to use them: it makes for an even greater distinction between query parts. SQL syntax is such a mess!

ANSI style: USING

Is the special case where we join tables on columns of the same name, we can make a shortcut and use USING:

SELECT * FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120

This time the parenthesis are required (I’m not sure why the difference on that part).

This is mainly a nicety, less words to type, and a resulting prettified query. But also note a couple differences:

USING vs. ON

The following is valid:

SELECT film.title, film_id FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120;

But the following is not:

SELECT film.title, film_id FROM film JOIN film_actor ON (film.film_id = film_actor.film_id) WHERE actor_id = 17 AND film.length > 120;
ERROR 1052 (23000): Column 'film_id' in field list is ambiguous

Since USING “knows” the film_id column is shared between both tables, it doesn’t mind if we ask it without specifying an exact table. It would be the same value anyway!

ON is not as smart and requires further clarifications: which table exactly do you want?

And the above is actually the result of this interesting phenomena: when using USING, the column only appears once in the result set:

SELECT * FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120 LIMIT 1\G
*************************** 1. row ***************************
             film_id: 96
               title: BREAKING HOME
         description: A Beautiful Display of a Secret Agent And a Monkey who must Battle a Sumo Wrestler in An Abandoned Mine Shaft
        release_year: 2006
         language_id: 1
original_language_id: NULL
     rental_duration: 4
         rental_rate: 2.99
              length: 169
    replacement_cost: 21.99
              rating: PG-13
    special_features: Trailers,Commentaries
         last_update: 2006-02-15 05:03:42
            actor_id: 17
         last_update: 2006-02-15 05:05:03

But joining on ON, we get this column twice:

SELECT * FROM film JOIN film_actor ON film.film_id = film_actor.film_id WHERE actor_id = 17 AND film.length > 120 LIMIT 1\G
*************************** 1. row ***************************
             film_id: 96
               title: BREAKING HOME
         description: A Beautiful Display of a Secret Agent And a Monkey who must Battle a Sumo Wrestler in An Abandoned Mine Shaft
        release_year: 2006
         language_id: 1
original_language_id: NULL
     rental_duration: 4
         rental_rate: 2.99
              length: 169
    replacement_cost: 21.99
              rating: PG-13
    special_features: Trailers,Commentaries
         last_update: 2006-02-15 05:03:42
            actor_id: 17
             film_id: 96
         last_update: 2006-02-15 05:05:03

Behind the scenes

The news is that MySQL treats all in the exact same way. With the kind help of EXPLAIN EXTENDED, we see that:

EXPLAIN EXTENDED SELECT film.title, film_id FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120\G
*************************** 1. row ***************************
...
2 rows in set, 1 warning (0.00 sec)

root@mysql-5.1.51> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `sakila`.`film`.`title` AS `title`,`sakila`.`film`.`film_id` AS `film_id` 
         from `sakila`.`film` join `sakila`.`film_actor` 
         where (
                 (`sakila`.`film`.`film_id` = `sakila`.`film_actor`.`film_id`) 
                 and (`sakila`.`film_actor`.`actor_id` = 17) 
                 and (`sakila`.`film`.`length` > 120)
               )

All queries are translated internally to theta-style.

This post only discusses inner joins. With outer joins the situation is somewhat different. Read this post for more insight.

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-joins-on-vs-using-vs-theta-style/feed 15 5050
Syntax of the day: IS TRUE and IS FALSE https://shlomi-noach.github.io/blog/mysql/syntax-of-the-day-is-true-and-is-false https://shlomi-noach.github.io/blog/mysql/syntax-of-the-day-is-true-and-is-false#comments Thu, 26 Jan 2012 04:09:23 +0000 https://shlomi-noach.github.io/blog/?p=4198 What makes for a true statement?

We usually test statements using a WHERE clause:

SELECT * FROM world.City WHERE Population > 1000000

The “Population > 1000000” statement makes for a boolean expression. Using WHERE is just one way of evaluating it. One can also test with IF():

SET @val := 7;
SELECT IF(@val > 2, 'Yes', 'No')

TRUE and FALSE

The two are keywords. They also map for the numerals 1 and 0, as follows:

mysql> SELECT TRUE, FALSE;
+------+-------+
| TRUE | FALSE |
+------+-------+
|    1 |     0 |
+------+-------+

Like in the C programming language, a nonzero value evaluates to a true value. A zero evaluates to false. A NULL evaluates to… well, NULL. But aside from 3-valued logic, what’s important in our case is that it is not true.

However, simple value comparison is incorrect:

mysql> SELECT @val, @val > 3, @val > 3 = TRUE as result;
+------+----------+--------+
| @val | @val > 3 | result |
+------+----------+--------+
|    7 |        1 |      1 |
+------+----------+--------+

mysql> SELECT @val, @val = TRUE as result;
+------+--------+
| @val | result |
+------+--------+
|    7 |      0 |
+------+--------+

To test for the truth value of an expression, the correct syntax is by using IS TRUE:

SELECT @val, @val IS TRUE as result;
+------+--------+
| @val | result |
+------+--------+
|    7 |      1 |
+------+--------+

Likewise, one may use IS FALSE to test for falsehood. However, if you wish to note NULL as a false value this does not work:

SELECT @empty, @empty IS TRUE, @empty IS FALSE;
+--------+----------------+-----------------+
| @empty | @empty IS TRUE | @empty IS FALSE |
+--------+----------------+-----------------+
| NULL   |              0 |               0 |
+--------+----------------+-----------------+

If you’re unsure why, you should read more on three-valued logic in SQL. To solve the above, simply use IS NOT TRUE:

SELECT @empty, @empty IS NOT TRUE;
+--------+--------------------+
| @empty | @empty IS NOT TRUE |
+--------+--------------------+
| NULL   |                  1 |
+--------+--------------------+

In summary, use IS TRUE and IS NOT TRUE so as to normalize truth values into a 0, 1 value range, C style, including handling of NULLs.

]]>
https://shlomi-noach.github.io/blog/mysql/syntax-of-the-day-is-true-and-is-false/feed 4 4198
Quoting text JavaScript/Python style https://shlomi-noach.github.io/blog/mysql/quoting-text-javascriptpython-style https://shlomi-noach.github.io/blog/mysql/quoting-text-javascriptpython-style#comments Tue, 15 Nov 2011 06:11:37 +0000 https://shlomi-noach.github.io/blog/?p=4364 Unless your MySQL is configured to use ANSI_QUOTES in sql_mode, you are able to quote your text in one of two forms: using single quotes or double quotes:

UPDATE world.Country SET HeadOfState = 'Willy Wonka' WHERE Code='USA'
UPDATE world.Country SET HeadOfState = "Willy Wonka" WHERE Code="USA"

This makes for JavaScript- or Python-style quoting: you quote by your needs. Say you have a text which includes single quotes:

It is what you read when you don’t have to that determines what you will be when you can’t help it. – Oscar Wilde

You wish to insert this text to some tables. You could go through the trouble of escaping it:

INSERT INTO quotes (quote, author) VALUES (
  'It is what you read when you don\'t have to that determines what you will be when you can\'t help it.', 'Oscar Wilde');

or you could just wrap it in double quotes:

INSERT INTO quotes (quote, author) VALUES (
  "It is what you read when you don't have to that determines what you will be when you can't help it.", 'Oscar Wilde');

I find this useful when using SQL to generate queries. Take, for example, eval() for MySQL: the statement:

CALL eval('select concat(\'KILL \',id) from information_schema.processlist where user=\'webuser\'');

is just so more easily written this way:

CALL eval("select concat('KILL ',id) from information_schema.processlist where user='webuser'");

I don’t suggest one should use this method throughout her application code. Application code works great with auto-escaping string literals. But for the handy DBA or developer, who needs to work some quick queries by hand, this makes for an easier syntax to use.

]]>
https://shlomi-noach.github.io/blog/mysql/quoting-text-javascriptpython-style/feed 2 4364
MySQL terminology: processes, threads & connections https://shlomi-noach.github.io/blog/mysql/mysql-terminology-processes-threads-connections https://shlomi-noach.github.io/blog/mysql/mysql-terminology-processes-threads-connections#comments Wed, 03 Nov 2010 06:38:03 +0000 https://shlomi-noach.github.io/blog/?p=2465 There’s some confusion in the MySQL terminology for processes, threads & connections, which I will try to resolve. I can’t explain the full nature of what processes and threads are; please see Wikipedia [1] [2] for that. But here’s some basics with regard to MySQL:

  • MySQL server is a single process application.
  • It is multithreaded.
  • It (usually) acts as a TCP/IP server, accepting connections.
  • Each connection gets a dedicated thread.
  • These threads are sometimes named processes, and sometimes they’re referred to as connections.

The last part is where confusion arises, so let me discuss again the use of threads and connections in MySQL.

MySQL truly is a single process server. It is multi threaded, in that there are many obvious and less obvious threads comprising the server. Such threads are the InnoDB I/O threads, the DELAYED INSERT thread, etc. Oh, and of course: the connection threads. More on this in a short while.

On older Linux versions or on glibc-static versions, one may view MySQL as a multi-process server. This is not so: it is merely because threads are mapped to OS processes. For the sake of this discussion this is irrelevant. mysqld is a single process.

So, every new connection gets its own thread. Assuming no thread pool is in use, every new connection makes for the creation of a new thread, and a disconnect causes for that thread’s destruction. Hence, there is a 1-1 mapping between connections and active threads. But then, there is a thread pool, which means there can be threads which are not associated with any connection. So, the number of threads is greater than or equal to the number of connections.

Here’s where terminology gets confusing. When you want to see what’s executing on the server, you issue SHOW PROCESSLIST:

mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 4
   User: root
   Host: localhost
     db: mycheckpoint
Command: Query
   Time: 0
  State: NULL
   Info: SHOW PROCESSLIST
1 row in set (0.02 sec)

Perhaps this should have been called SHOW THREADLIST; the acting queries are not really processes.

OK, so there’s process #4 which is executing a query. What’s my process id? Turns out I don’t have a process id. I do get to have a CONNECTION_ID():

mysql> SELECT CONNECTION_ID();
+-----------------+
| CONNECTION_ID() |
+-----------------+
|               4 |
+-----------------+

So how many processes or connections are now actually doing anything? We now must check for ‘Threads_running’.

mysql> SHOW GLOBAL STATUS LIKE 'Threads_running';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Threads_running | 1     |
+-----------------+-------+

And so we have ‘Threads_cached’, ‘Threads_connected’ & ‘Max_used_connections’.

Confusing?

Most of the time one can simply think of processes, threads and connections as 1-1-1 mapped, and not bother with it.

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-terminology-processes-threads-connections/feed 3 2465
SQL: good comments conventions https://shlomi-noach.github.io/blog/mysql/sql-good-comments-conventions https://shlomi-noach.github.io/blog/mysql/sql-good-comments-conventions#comments Thu, 01 Jul 2010 07:36:32 +0000 https://shlomi-noach.github.io/blog/?p=2581 I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.

I list four major places where SQL comments are helpful. I’ll use the sakila database. It is originally scarcely commented; I’ll present it now enhanced with comments, to illustrate.

Table definitions

The CREATE TABLE statement allows for a comment, intended to describe the nature of the table:

CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Reflection of `film`, used for FULLTEXT search.'

It’s too bad the comment’s max length is 60 characters, though. However, it’s a very powerful field.

Column definitions

One may comment particular columns:

CREATE TABLE `film` (
 `film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(255) NOT NULL,
 `description` text,
 `release_year` year(4) DEFAULT NULL,
 `language_id` tinyint(3) unsigned NOT NULL COMMENT 'Soundtrack spoken language',
 `original_language_id` tinyint(3) unsigned DEFAULT NULL COMMENT 'Filmed spoken language',
 `rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
 `rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
 `length` smallint(5) unsigned DEFAULT NULL,
 `replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
  ...
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8

Stored routines definitions

Here’s an original sakila procedure, untouched. It is already commented:

CREATE DEFINER=`root`@`localhost` PROCEDURE `rewards_report`(
 IN min_monthly_purchases TINYINT UNSIGNED
 , IN min_dollar_amount_purchased DECIMAL(10,2) UNSIGNED
 , OUT count_rewardees INT
)
 READS SQL DATA
 COMMENT 'Provides a customizable report on best customers'
BEGIN

 DECLARE last_month_start DATE;
 DECLARE last_month_end DATE;
 ...

SQL queries

Last but not least, while not part of the schema, SQL queries define the use of the schema. That is, the schema exists for the sole reason of being able to query it.

Where did that query come from? Which piece of code issued it? Why? What’s its purpose?

Looking at the PROCESSLIST, the slow log, etc., it is easier when the queries are commented:

SELECT
 /* List film details along with participating actors */
 /* Issued by analytics module */
 film.*,
 COUNT(*) AS count_actors,
 GROUP_CONCAT(CONCAT(actor.first_name, ' ', actor.last_name))
FROM
 film
 JOIN film_actor USING(film_id)
 JOIN actor USING(actor_id)
GROUP BY film.film_id;

Conclusion

Source code commenting is an important practice, and usually watched out for. SQL & table definitions commenting are often scarce or non-existent. I urge DBAs to adopt a comments coding convention for SQL, and apply it whenever they can.

]]>
https://shlomi-noach.github.io/blog/mysql/sql-good-comments-conventions/feed 5 2581
Views: better performance with condition pushdown https://shlomi-noach.github.io/blog/mysql/views-better-performance-with-condition-pushdown https://shlomi-noach.github.io/blog/mysql/views-better-performance-with-condition-pushdown#comments Thu, 20 May 2010 05:17:05 +0000 https://shlomi-noach.github.io/blog/?p=1328 Justin’s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer’s site.

The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.

While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there’s a hack around this.

Let’s use the world database to illustrate. Look at the following view definition:

CREATE
  ALGORITHM=TEMPTABLE
VIEW country_languages AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON (Country.CODE = CountryLanguage.CountryCode)
  GROUP BY
    Country.CODE;

The view presents with a list of spoken languages per country. The execution plan for querying this view looks like this:

mysql> EXPLAIN SELECT * FROM country_languages;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 |                                              |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+

And, even if we only want to filter out a single country, we still get the same plan:

mysql> EXPLAIN SELECT * FROM country_languages WHERE Code='USA';
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 | Using where                                  |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+

So, we need to scan the entire country_language and country tables in order to return results for just one row.

A non-working solution

The solution offered by the developer was this:

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_non_working AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = @country_code
  GROUP BY Country.CODE;

And follow by:

mysql> SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+

So, pushdown a WHERE condition into the view’s definition. The session variable @country_code is used to filter rows. In the above simplified code the value is assumed to be set; tweak it as you see fit (using IFNULL, for example, or OR statements) to allow for full scan in case the variable is undefined.

This doesn’t work. It used to work a couple years back; but today you cannot create a view which uses session variables or parameters. It is a restriction imposed by views.

A workaround

Justin showed a workaround using an additional table. There is another workaround which does not involve tables, but rather stored routines. Now, this is a patch, and an ugly one. It may not work in future versions of MySQL for all I know. But, here it goes:

DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_session_country`() RETURNS CHAR(3)
    NO SQL
    DETERMINISTIC
BEGIN
  RETURN @country_code;
END $$
DELIMITER ;

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_2 AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = get_session_country()
  GROUP BY Country.CODE;

And now:

mysql> SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM country_languages_2;
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY     | <derived2>      | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|  2 | DERIVED     | Country         | const  | PRIMARY       | PRIMARY | 3       |      |    1 |                          |
|  2 | DERIVED     | CountryLanguage | ref    | PRIMARY       | PRIMARY | 3       |      |    8 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+

Since views are allowed to call stored routines (Justing used this to call upon CONNECTION_ID()), and since stored routines can use session variables, we can take advantage and force the view into filtering out irrelevant rows before these accumulate to temporary tables and big joins.

Back in the customer’s office, we witnessed, what with their real data and multiple views, a reduction of query times from ~30 minutes to a few seconds.

Another kind of use

Eventually we worked to make better view definitions and query splitting, resulting in clearer code and fast queries, but this solution plays nicely into another kind of problem:

Can we force different customers to see different parts of a given table? e.g., only those rows that relate to the customers?

There can be many solutions: different tables; multiple views (one per customer), stored procedures, what have you. The above provides a solution, and I’ve seen it in use.

]]>
https://shlomi-noach.github.io/blog/mysql/views-better-performance-with-condition-pushdown/feed 1 1328
Discovery of the day: GROUP BY … DESC https://shlomi-noach.github.io/blog/mysql/discovery-of-the-day-group-by-desc https://shlomi-noach.github.io/blog/mysql/discovery-of-the-day-group-by-desc#comments Tue, 04 May 2010 09:38:38 +0000 https://shlomi-noach.github.io/blog/?p=2381 I happened on a query where, by mistake, an

SELECT ... ORDER BY x DESC LIMIT 1

was written as

SELECT ... GROUP BY x DESC LIMIT 1

And it took me by surprise to realize GROUP BY x DESC is a valid statement. I looked it up: yep! It’s documented.

In MySQL, GROUP BY results are sorted according to the group statement. You can override this by adding ORDER BY NULL (see past post). I wasn’t aware you can actually control the sort order.

]]>
https://shlomi-noach.github.io/blog/mysql/discovery-of-the-day-group-by-desc/feed 4 2381
But I DO want MySQL to say “ERROR”! https://shlomi-noach.github.io/blog/mysql/but-i-do-want-mysql-to-say-error https://shlomi-noach.github.io/blog/mysql/but-i-do-want-mysql-to-say-error#comments Fri, 12 Mar 2010 04:53:28 +0000 https://shlomi-noach.github.io/blog/?p=2005 MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you.

Warnings and errors

It would be nice to:

  • Have an auto_propagate_warning_to_error server variable (global/session/both) which, well, does what it says.
  • Have an i_am_really_not_a_dummy server variable which implies stricter checks for all the above and prevents you from doing with anything that may be problematic (or rolls back your transactions on your invalid actions).

Connectors may be nice enough to propagate warnings to errors – that’s good. But not enough: since data is already committed in MySQL.

If I understand correctly, and maybe it’s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that “MySQL just works out of the box and does not require me to configure or understand anything”).

MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity — or, be nice to everyone, just allow me to specify what “nice” means for me.

]]>
https://shlomi-noach.github.io/blog/mysql/but-i-do-want-mysql-to-say-error/feed 18 2005