MySQL joins: ON vs. USING vs. Theta-style

What is the difference between the following three syntaxes?

SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id)
SELECT * FROM film JOIN film_actor USING (film_id)
SELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id

The difference is mostly syntactic sugar, but with a couple interesting notes.

To put names, the first two are called “ANSI-style” while the third is called “Theta-style”.

Theta style

On the FROM clause, tables are listed as if with Cartesian products, and the WHERE clause specifies how the join should take place.

This is considered to be the “old” style. It is somewhat confusing to read. Consider the following query: Continue reading » “MySQL joins: ON vs. USING vs. Theta-style”

Auto caching tables

Is there a way to create a caching table, some sort of a materialized view, such that upon selecting from that table, its data is validated/invalidated?

Hint: yes.

But to elaborate the point: say I have some table data_table. Can I rewrite all my queries which access data_table to read from some autocache_data_table, but have nothing changed in the query itself? No caveats, no additional WHEREs, and still have that autocache_data_table provide with the correct data, dynamically updated by some rule of our choice?

And: no crontab, no event scheduler, and no funny triggers on data_table? In such way that invalidation/revalidation occurs upon SELECT?

Well, yes.

This post is long, but I suggest you read it through to understand the mechanism, it will be worthwhile.

Background

The following derives from my long research on how to provide better, faster and safer access to INFORMATION_SCHEMA tables. It is however not limited to this exact scenario, and in this post I provide with a simple, general purpose example. I’ll have more to share about INFORMATION_SCHEMA specific solutions shortly.

I was looking for a server side solution which would not require query changes, apart from directing the query to other tables. Solution has to be supported by all standard MySQL installs; so: no plugins, no special rebuilds. Continue reading » “Auto caching tables”

MySQL/QueryScript use case: DELETE all but top N records per group

Some administrative tasks can be simplified by using common_schema/QueryScript. I’m collecting a bunch of these for documentation. Here’s one for example:

The DBA/developer has the task of retaining only top 3 most populated countries per continent. That is, she has to DELETE 4th, 5th, 6th, … most populated counties in each continent.

Is it possible to work out with a single query? Yes. But the query is not pretty. In fact, it is quite complicated, and either involves unintuitive subqueries, or unintuitive hacks. A normal DBA would not want to write, neither maintain this kind of query, unless top-notch-geek, which is fine.

Since this is a one time job, we just need to get it done. And common_schema/QueryScript provide with the intuitive solution: if we read our demand aloud, we realize we want to delete 4th, 5th, 6th, … populated countries for each continent.

I present a solution made available by QueryScript, and discuss the ways in which the code overcomes limitations, or simplifies complexity:

var $num_countries_to_delete;
foreach($continent, $num_countries: SELECT continent, COUNT(*) FROM world.Country GROUP BY continent)
{
  if ($num_countries > 3)
  {
    set $num_countries_to_delete := $num_countries - 3;
    DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;
  }
}

Discussion

The first thing that should be apparent from the above is that this is a programmatic solution. Queries are declarative, which is why complex ones sometimes look incomprehensible. The above is more straightforward. Continue reading » “MySQL/QueryScript use case: DELETE all but top N records per group”

Syntax of the day: IS TRUE and IS FALSE

What makes for a true statement?

We usually test statements using a WHERE clause:

SELECT * FROM world.City WHERE Population > 1000000

The “Population > 1000000” statement makes for a boolean expression. Using WHERE is just one way of evaluating it. One can also test with IF():

SET @val := 7;
SELECT IF(@val > 2, 'Yes', 'No')

TRUE and FALSE

The two are keywords. They also map for the numerals 1 and 0, as follows:

mysql> SELECT TRUE, FALSE;
+------+-------+
| TRUE | FALSE |
+------+-------+
|    1 |     0 |
+------+-------+

Like in the C programming language, a nonzero value evaluates to a true value. A zero evaluates to false. A NULL evaluates to… well, NULL. But aside from 3-valued logic, what’s important in our case is that it is not true. Continue reading » “Syntax of the day: IS TRUE and IS FALSE”

More MySQL foreach()

In my previous post I’ve shown several generic use cases for foreach(), a new scripting functionality introduced in common_schema.

In this part I present DBA’s handy syntax for schema and table operations and maintenance.

Confession: while I love INFORMATION_SCHEMA‘s power, I just hate writing queries against it. It’s just so much typing! Just getting the list of tables in a schema makes for this heavy duty query:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='sakila' AND TABLE_TYPE='BASE TABLE';

When a join is involved this really becomes a nightmare. I think it’s cumbersome, and as result, many do not remember the names and meaning of columns, making for “oh, I need to read the manual all over again just to get that query right”. Anyway, that’s my opinion.

A SHOW TABLES statement is easier to type, but cannot be integrated into a SELECT query (though we have a partial solution for that, too), and besides, when filtering out the views, the SHOW statement becomes almost as cumbersome as the one on INFORMATION_SCHEMA.

Which is why foreach() offers handy shortcuts to common iterations on schemata and tables, as follows:

Use case: iterate all databases

call foreach('schema', 'CREATE TABLE ${schema}.event(event_id INT, msg VARCHAR(128))');

In the above we execute a query on each database. Hmmm, maybe not such a good idea to perform this operation on all databases? Let’s filter them:

Use case: iterate databases by name match

call foreach('schema like wordpress_%', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

The above will only iterate my WordPress databases (I have several of these), performing an ALTER on wp_posts for each of those databases. Continue reading » “More MySQL foreach()”

MySQL foreach()

A new routine is now available in common_schema, which makes for an easier execution syntax for some operations:

foreach(collection_to_iterate_over, queries_to_execute_per_iteration_step);

To illustrate what it can do, consider:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPACT');

call $('schema like shard_%', 'CREATE TABLE ${schema}.messages (id INT)');

call $('2000:2009', 'INSERT IGNORE INTO report (report_year) VALUES (${1})');

$() stands as a synonym to foreach(). I suspect it should look familiar to web programmers.

The idea for foreach() was introduced by Giuseppe Maxia during a correspondence. At first I was skeptic: this isn’t jQuery; this is SQL. Why would I want to use foreach()?

Then Giuseppe provided some use cases, and as I started thinking about it, I found more and more cases where such a tool might considerably reduce scripting overhead and avoid requiring SQL-fu skills. In fact, I have been using it myself for the past few weeks

I provide examples which I hope will convince the reader as for the simplicity of using such syntax. Showing off the types of input foreach() accepts (query, table search, schema search, set of constants, single or double numbers range), and the types of queries it can execute (single, multiple, using placeholders).

I stress that this is not a replacement for common queries (i.e. does not come to replace your common SELECT, UPDATE, DELETE), but more for working out administrative tasks. Nevertheless, the last example in this post does provide with an interesting insight on possible “normal” use. Continue reading » “MySQL foreach()”

Reading results of SHOW statements, on server side

SHOW statements are show stoppers on server side. While clients can get a SHOW statement as a result set just as any normal SELECT, things are not as such on server side.

On server side, that is, from within MySQL itself, one cannot:

SELECT `Database` FROM (SHOW DATABASES);

One cannot:

DECLARE show_cursor CURSOR FOR SHOW TABLES;

One cannot:

SHOW TABLES INTO OUTFILE '/tmp/my_file.txt';

So it is impossible to get the results with a query; impossible to get the results from a stored routine; impossible to get the results by file reading…

Bwahaha! A hack!

For some SHOW statements, there is a way around this. I’ve been banging my head against the wall for weeks now on this. Now I have a partial solution: I’m able to read SHOW output for several SHOW statements. Namely, those SHOW statements which allow a LIKE or a WHERE clause.

For example, most are familiar with the following syntax:

USE mysql;
SHOW TABLE STATUS LIKE 'user';

However not so many know that any SHOW statement which accepts LIKE, can also accept WHERE: Continue reading » “Reading results of SHOW statements, on server side”

Quoting text JavaScript/Python style

Unless your MySQL is configured to use ANSI_QUOTES in sql_mode, you are able to quote your text in one of two forms: using single quotes or double quotes:

UPDATE world.Country SET HeadOfState = 'Willy Wonka' WHERE Code='USA'
UPDATE world.Country SET HeadOfState = "Willy Wonka" WHERE Code="USA"

This makes for JavaScript- or Python-style quoting: you quote by your needs. Say you have a text which includes single quotes:

It is what you read when you don’t have to that determines what you will be when you can’t help it. – Oscar Wilde

You wish to insert this text to some tables. You could go through the trouble of escaping it:

INSERT INTO quotes (quote, author) VALUES (
  'It is what you read when you don\'t have to that determines what you will be when you can\'t help it.', 'Oscar Wilde');

or you could just wrap it in double quotes:

INSERT INTO quotes (quote, author) VALUES (
  "It is what you read when you don't have to that determines what you will be when you can't help it.", 'Oscar Wilde');

I find this useful when using SQL to generate queries. Take, for example, eval() for MySQL: the statement: Continue reading » “Quoting text JavaScript/Python style”

Self throttling MySQL queries

Recap on the problem:

  • A query takes a long time to complete.
  • During this time it makes for a lot of I/O.
  • Query’s I/O overloads the db, making for other queries run slow.

I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the query does not perform I/O at that time, which then means other queries can have their chance to execute.

I present two approaches:

  • The naive approach: for every 1,000 rows, the query sleep for 1 second
  • The factor approach: for every 1,000 rows, the query sleeps for the amount of time it took to iterate those 1,000 rows (effectively doubling the total runtime of the query). Continue reading » “Self throttling MySQL queries”

Contest for Glory: write a self throttling MySQL query

What’s all this about?

I’ve you’ve been to my talk in London, then you’ve already got this as homework. If not, allow me to fill in the details:

I was speaking about MySQL’s programmatic nature in many aspects (best if you read the slides!). We discussed user defined variables, derived tables, ordering constructs, order of evaluation, time suspension and time calculation.

An issue I presented was that of a very long running query. Say it runs for 20 minutes. It kills your I/O. Do you KILL the query or wait? It could terminate in 5 seconds from now, and if you kill it now, you lose everything. But it may yet run for 3 more hours!

We discussed a futuristic feature where the query would gracefully terminate after some designated time. However, futuristic wishes do not help us.

A self throttling query

I suggested the idea for a self throttling query. We know how to throttle writing queries, such as DELETE queries: we break them into small chunks, then work each chunk at a time, setting ourselves to sleep in between chunks. This is how –sleep and –sleep-ratio work in oak-chunk-update. It is how –sleep and –sleep-coef work in pt-archiver.

But can the same be done for SELECT queries? Continue reading » “Contest for Glory: write a self throttling MySQL query”