MySQL joins: ON vs. USING vs. Theta-style

What is the difference between the following three syntaxes?

SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id)
SELECT * FROM film JOIN film_actor USING (film_id)
SELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id

The difference is mostly syntactic sugar, but with a couple interesting notes.

To put names, the first two are called “ANSI-style” while the third is called “Theta-style”.

Theta style

On the FROM clause, tables are listed as if with Cartesian products, and the WHERE clause specifies how the join should take place.

This is considered to be the “old” style. It is somewhat confusing to read. Consider the following query:

SELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id AND actor_id = 17 AND film.length > 120

The above lists films over 120 minutes in length, in which actor #17 plays. Never mind the results; what about the query? Being just one part of the WHERE clause, a one out of three elements in the AND expression, the join equation gets lost. It is difficult to find and isolate the terms which make for table joins as opposed to terms which filter out rows. In the above example it is still relatively easy to point out. How about a query with 5 tables and a 20 terms WHERE clause?

ANSI style: ON

With JOINON, one separates the join terms from the filtering terms. Rewriting the previous example:

SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id) WHERE actor_id = 17 AND film.length > 120

It is quite clear now what belongs to what.

Note: the parenthesis are not strictly required in the ON clause. I personally like to use them: it makes for an even greater distinction between query parts. SQL syntax is such a mess!

ANSI style: USING

Is the special case where we join tables on columns of the same name, we can make a shortcut and use USING:

SELECT * FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120

This time the parenthesis are required (I’m not sure why the difference on that part).

This is mainly a nicety, less words to type, and a resulting prettified query. But also note a couple differences:

USING vs. ON

The following is valid:

SELECT film.title, film_id FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120;

But the following is not:

SELECT film.title, film_id FROM film JOIN film_actor ON (film.film_id = film_actor.film_id) WHERE actor_id = 17 AND film.length > 120;
ERROR 1052 (23000): Column 'film_id' in field list is ambiguous

Since USING “knows” the film_id column is shared between both tables, it doesn’t mind if we ask it without specifying an exact table. It would be the same value anyway!

ON is not as smart and requires further clarifications: which table exactly do you want?

And the above is actually the result of this interesting phenomena: when using USING, the column only appears once in the result set:

SELECT * FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120 LIMIT 1\G
*************************** 1. row ***************************
             film_id: 96
               title: BREAKING HOME
         description: A Beautiful Display of a Secret Agent And a Monkey who must Battle a Sumo Wrestler in An Abandoned Mine Shaft
        release_year: 2006
         language_id: 1
original_language_id: NULL
     rental_duration: 4
         rental_rate: 2.99
              length: 169
    replacement_cost: 21.99
              rating: PG-13
    special_features: Trailers,Commentaries
         last_update: 2006-02-15 05:03:42
            actor_id: 17
         last_update: 2006-02-15 05:05:03

But joining on ON, we get this column twice:

SELECT * FROM film JOIN film_actor ON film.film_id = film_actor.film_id WHERE actor_id = 17 AND film.length > 120 LIMIT 1\G
*************************** 1. row ***************************
             film_id: 96
               title: BREAKING HOME
         description: A Beautiful Display of a Secret Agent And a Monkey who must Battle a Sumo Wrestler in An Abandoned Mine Shaft
        release_year: 2006
         language_id: 1
original_language_id: NULL
     rental_duration: 4
         rental_rate: 2.99
              length: 169
    replacement_cost: 21.99
              rating: PG-13
    special_features: Trailers,Commentaries
         last_update: 2006-02-15 05:03:42
            actor_id: 17
             film_id: 96
         last_update: 2006-02-15 05:05:03

Behind the scenes

The news is that MySQL treats all in the exact same way. With the kind help of EXPLAIN EXTENDED, we see that:

EXPLAIN EXTENDED SELECT film.title, film_id FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length > 120\G
*************************** 1. row ***************************
...
2 rows in set, 1 warning (0.00 sec)

root@mysql-5.1.51> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `sakila`.`film`.`title` AS `title`,`sakila`.`film`.`film_id` AS `film_id` 
         from `sakila`.`film` join `sakila`.`film_actor` 
         where (
                 (`sakila`.`film`.`film_id` = `sakila`.`film_actor`.`film_id`) 
                 and (`sakila`.`film_actor`.`actor_id` = 17) 
                 and (`sakila`.`film`.`length` > 120)
               )

All queries are translated internally to theta-style.

This post only discusses inner joins. With outer joins the situation is somewhat different. Read this post for more insight.

16 thoughts on “MySQL joins: ON vs. USING vs. Theta-style

  1. Roland,

    Great examples on multiple choices (multiple FKs, multiple columns names address_id).

    With those examples it becomes very clear how being declarative SQL becomes ambiguous or undefined.

    I agree to the notion of RELATE using a FK.

    To be honest, I think SQL is a mess. It also uses ancient syntax rules. This doesn’t mean I don’t enjoy it, though 🙂

    But it is my belief that ANSI SQL could have been much better if redesigned, in many aspects.

  2. I think that the ANSIstyle UISNG syntax requires parenthesis to distinguish the anbiguity:

    FROM film JOIN film_actor USING (film_id, x)

    (where x is a column)

    vs

    FROM (film JOIN film_actor USING film_id) , x

    (where x is a table)

  3. Sorry, I wasn’t clear. I meant that without parenthesis, this would be ambiguous:

    FROM film JOIN film_actor USING film_id, x

    as to which of the two parenthesized versions above it should be parsed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.