{"id":5050,"date":"2012-07-11T06:54:20","date_gmt":"2012-07-11T04:54:20","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=5050"},"modified":"2012-07-11T06:54:20","modified_gmt":"2012-07-11T04:54:20","slug":"mysql-joins-on-vs-using-vs-theta-style","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/mysql-joins-on-vs-using-vs-theta-style","title":{"rendered":"MySQL joins: ON vs. USING vs. Theta-style"},"content":{"rendered":"<p>What is the difference between the following three syntaxes?<\/p>\n<blockquote>\n<pre>SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id)\r\nSELECT * FROM film JOIN film_actor USING (film_id)\r\nSELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id<\/pre>\n<\/blockquote>\n<p>The difference is mostly syntactic sugar, but with a couple interesting notes.<\/p>\n<p>To put names, the first two are called <strong>&#8220;ANSI-style&#8221;<\/strong> while the third is called <strong>&#8220;Theta-style&#8221;<\/strong>.<\/p>\n<h4>Theta style<\/h4>\n<p>On the <strong>FROM<\/strong> clause, tables are listed as if with Cartesian products, and the <strong>WHERE<\/strong> clause specifies how the join should take place.<\/p>\n<p>This is considered to be the &#8220;old&#8221; style. It is somewhat confusing to read. Consider the following query:<!--more--><\/p>\n<blockquote>\n<pre><strong>SELECT<\/strong> * <strong>FROM<\/strong> film, film_actor <strong>WHERE<\/strong> film.film_id = film_actor.film_id <strong>AND<\/strong> actor_id = 17 <strong>AND<\/strong> film.length &gt; 120<\/pre>\n<\/blockquote>\n<p>The above lists films over <strong>120<\/strong> minutes in length, in which actor <strong>#17<\/strong> plays. Never mind the results; what about the query? Being just one part of the <strong>WHERE<\/strong> clause, a one out of three elements in the <strong>AND<\/strong> expression, the join equation gets lost. It is difficult to find and isolate the terms which make for table joins as opposed to terms which filter out rows. In the above example it is still relatively easy to point out. How about a query with <strong>5<\/strong> tables and a <strong>20<\/strong> terms <strong>WHERE<\/strong> clause?<\/p>\n<h4>ANSI style: ON<\/h4>\n<p>With <strong>JOIN<\/strong> &#8230; <strong>ON<\/strong>, one separates the join terms from the filtering terms. Rewriting the previous example:<\/p>\n<blockquote>\n<pre><strong>SELECT<\/strong> * <strong>FROM<\/strong> film <strong>JOIN<\/strong> film_actor <strong>ON<\/strong> (film.film_id = film_actor.film_id) <strong>WHERE<\/strong> actor_id = 17 <strong>AND<\/strong> film.length &gt; 120<\/pre>\n<\/blockquote>\n<p>It is quite clear now what belongs to what.<\/p>\n<p>Note: the parenthesis are not strictly required in the <strong>ON<\/strong> clause. I personally like to use them: it makes for an even greater distinction between query parts. SQL syntax is such a mess!<\/p>\n<h4>ANSI style: USING<\/h4>\n<p>Is the special case where we join tables on columns of the same name, we can make a shortcut and use <strong>USING<\/strong>:<\/p>\n<blockquote>\n<pre><strong>SELECT<\/strong> * <strong>FROM<\/strong> film <strong>JOIN<\/strong> film_actor <strong>USING<\/strong> (film_id) <strong>WHERE<\/strong> actor_id = 17 <strong>AND<\/strong> film.length &gt; 120<\/pre>\n<\/blockquote>\n<p>This time the parenthesis are required (I&#8217;m not sure why the difference on that part).<\/p>\n<p>This is mainly a nicety, less words to type, and a resulting prettified query. But also note a couple differences:<\/p>\n<h4>USING vs. ON<\/h4>\n<p>The following is valid:<\/p>\n<blockquote>\n<pre>SELECT film.title, film_id FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length &gt; 120;<\/pre>\n<\/blockquote>\n<p>But the following is not:<\/p>\n<blockquote>\n<pre>SELECT film.title, film_id FROM film JOIN film_actor ON (film.film_id = film_actor.film_id) WHERE actor_id = 17 AND film.length &gt; 120;\r\n<strong>ERROR 1052 (23000): Column 'film_id' in field list is ambiguous<\/strong><\/pre>\n<\/blockquote>\n<p>Since <strong>USING<\/strong> &#8220;knows&#8221; the film_id column is shared between both tables, it doesn&#8217;t mind if we ask it without specifying an exact table. It would be the same value anyway!<\/p>\n<p><strong>ON<\/strong> is not as smart and requires further clarifications: which table exactly do you want?<\/p>\n<p>And the above is actually the result of this interesting phenomena: when using <strong>USING<\/strong>, the column only appears <em>once<\/em> in the result set:<\/p>\n<blockquote>\n<pre>SELECT * FROM film JOIN film_actor USING (film_id) WHERE actor_id = 17 AND film.length &gt; 120 LIMIT 1\\G\r\n*************************** 1. row ***************************\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>film_id<\/strong>: 96\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 title: BREAKING HOME\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 description: A Beautiful Display of a Secret Agent And a Monkey who must Battle a Sumo Wrestler in An Abandoned Mine Shaft\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 release_year: 2006\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 language_id: 1\r\noriginal_language_id: NULL\r\n\u00a0\u00a0\u00a0\u00a0 rental_duration: 4\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 rental_rate: 2.99\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 length: 169\r\n\u00a0\u00a0\u00a0 replacement_cost: 21.99\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 rating: PG-13\r\n\u00a0\u00a0\u00a0 special_features: Trailers,Commentaries\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 last_update: 2006-02-15 05:03:42\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 actor_id: 17\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 last_update: 2006-02-15 05:05:03<\/pre>\n<\/blockquote>\n<p>But joining on <strong>ON<\/strong>, we get this column <em>twice<\/em>:<\/p>\n<blockquote>\n<pre>SELECT * FROM film JOIN film_actor ON film.film_id = film_actor.film_id WHERE actor_id = 17 AND film.length &gt; 120 LIMIT 1\\G\r\n*************************** 1. row ***************************\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>film_id<\/strong>: 96\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 title: BREAKING HOME\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 description: A Beautiful Display of a Secret Agent And a Monkey who must Battle a Sumo Wrestler in An Abandoned Mine Shaft\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 release_year: 2006\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 language_id: 1\r\noriginal_language_id: NULL\r\n\u00a0\u00a0\u00a0\u00a0 rental_duration: 4\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 rental_rate: 2.99\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 length: 169\r\n\u00a0\u00a0\u00a0 replacement_cost: 21.99\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 rating: PG-13\r\n\u00a0\u00a0\u00a0 special_features: Trailers,Commentaries\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 last_update: 2006-02-15 05:03:42\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 actor_id: 17\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>film_id<\/strong>: 96\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 last_update: 2006-02-15 05:05:03<\/pre>\n<\/blockquote>\n<h4>Behind the scenes<\/h4>\n<p>The news is that MySQL treats all in the exact same way. With the kind help of <strong>EXPLAIN EXTENDED<\/strong>, we see that:<\/p>\n<blockquote>\n<pre>EXPLAIN EXTENDED SELECT film.title, film_id FROM film JOIN film_actor <strong>USING<\/strong> (film_id) WHERE actor_id = 17 AND film.length &gt; 120\\G\r\n*************************** 1. row ***************************\r\n...\r\n2 rows in set, 1 warning (0.00 sec)\r\n\r\nroot@mysql-5.1.51&gt; SHOW WARNINGS\\G\r\n*************************** 1. row ***************************\r\n\u00a0 Level: Note\r\n\u00a0\u00a0 Code: 1003\r\nMessage: select `sakila`.`film`.`title` AS `title`,`sakila`.`film`.`film_id` AS `film_id` \r\n         from `sakila`.`film` join `sakila`.`film_actor` \r\n         where (\r\n                 <strong>(`sakila`.`film`.`film_id` = `sakila`.`film_actor`.`film_id`)<\/strong> \r\n                 and (`sakila`.`film_actor`.`actor_id` = 17) \r\n                 and (`sakila`.`film`.`length` &gt; 120)\r\n               )<\/pre>\n<\/blockquote>\n<p>All queries are translated internally to <em>theta-style<\/em>.<\/p>\n<p>This post only discusses inner joins. With outer joins the situation is somewhat different. Read <a href=\"http:\/\/www.mysqldiary.com\/mysql-left-join\/\">this post<\/a> for more insight.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is the difference between the following three syntaxes? SELECT * FROM film JOIN film_actor ON (film.film_id = film_actor.film_id) SELECT * FROM film JOIN film_actor USING (film_id) SELECT * FROM film, film_actor WHERE film.film_id = film_actor.film_id The difference is mostly syntactic sugar, but with a couple interesting notes. To put names, the first two are [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[21,20],"class_list":["post-5050","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-sql","tag-syntax"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1js","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5050","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=5050"}],"version-history":[{"count":18,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5050\/revisions"}],"predecessor-version":[{"id":5069,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5050\/revisions\/5069"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=5050"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=5050"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=5050"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}