<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: SQL: finding a user&#039;s country/region based on IP</title>
	<atom:link href="http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Wed, 01 Feb 2012 20:47:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Joshua K Roberson</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2948</link>
		<dc:creator>Joshua K Roberson</dc:creator>
		<pubDate>Sun, 02 Aug 2009 04:32:26 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2948</guid>
		<description>Here are different ways to store an IP or an IP range as well as query for the IPs.
&lt;a href=&quot;http://strictcoder.blogspot.com/2009/08/different-ways-to-query-for-ip-in-your.html&quot; rel=&quot;nofollow&quot;&gt;http://strictcoder.blogspot.com/2009/08/different-ways-to-query-for-ip-in-your.html&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Here are different ways to store an IP or an IP range as well as query for the IPs.<br />
<a href="http://strictcoder.blogspot.com/2009/08/different-ways-to-query-for-ip-in-your.html" rel="nofollow">http://strictcoder.blogspot.com/2009/08/different-ways-to-query-for-ip-in-your.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shlomi</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2113</link>
		<dc:creator>shlomi</dc:creator>
		<pubDate>Thu, 28 May 2009 03:13:33 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2113</guid>
		<description>@Michael,

Great! Thanks for the benchmarks!</description>
		<content:encoded><![CDATA[<p>@Michael,</p>
<p>Great! Thanks for the benchmarks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2110</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Wed, 27 May 2009 21:02:58 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2110</guid>
		<description>@shlomi,

D&#039;oh! Yes, you&#039;re completely right. I wasn&#039;t reading sql statement correctly, and there was a typo in it (using &gt;= instead of &lt;=) in my test cases. (Note to self, do not comment until after coffee has been had.)

I did do some benchmarking on some stuff here.

The winner is definitely the &quot;SELECT * (SELECT * .. WHERE my_ip &lt;= ip_begin..) AS T ...&quot; at least on my box  configuration. - approx 5K queries/sec (single client)

The worse is of course, the &quot;ip BETWEEN start and end&quot;  at a whole whopping 3 queries/sec, and that&#039;s with the optimizer claiming to use start ip index.

My version using classb was able to crank out 2K queries/sec. Not bad, but only half as good as the SELECT(SELECT *) AS T.. one.</description>
		<content:encoded><![CDATA[<p>@shlomi,</p>
<p>D'oh! Yes, you're completely right. I wasn't reading sql statement correctly, and there was a typo in it (using &gt;= instead of &lt;=) in my test cases. (Note to self, do not comment until after coffee has been had.)</p>
<p>I did do some benchmarking on some stuff here.</p>
<p>The winner is definitely the "SELECT * (SELECT * .. WHERE my_ip &lt;= ip_begin..) AS T ..." at least on my box  configuration. - approx 5K queries/sec (single client)</p>
<p>The worse is of course, the "ip BETWEEN start and end"  at a whole whopping 3 queries/sec, and that's with the optimizer claiming to use start ip index.</p>
<p>My version using classb was able to crank out 2K queries/sec. Not bad, but only half as good as the SELECT(SELECT *) AS T.. one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shlomi</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2109</link>
		<dc:creator>shlomi</dc:creator>
		<pubDate>Wed, 27 May 2009 19:29:45 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2109</guid>
		<description>@Michael,

I disagree. The &#039;229512&#039; rows reported in first version&#039;s subquery does not really hold. That&#039;s what&#039;s peculiar about EXPLAIN: it does not (maybe can not?) take the LIMIT into consideration. I assure you (by testing it and seeing the performance, that is) that the number reported is not actually the number of rows to be scanned in this plan. The real number is 1.

Please notice that this is regardless of the subquery: see the EXPLAIN plan discussion within my post.

The second step involved is really nothing, since it only needs to scan a table of a single row. So no real impact on memory or CPU here.

Whereas in the second version you&#039;ve provided, and as I explained in my post, 229512 may actually be a reasonable estimation for number of rows to scan.

Regards</description>
		<content:encoded><![CDATA[<p>@Michael,</p>
<p>I disagree. The '229512' rows reported in first version's subquery does not really hold. That's what's peculiar about EXPLAIN: it does not (maybe can not?) take the LIMIT into consideration. I assure you (by testing it and seeing the performance, that is) that the number reported is not actually the number of rows to be scanned in this plan. The real number is 1.</p>
<p>Please notice that this is regardless of the subquery: see the EXPLAIN plan discussion within my post.</p>
<p>The second step involved is really nothing, since it only needs to scan a table of a single row. So no real impact on memory or CPU here.</p>
<p>Whereas in the second version you've provided, and as I explained in my post, 229512 may actually be a reasonable estimation for number of rows to scan.</p>
<p>Regards</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2108</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Wed, 27 May 2009 17:15:12 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2108</guid>
		<description>@jason

Unfortunately, that expression is worse.  The sub-select will  grab ALL rows above the starting IP, create a derived table, and then scan through that derived table which *does not* have an index. The lower the IP number is, the more rows that will have been dumped into the derived table.

Under MySQL, derived tables do not inherit any indexes from the original table, or tables, it was created from. It would be awesome if it did, but it doesn&#039;t.

The MySQL optimizer ain&#039;t too happy about it:

explain SELECT * FROM (select * from geo_ip_blocks WHERE 412306898 &gt;= ip_begin ORDER BY ip_begin limit 1) AS T where 412306898 &lt;= ip_end\G


*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE noticed after reading const tables
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: geo_ip_blocks
         type: range
possible_keys: ip_begin,index_on_ip_begin,classb_begin
          key: ip_begin
      key_len: 4
          ref: NULL
         rows: 229512
        Extra: Using where

Now look at the following SQL which is what we all naturally want to try first when attacking this problem:

explain SELECT * FROM geo_ip_blocks WHERE 412306898 BETWEEN  ip_begin AND  ip_end\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: geo_ip_blocks
         type: range
possible_keys: ip_begin,ip_end,index_on_ip_begin,classb_begin
          key: ip_begin
      key_len: 4
          ref: NULL
         rows: 229512
        Extra: Using where

In both cases the estimated number of rows to scan is the same: 229512. The first SQL statement is going to be worse since there&#039;s a second step involved with having to create a derived table.

The second SQL only has one action involved, which is exactly the same as the one in the first SQL.</description>
		<content:encoded><![CDATA[<p>@jason</p>
<p>Unfortunately, that expression is worse.  The sub-select will  grab ALL rows above the starting IP, create a derived table, and then scan through that derived table which *does not* have an index. The lower the IP number is, the more rows that will have been dumped into the derived table.</p>
<p>Under MySQL, derived tables do not inherit any indexes from the original table, or tables, it was created from. It would be awesome if it did, but it doesn't.</p>
<p>The MySQL optimizer ain't too happy about it:</p>
<p>explain SELECT * FROM (select * from geo_ip_blocks WHERE 412306898 &gt;= ip_begin ORDER BY ip_begin limit 1) AS T where 412306898 &lt;= ip_end\G</p>
<p>*************************** 1. row ***************************<br />
           id: 1<br />
  select_type: PRIMARY<br />
        table: NULL<br />
         type: NULL<br />
possible_keys: NULL<br />
          key: NULL<br />
      key_len: NULL<br />
          ref: NULL<br />
         rows: NULL<br />
        Extra: Impossible WHERE noticed after reading const tables<br />
*************************** 2. row ***************************<br />
           id: 2<br />
  select_type: DERIVED<br />
        table: geo_ip_blocks<br />
         type: range<br />
possible_keys: ip_begin,index_on_ip_begin,classb_begin<br />
          key: ip_begin<br />
      key_len: 4<br />
          ref: NULL<br />
         rows: 229512<br />
        Extra: Using where</p>
<p>Now look at the following SQL which is what we all naturally want to try first when attacking this problem:</p>
<p>explain SELECT * FROM geo_ip_blocks WHERE 412306898 BETWEEN  ip_begin AND  ip_end\G<br />
*************************** 1. row ***************************<br />
           id: 1<br />
  select_type: SIMPLE<br />
        table: geo_ip_blocks<br />
         type: range<br />
possible_keys: ip_begin,ip_end,index_on_ip_begin,classb_begin<br />
          key: ip_begin<br />
      key_len: 4<br />
          ref: NULL<br />
         rows: 229512<br />
        Extra: Using where</p>
<p>In both cases the estimated number of rows to scan is the same: 229512. The first SQL statement is going to be worse since there's a second step involved with having to create a derived table.</p>
<p>The second SQL only has one action involved, which is exactly the same as the one in the first SQL.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shlomi</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2101</link>
		<dc:creator>shlomi</dc:creator>
		<pubDate>Wed, 27 May 2009 08:40:00 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2101</guid>
		<description>@Jason,

This seems right to me, and was what I meant... :)</description>
		<content:encoded><![CDATA[<p>@Jason,</p>
<p>This seems right to me, and was what I meant... <img src='http://code.openark.org/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Stubbs</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2100</link>
		<dc:creator>Jason Stubbs</dc:creator>
		<pubDate>Wed, 27 May 2009 06:52:10 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2100</guid>
		<description>I didn&#039;t consider the case of a no-match when I attacked this problem myself way back when. Although the SQL starts to get a little ugly, using a sub-query should fix the performance hit there too.

SELECT * FROM (
    SELECT * FROM regions_ip_range
    WHERE my_ip &gt;= start_ip
    ORDER BY start_ip LIMIT 1
) AS t
WHERE my_ip &lt;= end_ip;

I haven&#039;t tested this performance-wise but it seems right theoretically... Either way, your spot on both of your &quot;wrong ways&quot;. :)</description>
		<content:encoded><![CDATA[<p>I didn't consider the case of a no-match when I attacked this problem myself way back when. Although the SQL starts to get a little ugly, using a sub-query should fix the performance hit there too.</p>
<p>SELECT * FROM (<br />
    SELECT * FROM regions_ip_range<br />
    WHERE my_ip &gt;= start_ip<br />
    ORDER BY start_ip LIMIT 1<br />
) AS t<br />
WHERE my_ip &lt;= end_ip;</p>
<p>I haven't tested this performance-wise but it seems right theoretically... Either way, your spot on both of your "wrong ways". <img src='http://code.openark.org/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shlomi</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2099</link>
		<dc:creator>shlomi</dc:creator>
		<pubDate>Wed, 27 May 2009 06:23:08 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2099</guid>
		<description>@Michael,
very nice!
Out of curiosity: is there no way a range consists of more than one classB?</description>
		<content:encoded><![CDATA[<p>@Michael,<br />
very nice!<br />
Out of curiosity: is there no way a range consists of more than one classB?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shlomi</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2098</link>
		<dc:creator>shlomi</dc:creator>
		<pubDate>Wed, 27 May 2009 06:20:39 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2098</guid>
		<description>@PB - #9
suspiciously identical comment to Jason #3??
Anyway, see my comment, #4</description>
		<content:encoded><![CDATA[<p>@PB - #9<br />
suspiciously identical comment to Jason #3??<br />
Anyway, see my comment, #4</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/comment-page-1#comment-2097</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Wed, 27 May 2009 06:14:44 +0000</pubDate>
		<guid isPermaLink="false">http://code.openark.org/blog/?p=705#comment-2097</guid>
		<description>FYI - the geo_ip_blocks table has 4M rows.</description>
		<content:encoded><![CDATA[<p>FYI - the geo_ip_blocks table has 4M rows.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

