Generic, auto scaling, scaled SQL graphs

In Rotating SQL graphs horizontally, I have shown how to rotate an ASCII SQL graph into horizontal position.

I was dissatisfied with some parts of the solution, which I will show now how to fix:

  • I had to manually scale the graph values so as to fit nicely into screen.
  • I had to rely on hard coded scaling schemes.
  • I had to rely on hard coded column names.
  • I had no y-axis legend.

I will now present an SQL query which allows for pluggable queries, which creates self, auto scaling graphs, along with y-axis scales.

Using deeply nested subqueries, we will evolve a simple SELECT query into an elaborate graph. I will present the many steps required, followed by explanations and sample results. But in the end – the steps are unimportant. I’ll present a generic query, into which your own SELECT can be embedded, and which will provide you with the graph.

We’ll use the same example, found in graph.sql.

Continue reading » “Generic, auto scaling, scaled SQL graphs”

Rotating SQL graphs horizontally

We all love graphs. We all love SQL hacks. We all know the SQL hack which displays a character-based graph (example follows for those unfamiliar).

But we all love horizontal graphs, not vertical ones. We are used to the X axis being horizontal, Y being vertical. Not vice versa.

In this post I’ll present a SQL hack which rotates a vertical graph to horizontal. In fact, the technique shown will rotate any ‘textual image’; but graphs are a nice example.

A vertical graph example

What’s prettier than a sinus curve? I have prepared a simple table that will serve nicely, and can be found in graph_tables.sql.

Continue reading » “Rotating SQL graphs horizontally”

Unwalking a string with GROUP_CONCAT

“Walking a string” is an SQL technique to convert a single value into multiple rows result set. For example, walking the string ‘hello’ results with 5 rows, each of which contains a single character from the text.

I’ll present a brief example of walking a string, and then show how to “unwalk” the string: do the reverse operation.

To walk a string, an integers table is required (or this could be a good use for SeqEngine): Continue reading » “Unwalking a string with GROUP_CONCAT”

SQL: finding a user’s country/region based on IP

I’ve encountered the same problem twice for different customers, so I guess it’s worth a discussion.

A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates for different time zones, can change locale settings, and, perhaps most commonly, show advertisements in her native language.

To start with, there’s a table which lists the IP ranges per country/region. Let’s assume we’re only dealing with IPv4:

CREATE TABLE regions_ip_range (
  regions_ip_range_id INT UNSIGNED AUTO_INCREMENT,
  country VARCHAR(64) CHARSET utf8,
  region VARCHAR(64) CHARSET utf8,
  start_ip INT UNSIGNED,
  end_ip INT UNSIGNED,
  …
  PRIMARY KEY(regions_ip_range_id),
  ...
);

The table is fixed, and is populated. Now the question arises: how do we query this table, and which indexes should be created?

The wrong way

The form I’ve encountered is as follows: an index is declared on regions_ip_range:

KEY ip_range_idx (start_ip, end_ip)

And the query goes like this:

SELECT * FROM regions_ip_range
WHERE my_ip BETWEEN start_ip AND end_ip

Continue reading » “SQL: finding a user’s country/region based on IP”

7 ways to convince MySQL to use the right index

Sometimes MySQL gets it wrong. It doesn’t use the right index.

It happens that MySQL generates a query plan which is really bad (EXPLAIN says it’s going to explore some 10,000,000 rows), when another plan (soon to show how was generated) says: “Sure, I can do that with 100 rows using a key”.

A true story

A customer had issues with his database. Queries were taking 15 minutes to complete, and the db in general was not responsive. Looking at the slow query log, I found the criminal query. Allow me to bring you up to speed:

A table is defined like this:

CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT,
  type INT UNSIGNED,
  level TINYINT unsigned,
  ...
  PRIMARY KEY(id),
  KEY `type` (type)
) ENGINE=InnoDB;

The offending query was this:

SELECT id FROM data
WHERE type=12345 AND level > 3
ORDER BY id

The facts were:

  • `t` has about 10,000,000 rows.
  • The index on `type` is selective: about 100 rows per value on average.
  • The query took a long time to complete.
  • EXPLAIN has shown that MySQL uses the PRIMARY KEY, hence searches 10,000,000 rows, filtered “using where”.
  • The other EXPLAIN has shown that by using the `type` key, only 110 rows are expected, to be filtered “using where”, then sorted “using filesort”

So MySQL acknowledged it was generating the wrong plan. The other plan was better by its own standards.

Solving the problem

Let’s walk through 7 ways to solve the problem, starting with the more aggressive solutions, refining to achieve desired behavior through subtle changes. Continue reading » “7 ways to convince MySQL to use the right index”

Using triggers to block malicious code: an example

Web applications face constant exploitation attempts. Those with a user base must keep their users’ private data, well… private.

While the MySQL security model allows restricting users access to databases, tables and even columns, it has no built in feature for restricting the rows access within the given table.

One cannot allow a user to only update rows 0 through 99, but restrict that user from updating rows 100 to 199. Such restrictions are usually managed in the application level, by adding a necessary “… AND filtering_column = some_value…”

Many web application have the notion of an ‘admin’ account, or several such accounts, which provide greater control over the application. The ‘admin’ account is one account to which many attacks are targeted. One such attack is an attempt to modify the admin’s password, such that the attacker can later log in with and access restricted data. Continue reading » “Using triggers to block malicious code: an example”

MySQL’s character sets and collations demystified

MySQL’s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.

This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets. Continue reading » “MySQL’s character sets and collations demystified”

Dynamic sequencing with a single query

It is a known trick to use a session variables for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.

For example, the following query lists the top 10 populated countries, using MySQL’s world database:

SELECT Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+--------------------+------------+
| Code | Name               | Population |
+------+--------------------+------------+
| CHN  | China              | 1277558000 |
| IND  | India              | 1013662000 |
| USA  | United States      |  278357000 |
| IDN  | Indonesia          |  212107000 |
| BRA  | Brazil             |  170115000 |
| PAK  | Pakistan           |  156483000 |
| RUS  | Russian Federation |  146934000 |
| BGD  | Bangladesh         |  129155000 |
| JPN  | Japan              |  126714000 |
| NGA  | Nigeria            |  111506000 |
+------+--------------------+------------+

The results do not provide any sequence number. Nor does the table have an AUTO_INCREMENT or otherwise unique row number. If I were to rate the countries by population, the common trick is:

SET @rank := 0;
SELECT
  @rank := @rank+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query sets the @rank to zero, so that it is not NULL (since no arithmetic can be done with NULL).  The second query relies on its success.

Can the same be achieved with one query only? That’s more of a problem. Continue reading » “Dynamic sequencing with a single query”

Selecting a specific non aggregated column data in GROUP BY

In a GROUP BY query, MySQL may allow specifying non aggregated columns. For example, using MySQL’s world database, I wish to get the number of countries per continent, along with a “sample” country:

SELECT Continent, COUNT(*), Name
FROM `Country` GROUP BY Continent

+---------------+----------+----------------+
| Continent     | COUNT(*) | Name           |
+---------------+----------+----------------+
| Asia          |       51 | Afghanistan    |
| Europe        |       46 | Albania        |
| North America |       37 | Aruba          |
| Africa        |       58 | Angola         |
| Oceania       |       28 | American Samoa |
| Antarctica    |        5 | Antarctica     |
| South America |       14 | Argentina      |
+---------------+----------+----------------+

What if I want to choose that “sample” country? For example, for each continent, I wish to show the country with the largest population. To simply see the largest population, I would use MAX(Population). But which country is referred? I wish to provide a solution which does not involve sub-queries, HAVING or JOINs. Continue reading » “Selecting a specific non aggregated column data in GROUP BY”

Useful database analysis queries with INFORMATION_SCHEMA

A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.

I will present queries for: