Generating Google line charts with SQL, part II

This post continues Generating Google line charts with SQL, part I, in pursue of generating time series based image charts.

We ended last post with the following chart:


http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL

which has a nice curve, and a proper y-legend, but incorrect x-legend and no ticks nor grids.

To date, Google Image Charts do not support time-series charts. We can’t just throw timestamp values and expect the chart to properly position them. We need to work these by hand.

This is not easily done; if our input consists of evenly spread timestamp values, we are in a reasonable position. If not, what do we do?

There are several solutions to this:

  • We can present whatever points we have on the chart, making sure to position them properly. This makes for an uneven distribution of ticks on the x-axis, and is not pleasant to watch.
  • We can extrapolate values for round hours (or otherwise round timestamp resolutions), and so show evenly spread timestamps. I don’t like this solution one bit, since we’re essentially inventing values here. Extrapolation is nice when you know you have nice curves, but not when you’re doing database monitoring, for example. You must have the precise values.
  • We can do oversampling, then group together several measurements within round timestamp resolutions. For example, we can make a measurement every 2 minutes, yet present only 6 measurements per hour, each averaging up 10 round minutes. This is the approach I take with mycheckpoint.

The latest approach goes even beyond that: what if we missed 30 minutes of sampling? Say the server was down. We then need to “invent” the missing timestamps. Note that we invent the timestamps, we do not invent values. We must present the chart with missing values on our invented timestamps.

I may show how to do this in a future post. Meanwhile, let’s simplify and assume our values are evenly spread. Continue reading » “Generating Google line charts with SQL, part II”

Generating Google line charts with SQL, part I

In this series of posts I wish to show how Google Charts can be generated via SQL. We discuss the Google Charts limitations which must be challenged, and work towards a simple chart.

I’m going to present the algorithm I use in mycheckpoint, a MySQL monitoring utility, which generates Google charts by raw data using views. An example of such chart follows:


http://chart.apis.google.com/chart?cht=lc&chs=370x180&chts=303030,12&chtt=Latest+24+hours:+Nov+9,+05:50++-++Nov+10,+05:50&chf=c,s,ffffff&chdl=Rentals+rate:+custom_1_psec&chdlp=b&chco=ff8c00&chd=s:GDGKGFLFGMJHRLMPPNULJRPLTOPRUMYPPVRNbQUSUSbSNWUOfSWTObVSUVWSVYVPbTPjfTbRTdXReUWhcTQRQZbTWYVYPaVZXdYYWPTabYUTbW99QLgLNIOIRNNMIKRJEHGFHGJGGFIFDFGDK&chxt=x,y&chxr=1,0,8.720000&chxl=0:|+||08:00||+||12:00||+||16:00||+||20:00||+||00:00||+||04:00||&chxs=0,505050,10,0,lt&chg=4.17,25,1,2,0.69,0&chxp=0,0.69,4.86,9.03,13.20,17.37,21.54,25.71,29.88,34.05,38.22,42.39,46.56,50.73,54.90,59.07,63.24,67.41,71.58,75.75,79.92,84.09,88.26,92.43,96.60&tsstart=2010-11-09+05:50:00&tsstep=600

mycheckpoint does not actually call on Google to do the chart rendering, but invokes its own JavaScript code to visualize the URL locally.

Here are some downsides for using Google charts:

  • The URL cannot be as long as you like. 2048 characters is an upper bound you’ll want to keep behind. [Google charts POST method calls are available, which leads to 16K equivalent of URL length — this is still not too helpful due to the nature of POST calls]
  • Features are inconsistent. To specify label or tick positions, one must specify exact positions. To specify grid positions, one must supply with step, offset, etc. There are more such inconsistencies.
  • Google charts are not too friendly. Taking the ticks and grids example from above, there really shouldn’t be a reason why grids would not be automatically generated according to ticks definitions. But we are required to specify positions for the ticks as well as for the grids.
  • There is no support for time-series. One must translate time as x-axis values.
  • Perhaps most intimidating to many people: to generate a Google chart, once must send data to Google. Which is the main reason I used local JavaScript rendering.

Anyway, let’s build a very simple chart. Since I will not cover everything in this post, we make for some relaxed conditions. Continue reading » “Generating Google line charts with SQL, part I”

Multi condition UPDATE query

A simple question I’ve been asked:

Is it possible to merge two UPDATE queries, each on different WHERE conditions, into a single query?

For example, is it possible to merge the following two UPDATE statements into one?

mysql> UPDATE film SET rental_duration=rental_duration+1 WHERE rating = 'G';
Query OK, 178 rows affected (0.01 sec)

mysql> UPDATE film SET rental_rate=rental_rate-0.5 WHERE length < 90;
Query OK, 320 rows affected (0.01 sec)

To verify our tests, we take a checksum:

mysql> pager md5sum
PAGER set to 'md5sum'
mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id;
c2d253c3919efaa6d11487b1fd5061f3  -

Obviously, the following query is incorrect: Continue reading » “Multi condition UPDATE query”

Another use for “top N records per group” query

A few days ago I published SQL: selecting top N records per group. It just dawned on me that the very same query solved another type of problem I was having a couple years ago.

BTW, for reference, Baron Schwartz posted this 4 years ago. There are very interesting approaches in text and in comments. Good to see the ancients already knew of such problems, I should study my history better.

(Kidding, kidding! This is self criticism of course)

In this case I present now I have a table with recurring data, which, to some extent, represents revision of data. For the sake of simplicity let’s describe it as a simple simulation of a revision system:

  • I have text files, whose content I store within a row
  • Each text file uses at least one row in the table
  • Text files can be edited, whereas an edited file is written in a new row (never UPDATEd).

Hold your horses: I’m not really implementing a revision system, I just can’t get into the actual details here.

The table becomes large quickly, and it’s desired to purge rows from the table. The rule is: we must obtain the most recent two versions for each file (if there are indeed more than one). All others can be purged. So the question is: Continue reading » “Another use for “top N records per group” query”

SQL: selecting top N records per group

A while back I presented(*) an SQL trick to present with non-aggregated column on a GROUP BY query, without use of subquery or derived tables.

Based on a similar concept, combined with string walking, I now present a query which selects top-n records for each group, ordered by some condition. It will require no subqueries. It executes faster than its more conventional alternatives.

[UPDATE: this is MySQL only. Others can use Window Functions where available]

Using the simple world database, we answer the following question:

What are the top 5 largest (by area) countries for each continent? What are their names, surface area and population?

Similar questions would be:

What were the latest 5 films rented by each customer?

What were the most presented advertisements for each user?

etc.

Step 1: getting the top

We already know how to get a single column’s value for the top country, as presented in the aforementioned post: Continue reading » “SQL: selecting top N records per group”

SQL trick: overcoming GROUP_CONCAT limitation in special cases

In Verifying GROUP_CONCAT limit without using variables, I have presented a test to verify if group_concat_max_len is sufficient for known limitations. I will follow the path where I assume I cannot control group_concat_max_len, not even in session scope, and show an SQL solution, dirty as it is, to overcome the GROUP_CONCAT limitation, under certain conditions.

Sheeri rightfully asks why I wouldn’t just set group_concat_max_len in session scope. The particular case I have is that I’m providing a VIEW definition. I’d like users to “install” that view, i.e. to CREATE it on their database. The VIEW does some logic, and uses GROUP_CONCAT to implement that logic.

Now, I have no control on the DBA or developer who created the view. The creation of the view has nothing to do with the group_concat_max_len setting on her database instance.

An example

OK, apologies aside. Using the sakila database, I execute:

mysql> SELECT GROUP_CONCAT(last_name) FROM actor \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): AKROYD,AKROYD,AKROYD,ALLEN,ALLEN,ALLEN,ASTAIRE,BACALL,BAILEY,BAILEY,BALE,BALL,BARRYMORE,BASINGER,BENING,BENING,BERGEN,BERGMAN,BERRY,BERRY,BERRY,BIRCH,BLOOM,BOLGER,BOLGER,BRIDGES,BRODY,BRODY,BULLOCK,CAGE,CAGE,CARREY,CHAPLIN,CHASE,CHASE,CLOSE,COSTNER,CRAWFORD,CRAWFORD,CRONYN,CRONYN,CROWE,CRUISE,CRUZ,DAMON,DAVIS,DAVIS,DAVIS,DAY-LEWIS,DEAN,DEAN,DEE,DEE,DEGENERES,DEGENERES,DEGENERES,DENCH,DENCH,DEPP,DEPP,DERN,DREYFUSS,DUKAKIS,DUKAKIS,DUNST,FAWCETT,FAWCETT,GABLE,GARLAND,GARLAND,GARLAND,GIBSON,GOLDBERG,GOODING,GOODING,GRANT,GUINESS,GUINESS,GUINESS,HACKMAN,HACKMAN,HARRIS,HARRIS,HARRIS,HAWKE,HESTON,HOFFMAN,HOFFMAN,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPKINS,HOPPER,HOPPER,HUDSON,HUNT,HURT,JACKMAN,JACKMAN,JOHANSSON,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KEITEL,KEITEL,KILMER,KILMER,KILMER,KILMER,KILMER,LEIGH,LOLLOBRIGIDA,MALDEN,MANSFIELD,MARX,MCCONAUGHEY,MCCONAUGHEY,MCDORMAND,MCKELLEN,MCKELLEN,MCQUEEN,MCQUEEN,MIRANDA,MONROE,MONROE,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,NOLTE,NOLTE,NOLTE,NOLTE,OLIVIER,OLIVIER,PALTROW,PALTROW,P
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+---------+------+--------------------------------------+
| Level   | Code | Message                              |
+---------+------+--------------------------------------+
| Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() |
+---------+------+--------------------------------------+
1 row in set (0.00 sec)

Continue reading » “SQL trick: overcoming GROUP_CONCAT limitation in special cases”

Implicit casting you don’t want to see around

In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:

Number-String comparisons

Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:

mysql> SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql> SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
|           0 |
+-------------+

The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.

Direct DATE arithmetics

The first query seems to work, but is completely incorrect. The second explains why. The third is a total mess. Continue reading » “Implicit casting you don’t want to see around”

SQL: good comments conventions

I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.

I list four major places where SQL comments are helpful. I’ll use the sakila database. It is originally scarcely commented; I’ll present it now enhanced with comments, to illustrate.

Table definitions

The CREATE TABLE statement allows for a comment, intended to describe the nature of the table:

CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Reflection of `film`, used for FULLTEXT search.'

It’s too bad the comment’s max length is 60 characters, though. However, it’s a very powerful field.

Column definitions

One may comment particular columns: Continue reading » “SQL: good comments conventions”

SQL: forcing single row tables integrity

Single row tables are used in various cases. Such tables can be used for “preferences” or “settings”; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc.

The problem with single row tables is that, well, they must have s single row. And the question is: how can you force them to have just one row?

The half-baked solution

The common solution is to create a PRIMARY KEY and always use the same value for that key. In addition, using REPLACE or INSERT INTO ON DUPLICATE KEY UPDATE helps out in updating the row. For example:

CREATE TABLE heartbeat (
 id int NOT NULL PRIMARY KEY,
 ts datetime NOT NULL
 );

The above table definition is taken from mk-heartbeat. It should be noted that mk-heartbeat in itself does not require that the table has a single row, so it is not the target of this post. I’m taking the above table definition as a very simple example.

So, we assume we want this table to have a single row, for whatever reasons we have. We would usually do:

REPLACE INTO heartbeat (id, ts) VALUES (1, NOW());

or

INSERT INTO heartbeat (id, ts) VALUES (1, NOW()) ON DUPLICATE KEY UPDATE ts = NOW();

Why is the above a “half baked solution”? Because it is up to the application to make sure it reuses the same PRIMARY KEY value. There is nothing in the database to prevent the following: Continue reading » “SQL: forcing single row tables integrity”