Generating Google line charts with SQL, part II

This post continues Generating Google line charts with SQL, part I, in pursue of generating time series based image charts.

We ended last post with the following chart:


http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL

which has a nice curve, and a proper y-legend, but incorrect x-legend and no ticks nor grids.

To date, Google Image Charts do not support time-series charts. We can’t just throw timestamp values and expect the chart to properly position them. We need to work these by hand.

This is not easily done; if our input consists of evenly spread timestamp values, we are in a reasonable position. If not, what do we do?

There are several solutions to this:

  • We can present whatever points we have on the chart, making sure to position them properly. This makes for an uneven distribution of ticks on the x-axis, and is not pleasant to watch.
  • We can extrapolate values for round hours (or otherwise round timestamp resolutions), and so show evenly spread timestamps. I don’t like this solution one bit, since we’re essentially inventing values here. Extrapolation is nice when you know you have nice curves, but not when you’re doing database monitoring, for example. You must have the precise values.
  • We can do oversampling, then group together several measurements within round timestamp resolutions. For example, we can make a measurement every 2 minutes, yet present only 6 measurements per hour, each averaging up 10 round minutes. This is the approach I take with mycheckpoint.

The latest approach goes even beyond that: what if we missed 30 minutes of sampling? Say the server was down. We then need to “invent” the missing timestamps. Note that we invent the timestamps, we do not invent values. We must present the chart with missing values on our invented timestamps.

I may show how to do this in a future post. Meanwhile, let’s simplify and assume our values are evenly spread.

Sample data

We use google_charts.sql. Note that the timestamp values provided in Part I of this post is skewed, so make sure to use this file.

x-axis values

We use chxl to present with x-axis values. We may be tempted to just list all values. Would that work?

Sadly, no, for two reasons:

  1. Google is not smart enough; whatever we throw at it, it will try to present. So, if we have 288 rows, that’s 288 x-axis values. Not enough room, to be sure! Smarter implementations would automatically hide some values, so as only to present with non-overlapping values.
  2. Our URL will turn out to be too long. Remember: 2048 characters is our maximum limit for GET request!

Also, we must format our timestamp to be of minimal width. In our example, we have a 24 hour range. We therefore present timestamps in hh:MM format. So, a naive approach would be to:

SELECT
  CONCAT(
    'http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,',
    ROUND(min_value, 1), ',',
    ROUND(max_value, 1),
    '&chd=s:',
    GROUP_CONCAT(
      IF(
        data IS NULL,
        '_',
        SUBSTRING(
          'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
          1+61*(data - min_value)/(max_value - min_value),
          1
        )
      )
      SEPARATOR ''
    ),
   '&chxl=0:|',
    GROUP_CONCAT(
      DATE_FORMAT(ts, '%H:%i')
      SEPARATOR '|'
    )
  ) FROM chart_data, chart_data_minmax

The resulting URL is just too long.

Solution? Let’s only consider round hour timestamps! Our next attempt looks like this (we also throw in chxs, to show ticks):

SELECT
  CONCAT(
    'http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,',
    ROUND(min_value, 1), ',',
    ROUND(max_value, 1),
    '&chd=s:',
    GROUP_CONCAT(
      IF(
        data IS NULL,
        '_',
        SUBSTRING(
          'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
          1+61*(data - min_value)/(max_value - min_value),
          1
        )
      )
      SEPARATOR ''
    ),
   '&chxs=0,505050,10,0,lt',
   '&chxl=0:|',
    GROUP_CONCAT(
      IF(
        MINUTE(ts) = 0,
        DATE_FORMAT(ts, '%H:%i'),
        NULL
      )
      SEPARATOR '|'
    )
  ) FROM chart_data, chart_data_minmax

and results with:


http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL&chxs=0,505050,10,0,lt&chxl=0:|00:00|01:00|02:00|03:00|04:00|05:00|06:00|07:00|08:00|09:00|10:00|11:00|12:00|13:00|14:00|15:00|16:00|17:00|18:00|19:00|20:00|21:00|22:00|23:00

Too messy, isn’t it?

A word about ticks

You would think: OK, then, let’s just present every 4 round hours timestamps. But there’s a catch: a tick will show only when there’s an x-axis value. It’s nice to have a tick for every hour, but we only want to present values every 4 hours.

Fortunately, we can provide with an unseen value: a space (URL encoded as ‘+‘). So we complicate things up a bit on the chxl to read:

SELECT
  CONCAT(
    'http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,',
    ROUND(min_value, 1), ',',
    ROUND(max_value, 1),
    '&chd=s:',
    GROUP_CONCAT(
      IF(
        data IS NULL,
        '_',
        SUBSTRING(
          'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
          1+61*(data - min_value)/(max_value - min_value),
          1
        )
      )
      SEPARATOR ''
    ),
   '&chxs=0,505050,10,0,lt',
   '&chxl=0:|',
    GROUP_CONCAT(
      IF(
        MINUTE(ts) = 0,
        IF(
          HOUR(ts) MOD 4 = 0,
          DATE_FORMAT(ts, '%H:%i'),
          '+'
        ),
        NULL
      )
      SEPARATOR '|'
    )
  ) FROM chart_data, chart_data_minmax

and get:


http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL&chxs=0,505050,10,0,lt&chxl=0:|00:00|+|+|+|04:00|+|+|+|08:00|+|+|+|12:00|+|+|+|16:00|+|+|+|20:00|+|+|+

OK, I cheated

Who says sample data starts with a round hour? We have that hidden assumption here, since the first tick is necessarily a round hour in our code. Yet our data may start at 12:35, for example. Sorry, you’ll have to dig into mycheckpoint’s source code to see a thorough solution. It’s just too much for this post.

Grids

Let’s wrap this up with grids. Grids work by specifying the step size (in percent of overall height/width) and initial offset (again, in percent).

Wouldn’t it be nicer if grids were automatically attached to ticks? I mean, REALLY! What were those guys thinking? (I know, they’re doing great work. Keep it up!)

Problem is, I have no idea how Google chooses to distribute values on the y-axis. I don’t know where y-axis ticks will be placed. So on y-axis, I just choose to split charts to 4 even parts, and draw horizontal grids between them. Percent is 25 (100/4), offset is 0.

But I do have control over the x-axis. In our case, I know how many ticks we’ll be having. Plus, I made life easier by assuming we start with a round hour, so no offset is required.

Umm… How many ticks do we have? Easy: the number of round hours. This can be calculated by: SUM(MINUTE(ts) = 0. Actually, we need to take 1 off.

We now build the chg parameter:

SELECT
  CONCAT(
    'http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,',
    ROUND(min_value, 1), ',',
    ROUND(max_value, 1),
    '&chd=s:',
    GROUP_CONCAT(
      IF(
        data IS NULL,
        '_',
        SUBSTRING(
          'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
          1+61*(data - min_value)/(max_value - min_value),
          1
        )
      )
      SEPARATOR ''
    ),
   '&chxs=0,505050,10,0,lt',
   '&chxl=0:|',
    GROUP_CONCAT(
      IF(
        MINUTE(ts) = 0,
        IF(
          HOUR(ts) MOD 4 = 0,
          DATE_FORMAT(ts, '%H:%i'),
          '+'
        ),
        NULL
      )
      SEPARATOR '|'
    ),
   '&chg=', ROUND(100.0/((SUM(MINUTE(ts) = 0) -1)), 2), ',25,1,2,0,0'
  ) FROM chart_data, chart_data_minmax

and get:


http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL&chxs=0,505050,10,0,lt&chxl=0:|00:00|+|+|+|04:00|+|+|+|08:00|+|+|+|12:00|+|+|+|16:00|+|+|+|20:00|+|+|+&chg=4.35,25,1,2,0,0

Phew!

Conclusion

So we haven’t worked on offsets. And, this is a single line chart. What about multiple lines? Legend? The following chart:


is harder to achieve. I’m leaving this up to you!

2 thoughts on “Generating Google line charts with SQL, part II

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.