MySQL monitoring: storing, not caching

I’ve followed with interest on Baron’s Why don’t our new Nagios plugins use caching? and Sheeri’s Caching for Monitoring: Timing is Everything. I wish to present my take on this, from mycheckpoint‘s point of view.

So mycheckpoint works in a completely different way. On one hand, it doesn’t bother with caching. On the other hand, it doesn’t bother with re-reads of data.

There are no staleness issues, the data is consistent as it can get (you can never get a completely atomic read of everything in MySQL), and you can issue as many calculations as you want at the price of one take of monitoring. As in Sheere’s example, you can run Threads_connected/max_connections*100, mix status variables, system variables, meta-variables (e.g. Seconds_behind_master), user-created variables (e.g. number of purchases in your online shop) etc.

mycheckpoint‘s concept is to store data. And store it in relational format. That is, INSERT it to a table.

A sample-run generates a row, which lists all status, server, OS, user, meta variables. It’s a huge row, with hundreds of columns. Columns like threads_connected, max_connections, innodb_buffer_pool_size, seconds_behind_master, etc.

mycheckpoint hardly cares about these columns. It identifies them dynamically. Have you just upgraded to MySQL 5.5? Oh, there’s a new bunch of server and status variables? No problem, mycheckpoint will notice it doesn’t have the matching columns and will add them via ALTER TABLE. There you go, now we have a place to store them.

Running a formula like Threads_connected/max_connections*100 is as easy as issuing the following query:

SELECT Threads_connected/max_connections*100 FROM status_variables WHERE id = ...

Hmmm. This means I can run this formula on the most recent row I’ve just added. But wait, this also means I can run this formula on any row I’ve ever gathered.

With mycheckpoint you can generate graphs retroactively using new formulas. The data is there, vanilla style. Any formula which can be calculated via SQL is good to go with. Plus, you get the benefit of cross referencing in fun ways: cross reference to the timestamp at which the sample was taken (so, for example, ignore the spikes generated at this and that timeframe due to maintenance. Don’t alert me on these), to system issues like load average or CPU usage (show me the average Seconds_behind_master when load average is over 8, or the average load average when slow query rate is over some threshold. You don’t do that all the time, but when you need it, well, you can get all the insight you ever wanted.

Actually storing the monitored data in an easy to access format allows one to query, re-query, re-formulate. No worries about caching, you only sample once.

For completeness, all the above is relevant when the data is of numeric types. Other types are far more complicated to manage (the list of running queries is a common example).

8
Leave a Reply

avatar
8 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
5 Comment authors
Rick JamesshlomiJonathan LevinSheeri CabralPeter Laursen Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
Peter Laursen
Guest

And in this respect MONyog and mycheckpoint seem to use same concept as regards storing versus caching. However the schemas are different and the ‘division of labor’ between SQL and javascript is also not exactly the same. Refer: http://www.webyog.com/blog/2009/07/29/monyog-database-schema-explained/ .. (slightly abbreviated) 1) Everytime MONyog sends a statement like SHOW VARIABLES/STATUS of some kind (or fetching a OS metric from Linux /proc folder) one row is INSERTED into [timestamp_master] table with information about current time and the metrics retrieved for each such statement will be INSERTED into [snapshot_master]. The [snapshot_master] table contains the metrics details. 2) And actually we do… Read more »

Sheeri Cabral
Guest

It sounds like mycheckpoint has a very wide table – one column for each status variable? There are over 300 of those….and another 300+ system variables. I think it makes sense that 3 tools that do different things approach the problem in 3 different ways: – The Percona Nagios plugin set is not concerned with caching, because it does not reuse any of the information it retrieves, it only uses the information once. – The PalominoDB Nagios plugin caches because the same plugin can be used over and over to check different calculations of variables, and is designed so that… Read more »

Jonathan Levin
Guest

I use mycheckpoint and its brilliant.
I set it up to send me emails with reports about the servers I monitor instead of me logging into trending tools (I’m lazy that way).

I have already been using it with the user_statistics patch to see trends in some of my top tables.
I guess I could also monitor mycheckpoint data itself to get more info from it, should I choose to.

Sheeri Cabral
Guest

Shlomi – ah, that’s good! One day I’ll get around to implementing your tools in my environment (now that I HAVE one that I’m in charge of at Mozilla! Consulting is great but it means you never really own anything and research isn’t always easy to do)….for now it’s back to basics for me, I’m still figuring out which machines I’m responsible for, and it’s not a small amount!

Rick James
Guest

300 columns today, 305 tomorrow when you upgrade MySQL. Are you ready to generate the 5 ALTER TABLE ADD COLUMN clauses? A couple of years ago, I looked at that and said yuck to that. I looked at key-value stores, but that performs terribly at scale. Instead, I stuffed the values into a JSON structure in a single column. Then the scripts can either do a json_decode, or a fairly simple regexp to reach into the JSON. So, I currently have a provide the expression, the host(s) desired, timeframe, etc, and PHP does all the grunt work. It performs quite… Read more »