I've followed with interest on Baron's Why don’t our new Nagios plugins use caching? and Sheeri's Caching for Monitoring: Timing is Everything. I wish to present my take on this, from mycheckpoint's point of view.
So mycheckpoint works in a completely different way. On one hand, it doesn't bother with caching. On the other hand, it doesn't bother with re-reads of data.
There are no staleness issues, the data is consistent as it can get (you can never get a completely atomic read of everything in MySQL), and you can issue as many calculations as you want at the price of one take of monitoring. As in Sheere's example, you can run Threads_connected/max_connections*100, mix status variables, system variables, meta-variables (e.g. Seconds_behind_master), user-created variables (e.g. number of purchases in your online shop) etc.
mycheckpoint's concept is to store data. And store it in relational format. That is, INSERT it to a table.
A sample-run generates a row, which lists all status, server, OS, user, meta variables. It's a huge row, with hundreds of columns. Columns like threads_connected, max_connections, innodb_buffer_pool_size, seconds_behind_master, etc.
mycheckpoint hardly cares about these columns. It identifies them dynamically. Have you just upgraded to MySQL 5.5? Oh, there's a new bunch of server and status variables? No problem, mycheckpoint will notice it doesn't have the matching columns and will add them via ALTER TABLE. There you go, now we have a place to store them.
Running a formula like Threads_connected/max_connections*100 is as easy as issuing the following query:
SELECT Threads_connected/max_connections*100 FROM status_variables WHERE id = ...
Hmmm. This means I can run this formula on the most recent row I've just added. But wait, this also means I can run this formula on any row I've ever gathered.
With mycheckpoint you can generate graphs retroactively using new formulas. The data is there, vanilla style. Any formula which can be calculated via SQL is good to go with. Plus, you get the benefit of cross referencing in fun ways: cross reference to the timestamp at which the sample was taken (so, for example, ignore the spikes generated at this and that timeframe due to maintenance. Don't alert me on these), to system issues like load average or CPU usage (show me the average Seconds_behind_master when load average is over 8, or the average load average when slow query rate is over some threshold. You don't do that all the time, but when you need it, well, you can get all the insight you ever wanted.
Actually storing the monitored data in an easy to access format allows one to query, re-query, re-formulate. No worries about caching, you only sample once.
For completeness, all the above is relevant when the data is of numeric types. Other types are far more complicated to manage (the list of running queries is a common example).