Working on mycheckpoint, I have the intention of adding custom monitoring. That is, letting the user define things to monitor. I have my own thoughts, I would be grateful to get more input!
What would the user want to monitor?
Monitoring for the number of SELECT statements per second, InnoDB locks, slave replication lag etc. is very important, and monitoring utilities provide with this information. But what does that tell the end user? Not much.
The experienced DBA may gain a lot. The user would be more interested in completely other kind of information. In between, some information is relevant to both.
Say we were managing an on-line store. We want to monitor the health of the database. But the health of the database is inseparable from the health of the application. I mean, having little to no disk usage is fine, unless... something is wrong with the application, which leads to no new purchases.
And so a user would be interested in monitoring the number of purchases per hour, or the time passed since last successful purchase. This kind of data can only be generated by a user's specific query. Looking at the charts, the user would then feel safer and confident in the wellness of his store app.
But let's dig further. We want the store's website to provide with good response. In particular, the query which returns the items in a customer's cart must react quickly. Our user would not only want to see that purchases get along, but also that page load times (as in our example) are quick for those critical parts. And so a user should be able to monitor the time it took to execute a given query.
It can be of further interest to know how many times per second a given query is executed. This part is not easily done on the server side, and requires the user's cooperation (or else we must analyze the general log, sniff, or set up a proxy). If the user is willing, she can log to some table each time she executes a certain query. Then we're back to monitoring a regular table, as with the first example.
It is also possible to monitor for a query's execution plan. Is it full scan? How many rows are expected? But given that we can monitor the time it took to execute a query, I'm not sure this is useful. If everything runs fast enough -- who cares about how it executes?
Some of the above can be monitored on an altogether higher level: if we're talking about some web application, then we can use our Apache logs to determine load time for pages, or number of requests to our "cart items" page. But not always do we work with web servers, and we may be interested in checking the specific queries behind the scenes.
Custom monitoring can include:
- User defined queries (number of concurrent visitors; count of successful operations per second; number of rows per given table or condition; ...)
- Execution time for user defined queries (time it takes to return cart items; find rows matching condition; sort a table; ...)
- Number of executions for a given query, per second.
I intend to incorporate the above into mycheckpoint as part of its standard monitoring scheme.
Please share your thought below.