mycheckpoint (rev. 132): custom monitoring, custom charts, process list dump

Revision 132 of mycheckpoint has been released. New and updated in this revision:

  • Custom monitoring: monitoring & charting for user defined queries
  • HTML reports for custom monitoring
  • Process list dump upon alert notifications

Custom monitoring & charts

Custom monitoring allows the user to supply with a query, the results of which will be monitored.

That is, mycheckpoint monitors the status variables, replication status, OS metrics. But it cannot by itself monitor one’s application. Which is why a user may supply with such query as:

SELECT COUNT(*) FROM shopping_cart WHERE is_pending=1

Such a query will tell an online store how many customers are in the midst of shopping. There is no argument that this number is worth monitoring for. Given the above query, mycheckpoint will execute it per sample, and store the query’s result along with all sampled data, to be then aggregated by complex views to answer for:

  • What was the value per given sample?
  • What is the value difference for each sample?
  • What is the change per second, i.e. the rate?

mycheckpoint goes one step forward, and explicity records another metric:

  • How much time did it take to take that sample?

Continue reading » “mycheckpoint (rev. 132): custom monitoring, custom charts, process list dump”

mycheckpoint (Rev. 118): alerts, email notifications and more

Revision 118 of mycheckpoint has been released. New and updated in this revision:

  • Conditional alerts
  • Email notifications
  • Revised HTML reports, including 24/7 reports.
  • Updated documentation

With this new revision mycheckpoint turns into a monitoring solution for MySQL. One can now:

  • Store measure metrics
  • Query for raw, aggregated or digested metrics
  • Generate charts for selected metrics
  • View HTML reports for selecetd metrics
  • Define alerts conditions, query for pending alerts
  • Be notified via email on raised or resolved alerts.

Conditional alerts

mycheckpoint is SQL oriented. As such, it allows for creation of alert conditions, which are nothing more than SQL conditions.

Continue reading » “mycheckpoint (Rev. 118): alerts, email notifications and more”

Things to monitor on MySQL, the user’s perspective

Working on mycheckpoint, I have the intention of adding custom monitoring. That is, letting the user define things to monitor. I have my own thoughts, I would be grateful to get more input!

What would the user want to monitor?

Monitoring for the number of SELECT statements per second, InnoDB locks, slave replication lag etc. is very important, and monitoring utilities provide with this information. But what does that tell the end user? Not much.

The experienced DBA may gain a lot. The user would be more interested in completely other kind of information. In between, some information is relevant to both.

Say we were managing an on-line store. We want to monitor the health of the database. But the health of the database is inseparable from the health of the application. I mean, having little to no disk usage is fine, unless… something is wrong with the application, which leads to no new purchases.

And so a user would be interested in monitoring the number of purchases per hour, or the time passed since last successful purchase. This kind of data can only be generated by a user’s specific query. Looking at the charts, the user would then feel safer and confident in the wellness of his store app.

Continue reading » “Things to monitor on MySQL, the user’s perspective”

Static charts vs. interactive charts

I’m having my usual fun with charts. Working on mycheckpoint, I’ve generated monitoring charts using the Google Chars API. But I’ve also had chance to experiment and deploy interactive charts, JavaScript based. In particular, I used and tweaked dygraphs.

I’d like to note some differences in using charts of both kinds. And I think it makes a very big difference.

Static charts

I’ll call any image-based chart by “static chart”. It’s just a static image. Example of such charts are those generated by Google Image Charts (they now also have new, interactive charts), or RRDtool. Show below is an example of a static chart; in this example, generated by Google: Continue reading » “Static charts vs. interactive charts”

mycheckpoint (rev. 88): mount points monitoring, improved charting, enhanced auto-deploy

Revision #88 of mycheckpoint is released. In this revision:

  • Disk space monitoring
  • Improved charting
  • Enhanced auto-deploy
  • And more…

Disk space monitoring

mycheckpoint now monitors (on Linux only) three mount points:

  1. The “/” (root) mount point
  2. The datadir mount point
  3. The tmpdir mount point

It may well be the case that two of the above (or perhaps all three of them) share the same mount point. For example, if there isn’t any particular partition for “/tmp“, it is possible that the tmpdir (by default “/tmp“) is on the same mount point as “/“. mycheckpoint does not care.

mycheckpoint monitors and reports the mount point’s used percent, in a similar algorithm df uses.

Disk space monitoring is only possible when monitoring the local machine (i.e. mycheckpoint runs on the same machine as the monitored MySQL server). In the future mycheckpoint may also monitor additional mount points, such as the various logs mount points.

Continue reading » “mycheckpoint (rev. 88): mount points monitoring, improved charting, enhanced auto-deploy”

mycheckpoint rev. 76: OS monitoring, auto deploy, brief HTML and 24/7 reports

Revision 76 of mycheckpoint comes with quite a few improvements, including:

  • OS monitoring (CPU, load average, memory)
  • Auto-deploy
  • Improved charting
  • Brief HTML reports
  • 24/7 charts

OS Monitoring

When monitoring the local machine, mycheckpoint now monitors CPU utilization, load average, memory and swap space.

This only applies to the Linux operating system; there is currently no plan to work this out for other operating systems.

Examples:

mysql> SELECT os_cpu_utilization_percent FROM sv_report_chart_sample;

mycheckpoint-chart-cpu-sample
mysql> SELECT ts, os_loadavg FROM mycheckpoint.sv_report_sample;
+---------------------+------------+
| 2009-12-27 11:45:01 |       1.78 |
| 2009-12-27 11:50:01 |       2.48 |
| 2009-12-27 11:55:01 |       2.35 |
...
+---------------------+------------+
mysql> SELECT report FROM mycheckpoint.sv_report_human_sample ORDER BY id DESC LIMIT 1 \G
*************************** 1. row ***************************
report:
Report period: 2009-12-27 13:20:01 to 2009-12-27 13:25:01. Period is 5 minutes (0.08 hours)
Uptime: 100.0% (Up: 334 days, 06:37:28 hours)

OS:
 Load average: 1.67
 CPU utilization: 25.2%
 Memory: 7486.4MB used out of 7985.6484MB (Active: 6685.8906MB)
 Swap: 3835.2MB used out of 8189.3750MB
...

Auto-deploy

mycheckpoint now has a version recognition mechanism. There is no need to call mycheckpoint with the “deploy” argument on first install or after upgrade. mycheckpoint will recognize a change of version and will auto-deploy before moving on to monitoring your system.

Continue reading » “mycheckpoint rev. 76: OS monitoring, auto deploy, brief HTML and 24/7 reports”

New and noteworthy in mycheckpoint (rev. 57)

Rev. 57 of mycheckpoint has been released and is available for download.

New and updated in this revision:

Remote host monitoring

It is now possible to monitor one host, while writing into another. Either or both could be remote hosts:

mycheckpoint --host=localhost --monitored-host=192.168.10.178

The above monitors the MySQL server on 192.168.10.178, and writes down to localhost (to be queried later)

mycheckpoint --monitored-host=127.0.0.1 --host=192.168.10.178

The above monitors the MySQL server on 127.0.0.1, and writes down to 192.168.10.178.

Continue reading » “New and noteworthy in mycheckpoint (rev. 57)”

questions or queries?

I’ve hit a recent change which took me by surprise.

I was used to checking for the ‘questions‘ global status variables to see the total amount of queries the server performs. So, for example, I could run com_select/questions to learn the SELECT ratio out of all queries.

Apparently, as of 5.0.725.0.76 & 5.1.31 this has changed. A new status variable was introduced, called ‘queries‘.

The change being? questions does not any longer indicate the number of queries the server has executed: only the number of queries requested by the client (so, calling on a stored routine only counts as 1, regardless of how many queries the routine executes). The new queries variable indicates the amount of server queries issued (see the 5.0 and 5.1 docs for details).

So, as of 5.0.72 or 5.1.31, the calculation should be com_select/com_queries (or com_select_diff/com_queries_diff) to learn the SELECT ratio of all queries. I learned this due to a bug report on mycheckpoint, which presented some 10265% SELECT ratio…

My take on this is that it could have been worked out differently: instead of changing the meaning of an existing variable, questions could have remained as it was, with the introduction of, say, client_questions, which would only indicate client number of issued queries.

I believe changing the meaning of status variables at such late versions (5.0.76 is quite late!) invites trouble: code that used to work on already then-stable versions (e.g. 5.0.51) would behave differently after upgrade. Such changes should best take place while still in BETA phase.

Performance analysis with mycheckpoint

mycheckpoint (see announcement) allows for both graph presentation and quick SQL access to monitored & analyzed data. I’d like to show the power of combining them both.

InnoDB performance

Taking a look at one of the most important InnoDB metrics: the read hit ratio (we could get the same graph by looking at the HTML report):

SELECT innodb_read_hit_percent FROM sv_report_chart_sample \G
*************************** 1. row ***************************
innodb_read_hit_percent: http://chart.apis.google.com/chart?cht=lc&chs=400x200&chts=303030,12&chtt=Nov+10,+11:40++-++Nov+11,+08:55+(0+days,+21+hours)&chdl=innodb_read_hit_percent&chdlp=b&chco=ff8c00&chd=s:400664366P6674y7176677677u467773y64ux166666764366646y616666666666644444434444s6u4S331444404433341334433646777666666074736777r1777767764776666F667777617777777777777777yaRi776776mlf667676xgx776766rou67767777u37797777x76676776u6A737464y67467761777666643u66446&chxt=x,y&chxr=1,99.60,100.00&chxl=0:||Nov+10,+15:55|Nov+10,+20:10|Nov+11,+00:25|Nov+11,+04:40|&chxs=0,505050,10

We see that read hit is usually high, but occasionally drops low, down to 99.7, or even 99.6. But it seems like most of the time we are above 99.95% read hit ratio. It’s hard to tell about 99.98%.

Can we know for sure?

We can stress our eyes, yet be certain of little. It’s best if we just query for the metrics! mycheckpoint provides with all data, accessible by simple SQL queries: Continue reading » “Performance analysis with mycheckpoint”

Replication analysis with mycheckpoint

I would like to show how mycehckpoint (see announcement) can be put to use for analyzing various replication metrics.

Lagging slaves

A slave has been monitored. Monitoring started at a time when it was way behind master (about two days lag), but it has since caught up. This can be easily verified by the following chart:

The above chart can be obtained by viewing the HTML report:

SELECT html FROM sv_report_html

Or by directly issuing the query:

mysql> SELECT seconds_behind_master FROM sv_report_chart_hour\G
*************************** 1. row ***************************
seconds_behind_master: http://chart.apis.google.com/chart?cht=lc&chs=400x200&chts=303030,12&chtt=Nov+5,+10:00++-++Nov+10,+08:00+(4+days,+22+hours)&chdl=seconds_behind_master&chdlp=b&chco=ff8c00&chd=s:976431zzzywutrpnliiifdbZYXVTRRRPNLJHEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&chxt=x,y&chxr=1,0,169811&chxl=0:||Nov+6,+09:00|Nov+7,+09:00|Nov+8,+08:00|Nov+9,+08:00|&chxs=0,505050,10

This is all nice. But I’m also interested in the rate at which slave lag decreased. Many ignore this important metric: just how fast does your slave replicate?

Continue reading » “Replication analysis with mycheckpoint”