questions or queries?

I’ve hit a recent change which took me by surprise.

I was used to checking for the ‘questions‘ global status variables to see the total amount of queries the server performs. So, for example, I could run com_select/questions to learn the SELECT ratio out of all queries.

Apparently, as of 5.0.725.0.76 & 5.1.31 this has changed. A new status variable was introduced, called ‘queries‘.

The change being? questions does not any longer indicate the number of queries the server has executed: only the number of queries requested by the client (so, calling on a stored routine only counts as 1, regardless of how many queries the routine executes). The new queries variable indicates the amount of server queries issued (see the 5.0 and 5.1 docs for details).

So, as of 5.0.72 or 5.1.31, the calculation should be com_select/com_queries (or com_select_diff/com_queries_diff) to learn the SELECT ratio of all queries. I learned this due to a bug report on mycheckpoint, which presented some 10265% SELECT ratio…

My take on this is that it could have been worked out differently: instead of changing the meaning of an existing variable, questions could have remained as it was, with the introduction of, say, client_questions, which would only indicate client number of issued queries.

I believe changing the meaning of status variables at such late versions (5.0.76 is quite late!) invites trouble: code that used to work on already then-stable versions (e.g. 5.0.51) would behave differently after upgrade. Such changes should best take place while still in BETA phase.

Performance analysis with mycheckpoint

mycheckpoint (see announcement) allows for both graph presentation and quick SQL access to monitored & analyzed data. I’d like to show the power of combining them both.

InnoDB performance

Taking a look at one of the most important InnoDB metrics: the read hit ratio (we could get the same graph by looking at the HTML report):

SELECT innodb_read_hit_percent FROM sv_report_chart_sample \G
*************************** 1. row ***************************
innodb_read_hit_percent: http://chart.apis.google.com/chart?cht=lc&chs=400x200&chts=303030,12&chtt=Nov+10,+11:40++-++Nov+11,+08:55+(0+days,+21+hours)&chdl=innodb_read_hit_percent&chdlp=b&chco=ff8c00&chd=s:400664366P6674y7176677677u467773y64ux166666764366646y616666666666644444434444s6u4S331444404433341334433646777666666074736777r1777767764776666F667777617777777777777777yaRi776776mlf667676xgx776766rou67767777u37797777x76676776u6A737464y67467761777666643u66446&chxt=x,y&chxr=1,99.60,100.00&chxl=0:||Nov+10,+15:55|Nov+10,+20:10|Nov+11,+00:25|Nov+11,+04:40|&chxs=0,505050,10

We see that read hit is usually high, but occasionally drops low, down to 99.7, or even 99.6. But it seems like most of the time we are above 99.95% read hit ratio. It’s hard to tell about 99.98%.

Can we know for sure?

We can stress our eyes, yet be certain of little. It’s best if we just query for the metrics! mycheckpoint provides with all data, accessible by simple SQL queries: Continue reading » “Performance analysis with mycheckpoint”

Replication analysis with mycheckpoint

I would like to show how mycehckpoint (see announcement) can be put to use for analyzing various replication metrics.

Lagging slaves

A slave has been monitored. Monitoring started at a time when it was way behind master (about two days lag), but it has since caught up. This can be easily verified by the following chart:

The above chart can be obtained by viewing the HTML report:

SELECT html FROM sv_report_html

Or by directly issuing the query:

mysql> SELECT seconds_behind_master FROM sv_report_chart_hour\G
*************************** 1. row ***************************
seconds_behind_master: http://chart.apis.google.com/chart?cht=lc&chs=400x200&chts=303030,12&chtt=Nov+5,+10:00++-++Nov+10,+08:00+(4+days,+22+hours)&chdl=seconds_behind_master&chdlp=b&chco=ff8c00&chd=s:976431zzzywutrpnliiifdbZYXVTRRRPNLJHEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&chxt=x,y&chxr=1,0,169811&chxl=0:||Nov+6,+09:00|Nov+7,+09:00|Nov+8,+08:00|Nov+9,+08:00|&chxs=0,505050,10

This is all nice. But I’m also interested in the rate at which slave lag decreased. Many ignore this important metric: just how fast does your slave replicate?

Continue reading » “Replication analysis with mycheckpoint”

Announcing mycheckpoint: lightweight, SQL oriented monitoring for MySQL

I’m proud to announce mycheckpoint, a monitoring utility for MySQL, with strong emphasis on user accessibility to monitored data.

mycheckpoint is a different kind of monitoring tool. It leaves the power in the user’s hand. It’s power is not with script-based calculations of recorded data. It’s with the creation of a view hierarchy, which allows the user to access computed metrics directly.

mycheckpoint is needed first, to deploy a monitoring schema. It may be needed next, so as to INSERT recorded data (GLOBAL STATUS, GLOBAL VARIABLES, MASTER STATUS, SLAVE STATUS) — but this is just a simple INSERT; anyone can do that, even another monitoring tool.

It is then that you do not need it anymore: everything is laid at your fingertips. Consider:

SELECT innodb_read_hit_percent, DML FROM sv_report_chart_hour;

mycheckpoint provides the views which take raw data (just innodb_buffer_pool_read_requests, com_select, innodb_buffer_pool_size, table_open_cache, seconds_behind_master etc.) and generate Google Charts URLs, HTML reports, human readable reports, or otherwise easily accessible data.

Continue reading » “Announcing mycheckpoint: lightweight, SQL oriented monitoring for MySQL”

SQL: querying for status difference over time

The InnoDB plugin has a nice INFORMATION_SCHEMA concept: resetting tables. For example, the INNODB_CMP table lists information about compression operation. A similar table, INNODB_CMP_RESET, provides the same information, but resets the values. The latter can be used to measure, for example, number of compression operations over time.

I wish to present a SQL trick which does the same, without need for resetting tables. Suppose you have some status table, and you wish to measure the change in status per second, per minute etc. The trick is to query for the value twice in the same query, with some pause in between, and make the difference calculation.

For sake of simplicity, I’ll demonstrate using 5.1’s INFORMATION_SCHEMA.GLOBAL_STATUS. Please refer to INFORMATION_SCHEMA.GLOBAL_STATUS: watch out for some discussion on this.

Continue reading » “SQL: querying for status difference over time”

How NOT to test that mysqld is alive

I had a call from a new customer last week. They had issues with their MySQL server. Apparently, it was repeatingly crashing, every few hours. To have their production system kept alive, they used a script to periodically see if MySQL was still alive, and if not – start it.

I was first thinking in directions of old installations; wrong memory allocations (too little memory for large content; to large memory allocation for weak machine). When I reached the customer’s premises, I quickly reviewed general settings, while explaining some basic configuration guidelines. There were peculiarities (e.g. query_cache_limit being larger than query_cache_size), but nothing to obviously suggest a reason for crash.

I then observed the error log. Took some time to find it, since the log_error parameter appeared twice in the my.cnf file; first one appearing at the very beginning, the second (overwriting the first) was far into the file.

Sure enough, there were a lot of startup messages. And… shutdown messages. In fact, the log looked something like: Continue reading » “How NOT to test that mysqld is alive”