Monitoring – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Wed, 19 Aug 2015 08:00:11 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Baffling 5.7 global/status variables issues, unclean migration path https://shlomi-noach.github.io/blog/mysql/baffling-5-7-globalstatus-variables-issues-unclean-migration-path https://shlomi-noach.github.io/blog/mysql/baffling-5-7-globalstatus-variables-issues-unclean-migration-path#comments Fri, 07 Aug 2015 12:39:59 +0000 https://shlomi-noach.github.io/blog/?p=7327 MySQL 5.7 introduces a change in the way we query for global variables and status variables: the INFORMATION_SCHEMA.(GLOBAL|SESSION)_(VARIABLES|STATUS) tables are now deprecated and empty. Instead, we are to use the respective performance_schema.(global|session)_(variables|status) tables.

But the change goes farther than that; there is also a security change. Oracle created a pitfall of 2 changes at the same time:

  1. Variables/status moved to a different table
  2. Privileges required on said table

As an example, my non-root user gets:

mysql> show session variables like 'tx_isolation';
ERROR 1142 (42000): SELECT command denied to user 'normal_user'@'my_host' for table 'session_variables'

Who gets affected by this? Nearly everyone and everything.

  • Your Nagios will not be able to read status variables
  • Your ORM will not be able to determine session variables
  • Your replication user will fail connecting (see this post by Giuseppe)
  • And most everyone else.

The problem with the above is that involves two unrelated changes to your setup, which are not entirely simple to coordinate:

  1. Change your app code to choose the correct schema (information_schema vs. performance_schema)
  2. GRANT the permissions on your database

Perhaps at this point you still do not consider this to be a problem. You may be thinking: well, let’s first prepare by creating the GRANTs, and once that is in place, we can, at our leisure, modify the code.

Not so fast. Can you really that simply create those GRANTs?

Migration woes

How do you migrate to a new MySQL version? You do not reinstall all your servers. You want an easy migration path, and that path is: introduce one or two slaves of a newer version, see that everything works to your satisfaction, slowly upgrade all your other slaves, eventually switchover/upgrade your master.

This should not be any different for 5.7. We would like to provision a 5.7 slave in our topologies and just see that everything works. Well, we have, and things don’t just work. Our Nagios stops working for that 5.7 slave. Orchestrator started complaining (by this time I’ve already fixed it to be more tolerant for the 5.7 problems so no crashes here).

I hope you see the problem by now.

You cannot issue a GRANT SELECT ON performance_schema.global_variables TO ‘…’ on your 5.6 master.

The table simply does not exist there, which means the statement will not go to binary logs, which means it will not replicate on your 5.7 slave, which means you will not be able to SHOW GLOBAL VARIABLES on your slave, which means everything remains broken.

Yes, you can issue this directly on your 5.7 slaves. It’s doable, but undesired. It’s ugly in terms of automation (and will quite possibly break some assumptions and sanity checks your automation uses); in terms of validity testing. It’s unfriendly to GTID (make sure to SET SQL_LOG_BIN=0 before that).

WHY in the first place?

It seems like a security thing. I’m not sure whether this was intended. So you prevent a SHOW GLOBAL VARIABLES for a normal user. Makes sense. And yet:

mysql> show global variables like 'hostname';
ERROR 1142 (42000): SELECT command denied to user 'normal_user'@'my_host' for table 'global_variables'

mysql> select @@global.hostname;
+---------------------+
| @@global.hostname   |
+---------------------+
| myhost.mydomain.com |
+---------------------+

mysql> select @@version;
+--------------+
| @@version    |
+--------------+
| 5.7.8-rc-log |
+--------------+

Seems like I’m allowed access to that info after all. So it’s not strictly a security design decision. For status variable, I admit, I don’t have a similar workaround.

Solutions?

The following are meant to be solutions, but do not really solve the problem:

  • SHOW commands. SHOW GLOBAL|SESSION VARIABLES|STATUS will work properly, and will implicitly know whether to provide the results via information_schema or performance_schema tables.
    • But, aren’t we meant to be happier with SELECT queries? So that I can really do stuff that is smarter than LIKE ‘variable_name%’?
    • And of course you cannot use SHOW in server side cursors. Your stored routines are in a mess now.
    • This does not solve the GRANTs problem.
  • show_compatibility_56: an introduced variable in 5.7, boolean. It truly is a time-travel-paradox novel in disguise, in multiple respects.
    • Documentation introduces it, and says it is deprecated.
      • time-travel-paradox :O
    • But it actually works in 5.7.8 (latest)
      • time-travel-paradox plot thickens
    • Your automation scripts do not know in advance whether your MySQL has this variable
      • Hence SELECT @@global.show_compatibility_56 will produce an error on 5.6
      • But the “safe” way of SHOW GLOBAL VARIABLES LIKE ‘show_compatibility_56’ will fail on a privilege error on 5.7
      • time-travel-paradox :O
    • Actually advised by my colleague Simon J. Mudd, show_compatibility_56 defaults to OFF. I support this line of thought. Or else it’s old_passwords=1 all over again.
    • show_compatibility_56 doesn’t solve the GRANTs problem.
    • This does not solve any migration path. It just postpones the moment when I will hit the same problem. When I flip the variable from “1” to “0”, I’m back at square one.

Suggestion

I claim security is not the issue, as presented above. I claim Oracle will yet again fall into the trap of no-easy-way-to-migrate-to-GTID in 5.6 if the current solution is unchanged. I claim that there have been too many changes at once. Therefore, I suggest one of the alternative two flows:

  1. Flow 1: keep information_schema, later migration into performance_schema
    • In 5.7information_schema tables should still produce the data.
    • No security constraints on information_schema
    • Generate WARNINGs on reading from information_schema (“…this will be deprecated…”)
    • performance_schema also available. With security constraints, whatever.
    • In 5.8 remove information_schema tables; we are left with performance_schema only.
  2. Flow 2: easy migration into performance_schema:
    • In 5.7, performance_schema tables should not require any special privileges. Any user can read from them.
    • Keep show_compatibility_56 as it is.
    • SHOW commands choose between information_schema or performance_schema on their own — just as things are done now.
    • In 5.8performance_schema tables will require SELECT privileges.

As always, I love the work done by the engineers; and I love how they listen to the community.

Comments are most welcome. Have I missed the simple solution here? Are there even more complications to these features? Thoughts on my suggested two flows?

[UPDATE 2015-08-19]

Please see this followup by Morgan Tocker of Oracle.

]]>
https://shlomi-noach.github.io/blog/mysql/baffling-5-7-globalstatus-variables-issues-unclean-migration-path/feed 5 7327
Monitoring DML/slow queries with graphite https://shlomi-noach.github.io/blog/mysql/monitoring-dmlslow-queries-with-graphite https://shlomi-noach.github.io/blog/mysql/monitoring-dmlslow-queries-with-graphite#comments Sat, 19 Apr 2014 05:59:23 +0000 https://shlomi-noach.github.io/blog/?p=6830 pt-query-digest, Anemometer or “Anemomaster” do a great job of analysing your queries and giving you visibility into what’s going on with your MySQL servers. However, the place where the query digests are written is just some MySQL tables on some server. Do you have monitoring/alerts on that table? How will you verify a specific query does not exceed some runtime/execution count threshold, and get notified when it does?

At Outbrain we use Graphite to collect almost all of our data. We like it for its simplicity and for the fact it has a “push” strategy as opposed to “pull” strategy: every service/server/collectd writes (pushes) its own data to Graphite, as opposed to having some centralized monitoring service trying to pull data from thousands of servers & services. We also have a great Graphite dashboard (developed at our company by Erez Mazor) called graphitus, which is a very sophisticated and easily configurable visualization solution (see documentation).

Our love/hate relationship with Nagios boil down to having a single Nagios plugin: one that reads data from Graphite. We use Nagios to generate our alerts, and dream of the day we will substitute it with something else (there’s not too much love in this love/hate relationship).

Graphite is a numeric timeseries data monitoring solution. How do you throw MySQL query analysis into Graphite, then?

The answer lies within the flexible structure of a Graphite metric entry, which is a freely composed path, such as collectd.hosts.us-east.myhost01.mysql.gauge-Threads_running.value. Graphite does not require you to pre-define paths, and you can use anything that makes sense to you. Thus, you can use a slow query’s text, for example, as part of the Graphite entry path. This is not entirely simple as the graphite path limits the allowed characters. So this is what we do:

Any query that is written to Graphite is transformed into a “canonical form”. We strip it of excessive information and write enough of it that still makes sense to us. Actually, we found out that we usually do well with just the bare bones of “what type of query this is and what tables are involved”. For better drill down we then go to Anemometer/Anemomaster. Hence, the canonical form of the following query:

UPDATE my_documents SET document_owner='Wallace'  WHERE document_domain='Gromit'

is simply

update_my_documents

Thankfully the pt-query-digest report tables are already timestamp based, and are already aggregated by query “fingerprints”. This makes writing this data to graphite just a matter of text normalizing. The following script is a slightly modified version of what we use. Do note that we have the notion of “clustername” which is the name of the replication topology we’re looking at. We have many topologies, like OLTP, OLAP, Metadata, etc. etc. We support this notion by adding a clustername_max column to the report tables and instructing pt-query-digest fill in this value.

We run the following shell script by cron every 10 minutes (based on the 10 minute interval of analysing our masters’ DML):

#!/bin/bash

#
# This script should run on the anemomaster machine every 10 minutes, shortly after
# binary logs / relay logs are analyzed via pt-query-digest.
#

unixtime=$(date +%s)

# Get stats for the last round 10 minutes
# The query only takes one representative from each cluster
query=" select clustername_max, sum(ts_cnt), replace(fingerprint, '\n', ' ') from global_query_review_history join global_query_review using (checksum), (select date(now()) + interval hour(now()) hour + interval (minute(now()) div 10 *10) minute as search_to_timestamp) as search_to_timestamp_sel where ts_min >= search_to_timestamp - interval 10 minute and ts_min < search_to_timestamp and hostname_max in ( select min(hostname_max) from global_query_review_history where ts_min >= search_to_timestamp - interval 10 minute and ts_min < search_to_timestamp group by clustername_max) group by clustername_max, fingerprint order by sum(ts_cnt) desc "

mysql -umyself -psecret anemomaster --silent --silent --raw -e "$query" | while IFS=$'\t' read -r -a result_values
    do
        fingerprint_cluster=${result_values[0]} ;
        fingerprint_count=${result_values[1]} ;
        fingerprint_query=${result_values[2]} ;
        fingerprint_query=$(echo $fingerprint_query | sed -r -e "s/^(-- .*)]//g")
        fingerprint_query=$(echo $fingerprint_query | tr '\n' ' ' | tr '\r' ' ' | tr '\t' ' ')
        fingerprint_query=${fingerprint_query%%(*}
        fingerprint_query=${fingerprint_query%%,*}
        fingerprint_query=${fingerprint_query%% set *}
        fingerprint_query=${fingerprint_query%% SET *}
        fingerprint_query=${fingerprint_query%% where *}
        fingerprint_query=${fingerprint_query%% WHERE *}
        fingerprint_query=${fingerprint_query%% join *}
        fingerprint_query=${fingerprint_query%% JOIN *}
        fingerprint_query=${fingerprint_query%% using *}
        fingerprint_query=${fingerprint_query%% USING *}
        fingerprint_query=${fingerprint_query%% select *}
        fingerprint_query=${fingerprint_query%% SELECT *}
        fingerprint_query=$(echo $fingerprint_query | tr -d "\`")
        fingerprint_query=$(echo $fingerprint_query | tr -d "*")
        fingerprint_query=$(echo $fingerprint_query | tr -d "?")
        fingerprint_query=$(echo $fingerprint_query | tr " " "_")
        fingerprint_query=$(echo $fingerprint_query | tr "." "__")
        echo "data.mysql.dml.${fingerprint_cluster}.${fingerprint_query}.count ${fingerprint_count} $unixtime" | nc -w 1 my.graphite.server 2003
    done

If you don’t need the “clustername stuff”, modify the query to read:

select 'mysql' as clustername_max, sum(ts_cnt), replace(fingerprint, '\n', ' ') from global_query_review_history join global_query_review using (checksum), (select date(now()) + interval hour(now()) hour + interval (minute(now()) div 10 *10) minute as search_to_timestamp) as search_to_timestamp_sel where ts_min >= search_to_timestamp - interval 10 minute and ts_min < search_to_timestamp and hostname_max in ( select min(hostname_max) from global_query_review_history where ts_min >= search_to_timestamp - interval 10 minute and ts_min < search_to_timestamp) group by fingerprint order by sum(ts_cnt) desc

The graphite metric path will look like data.mysql.dml.oltp.update_my_documents.count, which makes for a perpefctly valid metric to monitor, graphically visualize and get alerts on.

]]>
https://shlomi-noach.github.io/blog/mysql/monitoring-dmlslow-queries-with-graphite/feed 1 6830
“Anemomaster”: DML visibility. Your must-do for tomorrow https://shlomi-noach.github.io/blog/mysql/anemomaster-dml-visibility-your-must-do-for-tomorrow https://shlomi-noach.github.io/blog/mysql/anemomaster-dml-visibility-your-must-do-for-tomorrow#comments Fri, 18 Apr 2014 11:15:09 +0000 https://shlomi-noach.github.io/blog/?p=6793 Here’s our take of master DML query monitoring at Outbrain (presented April 2014). It took a half-day to code, implement, automate and deploy, and within the first hour of work we managed to catch multiple ill-doing services and scripts. You might want to try this out for yourself.

What’s this about?

What queries do you monitor on your MySQL servers? Many don’t monitor queries at all, and only look up slow queries on occasion, using pt-query-digest. Some monitor slow queries, where Anemometer (relying on pt-query-digest) is a very good tool. To the extreme, some monitor TCP traffic on all MySQL servers — good for you! In between, there’s a particular type of queries that are of special interest: DML (INSERT/UPDATE/DELETE) queries issued against the master.

They are of particular interest because they are only issued once against the master, yet propagate through replication topology to execute on all slaves. These queries have a direct impact on your slave lag and on your overall replication capacity. I suggest you should be familiar with your DMLs just as you are with your slow queries.

In particular, we had multiple occasions in the past where all or most slaves started lagging. Frantically we would go to our metrics; yes! We would see a spike in com_insert. Someone (some service) was obviously generating more INSERTs than usual, at a high rate that the slaves could not keep up with. But, which INSERT was that? Blindly, we would look at the binary logs. Well, erm, what are we looking for, exactly?

Two such occasions convinced us that there should be a solution, but it took some time till it hit us. We were already using Anemometer for monitoring our slow logs. We can do the same for monitoring our binary logs. Thus was born “Anemomaster”.

Quick recap on how Anemometer works: you issue pt-query-digest on your slow logs on all MySQL hosts (we actually first ship the slow logs to a central place where we analyse them; same thing). This is done periodically, and slow logs are then rotated. You throw the output of pt-query-digest to a central database (this is built in with pt-query-digest; it doesn’t necessarily produce human readable reports). Anemometer would read this central database and visualize the slow queries.

Analysing DMLs

But then, pt-query-digest doesn’t only parse slow logs. It can parse binary logs. Instead of asking for total query time, we ask for query count, and on we go to establish the same mechanism, using same pt-query-digest and same Anemometer to store and visualize the DMLs issued on our masters.

When analysing DMLs we’re interested in parsing binary logs — and it makes no sense to do the same on all slaves. All slaves just have same copy of binlog entries as the master produces. It only takes one server to get an accurate picture of the DMLs on your replication topology.

One server could be the master, and this can indeed be done: just FLUSH MASTER LOGS, parse the binary logs with pt-query-digest, and you’re done. But like others, we tend to look at our masters as tender babies. We care for them, and do not wish to overload them unnecessarily. We chose to get the binlog entries from our slaves, instead. We also chose to get the entries from the relay logs, since these are unaffected by slave performance and as long as network is good, we can expect the relay logs to be very up to date. At any given time we have two slaves that take this role (this is automated and verified). On a 10 minute period we would flush the relay logs on these servers, and analyse whatever relay logs we have not analysed as yet.

The script below is a slightly modified version of our own, and should work for the standard installation. Modify to fit your own data (in particular, it assumes relay logs are named mysqld-relay-bin; datadir is specified in /etc/my.cnf, and please don’t ask me how to do this on Windows):

#!/bin/bash

# 
# Digest latest relay logs file, write results to "anemomaster"
#
# This script can run from any machine; it only needs to execute on a single machine per mysql cluster, but for
# analysis availability it should execute on at least two hosts per cluster.
#

DATADIR=`grep datadir /etc/my.cnf|awk -F= '{print $2}'`
TMPDIR=/tmp
DIGESTED_RELAY_LOGS_FILE=${DATADIR}/digested_relay_logs.txt

touch $DIGESTED_RELAY_LOGS_FILE
chown mysql:mysql $DIGESTED_RELAY_LOGS_FILE

hostname=$(hostname)

echo "deleting old relay logs from ${TMPDIR}"
rm ${TMPDIR}/mysqld-relay-bin.[0-9]*

echo "Getting current relay log files"
existing_relay_log_files=$(ls -tr ${DATADIR}/mysqld-relay-bin.[0-9]* | head -n -1)
for existing_relay_log_file in $existing_relay_log_files
do
    cp -u $existing_relay_log_file $TMPDIR
done
echo "flushing relay logs"
/usr/bin/mysql -umyself -psecret -e 'flush relay logs\G;' 2>/dev/null
# sleep because the log file takes some time to disappear
sleep 1

echo "Getting current relay log files"
existing_relay_log_files=$(ls -tr ${DATADIR}/mysqld-relay-bin.[0-9]* | head -n -1)
for existing_relay_log_file in $existing_relay_log_files
do
    cp -u $existing_relay_log_file $TMPDIR
done

cd $TMPDIR
for relay_log_file in mysqld-relay-bin.[0-9]*
do
    # Filter this relay log file, since it's already been digested
    grep $relay_log_file $DIGESTED_RELAY_LOGS_FILE && rm $relay_log_file
done

for relay_log_file in mysqld-relay-bin.[0-9]*
do
    echo "digesting $relay_log_file"
    mysqlbinlog $relay_log_file | /usr/bin/pt-query-digest \
      --type binlog --order-by Query_time:cnt --group-by fingerprint --limit 100 \
      --review  P=3306,u=anemomaster,p=secret,h=anemomaster_host,D=anemomaster,t=global_query_review \
      --history P=3306,u=anemomaster,p=secret,h=anemomaster_host,D=anemomaster,t=global_query_review_history \
      --filter=" \$event->{Bytes} = length(\$event->{arg}) and \$event->{hostname}=\"$(hostname)\" " \
      --no-report
    echo "$relay_log_file" >> $DIGESTED_RELAY_LOGS_FILE
    rm $relay_log_file
done
# make sure the file does not bloat. 20 entries is more than enough.
tail -n 20 $DIGESTED_RELAY_LOGS_FILE > ${TMPDIR}/DIGESTED_RELAY_LOGS_FILE
cat ${TMPDIR}/DIGESTED_RELAY_LOGS_FILE > $DIGESTED_RELAY_LOGS_FILE
echo "done"

As for Anemometer, we patched it to support multiple environments (“clusters”) — but irrelevant to the DML change. If you just want to make it visualize DMLs, here’s the major configuration changes to config.inc.php (marked with bold):

$conf['history_defaults'] = array(
	'output'		=> 'table',
	'fact-group'	=> 'date',
	'fact-order'	=> 'date DESC',
	'fact-limit' => '90',
	'dimension-ts_min_start' => date("Y-m-d H:i:s", strtotime( '-90 day')),
	'dimension-ts_min_end'	=> date("Y-m-d H:i:s"),
	'table_fields' => array('date', 'query_time_avg','ts_cnt','Query_time_sum')
);

$conf['report_defaults'] = array(
	'fact-group'	=> 'checksum',
	'fact-order'	=> 'ts_cnt DESC',
	'fact-limit' => '20',
	'dimension-ts_min_start' => date("Y-m-d H:i:s", strtotime( '-1 day')),
	'dimension-ts_min_end'	=> date("Y-m-d H:i:s"),
	'table_fields' => array('checksum','snippet', 'query_time_avg','ts_cnt','Query_time_sum'),
	'dimension-pivot-hostname_max' => null,
	'dimension-pivot-clustername_max' => null
);

$conf['graph_defaults'] = array(
	'fact-group'	=> 'minute_ts',
	'fact-order'	=> 'minute_ts',
	'fact-limit' => '',
	'dimension-ts_min_start' => date("Y-m-d H:i:s", strtotime( '-7 day')),
	'dimension-ts_min_end'	=> date("Y-m-d H:i:s"),
	'table_fields' => array('minute_ts'),
	// hack ... fix is to make query builder select the group and order fields,
	// then table fields only has to contain the plot_field
	'plot_field' => 'ts_cnt',
);

With a 10 minute rotation & digestion, we are able to analyze near real-time what’s been done on our masters. If we see a spike in com_insert/com_update/com_delete, or just see slave lags, we turn to Anemomaster and within a couple minutes know exactly what service is guilty of abusing our database. We are also working to protect our database against abuse, but that’s for another discussion.

]]>
https://shlomi-noach.github.io/blog/mysql/anemomaster-dml-visibility-your-must-do-for-tomorrow/feed 5 6793
mycheckpoint, discontinued https://shlomi-noach.github.io/blog/mysql/mycheckpoint-discontinued https://shlomi-noach.github.io/blog/mysql/mycheckpoint-discontinued#comments Thu, 06 Mar 2014 10:27:28 +0000 https://shlomi-noach.github.io/blog/?p=6745 Time to admit to myself: mycheckpoint has to be discontinued.

I started mycheckpoint back in 2009, as a free & open source lightweight monitoring tool for MySQL. Over some years it evolved and became an actual (lightweight) monitoring solution, used by many. It has a unique and original design, which, alas, is also its bane.

mycheckpoint uses the relational model & SQL to store and query monitored metrics. This leads to quite a sophisticated service, which can make practically anything visible to the user. The raw data is just numbers. but with some SQL-Fu one can generate charts out of it,  (interactive ones as well), human readable reports and full blown email messages. It is still the only common solution I’m aware of that keeps track of variable changes and provides with clear “what changed, when, from value & to_value”. I caught many deployment bugs by just observing this. It’s a single file that provides with full blown HTTP service, alerting, mail notifications, multi-database monitoring, custom monitoring queries, query execution time monitoring, OS metrics, …

While developing mycheckpoint I learned a lot on MySQL status & configuration, complex SQL queries, Python, linux, packaging and more. I got a lot of feedback from users, as I still do (thank you!). Didn’t always manage to fix all bugs or answer all questions.

The design of mycheckpoint does not meet today’s reality. Heck, today there are more counters & variables than possible table columns. The schema-per-monitored-instance design makes for simplicity, but does not fare well with dozens or hundreds of servers to monitor. There is no cross-instance aggregation or visualization of data. The per-10 minute aggregation is too rough. There isn’t a test suite.

Some of the above issues can be fixed, and if you like, the source code is still freely available. I’ll even migrate the entire SVN to GitHub at some stage. But I believe the current state might only be good for small scale deployments;  not something you would consider to scale up with.

For me, there’s nothing more motivating in code development than knowing the code will go public. The efforts in making the code look as best it can, as easily deployable as possibly can, with good documentation, makes for a lot of effort – but very satisfying. Open Source FTW!!!1

 

]]>
https://shlomi-noach.github.io/blog/mysql/mycheckpoint-discontinued/feed 1 6745
Seconds_behind_master vs. Absolute slave lag https://shlomi-noach.github.io/blog/mysql/seconds_behind_master-vs-absolute-slave-lag https://shlomi-noach.github.io/blog/mysql/seconds_behind_master-vs-absolute-slave-lag#comments Fri, 24 Jan 2014 13:41:18 +0000 https://shlomi-noach.github.io/blog/?p=6589 I am unable to bring myself to trust the Seconds_behind_master value on SHOW SLAVE STATUS. Even with MySQL 5.5‘s CHANGE MASTER TO … MASTER_HEARTBEAT_PERIOD (good thing, applied when no traffic goes from master to slave) it’s easy and common to find fluctuations in Seconds_behind_master value.

And, when sampled by your favourite monitoring tool, this often leads to many false negatives.

At Outbrain we use HAProxy as proxy to our slaves, on multiple clusters. More about that in a future post. What’s important here is that our decision whether a slave enters or leaves a certain pool (i.e. gets UP or DOWN status in HAProxy) is based on replication lag. Taking slaves out when they are actually replicating well is bad, since this reduces the amount of serving instances. Putting slaves in the pool when they are actually lagging too much is bad as they contain invalid, irrelevant data.

To top it all, even when correct, the Seconds_behind_master value is practically irrelevant on 2nd level slaves. In a Master -> Slave1 -> Slave2 setup, what does it mean that Slave2 has Seconds_behind_master = 0? Nothing much to the application: Slave1 might be lagging an hour behind the master, or may not be replicating at all. Slave2 might have an hour’s data missing even though it says its own replication is fine.

None of the above is news, and yet many fall in this pitfall. The solution is quite old as well; it is also very simple: do your own heartbeat mechanism, at your favourite time resolution, and measure slave lag by timestamp you yourself updated on the master.

Maatkit/percona-toolkit did this long time ago with mk-heartbeat/pt-heartbeat. We’re doing it in a very similar manner. The benefit is obvious. Consider the following two graphs; the first shows Seconds_behind_master, the seconds shows our own Absolute_slave_lag measurement.

seconds_behind_master
seconds_behind_master
absolute_slave_lag
absolute_slave_lag

The two graphs were taken simultaneously on a set of servers. Excuse me for not having same colours for same slaves, I blame it on graphite. Some small gaps are seen here that are irrelevant to our discussion (yes, we had some graphite delivery issues).

As you can see the Absolute_slave_lag does not (and cannot!) fluctuate. With our 10 second heartbeat resolution it always shows an accurate value. In fact, within the hearbeat resolution, it show the de facto replication lag. Let’s stress this one:

When you implement your own heartbeat mechanism, your own measured slave lag makes for the de facto slave replication lag within your heartbeat interval.

As another example, consider what happens when a slave stop replicating (i.e. some issued STOP SLAVE, or replication fails). The Seconds_behind_master value is NULL, which is a good indication to error, and easy to monitor. But how does it present visually? Not too well. It is usually just not rendered:

But, consider: a slave that STOPs for 1 minute for whatever reason is still only 1 minute behind master. That is, it is by 60 seconds up to date with master’s data. If we decide a slave should be serving for up to 5 minutes of lag, then our slave can still be used for serving for 4 more minutes! Seconds_behind_master does not provide us with helpful information. Absolute_slave_lag does. Consider the above system status when measured by Absolute_slave_lag:

We now get good insight on how far our slave is behind. Of course we monitor Seconds_behind_master to find out replication is not working; by our HAProxy only cares about Absolute_slave_lag.

How does it work?

Very similar to pt-heartbeat, there’s a dedicated table which we update with current timestamp. We read that timestamp on slave and compare with actual time on host.

We have these DDL:

create table my_heartbeat (
  id int unsigned not null primary key,
  master_ts timestamp,
  update_by varchar(128) default NULL
);

create or replace view my_heartbeat_status_v as
  select 
    master_ts,
    now() as time_now,
    timestampdiff(SECOND, master_ts, now()) as slave_lag_seconds,
    update_by
  from my_heartbeat
;

insert into my_heartbeat (id, master_ts, update_by) values (1, NOW(), 'init') on duplicate key update master_ts=NOW(), update_by=VALUES(update_by);

create event 
  update_heartbeat_event
  on schedule every 10 second starts current_timestamp 
  on completion preserve
  enable
  do  
    insert into my_heartbeat (id, master_ts, update_by) values (1, NOW(), 'event_scheduler') on duplicate key update master_ts=NOW(), update_by=VALUES(update_by);
;

We use both event scheduler as well as external script to pump the heartbeat value.

On slave, utilize the view to:

select slave_lag_seconds from my_heartbeat_status_v

The above query answers: “how long ago was did I (the slave) get a timestamp update from the master?”. The result is correct within 10 seconds resolution, in our example.

Not new, but not well known. I hope the above provides you with better visibility into your replication lag.

]]>
https://shlomi-noach.github.io/blog/mysql/seconds_behind_master-vs-absolute-slave-lag/feed 1 6589
Bash script: report largest InnoDB files https://shlomi-noach.github.io/blog/mysql/bash-script-report-largest-innodb-files https://shlomi-noach.github.io/blog/mysql/bash-script-report-largest-innodb-files#comments Thu, 19 Dec 2013 08:58:17 +0000 https://shlomi-noach.github.io/blog/?p=6682 The following script will report the largest InnoDB tables under the data directory: schema, table & length in bytes. The tables could be non-partitioned, in which case this is simply the size of the corresponding .ibd file, or they can be partitioned, in which case the reported size is the sum of all partition files. It is assumed tables reside in their own tablespace files, i.e. created with innodb_file_per_table=1.

(
    mysql_datadir=$(grep datadir /etc/my.cnf | cut -d "=" -f 2)
    cd $mysql_datadir
    for frm_file in $(find . -name "*.frm")
    do
        tbl_file=${frm_file//.frm/.ibd}
        table_schema=$(echo $frm_file | cut -d "/" -f 2)
        table_name=$(echo $frm_file | cut -d "/" -f 3 | cut -d "." -f 1)
        if [ -f $tbl_file ]
        then
            # unpartitioned table
            file_size=$(du -cb $tbl_file 2> /dev/null | tail -n 1) 
        else
            # attempt partitioned innodb table
            tbl_file_partitioned=${frm_file//.frm/#*.ibd}
            file_size=$(du -cb $tbl_file_partitioned 2> /dev/null | tail -n 1)
        fi
        file_size=${file_size//total/}
        # Replace the below with whatever action you want to take,
        # for example, push the values into graphite.
        echo $file_size $table_schema $table_name
    done
) | sort -k 1 -nr | head -n 20

We use this to push table statistics to our graphite service; we keep an eye on table growth (we actually do not limit to top 20 but just monitor them all). File size does not report the real table data size (this can be smaller due to tablespace fragmentation). It does give the correct information if you’re concerned about disk space. For table data we also monitor SHOW TABLE STATUS / INFORMATION_SCHEMA.TABLES, themselves being inaccurate. Gotta go by something.

]]>
https://shlomi-noach.github.io/blog/mysql/bash-script-report-largest-innodb-files/feed 3 6682
Trick: recovering from “no space left on device” issues with MySQL https://shlomi-noach.github.io/blog/mysql/trick-recovering-from-no-space-left-on-device-issues-with-mysql https://shlomi-noach.github.io/blog/mysql/trick-recovering-from-no-space-left-on-device-issues-with-mysql#comments Fri, 23 Aug 2013 11:25:15 +0000 https://shlomi-noach.github.io/blog/?p=6487 Just read Ronald Bradford’s post on an unnecessary 3am (emergency) call. I sympathize! Running out of disk space makes for some weird MySQL behaviour, and in fact whenever I encounter weird behaviour I verify disk space.

But here’s a trick I’ve been using for years to avoid such cases and to be able to recover quickly. It helped me on such events as running out of disk space during ALTER TABLEs or avoiding purging of binary logs when slave is known to be under maintenance.

Ronald suggested it — just put a dummy file in your @@datadir! I like putting a 1GB dummy file: I typically copy+paste a 1GB binary log file and call it “placeholder.tmp”. Then I forget all about it. My disk space should not run out — if it does it’s a cause for emergency. I have monitoring, but sometimes I’m hoping to make an operation on 97%99% utilization.

If I do run out of disk space: well, MySQL won’t let me connect; won’t complete an important statement; not sync transaction to disk — bad situation. Not a problem in our case: we can magically recover 1GB worth of data from the @@datadir, buying us enough time (maybe just minutes) to gracefully complete so necessary operations; connect, KILL, shutdown, abort etc.

]]>
https://shlomi-noach.github.io/blog/mysql/trick-recovering-from-no-space-left-on-device-issues-with-mysql/feed 5 6487
mycheckpoint revision 231 released https://shlomi-noach.github.io/blog/mysql/mycheckpoint-revision-released https://shlomi-noach.github.io/blog/mysql/mycheckpoint-revision-released#comments Thu, 23 May 2013 12:21:52 +0000 https://shlomi-noach.github.io/blog/?p=6353 A new release for mycheckpoint: lightweight, SQL oriented MySQL monitoring solution.

If you’re unfamiliar with mycheckpoint, well, the one minute sales pitch is: it’s a free and open source monitoring tool for MySQL, which is extremely easy to install and execute, and which includes custom queries, alerts (via emails), and out of the box HTTP server and charting.

This is mostly a maintenance release, with some long-time requested features, and of course solved bugs. Here are a few highlights:

  • Supports MariaDB and MySQL 5.6 (issues with new variables, space padded variables, text-valued variables)
  • Supports alerts via function invocation on monitored host (so not only checking alerts via aggregated data like ‘Seconds_behind_master’ but also by SELECT my_sanity_check_function() on monitored instance). See alerts.
  • Supports single-running-instance via “–single” command line argument
  • Supports strict sql_mode, including ONLY_FULL_GROUP_BY, overcoming bug #69310.
  • Supports sending of pending email HTML report
  • Better re-deployment process
  • Better recognizing of SIGNED/UNSIGNED values
  • Some other improvements in charting, etc.

mycheckpoint is released under the BSD license.

Downloads are available from the project’s page.

]]>
https://shlomi-noach.github.io/blog/mysql/mycheckpoint-revision-released/feed 2 6353
MySQL monitoring: storing, not caching https://shlomi-noach.github.io/blog/mysql/mysql-monitoring-storing-not-caching https://shlomi-noach.github.io/blog/mysql/mysql-monitoring-storing-not-caching#comments Wed, 22 Feb 2012 07:44:47 +0000 https://shlomi-noach.github.io/blog/?p=4736 I’ve followed with interest on Baron’s Why don’t our new Nagios plugins use caching? and Sheeri’s Caching for Monitoring: Timing is Everything. I wish to present my take on this, from mycheckpoint‘s point of view.

So mycheckpoint works in a completely different way. On one hand, it doesn’t bother with caching. On the other hand, it doesn’t bother with re-reads of data.

There are no staleness issues, the data is consistent as it can get (you can never get a completely atomic read of everything in MySQL), and you can issue as many calculations as you want at the price of one take of monitoring. As in Sheere’s example, you can run Threads_connected/max_connections*100, mix status variables, system variables, meta-variables (e.g. Seconds_behind_master), user-created variables (e.g. number of purchases in your online shop) etc.

mycheckpoint‘s concept is to store data. And store it in relational format. That is, INSERT it to a table.

A sample-run generates a row, which lists all status, server, OS, user, meta variables. It’s a huge row, with hundreds of columns. Columns like threads_connected, max_connections, innodb_buffer_pool_size, seconds_behind_master, etc.

mycheckpoint hardly cares about these columns. It identifies them dynamically. Have you just upgraded to MySQL 5.5? Oh, there’s a new bunch of server and status variables? No problem, mycheckpoint will notice it doesn’t have the matching columns and will add them via ALTER TABLE. There you go, now we have a place to store them.

Running a formula like Threads_connected/max_connections*100 is as easy as issuing the following query:

SELECT Threads_connected/max_connections*100 FROM status_variables WHERE id = ...

Hmmm. This means I can run this formula on the most recent row I’ve just added. But wait, this also means I can run this formula on any row I’ve ever gathered.

With mycheckpoint you can generate graphs retroactively using new formulas. The data is there, vanilla style. Any formula which can be calculated via SQL is good to go with. Plus, you get the benefit of cross referencing in fun ways: cross reference to the timestamp at which the sample was taken (so, for example, ignore the spikes generated at this and that timeframe due to maintenance. Don’t alert me on these), to system issues like load average or CPU usage (show me the average Seconds_behind_master when load average is over 8, or the average load average when slow query rate is over some threshold. You don’t do that all the time, but when you need it, well, you can get all the insight you ever wanted.

Actually storing the monitored data in an easy to access format allows one to query, re-query, re-formulate. No worries about caching, you only sample once.

For completeness, all the above is relevant when the data is of numeric types. Other types are far more complicated to manage (the list of running queries is a common example).

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-monitoring-storing-not-caching/feed 8 4736
Announcing common_schema: common views & routines for MySQL https://shlomi-noach.github.io/blog/mysql/announcing-common_schema-common-views-routines-for-mysql https://shlomi-noach.github.io/blog/mysql/announcing-common_schema-common-views-routines-for-mysql#comments Wed, 13 Jul 2011 04:25:24 +0000 https://shlomi-noach.github.io/blog/?p=3794 Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.

What does it do?

There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals… There are basic functions answering for common needs.

Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.

Here are a few highlights:

There’s more. Take a look at the common_schema documentation for full listing. And it’s evolving: I’ve got quite a few ideas already for future components.

Some of these views rely on heavyweight INFORMATION_SCHEMA tables. You should be aware of the impact and risks.

What do I need to install?

There’s no script or executable file. It’s just a schema. The distribution in an SQL file which generates common_schema. Much like a dump file.

What are the system requirements?

It’s just between you and your MySQL. There are currently three distribution files, dedicated for different versions of MySQL (and allowing for increased functionality):

  • common_schema_mysql_51: fits all MySQL >= 5.1 distributions
  • common_schema_innodb_plugin: fits MySQL >= 5.1, with InnoDB plugin + INFORMATION_SCHEMA tables enabled
  • common_schema_percona_server: fits Percona Server >= 5.1

Refer to the documentation for more details.

What are the terms of use?

common_schema is released under the BSD license.

Where can I download it?

On the common_schema project page. Enjoy it!

]]>
https://shlomi-noach.github.io/blog/mysql/announcing-common_schema-common-views-routines-for-mysql/feed 7 3794