Bash script: report largest InnoDB files

December 19, 2013

The following script will report the largest InnoDB tables under the data directory: schema, table & length in bytes. The tables could be non-partitioned, in which case this is simply the size of the corresponding .ibd file, or they can be partitioned, in which case the reported size is the sum of all partition files. It is assumed tables reside in their own tablespace files, i.e. created with innodb_file_per_table=1.

    mysql_datadir=$(grep datadir /etc/my.cnf | cut -d "=" -f 2)
    cd $mysql_datadir
    for frm_file in $(find . -name "*.frm")
        table_schema=$(echo $frm_file | cut -d "/" -f 2)
        table_name=$(echo $frm_file | cut -d "/" -f 3 | cut -d "." -f 1)
        if [ -f $tbl_file ]
            # unpartitioned table
            file_size=$(du -cb $tbl_file 2> /dev/null | tail -n 1) 
            # attempt partitioned innodb table
            file_size=$(du -cb $tbl_file_partitioned 2> /dev/null | tail -n 1)
        # Replace the below with whatever action you want to take,
        # for example, push the values into graphite.
        echo $file_size $table_schema $table_name
) | sort -k 1 -nr | head -n 20

We use this to push table statistics to our graphite service; we keep an eye on table growth (we actually do not limit to top 20 but just monitor them all). File size does not report the real table data size (this can be smaller due to tablespace fragmentation). It does give the correct information if you're concerned about disk space. For table data we also monitor SHOW TABLE STATUS / INFORMATION_SCHEMA.TABLES, themselves being inaccurate. Gotta go by something.

Why a professional conference must have a committee, and what that committee does

December 14, 2013

What exactly is it that a conference committee does? This post comes as response to a comment on A sneak peek at the Percona Live MySQL Conference & Expo 2014, reading:

Why the same committee each year? Community should vote on proposals and committee should just work schedule,etc.

I'll pick the glove and shed some light into the work of the committee. While this specific comment related to the Percona Live conference, I trust that my opinions expressed below apply just as well to any (technical?) professional conference; the point below can equally apply to conferences from Oracle MySQL Connect, O'Reilly Velocity to FOSDEM & PyCon.

I can sum up the entire answer with one word: "Discussion". For a breakdown, please read through.

First, what's not feasible with community-based voting, and what looks very wrong

So why not open up a voting system and let the community do the rating? I always disliked the "send an SMS to this number to vote for X" approach. It is so unbalanced and unreliable: if I were to submit a proposal describing how my company invented/develops/uses X to do great things, I can expect my co-workers to vote for me. In fact, my company would possibly ask my co-workers to do so. I stand a better chance if I work in a large company; less so in a small company.

Anonymous votes tend to be touched by politics. I could vote for my company, against a competing product, for my friends, against people I dislike, and none the wiser. We can take away anonymity, which means my votes will be public, which means they are visible to all. In which case my ranking will be affected by what people I rate would think of me; which means my rating would not be based on strictly professional/technical grounds.

But before we drop into this endless pit, let's consider: will I, as a KMyPyVelocirails community member, really engage in reviewing over 300 submissions? How many members of my community would take so many hours of their time to do so? Let me clarify, this is a part-time job. It requires time, and it requires a mindset. I'm guessing here that you cannot count on everyone rating all talks. Some more prominent talks will be reviewed by more people, others may be left little to not reviewed in the first place.

The idea of a purely community based rating is romantic and beautiful, but not feasible.

And then there's the discussion. Let's look at some of the things the committee is engaged in to clarify.

Duties, responsibility and actions of a conference committee

The following discussion cannot be an exhaustive description of a committee's work, but it can give a good glimpse into its scope. We begin with the commitment the members take upon themselves: to invest their time and will in the committee's duties. Once you join in, you are expected to work and deliver. Continue Reading »

Percona Live: MySQL Conference & Expo 2014: call for papers & guidelines

November 14, 2013

Call for papers for Percona Live MySQL Conference & Expo 2014 is open. As in previous year, I have the honour of being conference committee chairman, which means I'm in particular part of the reviewing committee. I wish to add to Giuseppe's fine observations and suggestions. On submitting a talk, please consider the following:

  • Make it right the first time (an old advice by Baron). We will not be able to review your proposal more than once. We will not be iterating the proposals again and again to see what's changed, nor will we have a feed for updated content. We will not be able to diff any changes you made to your proposal. Get it right the first time, this is the one time we will read your proposal.

On rare occasions we may think your proposal is just too important to be thrown just for bad description/grammar/language, and we may contact you to refine.

  • The proposal must be as clear to us as for your target audience. Good proposals are easy to understand. Make it clear.

Being super-famous does not grant you immediate approval. It helps if you've given the talk 10 times in the past, and we've all seen it and it was widely acknowledged as one of the best talks ever. But then again, if that's the kind of speaker you are, you probably know how to write a good proposal.

The committee seeks the audience's best interest. We assume the audience's common sense is similar, via extrapolation, to our own. We therefore assume that if we think a proposal is bad, so will the audience.

  • If you're a commercial vendor and you want to make a proposal, that's fine.
    • I suppose the best way would be through the sponsor program
    • But otherwise we actually accept talks on commercial/closed source solutions. Be advised, however, that the committee members will typically wish to promote open source & free solutions, and so commercial/closed source talks are at some disadvantage.
  • If you are owner/employee of commercial/closed source solution, and you're going to speak on a subject that is related to your product, please be very explicit:
    • either note (in private notes, if you like) that you will be, say, comparing your own product along with other products, and provide full disclosure
    • or let us know you will in fact not speak about your own product, and make us sigh in relief.
    • In short, don't let us suspect this is going to be a sales pitch in disguise. It is one of the quickest ways to get your proposal rejected.

You might be interested to learn that last year, a proposal made by an influential and a well known speaker was turned down for that exact reason. Even if he had no intention of doing a sales pitch, that's what it looked like to us, and by extrapolation to the audience. Continue Reading »

common_schema: speaking at Percona Live London, Nov. 2013

November 4, 2013

In one week's time I'll be presenting common_schema: DBA's framework for MySQL at Percona Live, London.

This talk introduces the rich toolset known as common_schema. It is free, open source, extremely useful in solving DBA & developer tasks, and is the next best thing ever invented next to SQL pie charts.

I'll introduce:

  • Views, with which you can detect and terminate idle transactions, blocking transactions; roll your range partitions; find duplicate keys; block/unblock accounts; get fine grained privileges per account; find AUTO_ICNREMENT free space; ...
  • Routines: do meta executions such as eval(); get query checksums; duplicating accounts; killing suspicious connections; security auditing; parsing JSON data; ...
  • QueryScript: if you're not using it, you're missing on a powerful scripting language tightly integrated with SQL/MySQL. We'll see the basic constructs, variables, loops; the more sophisticated MySQL/locks/overhead/danger aware constructs such as foreach & split; throttling, exceptions, it's all in there. I'll present real scripts that saved the day and challenge you to implement them in another scripting language.
  • Briefly introducing rdebug: stored routine debugger and debugging API
  • Roadmap (some cool things coming along) Continue Reading »

TokuDB configuration variables of interest

October 23, 2013

During our experiments I came upon a few TokuDB variables of interest; if you are using TokuDB you might want to look into these:

  • tokudb_analyze_time

This is a boundary on the number of seconds an ANALYZE TABLE will operate on each index on each partition on a TokuDB table.

That is, if tokudb_analyze_time = 5, and your table has 4 indexes (including PRIMARY) and 7 partitions, then the total runtime is limited to 5*4*7 = 140 seconds.

Default in 7.1.0: 5 seconds

  • tokudb_cache_size

Similar to innodb_buffer_pool_size, this variable sets the amount of memory allocated by TokuDB for caching pages. Like InnoDB the table is clustered within the index, so the cache includes pages for both indexes and data.

Default: 50% of total memory

  • tokudb_directio

Boolean, values are 0/1. Setting tokudb_directio = 1 is like specifying innodb_flush_method = O_DIRECT. Which in turn means the OS should not cache pages requested by TokuDB. Default: 0.

Now here's the interesting part: we are used to tell InnoDB to get the most memory we can provide (because we want it to cache as much as it can) and to avoid OS caching (because that would mean a page would appear both in the buffer pool and in OS memory, which is a waste). So the following setup is common: Continue Reading »

5 years of blogging

October 22, 2013 blog is now five years old. Hurrah!

Throughout these five years I posted almost exclusively MySQL oriented blogs, though I had every intention of writing on various engineering topics.

I still see blogging as one of the most important forms of knowledge sharing, and indeed for me the blogs aggregated at Planet MySQL are my main source of MySQL information. I especially like to read technical content straight from the developer; but am also keen on being updated with news on conferences, releases, distributions and some insightful opinions.

The advent of the Stack Exchange websites makes for a common place where people get to learn "how to issue this query" or "how replication works". I can see the DBA site gaining popularity, though still going slow. Eventually I suspect it will be the one place where people share their questions and answers.

Nevertheless blogging is a great way of sharing a new finding of yours; express an opinion; review a product. So I hope for five more years of active blogging, and thank the readers of this blog for keeping watch!

Some numbers: to date I've posted 296 MySQL related posts (so 1.13 blog posts per week, evenly spread). The vast majority were technical; and a few opinions. I posted 6 comic strips; most of which were, I'm afraid, largely misunderstood  (the blame is on me) and projected exactly the opposite perception I was trying to express. I'm bound to fail again.

Traffic can be better (not sharing all my secrets). Been a slow down in the past months. I know, I haven't produced much Pie charts lately.

Again, thank you for reading or commenting. No point in blogging if no one pays attention to your blabbering! And do keep coming :)


Converting an OLAP database to TokuDB, part 3: operational stuff

October 14, 2013

This is the third post in a series of posts describing our experience in migrating a large DWH server to TokuDB (see 1st and 2nd parts). This post discusses operations; namely ALTER TABLE operations in TokuDB. We ran into quite a few use cases by this time that we can shed light on.

Quick recap: we've altered one of out DWH slaves to TokuDB, with the goal of migrating most of out servers, including the master, to TokuDB.

Adding an index

Shortly after migrating our server to TokuDB we noticed an unreasonably disproportionate slave lag on our TokuDB slave (red line in chart below) as compared to other slaves.


Quick investigation led to the fact that, coincidentally, a manual heavy-duty operation was just taking place, which updated some year's worth of data retroactively. OK, but why so slow on TokuDB? Another quick investigation led to an apples vs. oranges problem: as depicted in part 1, our original setup included MONTHly partitioning on our larger tables, whereas we could not do the same in TokuDB, where we settled for YEARly partitioning.

The heavy-duty operation included a query that was relying on the MONTHly partitioning to do reasonable pruning: a WHERE condition on a date column did the right partition pruning; but where on InnoDB that would filter 1 month's worth of data, on TokuDB it would filter 1 year.

Wasn't it suggested that TokuDB has online table operations? I decided to give it a shot, and add a proper index on our date column (I actually created a compound index, but irrelevant).

It took 13 minutes to add an index on a 1GB TokuDB table (approx. 20GB InnoDB uncompressed equivalent):

  • The ALTER was non blocking: table was unlocked at that duration
  • The client issuing the ALTER was blocked (I thought it would happen completely in the background) -- but who cares?
  • I would say 13 minutes is fast

Not surprisingly adding the index eliminated the problem altogether.

Modifying a PRIMARY KEY

It was suggested by our DBA that there was a long time standing need to modify our PRIMARY KEY. It was impossible to achieve with our InnoDB setup (not enough disk space for the operation, would take weeks to complete if we did have the disk space). Would it be possible to modify our TokuDB tables? On some of our medium-sized tables we issued an ALTER of the form: Continue Reading »

On MySQL plugin configuration

October 1, 2013

MySQL offers plugin API, with which you can add different types of plugins to the server. The API is roughly the same for all plugin types: you implement an init() function, a deinit(); you declare status variables and global variables associated with your plugin, and of course you implement the particular implementation of plugin call.

I wish to discuss the creation and use of global variables for plugins.

Consider the following declaration of a global variable in audit_login:

static MYSQL_SYSVAR_BOOL(enabled, plugin_enabled, PLUGIN_VAR_NOCMDARG,
"enable/disable the plugin's operation, namely writing to file", NULL, NULL, 1);

static struct st_mysql_sys_var * audit_login_sysvars[] = {

The above creates a new global variables called "simple_login_audit_enabled": it is composed of the plugin name (known to be "simple_login_audit" in our example) and declared name ("enabled"). It is a boolean, defaults to 1, and is associated with the internal plugin_enabled variable.

Once this variable is declared, you can expect to be able to: Continue Reading »

Introducing audit_login: simple MySQL login logfile based auditing

September 17, 2013

audit_login is a simple MySQL login auditing plugin, logging any login or login attempt to log file in JSON format.

It seems that audit plugins are all the rage lately... We've developed out simple plugin a month ago as part of our database securing efforts; by auditing any login or login attempt we could either intercept or later investigate suspicious logins.

However we quickly realized there is much more to be gathered by this info.

In very short, you install this plugin onto your MySQL server, and your server starts writing into a text file called audit_login.log entries such as follows:

{"ts":"2013-09-11 09:11:47","type":"successful_login","myhost":"gromit03","thread":"74153868","user":"web_user","priv_user":"web_user","host":"web-87.localdomain","ip":""}
{"ts":"2013-09-11 09:11:55","type":"failed_login","myhost":"gromit03","thread":"74153869","user":"backup_user","priv_user":"","host":"web-32","ip":""}
{"ts":"2013-09-11 09:11:57","type":"failed_login","myhost":"gromit03","thread":"74153870","user":"backup_user","priv_user":"","host":"web-32","ip":""}
{"ts":"2013-09-11 09:12:48","type":"successful_login","myhost":"gromit03","thread":"74153871","user":"root","priv_user":"root","host":"localhost","ip":""}
{"ts":"2013-09-11 09:13:26","type":"successful_login","myhost":"gromit03","thread":"74153872","user":"web_user","priv_user":"web_user","host":"web-11.localdomain","ip":""}
{"ts":"2013-09-11 09:13:44","type":"successful_login","myhost":"gromit03","thread":"74153873","user":"web_user","priv_user":"web_user","host":"web-40.localdomain","ip":""}
{"ts":"2013-09-11 09:13:51","type":"successful_login","myhost":"gromit03","thread":"74153874","user":"web_user","priv_user":"web_user","host":"web-03.localdomain","ip":""}
{"ts":"2013-09-11 09:14:09","type":"successful_login","myhost":"gromit03","thread":"74153875","user":"web_user","priv_user":"web_user","host":"web-40.localdomain","ip":""}
{"ts":"2013-09-11 10:55:25","type":"successful_login","myhost":"gromit03","thread":"74153876","user":"web_user","priv_user":"web_user","host":"web-87.localdomain","ip":""}
{"ts":"2013-09-11 10:55:59","type":"successful_login","myhost":"gromit03","thread":"74153877","user":"web_user","priv_user":"web_user","host":"web-12.localdomain","ip":""}
{"ts":"2013-09-11 10:55:59","type":"failed_login","myhost":"gromit03","thread":"74153878","user":"(null)","priv_user":"(null)","host":"(null)","ip":""}

In the above your MySQL server is on gromit03, and is accepting connections from other hosts; some successful, some not. What kind of information can you gather from the above?

  • You can tell how many connections are being created on your server
  • Where they came from
  • Where 'root' connections come from
  • Port scans (see last row) can be identified by no credentials. These don't have to be port scans per se; any telnet localhost 3006 followed by Ctrl+D will show the same. Typically these would be either load balancer or monitoring tools checks to see that the 3306 port is active.
  • You can tell which accounts connect, and how many times
  • And you can infer which accounts are stale and can be dropped -- if an account does not connect within a week's time, it's probably stale (pick your own timeframe)

The above is quite interesting on one host; but we have dozens. We've installed this plugin on all our MySQL servers, and we use logstash to aggregate them. We aggregate to two destinations: Continue Reading »

Converting an OLAP database to TokuDB, part 2: the process of migration

September 9, 2013

This is a second in a series of posts describing our experience in migrating a large DWH server to TokuDB. This post discusses the process of migration itself.

As a quick recap (read part 1 here), we have a 2TB compressed InnoDB (4TB uncompressed) based DWH server. Space is running low, and we're looking at TokuDB for answers. Early experiments show that TokuDB's compression could make a good impact on disk space usage. I'm still not discussing performance -- keeping this till later post.

Those with weak hearts can skip right to the end, where we finally have a complete conversion. You can also peek at the very end to find out how much 4TB uncompressed InnoDB data is worth in TokuDB. But you might want to read through. The process was not smooth, and not as expected (it's a war story thing). Throughout the migration we got a lot of insight on TokuDB's behaviour, limitations, conveniences, inconveniences and more.

Disclosure: I have no personal interests and no company interests; throughout the process we were in touch with Tokutek engineers, getting free, friendly & professional advice and providing with input of our own. Most of this content has already been presented to Tokutek throughout the process. TokuDB is open source and free to use, though commercial license is also available.

How do you convert 4TB worth of data to TokuDB?

Obviously one table at a time. But we had another restriction: you may recall I took a live slave for the migration process. And we wanted to end the process with a live slave. So the restriction was: keep it replicating!

How easy would that be? Based on our initial tests, I extrapolated over 20 days of conversion from InnoDB to TokuDB. Even with one table at a time, our largest table was expected to convert in some 12-14 days. Can we retain 14 days of binary logs on a server already running low on disk space? If only I knew then what I know today :) Continue Reading »

Powered by Wordpress and MySQL. Theme by