Bash script: report largest InnoDB files

The following script will report the largest InnoDB tables under the data directory: schema, table & length in bytes. The tables could be non-partitioned, in which case this is simply the size of the corresponding .ibd file, or they can be partitioned, in which case the reported size is the sum of all partition files. It is assumed tables reside in their own tablespace files, i.e. created with innodb_file_per_table=1.

(
    mysql_datadir=$(grep datadir /etc/my.cnf | cut -d "=" -f 2)
    cd $mysql_datadir
    for frm_file in $(find . -name "*.frm")
    do
        tbl_file=${frm_file//.frm/.ibd}
        table_schema=$(echo $frm_file | cut -d "/" -f 2)
        table_name=$(echo $frm_file | cut -d "/" -f 3 | cut -d "." -f 1)
        if [ -f $tbl_file ]
        then
            # unpartitioned table
            file_size=$(du -cb $tbl_file 2> /dev/null | tail -n 1) 
        else
            # attempt partitioned innodb table
            tbl_file_partitioned=${frm_file//.frm/#*.ibd}
            file_size=$(du -cb $tbl_file_partitioned 2> /dev/null | tail -n 1)
        fi
        file_size=${file_size//total/}
        # Replace the below with whatever action you want to take,
        # for example, push the values into graphite.
        echo $file_size $table_schema $table_name
    done
) | sort -k 1 -nr | head -n 20

We use this to push table statistics to our graphite service; we keep an eye on table growth (we actually do not limit to top 20 but just monitor them all). File size does not report the real table data size (this can be smaller due to tablespace fragmentation). It does give the correct information if you’re concerned about disk space. For table data we also monitor SHOW TABLE STATUS / INFORMATION_SCHEMA.TABLES, themselves being inaccurate. Gotta go by something.

Why a professional conference must have a committee, and what that committee does

What exactly is it that a conference committee does? This post comes as response to a comment on A sneak peek at the Percona Live MySQL Conference & Expo 2014, reading:

Why the same committee each year? Community should vote on proposals and committee should just work schedule,etc.

I’ll pick the glove and shed some light into the work of the committee. While this specific comment related to the Percona Live conference, I trust that my opinions expressed below apply just as well to any (technical?) professional conference; the point below can equally apply to conferences from Oracle MySQL Connect, O’Reilly Velocity to FOSDEM & PyCon.

I can sum up the entire answer with one word: “Discussion”. For a breakdown, please read through.

First, what’s not feasible with community-based voting, and what looks very wrong

So why not open up a voting system and let the community do the rating? I always disliked the “send an SMS to this number to vote for X” approach. It is so unbalanced and unreliable: if I were to submit a proposal describing how my company invented/develops/uses X to do great things, I can expect my co-workers to vote for me. In fact, my company would possibly ask my co-workers to do so. I stand a better chance if I work in a large company; less so in a small company.

Anonymous votes tend to be touched by politics. I could vote for my company, against a competing product, for my friends, against people I dislike, and none the wiser. We can take away anonymity, which means my votes will be public, which means they are visible to all. In which case my ranking will be affected by what people I rate would think of me; which means my rating would not be based on strictly professional/technical grounds.

But before we drop into this endless pit, let’s consider: will I, as a KMyPyVelocirails community member, really engage in reviewing over 300 submissions? How many members of my community would take so many hours of their time to do so? Let me clarify, this is a part-time job. It requires time, and it requires a mindset. I’m guessing here that you cannot count on everyone rating all talks. Some more prominent talks will be reviewed by more people, others may be left little to not reviewed in the first place.

The idea of a purely community based rating is romantic and beautiful, but not feasible.

And then there’s the discussion. Let’s look at some of the things the committee is engaged in to clarify.

Duties, responsibility and actions of a conference committee

The following discussion cannot be an exhaustive description of a committee’s work, but it can give a good glimpse into its scope. We begin with the commitment the members take upon themselves: to invest their time and will in the committee’s duties. Once you join in, you are expected to work and deliver. Continue reading » “Why a professional conference must have a committee, and what that committee does”