MySQL Community Awards 2014: Call for Nominations!

The 2014 MySQL Community Awards event will take place, as usual, in Santa Clara, April 2014, during the Percona Live MySQL Conference & Expo (currently scheduled at Thursday, April 3rd 2014).

The MySQL Community Awards is a community based initiative. The idea is to publicly recognize contributors to the MySQL ecosystem. The entire process of discussing, voting and awarding is controlled by an independent group of community members, typically based of past winners or their representatives, as well as known contributors.

It is a self-appointed, self-declared, self-making-up-the-rules-as-it-goes committee. It is also very aware of the importance of the community; a no-nonsense, non-political, adhering to tradition, self criticizing committee.

The Call for Nominations is open. We are seeking the community’s assistance in nominating candidates in the following categories:

MySQL Community Awards: Community Contributor of the year 2014

This is a personal award; a winner would a person who has made contribution to the MySQL ecosystem. This could be via development, advocating, blogging, speaking, supporting, etc. All things go.

MySQL Community Awards: Application of the year 2014

An application, project, product etc. which supports the MySQL ecosystem by either contributing code, complementing its behaviour, supporting its use, etc. This could range from a one man open source project to a large scale social service.

MySQL Community Awards: Corporate Contributor of the year 2013

A company who made contribution to the MySQL ecosystem. This might be a corporate which released major open source code; one that advocates for MySQL; one that help out community members by… anything.

Continue reading » “MySQL Community Awards 2014: Call for Nominations!”

Why delegating code to MySQL Stored Routines is poor engineering practice

I happen to use stored routines with MySQL. In fact, my open source project common_schema heavily utilizes them. DBA-wise, I think they provide with a lot of power (alas, the ANSI:SQL 2003 syntax feels more like COBOL than a sane programming language, which is why I use QueryScript instead).

However I wish to discuss the use of stored routines as integral part of your application code, which I discourage.

The common discussion on whether to use or not use stored routines typically revolves around data transfer (with stored routines you transfer less data since it’s being processed on server side), security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API) and performance (with MySQL this is not what you would expect, as routines are interpreted & their queries re-evaluated, as opposed to other RDBMS you may be used to).

But I wish to discuss the use of stored routines from an engineering standpoint. The first couple of points I raise are cultural/behavioural.

2nd grade citizens

Your stored routines are not likely to integrate well with your IDE. While your Java/Scala/PHP/Ruby/whatnot code comfortably lies within your home directory, the stored routines live in their own space: a database container. They’re not as visible to you as your standard code. Your IDE is unaware of their existence and is unlikely to have the necessary plugin/state of mind to be able to view these.

This leads to difficulty in maintaining the code. People typically resort to using some SQL-oriented GUI tool such as MySQL Workbench, SequelPro or other, commercial tools. But these tools, while make it easy to edit your routine code, do not integrate (well?) with your source control. I can’t say I’ve used all GUI tools; but how many of them will have Git/SVN/Mercurial connectors? How many of them will keep local history changes once you edit a routine? I’m happy to get introduced to such a tool.

Even with such integration, you’re split between two IDEs. And if you’re the command line enthusiast, well, you can’t just svn ci -m “fixed my stored procedure bug”. Your code is simply not in your trunk directory.

It can be done. You could maintain the entire routine code from within your source tree, and hats off to all those who do it. Most will not. See later on about deployments for more on this. Continue reading » “Why delegating code to MySQL Stored Routines is poor engineering practice”

Seconds_behind_master vs. Absolute slave lag

I am unable to bring myself to trust the Seconds_behind_master value on SHOW SLAVE STATUS. Even with MySQL 5.5‘s CHANGE MASTER TO … MASTER_HEARTBEAT_PERIOD (good thing, applied when no traffic goes from master to slave) it’s easy and common to find fluctuations in Seconds_behind_master value.

And, when sampled by your favourite monitoring tool, this often leads to many false negatives.

At Outbrain we use HAProxy as proxy to our slaves, on multiple clusters. More about that in a future post. What’s important here is that our decision whether a slave enters or leaves a certain pool (i.e. gets UP or DOWN status in HAProxy) is based on replication lag. Taking slaves out when they are actually replicating well is bad, since this reduces the amount of serving instances. Putting slaves in the pool when they are actually lagging too much is bad as they contain invalid, irrelevant data.

To top it all, even when correct, the Seconds_behind_master value is practically irrelevant on 2nd level slaves. In a Master -> Slave1 -> Slave2 setup, what does it mean that Slave2 has Seconds_behind_master = 0? Nothing much to the application: Slave1 might be lagging an hour behind the master, or may not be replicating at all. Slave2 might have an hour’s data missing even though it says its own replication is fine.

None of the above is news, and yet many fall in this pitfall. The solution is quite old as well; it is also very simple: do your own heartbeat mechanism, at your favourite time resolution, and measure slave lag by timestamp you yourself updated on the master.

Maatkit/percona-toolkit did this long time ago with mk-heartbeat/pt-heartbeat. We’re doing it in a very similar manner. The benefit is obvious. Consider the following two graphs; the first shows Seconds_behind_master, the seconds shows our own Absolute_slave_lag measurement. Continue reading » “Seconds_behind_master vs. Absolute slave lag”

Percona Live 2014 schedule released; BoF and Lightning Talks Call for Papers continues

The complete tutorial & session schedule for Percona Live MySQL Conference & Expo 2014 is released. This schedule offers both a sense of achievement as well as a sense of regret; for I believe the schedule is very good, and yet some good proposals had to be left out.

This is an inevitable result of a conference that is popular and receives far more proposals than can fit within the time frames. This conference offers 96 session slots and 16 3-hour tutorial slots. We got well over 300 proposals — I’m not even sure how to count them — and they just can’t all fit in. My sincere apologies to all those left out. A proposal of mine was just rejected yesterday from another conference; I can sympathize and empathize with all turned down.

As part of our interest in having a diversity of talks and speakers, we have promoted talks by less frequent speakers and newly presenting companies. We are happy to grow the community!

Although titled “Percona Live”, this conference’s program is managed by a diverse and independent committee. We had good discussions and some very good thinking and advice were offered. I’m happy to acknowledge and thank the committee members:

  • Cédric Peintre, Dailymotion
  • Giuseppe Maxia, Continuent
  • Ivan Zoratti, SkySQL
  • Jay Janssen, Percona
  • Jeremy Cole, Google
  • Laine Campbell, PalominoDB (now Blackbird, congrats!)
  • Liz van Dijk, Percona
  • Roland Bouman, Pentaho
  • Tim Callaghan, Tokutek
  • Todd Farmer, Oracle
  • myself, Outbrain

Looking at the schedule I’m as always eager to attend many more sessions than I can; until I get more replicas of myself, It’s again down to choosing between multiple prominent talks at each time slot.

Thank you to all those who submitted a proposal! (It’s cool, just saying)

Birds of a Feather, Lightning Talks

Call for papers continues! You are encouraged to submit your proposals until end of January. These proposals are reviewed by the committee, and eventually chosen and scheduled by Giuseppe Maxia. See also:

Bash script: report largest InnoDB files

The following script will report the largest InnoDB tables under the data directory: schema, table & length in bytes. The tables could be non-partitioned, in which case this is simply the size of the corresponding .ibd file, or they can be partitioned, in which case the reported size is the sum of all partition files. It is assumed tables reside in their own tablespace files, i.e. created with innodb_file_per_table=1.

(
    mysql_datadir=$(grep datadir /etc/my.cnf | cut -d "=" -f 2)
    cd $mysql_datadir
    for frm_file in $(find . -name "*.frm")
    do
        tbl_file=${frm_file//.frm/.ibd}
        table_schema=$(echo $frm_file | cut -d "/" -f 2)
        table_name=$(echo $frm_file | cut -d "/" -f 3 | cut -d "." -f 1)
        if [ -f $tbl_file ]
        then
            # unpartitioned table
            file_size=$(du -cb $tbl_file 2> /dev/null | tail -n 1) 
        else
            # attempt partitioned innodb table
            tbl_file_partitioned=${frm_file//.frm/#*.ibd}
            file_size=$(du -cb $tbl_file_partitioned 2> /dev/null | tail -n 1)
        fi
        file_size=${file_size//total/}
        # Replace the below with whatever action you want to take,
        # for example, push the values into graphite.
        echo $file_size $table_schema $table_name
    done
) | sort -k 1 -nr | head -n 20

We use this to push table statistics to our graphite service; we keep an eye on table growth (we actually do not limit to top 20 but just monitor them all). File size does not report the real table data size (this can be smaller due to tablespace fragmentation). It does give the correct information if you’re concerned about disk space. For table data we also monitor SHOW TABLE STATUS / INFORMATION_SCHEMA.TABLES, themselves being inaccurate. Gotta go by something.

Why a professional conference must have a committee, and what that committee does

What exactly is it that a conference committee does? This post comes as response to a comment on A sneak peek at the Percona Live MySQL Conference & Expo 2014, reading:

Why the same committee each year? Community should vote on proposals and committee should just work schedule,etc.

I’ll pick the glove and shed some light into the work of the committee. While this specific comment related to the Percona Live conference, I trust that my opinions expressed below apply just as well to any (technical?) professional conference; the point below can equally apply to conferences from Oracle MySQL Connect, O’Reilly Velocity to FOSDEM & PyCon.

I can sum up the entire answer with one word: “Discussion”. For a breakdown, please read through.

First, what’s not feasible with community-based voting, and what looks very wrong

So why not open up a voting system and let the community do the rating? I always disliked the “send an SMS to this number to vote for X” approach. It is so unbalanced and unreliable: if I were to submit a proposal describing how my company invented/develops/uses X to do great things, I can expect my co-workers to vote for me. In fact, my company would possibly ask my co-workers to do so. I stand a better chance if I work in a large company; less so in a small company.

Anonymous votes tend to be touched by politics. I could vote for my company, against a competing product, for my friends, against people I dislike, and none the wiser. We can take away anonymity, which means my votes will be public, which means they are visible to all. In which case my ranking will be affected by what people I rate would think of me; which means my rating would not be based on strictly professional/technical grounds.

But before we drop into this endless pit, let’s consider: will I, as a KMyPyVelocirails community member, really engage in reviewing over 300 submissions? How many members of my community would take so many hours of their time to do so? Let me clarify, this is a part-time job. It requires time, and it requires a mindset. I’m guessing here that you cannot count on everyone rating all talks. Some more prominent talks will be reviewed by more people, others may be left little to not reviewed in the first place.

The idea of a purely community based rating is romantic and beautiful, but not feasible.

And then there’s the discussion. Let’s look at some of the things the committee is engaged in to clarify.

Duties, responsibility and actions of a conference committee

The following discussion cannot be an exhaustive description of a committee’s work, but it can give a good glimpse into its scope. We begin with the commitment the members take upon themselves: to invest their time and will in the committee’s duties. Once you join in, you are expected to work and deliver. Continue reading » “Why a professional conference must have a committee, and what that committee does”

Percona Live: MySQL Conference & Expo 2014: call for papers & guidelines

Call for papers for Percona Live MySQL Conference & Expo 2014 is open. As in previous year, I have the honour of being conference committee chairman, which means I’m in particular part of the reviewing committee. I wish to add to Giuseppe’s fine observations and suggestions. On submitting a talk, please consider the following:

  • Make it right the first time (an old advice by Baron). We will not be able to review your proposal more than once. We will not be iterating the proposals again and again to see what’s changed, nor will we have a feed for updated content. We will not be able to diff any changes you made to your proposal. Get it right the first time, this is the one time we will read your proposal.

On rare occasions we may think your proposal is just too important to be thrown just for bad description/grammar/language, and we may contact you to refine.

  • The proposal must be as clear to us as for your target audience. Good proposals are easy to understand. Make it clear.

Being super-famous does not grant you immediate approval. It helps if you’ve given the talk 10 times in the past, and we’ve all seen it and it was widely acknowledged as one of the best talks ever. But then again, if that’s the kind of speaker you are, you probably know how to write a good proposal.

The committee seeks the audience’s best interest. We assume the audience’s common sense is similar, via extrapolation, to our own. We therefore assume that if we think a proposal is bad, so will the audience.

  • If you’re a commercial vendor and you want to make a proposal, that’s fine.
    • I suppose the best way would be through the sponsor program
    • But otherwise we actually accept talks on commercial/closed source solutions. Be advised, however, that the committee members will typically wish to promote open source & free solutions, and so commercial/closed source talks are at some disadvantage.
  • If you are owner/employee of commercial/closed source solution, and you’re going to speak on a subject that is related to your product, please be very explicit:
    • either note (in private notes, if you like) that you will be, say, comparing your own product along with other products, and provide full disclosure
    • or let us know you will in fact not speak about your own product, and make us sigh in relief.
    • In short, don’t let us suspect this is going to be a sales pitch in disguise. It is one of the quickest ways to get your proposal rejected.

You might be interested to learn that last year, a proposal made by an influential and a well known speaker was turned down for that exact reason. Even if he had no intention of doing a sales pitch, that’s what it looked like to us, and by extrapolation to the audience. Continue reading » “Percona Live: MySQL Conference & Expo 2014: call for papers & guidelines”

common_schema: speaking at Percona Live London, Nov. 2013

In one week’s time I’ll be presenting common_schema: DBA’s framework for MySQL at Percona Live, London.

This talk introduces the rich toolset known as common_schema. It is free, open source, extremely useful in solving DBA & developer tasks, and is the next best thing ever invented next to SQL pie charts.

I’ll introduce:

  • Views, with which you can detect and terminate idle transactions, blocking transactions; roll your range partitions; find duplicate keys; block/unblock accounts; get fine grained privileges per account; find AUTO_ICNREMENT free space; …
  • Routines: do meta executions such as eval(); get query checksums; duplicating accounts; killing suspicious connections; security auditing; parsing JSON data; …
  • QueryScript: if you’re not using it, you’re missing on a powerful scripting language tightly integrated with SQL/MySQL. We’ll see the basic constructs, variables, loops; the more sophisticated MySQL/locks/overhead/danger aware constructs such as foreach & split; throttling, exceptions, it’s all in there. I’ll present real scripts that saved the day and challenge you to implement them in another scripting language.
  • Briefly introducing rdebug: stored routine debugger and debugging API
  • Roadmap (some cool things coming along) Continue reading » “common_schema: speaking at Percona Live London, Nov. 2013”

TokuDB configuration variables of interest

During our experiments I came upon a few TokuDB variables of interest; if you are using TokuDB you might want to look into these:

  • tokudb_analyze_time

This is a boundary on the number of seconds an ANALYZE TABLE will operate on each index on each partition on a TokuDB table.

That is, if tokudb_analyze_time = 5, and your table has 4 indexes (including PRIMARY) and 7 partitions, then the total runtime is limited to 5*4*7 = 140 seconds.

Default in 7.1.0: 5 seconds

  • tokudb_cache_size

Similar to innodb_buffer_pool_size, this variable sets the amount of memory allocated by TokuDB for caching pages. Like InnoDB the table is clustered within the index, so the cache includes pages for both indexes and data.

Default: 50% of total memory

  • tokudb_directio

Boolean, values are 0/1. Setting tokudb_directio = 1 is like specifying innodb_flush_method = O_DIRECT. Which in turn means the OS should not cache pages requested by TokuDB. Default: 0.

Now here’s the interesting part: we are used to tell InnoDB to get the most memory we can provide (because we want it to cache as much as it can) and to avoid OS caching (because that would mean a page would appear both in the buffer pool and in OS memory, which is a waste). So the following setup is common: Continue reading » “TokuDB configuration variables of interest”

5 years of blogging

openark.org blog is now five years old. Hurrah!

Throughout these five years I posted almost exclusively MySQL oriented blogs, though I had every intention of writing on various engineering topics.

I still see blogging as one of the most important forms of knowledge sharing, and indeed for me the blogs aggregated at Planet MySQL are my main source of MySQL information. I especially like to read technical content straight from the developer; but am also keen on being updated with news on conferences, releases, distributions and some insightful opinions.

The advent of the Stack Exchange websites makes for a common place where people get to learn “how to issue this query” or “how replication works”. I can see the DBA site gaining popularity, though still going slow. Eventually I suspect it will be the one place where people share their questions and answers.

Nevertheless blogging is a great way of sharing a new finding of yours; express an opinion; review a product. So I hope for five more years of active blogging, and thank the readers of this blog for keeping watch!

Some numbers: to date I’ve posted 296 MySQL related posts (so 1.13 blog posts per week, evenly spread). The vast majority were technical; and a few opinions. I posted 6 comic strips; most of which were, I’m afraid, largely misunderstood  (the blame is on me) and projected exactly the opposite perception I was trying to express. I’m bound to fail again.

Traffic can be better (not sharing all my secrets). Been a slow down in the past months. I know, I haven’t produced much Pie charts lately.

Again, thank you for reading or commenting. No point in blogging if no one pays attention to your blabbering! And do keep coming 🙂

</meta-post>