Some anecdotes I learned at Percona Live

April 8, 2014

While on the plane back home I wrote down all my impressions from Percona Live 2014. Have lots of TODOs and great ideas to implement. Among all my impressions, there were a few anecdotes worth noting.

  • 5.6 GTID is still unfriendly. It will require complete shutdown & reconfiguration of your entire replication topology; and some companies present hacks around this. Notable, Facebook recoded GTID related code (slave agrees to replicate with GTID even though its master still uses binlog coordinates). Booking.com have their own hack around slowly migrating their topologies. And in a great lightning talk we were shown how to patch MySQL such that the relay logs turn into a  consistent GTID-like coordinate system.
  • Galera replication has been implemented for TokuDB (only active-passive mode, not active-active). This came as a surprise to Tokutek ; apparently Codership did this within a few hours of work. The rest is up for Tokutek!
  • WebScaleSQL is a cool initiative that aims to assist in pushing commonly desired featured back upstream. It is Web Scale. It is also not a productio0n distribution and I do not expect it to be, It is not a fork that is meant for the common DBA to download and deploy.
  • Tungsten replicator has MySQL to Hadoop replication using staging tables - an auditing of changes that are group-deployed on Hadoop.
  • Still so many people unfamiliar with MySQLSandbox. It's such a basic tool, especially for testing and local installations.
  • Still misconceptions about common_schema. Yes, I do use it on production.
  • Everyone has the same problems :)
  • Replication is still queen of MySQL's featureset. We're still all about failing over, promoting, scaling via replication.
  • Linux rules. Where two MacBooks failed to connect to the projector, my Lenovo/Ubuntu Linux did the job just fine and saved the day.

MySQL Community Awards 2014: the Winners

April 4, 2014

The MySQL Community Awards initiative is an effort to acknowledge and thank individuals and corporates for their contributions to the MySQL ecosystem. It is a from-the-community, by-the-community and for-the-community effort. The committee is composed of an independent group of community members of different orientation and opinion, themselves past winners or known contributors to the community.

The 2014 community awards were presented on April 3rd, 2014, during the community event at the Percona Live conference. The winners are:

MySQL Community Awards: Community Contributor of the year 2014

  • Colin Charles
    Colin's list of service to the MySQL Community goes back almost 10 years. He was a community engineer starting in 2005, chaired some of the O'Reilly MySQL conferences, ran the MySQL projects for Google Summer of Code. As a partner and Chief Evangelist for Monty program, he continues to promote and grow the MySQL ecosystem. Though it's his job, he goes above and beyond, driven by his passion for open source and MySQL.
  • Frédéric Descamps
    Frederic organizes the MySQL & Friends Devroom at FOSDEM every year. He worked towards making a true community driven event participated by all key players. in 2014 the MySQL & Friends devroom also presented a shared booth/stand regrouping Oracle, MariaDB/SkySQL and Percona engineers and developers.
  • Geoffrey Anderson
    Geoffrey is organizor & moderator at DBHangops: a bi-weekly community hangout discussing all things MySQL. The hangouts are free to join, and are streamed live and stored for later playback via YouTube.
    Geoffrey works to provide with content, and leads the hangouts with good spirit. DBHangOps makes for a good knowledge sharing media.

MySQL Community Awards: Application of the year 2014

  • Galera
    Galera provides MySQL with synchronous replication, offering a long time sought for High Availability solution for MySQL. It is fast becoming one of the main product/technology everyone is building next-gen businesses around. It is an important part of the renaissance of what's next for the "ecosystem".
  • Random Query Generator
    Random Query Generator has found hundreds of problems with its random generation of queries, because often the most serious bugs are outside of developers' expectations and their test cases. Engineers who work on MySQL and its derivatives have been able to vastly improve the quality of the database because of it. Random Query Generator has positively effected every single user in the MySQL ecosystem, mostly without them even knowing it.
    RQG was originally created by Philip Stoev, and has attracted further contributors over the years.

MySQL Community Awards: Corporate Contributor of the year 2014

  • Oracle
    Oracle has continued to show its commitment to MySQL with the 5.6 release and work in progress on 5.7. MySQL 5.6 might be the best release ever with significant improvements for InnoDB and replication (performance, GTIDs, parallel apply) including significant changes from the community. The 5.7 release has equally high goals for making replication better (enhanced semi-sync) and making InnoDB faster on big multi-core servers. We also appreciate the continued involvement, excellence and passion of the support and community teams.
  • BinLogic
    BinLogic has been a great steward of the MySQL Community in one of the fastest growing technology hubs, Latin America. BinLogic created the MySQL, NoSQL and Cloud Solutions day in 2012 and repeated the event in 2013, drawing in notable speakers from the community.

Continue Reading »

Introducing Propagator: multi-everything deployment made easy

March 11, 2014

I'm happy to release Outbrain's Propagator as open source. Propagator is a schema & data deployment tool which makes it easy to deploy, review, audit & fix deployments to your database servers.

What does multi-everything mean? It is:

  • Multi-server: push your schema & data changes to multiple instances in parallel
  • Multi-role: different servers have different schemas
  • Multi-environment: recognizes the differences between development, QA, build & production servers
  • Multi-technology: supports MySQL, Hive (Cassandra on the TODO list)
  • Multi-user: allows users authenticated and audited access
  • Multi-planetary: TODO

With dozens of database servers in our company (and these are master database servers), from development machines to testing machines, through build machines to production servers, and with a growing team of over 70 engineers, we faced the growing problem of controlling our database schema evolution. Controlling creation of tables, columns, keys, foreign keys; controlling creation of data that must be consistent across all servers became an infeasible task. Some changes were lost; some servers forgotten along the way, and inconsistencies blocked our development & deployments again and again.

We have reviewed schema-versioning tools (e.g. FlywayDB and Liquibase) only to conclude they solve a fraction of the problem. We looked at some GUI tools that promise to deliver the solution, but frankly any Windows Desktop GUI application is by definition the wrong tool for the job, and not (only) because of the "Windows" part.

Not all deployments are the same. Not all servers are the same. You don't ALTER a big table on production. You may be using different schema names on different servers. You may have multiple schemas on a single server with identical structure. You may wish to only deploy to some development servers, possibly to a test server, but not to all, and yet be able to pick up on where you left a few days later on to complete your deployment. Some deployments fail, and for various reasons (e.g. John created that table manually on this particular test server, so obviously you can't CREATE it again), and you want to be able to skip it, or mark it as "OK", or put some comments, or hint that you need assistance from a DBA. You want to be able to quickly add new servers to your deployment group.

Above all, you want every single deployment to be fully audit-able. You want to know exactly who did what and when. If their deployment failed, you want to know that. You want to know why it failed. You want to be able to pick up and try it again, after your DBA found the problem. You want to be able to review yesterday's deployments, and be able to contact Jane and say "hey, I see you hit a problem here; I know what the problem is; you should do this and that, then please try to deploy again".

There's so much more, but I'll stop telling you what you want to have, since there's a good manual available. Continue Reading »

Speaking at Percona Live: common_schema, MySQL DevOps

March 10, 2014

In less than a month I'll be giving these two talks at Percona Live:

If you are still unfamiliar with common_schema, this will make for a good introduction. I'll give you multiple reasons why you would want to use it, and how it would come to immediate use at your company. I do mean immediate, as in previous common_schema presentations I happened to get feedback emails from attendees within the same or next day letting me know how common_schema solved an insistent problem of theirs or how it exposed an unknown status.

I'll review some useful views & routines, and discuss the ease and power of QueryScript. common_schema is a Swiss-knife of solutions, and all from within your MySQL server.

I am using common_schema in production on a regular basis, and it happened to be hero of the day in multiple occasions. I'll present a couple such cases.

This is a technical talk touching at some cultural issues.

At Outbrain, where I work, we have two blessings: a large group of engineers and a large dataset. We at the infrastructure team, together with the ops team, are responsible for the availability of the data. What we really like is technology which lets the owners of a problem be able to recognize it and take care of it. We want ops guys to do ops, and engineers to do engineering. And we want them to be able to talk to each other and understand each other.

What tools can you use to increase visibility? To allow sharing of data between the teams? I'll share some tools and techniques that allow us to automate deployments, detect a malfunctioning/abusing service, deploy schema changes across dozens of hosts, control data retention, monitor connections, and more.

We like open source. The tools discussed are mostly open source, or open sourced by Outbrain.

I'll explain why these tools matter, and how they serve the purpose of removing friction between teams, allowing for quick analysis of problems and overall visibility on all things that happen.

Do come by!

Percona Live - call for "Hall of Shame" talks

March 8, 2014

We've got some spare time on Percona Live during the lightning talks session, and are spontaneously calling for "Hall of Shame" submissions.

What is this about?

We just had a wonderful Reversim Summit a couple weeks back, where we held the "Hall of Shame" session. We are used to hear talks about success stories and great new technologies. Well, this session is your chance to come up and say: "I messed up, and I'm proud of it!"

You will have 3-4 minutes to tell us about how you once accidentally dropped your database; corrupted your data; brought your company's service down. The greater the damage, the greater the appeal! But we're looking for the funny edge - not for a tragedy. There are no slides. Just a "Hall of Shame" screen behind you.

The response we got on Reversim Summit? It was amazing. The audience was literally in tears; there were such hilarious stories that we could hardly keep up. People were spontaneously offering their stories and the organizers had to hold them back.

And yet, you will be telling about your mess up - so please make sure you feel OK about this. For what it's worth, I will contribute my own shameful, shameful story.

So, this is new & experimental for the Percona Live conference, and we don't have many slots. If no one submits - that's OK. If too many submit, we'll have to cut most. As conferences go, we may end up with a last moment open timeslot, so if you're spontaneous that could be your chance.

Ready to submit?

Please send an email to mysql.hallofshame@gmail.com with a brief description of what you want to share. I'll be reviewing these submissions and either approve, reject or hold you on a waiting list. I assume this will go by First Come First Served. The deadline for submissions is Friday, Mar 14th.

mycheckpoint, discontinued

March 6, 2014

Time to admit to myself: mycheckpoint has to be discontinued.

I started mycheckpoint back in 2009, as a free & open source lightweight monitoring tool for MySQL. Over some years it evolved and became an actual (lightweight) monitoring solution, used by many. It has a unique and original design, which, alas, is also its bane.

mycheckpoint uses the relational model & SQL to store and query monitored metrics. This leads to quite a sophisticated service, which can make practically anything visible to the user. The raw data is just numbers. but with some SQL-Fu one can generate charts out of it,  (interactive ones as well), human readable reports and full blown email messages. It is still the only common solution I'm aware of that keeps track of variable changes and provides with clear "what changed, when, from value & to_value". I caught many deployment bugs by just observing this. It's a single file that provides with full blown HTTP service, alerting, mail notifications, multi-database monitoring, custom monitoring queries, query execution time monitoring, OS metrics, ...

While developing mycheckpoint I learned a lot on MySQL status & configuration, complex SQL queries, Python, linux, packaging and more. I got a lot of feedback from users, as I still do (thank you!). Didn't always manage to fix all bugs or answer all questions.

The design of mycheckpoint does not meet today's reality. Heck, today there are more counters & variables than possible table columns. The schema-per-monitored-instance design makes for simplicity, but does not fare well with dozens or hundreds of servers to monitor. There is no cross-instance aggregation or visualization of data. The per-10 minute aggregation is too rough. There isn't a test suite.

Some of the above issues can be fixed, and if you like, the source code is still freely available. I'll even migrate the entire SVN to GitHub at some stage. But I believe the current state might only be good for small scale deployments;  not something you would consider to scale up with.

For me, there's nothing more motivating in code development than knowing the code will go public. The efforts in making the code look as best it can, as easily deployable as possibly can, with good documentation, makes for a lot of effort - but very satisfying. Open Source FTW!!!1

 

The "once and for all" SHOW SLAVE STATUS log files & positions explained

March 6, 2014

True, GTID is upon us whether via MySQL 5.6 or Tungsten Replicator (and wasn't it in Google Patches since 2009?).

But some of us are still using standard replication with MySQL 5.5, and the "what's with all these binary log files and positions" question is ever erupting. The output of SHOW SLAVE STATUS confuses people new to it. It confuses me time and again.

So here's the semi visual guide to interpreting the SHOW SLAVE STATUS.

About binary logs and relay logs

A master writes binary logs. These are typically and conventionally called mysql-bin.##### or mysqld-bin.##### (replace ##### with digits).

A slave connects to its master, and reads entries from the master's binary logs. The slave writes those entries into its own relay logs. These are typically and conventionally called mysql-relay.##### or mysqld-relay.##### (replace ##### with digits).

There is nothing at all that connects the name or number of a slave's relay log with the master's binary log. There is nothing at all that connects the position within the relay log with the position within the master binary log. Files are flushed/rotated; have different size configuration; are re-created. However the slave does keep track on the current relay-log entry: it knows what's the matching entry on the master's binary logs. This is an important piece of information.

While the slave fetches entries and writes them into the relay log (via the IO_THREAD), it also reads the relay log to replay those entries (via the SQL_THREAD).

And so at each point in time we are interested in the following "coordinates":

  • What are we fetching from the master? Which file are we fetching and from which position?
  • Where are we writing this to? (This is implicitly the latest relay log file and its size)
  • What's the position of currently executing slave query, in relay-log coordinates? As the slave lags these coordinates are farther (smaller) than the written-to position.
  • What's the position of currently executing slave query, in master binary-log coordinates? This information really tells us how far apart we are from the master. Continue Reading »

MySQL Community Awards 2014: Call for Nominations!

February 9, 2014

The 2014 MySQL Community Awards event will take place, as usual, in Santa Clara, April 2014, during the Percona Live MySQL Conference & Expo (currently scheduled at Thursday, April 3rd 2014).

The MySQL Community Awards is a community based initiative. The idea is to publicly recognize contributors to the MySQL ecosystem. The entire process of discussing, voting and awarding is controlled by an independent group of community members, typically based of past winners or their representatives, as well as known contributors.

It is a self-appointed, self-declared, self-making-up-the-rules-as-it-goes committee. It is also very aware of the importance of the community; a no-nonsense, non-political, adhering to tradition, self criticizing committee.

The Call for Nominations is open. We are seeking the community’s assistance in nominating candidates in the following categories:

MySQL Community Awards: Community Contributor of the year 2014

This is a personal award; a winner would a person who has made contribution to the MySQL ecosystem. This could be via development, advocating, blogging, speaking, supporting, etc. All things go.

MySQL Community Awards: Application of the year 2014

An application, project, product etc. which supports the MySQL ecosystem by either contributing code, complementing its behaviour, supporting its use, etc. This could range from a one man open source project to a large scale social service.

MySQL Community Awards: Corporate Contributor of the year 2013

A company who made contribution to the MySQL ecosystem. This might be a corporate which released major open source code; one that advocates for MySQL; one that help out community members by... anything.

Continue Reading »

Why delegating code to MySQL Stored Routines is poor engineering practice

February 6, 2014

I happen to use stored routines with MySQL. In fact, my open source project common_schema heavily utilizes them. DBA-wise, I think they provide with a lot of power (alas, the ANSI:SQL 2003 syntax feels more like COBOL than a sane programming language, which is why I use QueryScript instead).

However I wish to discuss the use of stored routines as integral part of your application code, which I discourage.

The common discussion on whether to user or not use stored routines typically revolves around data transfer (with stored routines you transfer less data since it's being processed on server side), security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API) and performance (with MySQL this is not what you would expect, as routines are interpreted & their queries re-evaluated, as opposed to other RDBMS you may be used to).

But I wish to discuss the use of stored routines from an engineering standpoint. The first couple of points I raise are cultural/behavioural.

2nd grade citizens

Your stored routines are not likely to integrate well with your IDE. While your Java/Scala/PHP/Ruby/whatnot code comfortably lies within your home directory, the stored routines live in their own space: a database container. They're not as visible to you as your standard code. Your IDE is unaware of their existence and is unlikely to have the necessary plugin/state of mind to be able to view these.

This leads to difficulty in maintaining the code. People typically resort to using some SQL-oriented GUI tool such as MySQL Workbench, SequelPro or other, commercial tools. But these tools, while make it easy to edit your routine code, do not integrate (well?) with your source control. I can't say I've used all GUI tools; but how many of them will have Git/SVN/Mercurial connectors? How many of them will keep local history changes once you edit a routine? I'm happy to get introduced to such a tool.

Even with such integration, you're split between two IDEs. And if you're the command line enthusiast, well, you can't just svn ci -m "fixed my stored procedure bug". Your code is simply not in your trunk directory.

It can be done. You could maintain the entire routine code from within your source tree, and hats off to all those who do it. Most will not. See later on about deployments for more on this. Continue Reading »

Seconds_behind_master vs. Absolute slave lag

January 24, 2014

I am unable to bring myself to trust the Seconds_behind_master value on SHOW SLAVE STATUS. Even with MySQL 5.5's CHANGE MASTER TO ... MASTER_HEARTBEAT_PERIOD (good thing, applied when no traffic goes from master to slave) it's easy and common to find fluctuations in Seconds_behind_master value.

And, when sampled by your favourite monitoring tool, this often leads to many false negatives.

At Outbrain we use HAProxy as proxy to our slaves, on multiple clusters. More about that in a future post. What's important here is that our decision whether a slave enters or leaves a certain pool (i.e. gets UP or DOWN status in HAProxy) is based on replication lag. Taking slaves out when they are actually replicating well is bad, since this reduces the amount of serving instances. Putting slaves in the pool when they are actually lagging too much is bad as they contain invalid, irrelevant data.

To top it all, even when correct, the Seconds_behind_master value is practically irrelevant on 2nd level slaves. In a Master -> Slave1 -> Slave2 setup, what does it mean that Slave2 has Seconds_behind_master = 0? Nothing much to the application: Slave1 might be lagging an hour behind the master, or may not be replicating at all. Slave2 might have an hour's data missing even though it says its own replication is fine.

None of the above is news, and yet many fall in this pitfall. The solution is quite old as well; it is also very simple: do your own heartbeat mechanism, at your favourite time resolution, and measure slave lag by timestamp you yourself updated on the master.

Maatkit/percona-toolkit did this long time ago with mk-heartbeat/pt-heartbeat. We're doing it in a very similar manner. The benefit is obvious. Consider the following two graphs; the first shows Seconds_behind_master, the seconds shows our own Absolute_slave_lag measurement. Continue Reading »

 
Powered by Wordpress and MySQL. Theme by openark.org