Opinions – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Tue, 27 Dec 2016 16:45:41 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Three wishes for a new year https://shlomi-noach.github.io/blog/mysql/three-wishes-for-a-new-year-4 https://shlomi-noach.github.io/blog/mysql/three-wishes-for-a-new-year-4#comments Wed, 28 Sep 2016 14:20:54 +0000 https://shlomi-noach.github.io/blog/?p=7643 (Almost) another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Relaxed GTID constraints

I’m still not using GTID, and still see operational issues with working with GTID. As a latest example, our new schema migration solution, gh-ost, allows us to test migrations in production, on replicas. The GTID catch? gh-ost has to write something to the binary log. Thus, it “corrupts” the replica with a bogus GTID entry that will never be met in another server, thus making said replica unsafe to promote. We can work around this, but…

I understand the idea and need for the Executed GTID Set. It will certainly come in handy with multi-writer InnoDB Cluster. However for most use cases GTID poses a burden. The reason is that our topologies are imperfect, and we as humans are imperfect, and operations are most certainly imperfect. We may wish to operate on a replica: test something, by intention or mistake. We may wish to use a subchain as the seed for a new cluster split. We may wish to be able to write to downstream replicas. We may use a 3rd party tool that issues a flush tables with read lock without disabling sql_log_bin. Things just happen.

For that, I would like to suggest GTID control levels, such as:

  1. Strict: same as Oracle’s existing implementation. Executed sets, purged sets, whatnot.
  2. Last executed: a mode where the only thing that counts is the last executed GTID value. If I repoint replica, all it needs to check is “hey this is my last executed GTID entry, give me the coordinates of yours. And, no, I don’t care about comparing executed and purged sets, I will trust you and keep running from that point on”
  3. Declarative: GTIDs are generated, are visible in each and every binary log entry, but are completely ignored.

I realize Oracle MySQL GTID is out for some over 3 years now, but I’m sorry – I still have reservations and see use cases where I fear it will not serve me right.

How about my previous years wishes? World peace and good health never came through, however:

  • My 2015 wish for “decent, operations friendly built in online table refactoring” was unmet, however gh-ost is a thing now and exceeds my expectations. No, really. Please come see Tom & myself present gh-ost and how it changed our migration paradigm.
  • My 2012 wish for “decent, long waited for, implementation of Window Functions (aka Analytic Functions) for MySQL” was met by MariaDB’s window functions.
    Not strictly Window Functions, but Oracle MySQL 8.0 will support CTE (hierarchial/recursive), worth a mention.

See you in Amsterdam!

]]>
https://shlomi-noach.github.io/blog/mysql/three-wishes-for-a-new-year-4/feed 2 7643
Thoughts on MaxScale automated failover (and Orchestrator) https://shlomi-noach.github.io/blog/mysql/thoughts-on-maxscale-automated-failover-and-orchestrator https://shlomi-noach.github.io/blog/mysql/thoughts-on-maxscale-automated-failover-and-orchestrator#comments Wed, 18 Nov 2015 09:17:48 +0000 https://shlomi-noach.github.io/blog/?p=7439 Having attended a talk (as part of the MariaDB Developer Meeting in Amsterdam) about recent developments of MaxScale in executing automated failovers, here are some (late) observations of mine.

I will begin by noting that the project is stated to be pre-production, and so of course none of the below are complaints, but rather food for thought, points for action and otherwise recommendations.

Some functionality of the MaxScale failover is also implemented by orchestrator, which I author. Orchestrator was built in production environments by and for operational people. In this respect it has gained many insights and had to cope with many real-world cases, special cases & Murphy’s law cases. This post compares logic, feature set and capabilities of the two where relevant. To some extent the below will read as “hey, I’ve already implemented this; shame to re-implement the same”, and indeed I think that way; but it wouldn’t be the first time a code of mine would just be re-implemented by someone else and I’ve done the same, myself.

I’m describing the solution the way I understood it from the talk. If I’m wrong on any account I’m happy to be corrected via comments below. Edit: please see comment by Dipti Joshi

General overview

The idea is that MaxScale operates as a proxy to your topology. You do not connect to your master directly, but rather through MaxScale. Thus, MaxScale acts as a proxy to your master.

The next phase is that MaxScale would also auto-detect master failure, fix the topology for you, promote a new master, and will have your application unaware of all the complexity and without the app having to change setup/DNS/whatever. Of course some write downtime is implied.

Now for some breakdown.

Detection

The detection of a dead master, the check by which a failover is initiated, is based on MaxScale not being able to query the master. This calls for some points for consideration:

  • Typically, I would see “I can’t connect to the master therefore failover” as too hysterical, and the basis for a lot of false positives.
  • However, since in the discussed configuration MaxScale is the only access point to the master, the fact MaxScale cannot connect to the master means the master is inaccessible de-facto.
  • In light of the above, the decision makes sense – but I still hold that it would make false positives.
  • I’m unsure (I think not; can anyone comment?) if MaxScale would make multiple attempts over time and only reach the conclusion after X successive failures. This would reduce the false positives.
  • I’m having a growing dislike to a “check for 4 successive times then alert/failover” Nagios-style behavior. Orchestrator takes a different approach where it recognizes a master’s death by not being able to connect to the master as well as being able to connect to 1st tier slaves, check their status and observe that they’re unable to connect to the master as well. See What makes a MySQL server failure/recovery case?. This approach still calls for further refinement (what if the master is temporarily deadlocked? Is this a failover or not?).

Assumed topology

MaxScale assumes the topology is all MariaDB, and all slaves are using (MariaDB) GTID replication. Well, MaxScale does not actually assumes that. It is assumed so by the MariaDB Replication Manager which MaxScale invokes. But I’m getting ahead of myself here.

Topology detection

MaxScale does not recognize the master by configuration but rather by state. It observes the servers it should observe, and concludes which is the master.

I’m using similar approach in orchestrator. I maintain that this approach works well and opens the Chakras for complex recovery options.

Upon failure detection

When MaxScale detects failure, it invokes external scripts to fix the problem. There are some similar and different particulars here as compared to orchestrator, and I will explain what’s wrong with the MaxScale approach:

  • Although MaxScale observes the topology and understands who is the master and who isn’t, the executed scripts do not. They need to re-discover everything by themselves.
  • This implies the scripts start without memory of “what was last observed”. This is one of the greatest strengths of orchestrator: it knows what the state was just before the failure, and, having the bigger picture, can make informed decisions.
    • As a nasty example, what do you do when some the first tier slaves also happen to be inaccessible at that time? What if one of those happens to further have slaves of its own?
  • The MariaDB Replication Manager script (to be referenced as repmgr) assumes all instances to be MariaDB with GTID.
    • It is also implied that all my slaves are configured with binary logs & log-slave-updates
    • That’s way too restrictive.
      • Orchestrator handles all following topologies: Oracle MySQL with/out GTID, MariaDB with/out GTID, MariaDB hybrid GTID & non-GTID replication, Pseudo-GTID (MySQL and/or MariaDB), hybrid normal & binlog servers topologies, slaves with/out log-slave-updates, hybrid Oracle & MariaDB & Binlog Servers & Pseudo-GTID.
  • repmgr is unaware of data centers & physical environments. You want failover to be as local to your datacenters as possible. Avoid too many cross-DC replication streams.

Failover invocation

MaxScale invokes the failover scripts asynchronously. This is a major flaw imho, as the decoupling between the monitoring and acting processes leads to further problems, see further.

After failover

MaxScale continuously scans the topology and observes that some other server has been promoted. This behavior is similar to orchestrator’s. But the following differences are noteworthy:

  • Because of both the decoupling as well as the asynchronous invocation by MaxScale, it doesn’t really have any idea if and how the promotion resolved.
  • I don’t know that there’s any anti-flapping mechanism, nor that there could be. If MaxScale doesn’t care what happened to the failover script, it shouldn’t be able to keep up with flapping scenarios.
  • Nor is there a minimal suspend period between any two failure recoveries, that I know of. MaxScale can actually have easier life than orchestrator in this regard as it is (I suspect) strictly associated with a topology. Not like there’s a single MaxScale handling multiple topologies. So it should be very easy to keep track of failures.
  • Or, if there is a minimal period and I’m just uninformed — what makes sure it is not smaller than the time it takes for the failover?

Further on failover

I wish to point out that one component of the system analyses a failure scenario, and another one fixes it. I suggest this is an undesired design. The “fixer” must have its own ability to diagnose problems as it makes progress (or else it is naive and would fail in many production cases). And the “analyzer” part must have some wisdom of its own so as to suggest course of action; or understand the consequences of the recovery done by the “fixer”.

Use of shell scripts

Generally speaking, the use of shell scripts as external hooks is evil:

  • Shell scripts tend to be poorly audited
  • With poor clarity as for what went wrong
  • Killing them has operational difficulty (detect the shell script, find possible children, detached children)
  • The approach of “if you want something else, just write a shell script for it” is nice for some things, but as the problem turns complex, you turn out to just write big parts of the solution in shell. This decouples your code to unwanted degree.

At this time, orchestrator also uses external hooks. However:

  • Fixing the topology happens within orchestrator, not by external scripts. There is an elaborate, auditable, visible decision making.
    • Decision making includes data center considerations, different configuration of servers involved, servers hinted as candidates, servers configured to be ignored, servers known to be downtimed.
  • Leaving the external scripts with the task of managing DNS changes or what have you.
    • Today, at Booking.com, we have a special operational tool (called the dba tool) which does that, manages rosters, issues puppet etc. This tool is itself well audited. Granted, there is still decoupling, but information does not just get lost.
    • Sometime in the future I suspect I will extend orchestrator-agent to participate in failovers, which means the entire flow is within orchestrator’s scope.

High availability

All the above is only available via a single MaxScale server. What happens if it dies?

There is a MaxScale/pacemaker setup I’m aware of. If one MaxScale dies, pacemaker takes charge and starts another on another box.

  • But this means real downtime
  • There are no multiple-MaxScale servers to load-balance on
  • The MaxScale started by pacemaker is newly born, and does not have the big picture of the topology. It needs to go through a “peaceful time” to understand what’s going on.

More High Availability

At a time where MaxScale will be able to load-balance and run on multiple nodes, MariaDB will have to further tackle:

  • Leader election
  • Avoiding concurrent initiation of failovers
    • Either via group communication
    • Or via shared datastore
  • Taking off from a failed/crashed MaxScale server’s work
    • Or rolling it back
    • Or just cleaning it up
  • And generally share all those little pieces of information, such as “Hey, now this server is the master” (are all MaxScales in complete agreement on the topology?) or “I have failed over this topology, we should avoid failing it over again for the next 10 minutes” and more.

The above are supported by orchestrator. It provides leader election, automated leader promotion, fair recognition of various failure scenarios, picking up a failed recovery from a failed orchestrator. Data is shared by a backend MySQL datastore, and before you shout SPOF, make it Galera/NDB.

Further little things that can ruin your day

How about having a delayed replica?

Here’s an operational use case we had to tackle.

  • You have a slave configured to lag by 24 hours. You know the drill: hackers / accidental DROP TABLE
  • How much time will an automated tool spend on reconnecting this slave to the topology?
    • This could take long minutes
    • Will your recovery hang till this is resolved?
  • Since orchestrator heals the topology in-house, it knows how to push certain operations till after specific other operations took place. For example, orchestrator wants to heal the entire topology, but pushes the delayed replicas aside, under the assumption that it will be able to fix them later (fair assumption, because they are known to be behind our promoted master); it will proceed to fix everything else, execute external hooks (change DNS etc.) and only then come back to the delayed replica. All the while, the process is audited.

Flapping ruins your day

  • Not only do you want some stall period between two failovers, you also want your team to respond to a failover and acknowledge it. Or clear up the stall period having verified the source of the problem. Or force the next failover even if it comes sooner than the stall period termination.

Binlog formats

It is still not uncommon to have Statement Based Replication running. And then it is also not uncommon to have one or two slaves translating to Row Based Replication because of:

  • Some app that has to read ROW based format
  • Experimenting with RBR for purposes of upgrade

You just can’t promote such a RBR slave on top of SBR slaves; it wouldn’t work. Orchestrator is aware of such rules. I still need to integrate this particular consideration into the promotion algorithm.

Versions

Likewise, not all your slaves are of same version. You should not promote a newer version slave on top of an older version slave. Again, orchestrator will not allow putting such a topology, and again, I still need to integrate this consideration into the promotion algorithm.

In summary

There is a long way for MaxScale failover to go. When you consider the simplest, all-MariaDB-GTID-equal-slaves small topology case, things are kept simple and probably sustainable. But issues like complex topologies, flapping, special slaves, different configuration, high availability, leadership, acknowledgements, and more, call for a more advanced solution.

]]>
https://shlomi-noach.github.io/blog/mysql/thoughts-on-maxscale-automated-failover-and-orchestrator/feed 6 7439
New statements I’d like to see in MySQL 5.8 https://shlomi-noach.github.io/blog/mysql/new-statements-id-like-to-see-in-mysql-5-8 https://shlomi-noach.github.io/blog/mysql/new-statements-id-like-to-see-in-mysql-5-8#comments Thu, 08 Oct 2015 07:44:50 +0000 https://shlomi-noach.github.io/blog/?p=7407 Following up on New features I’d like to see in MySQL 5.8, here are statements I would like to see in MySQL 5.8:

  • ENABLE EVENTS;
    When promoting a slave to master, I want to be able to enable all those events that are in SLAVESIDE_DISABLED state. Today I script an iteration over the events an constructing the ALTER EVENT…ENABLE statement one by one. Just activate those!
  • SKIP GTID TRANSACTION;
    I have a transaction on slave that I want to skip, and there’s GTID. The sequence of

    STOP SLAVE;
    SET GTID_NEXT="...";
    BEGIN;
    COMMIT;
    SET GTID_NEXT="AUTOMATIC";
    START SLAVE;

    is just something I don’t want to do. To compute the GTID_NEXT; to open a transaction; to use session variables; this may seem straightforward to import the above from a shell script, but calculating the next GTID is not entirely trivial; issuing the above from your programming language makes for a weird “all these have to be in the same session AND you’re going to do a transaction meanwhile”. With golang it’s actually a problem.
    Make it simple for me. I’m willing to do the STOP/START SLAVE.

  • BINLOG ENTRY ‘<arbitrary text>’;
    Very selfishly, I want to be able to inject a comment into the binary log, of arbitrary text. I want this comment to appear in SBR format, as if it were a DDL.
    My selfish reason: injection of Pseudo-GTID. But I can see various other use cases, such as application level injection of app-logic checkpointing; chef/glu injection of “code deployed at this time”; application injection of “daily audit done to this point”. This is too cool and too easy to skip.
  • SHOW RELAY LOGS;
    Similar to SHOW BINARY LOGS;
  • PURGE RELAY LOGS TO ‘…’;
    Similar to PURGE BINARY LOGS TO ‘…’;
    It’s time relay logs stopped being 2nd class citizens.
  • SHOW NONBLOCKING [GLOBAL|SESSION] STATUS;
    Issue a SHOW GLOBAL|SESSION STATUS query that only shows those variables for which it does not need to block. i.e. this is a safe, fast “show me everything you’ve got that I won’t need to pay for”.

Yes, yes, statements are also features, I know.

 

]]>
https://shlomi-noach.github.io/blog/mysql/new-statements-id-like-to-see-in-mysql-5-8/feed 1 7407
New features I’d like to see in MySQL 5.8 https://shlomi-noach.github.io/blog/mysql/new-features-id-like-to-see-in-mysql-5-8 https://shlomi-noach.github.io/blog/mysql/new-features-id-like-to-see-in-mysql-5-8#comments Wed, 07 Oct 2015 08:02:16 +0000 https://shlomi-noach.github.io/blog/?p=7417 Following up on Morgan Tocker’s What would you like to see in MySQL 5.8?, having attended and participated at the brainstorming at Percona Live Amsterdam, and publishing this post while failing to comply with any of Morgan’s suggested media, these are the features I would like to see in MySQL 5.8:

  • Dynamicly enable/disable log-bin and log-slave-updates
    Today, when changing chef/puppet role of a server from a simple slave to an intermediate master and vice versa, a MySQL restart is required. This is a very big pain which makes replication automation complex, not to mention warmup times.
  • nice“.
    I want to be able to execute a query that is nice, i.e has lower priority; will not consume all resources; will stall/throttle so as to allow other queries to complete. Luis asked and I said this could be on a per statement basis, e.g. add a SQL_NICE query hint. But I’m unsure that would be the correct behavior. It also makes sense to do so on a per connection basis (perhaps provide connection attributed to hint niceness?).
  • Online-ier ALTER TABLE. I would in particular want it to apply the nice feature, above. Otherwise throttle by user defined metrics.
  • Online-ier ALTER TABLE in replication stream.  Can the slaves run the ALTER statement in parallel?
  • Re-Group Commit: in MTS, and when intermediate masters involved, copy+paste the group commit as applied on master as working downstream. I suspect this is easily achievable. The result: same parallelism for replicating slaves in all levels, whether they replicate directly from master or from 2nd, 3rd tier intermediate masters. Today parallelism decreases as we go downstream.
  • Global user-defined-variables. I want to be able to define arbitrary (global) variables that I can later query via SELECT @@global.arbitrary. This would be similar to HTML 5‘s “data-*” attributes. I often wish I could tell & ask MySQL my puppet role; or the server status (is it live? Is it offline? Does it belong to a specific pool? etc.). Similar to “loose-*” syntax, this could be a “data-*” or “user-*” name prefix system.

I will follow up on new statements I would like to see in MySQL 5.8.

The brainstorming session at PerconaLive, I should note, was pure joy, and apart from getting two nice furry dolphins I enjoyed the engagement, the diversity of ideas, and the fact Oracle engineers (Mark in particular) were very busy taking notes or otherwise openly discussing the viability of some requested features.

 

]]>
https://shlomi-noach.github.io/blog/mysql/new-features-id-like-to-see-in-mysql-5-8/feed 5 7417
Baffling 5.7 global/status variables issues, unclean migration path https://shlomi-noach.github.io/blog/mysql/baffling-5-7-globalstatus-variables-issues-unclean-migration-path https://shlomi-noach.github.io/blog/mysql/baffling-5-7-globalstatus-variables-issues-unclean-migration-path#comments Fri, 07 Aug 2015 12:39:59 +0000 https://shlomi-noach.github.io/blog/?p=7327 MySQL 5.7 introduces a change in the way we query for global variables and status variables: the INFORMATION_SCHEMA.(GLOBAL|SESSION)_(VARIABLES|STATUS) tables are now deprecated and empty. Instead, we are to use the respective performance_schema.(global|session)_(variables|status) tables.

But the change goes farther than that; there is also a security change. Oracle created a pitfall of 2 changes at the same time:

  1. Variables/status moved to a different table
  2. Privileges required on said table

As an example, my non-root user gets:

mysql> show session variables like 'tx_isolation';
ERROR 1142 (42000): SELECT command denied to user 'normal_user'@'my_host' for table 'session_variables'

Who gets affected by this? Nearly everyone and everything.

  • Your Nagios will not be able to read status variables
  • Your ORM will not be able to determine session variables
  • Your replication user will fail connecting (see this post by Giuseppe)
  • And most everyone else.

The problem with the above is that involves two unrelated changes to your setup, which are not entirely simple to coordinate:

  1. Change your app code to choose the correct schema (information_schema vs. performance_schema)
  2. GRANT the permissions on your database

Perhaps at this point you still do not consider this to be a problem. You may be thinking: well, let’s first prepare by creating the GRANTs, and once that is in place, we can, at our leisure, modify the code.

Not so fast. Can you really that simply create those GRANTs?

Migration woes

How do you migrate to a new MySQL version? You do not reinstall all your servers. You want an easy migration path, and that path is: introduce one or two slaves of a newer version, see that everything works to your satisfaction, slowly upgrade all your other slaves, eventually switchover/upgrade your master.

This should not be any different for 5.7. We would like to provision a 5.7 slave in our topologies and just see that everything works. Well, we have, and things don’t just work. Our Nagios stops working for that 5.7 slave. Orchestrator started complaining (by this time I’ve already fixed it to be more tolerant for the 5.7 problems so no crashes here).

I hope you see the problem by now.

You cannot issue a GRANT SELECT ON performance_schema.global_variables TO ‘…’ on your 5.6 master.

The table simply does not exist there, which means the statement will not go to binary logs, which means it will not replicate on your 5.7 slave, which means you will not be able to SHOW GLOBAL VARIABLES on your slave, which means everything remains broken.

Yes, you can issue this directly on your 5.7 slaves. It’s doable, but undesired. It’s ugly in terms of automation (and will quite possibly break some assumptions and sanity checks your automation uses); in terms of validity testing. It’s unfriendly to GTID (make sure to SET SQL_LOG_BIN=0 before that).

WHY in the first place?

It seems like a security thing. I’m not sure whether this was intended. So you prevent a SHOW GLOBAL VARIABLES for a normal user. Makes sense. And yet:

mysql> show global variables like 'hostname';
ERROR 1142 (42000): SELECT command denied to user 'normal_user'@'my_host' for table 'global_variables'

mysql> select @@global.hostname;
+---------------------+
| @@global.hostname   |
+---------------------+
| myhost.mydomain.com |
+---------------------+

mysql> select @@version;
+--------------+
| @@version    |
+--------------+
| 5.7.8-rc-log |
+--------------+

Seems like I’m allowed access to that info after all. So it’s not strictly a security design decision. For status variable, I admit, I don’t have a similar workaround.

Solutions?

The following are meant to be solutions, but do not really solve the problem:

  • SHOW commands. SHOW GLOBAL|SESSION VARIABLES|STATUS will work properly, and will implicitly know whether to provide the results via information_schema or performance_schema tables.
    • But, aren’t we meant to be happier with SELECT queries? So that I can really do stuff that is smarter than LIKE ‘variable_name%’?
    • And of course you cannot use SHOW in server side cursors. Your stored routines are in a mess now.
    • This does not solve the GRANTs problem.
  • show_compatibility_56: an introduced variable in 5.7, boolean. It truly is a time-travel-paradox novel in disguise, in multiple respects.
    • Documentation introduces it, and says it is deprecated.
      • time-travel-paradox :O
    • But it actually works in 5.7.8 (latest)
      • time-travel-paradox plot thickens
    • Your automation scripts do not know in advance whether your MySQL has this variable
      • Hence SELECT @@global.show_compatibility_56 will produce an error on 5.6
      • But the “safe” way of SHOW GLOBAL VARIABLES LIKE ‘show_compatibility_56’ will fail on a privilege error on 5.7
      • time-travel-paradox :O
    • Actually advised by my colleague Simon J. Mudd, show_compatibility_56 defaults to OFF. I support this line of thought. Or else it’s old_passwords=1 all over again.
    • show_compatibility_56 doesn’t solve the GRANTs problem.
    • This does not solve any migration path. It just postpones the moment when I will hit the same problem. When I flip the variable from “1” to “0”, I’m back at square one.

Suggestion

I claim security is not the issue, as presented above. I claim Oracle will yet again fall into the trap of no-easy-way-to-migrate-to-GTID in 5.6 if the current solution is unchanged. I claim that there have been too many changes at once. Therefore, I suggest one of the alternative two flows:

  1. Flow 1: keep information_schema, later migration into performance_schema
    • In 5.7information_schema tables should still produce the data.
    • No security constraints on information_schema
    • Generate WARNINGs on reading from information_schema (“…this will be deprecated…”)
    • performance_schema also available. With security constraints, whatever.
    • In 5.8 remove information_schema tables; we are left with performance_schema only.
  2. Flow 2: easy migration into performance_schema:
    • In 5.7, performance_schema tables should not require any special privileges. Any user can read from them.
    • Keep show_compatibility_56 as it is.
    • SHOW commands choose between information_schema or performance_schema on their own — just as things are done now.
    • In 5.8performance_schema tables will require SELECT privileges.

As always, I love the work done by the engineers; and I love how they listen to the community.

Comments are most welcome. Have I missed the simple solution here? Are there even more complications to these features? Thoughts on my suggested two flows?

[UPDATE 2015-08-19]

Please see this followup by Morgan Tocker of Oracle.

]]>
https://shlomi-noach.github.io/blog/mysql/baffling-5-7-globalstatus-variables-issues-unclean-migration-path/feed 5 7327
Percona Live 2015: Reflections https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections#respond Sat, 18 Apr 2015 01:41:07 +0000 https://shlomi-noach.github.io/blog/?p=7223 Some personal reflections on PerconaLive 2015:

Percona acquires Tokutek

Well done! Tokutek develops the TokuDB storage engine for MySQL and TokuMX engine for MongoDB. I will discuss the MySQL aspect only.

TokuDB was released as open source in 2013. It has attained a lot of traction and I have used it myself for some time. I met issues with locking or otherwise operational difficulties which I reported, and otherwise was fascinated by such features as great compression, online schema changes, and more.

Recently another company, InfiniDB, that also released its MySQL-backed codebase as open source, went out of business. I was afraid the same might happen to Tokutek.

I see Percona’s purchase as a very good move for the community. I saw a lot of TokuDB interest in Percona for some time now, and it is clearly interested in the technology. I expect they will add their own hands-on experience into the development of more operations-friendly features; put effort in solving locking issues (it’s been a while since I last checked, of course some of these may have been addressed by now). I am guessing they will work on a Galera/TokuDB integration and offer a “Toku-XtraDB-Cluster”.

TokuDB can compete with InnoDB in many places, while in others each will have its distinct advantage.

I see this is as good news for the community.

Community Awards and Lightning Talks

On a completely different subject, I believe it is commonly accepted that this year’s setup for the community awards & lightning talks was unsuccessful. The noise was astounding, human traffic was interrupting and overall this was a poor experience. We (Giuseppe Maxia, Kortney Runyan & myself) made a quick, informal brainstorming on this and came up with a couple ideas. One of which we hope to try in the upcoming Percona Live Europe – Amsterdam.

We apologize to the speakers for the difficulties.

Percona Live Europe – Amsterdam

Haha! Having recently relocated to the Netherlands I’m of course very happy. But regardless, Percona Live London was fun – and yet running on low fuel. I think it was a great idea to change location (and more locations expected in the future). This is the path taken by such conferences as OSCon, Velocity, Strata and more. Amsterdam in particular, as I’ve recently learned, is especially appreciated by many. I think this conf will do great!

Woz

And now for something completely different. Woz’ talk was that. I’m happy he came; I appreciate that he discussed education; and it was fun.

]]>
https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections/feed 0 7223
Percona Live 2015: Reflections; the Apache CCLA offer https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections-the-apache-ccla-offer https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections-the-apache-ccla-offer#respond Sat, 18 Apr 2015 01:11:56 +0000 https://shlomi-noach.github.io/blog/?p=7218 Facebook, Google, Twitter, LinkedIn, Alibaba, MariaDB, Percona team up and offer Oracle all public changes under the Apache CCLA

Read again please.

My one word summary of this is: Romantic. In the most positive sense.

Disclaimer: I am not a lawyer; this is my understanding of the current status and of the offer.

Summarizing the deal: the teams participating with WebScaleSQL would like to push code upstream. Current legal issues limit their options. Existing patches/contributions from Percona & MariaDB are licensed by GPLv2, which Oracle cannot import as it distributes a commercial, closed source, edition, in addition to its open source MySQL community edition.

So what happens is that there is a lot of free code, great patches, new features out there, that are only available via MariaDB or WebscaleSQL or Percona Server, but not in the Oracle MySQL code base. This, in turn, means Oracle re-implements many features originating from said companies. And, more importantly, said companies need to routinely rebase their code on new Oracle releases, repeating tedious work.

The offer is that Oracle agrees to the Apache CCLA as a license by which it would be able to incorporate contributions. Oracle would then be able to use incorporated code in both open source and commercial edition. Oracle will choose what code to incorporate; hopefully many patches will be accepted upstream, and the community will benefit from a rich featureset, rapid developed MySQL server.

Clearly a lot of work, persuasion, lawyer time, discussions etc. have been invested in this effort. I would like to add my humble +1/like/favorite/whathaveyou. You may add yours by letting Oracle know your opinion on the subject. Media tools are great for this.

 

 

]]>
https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections-the-apache-ccla-offer/feed 0 7218
Three wishes for a new year https://shlomi-noach.github.io/blog/mysql/three-wishes-for-a-new-year-3 https://shlomi-noach.github.io/blog/mysql/three-wishes-for-a-new-year-3#comments Tue, 23 Sep 2014 09:21:13 +0000 https://shlomi-noach.github.io/blog/?p=7002

Another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Multi-core execution for queries

After having evaluated a few columnar databases, and having seen how a single query gets 24 cores busy, I can’t look at MySQL the same way again. The fact that a single query consumes a single core only doesn’t seem right in the year 2014. Shard-query is a cool application-level attempt to solve the above; I would like to see stuff like this implemented inside the server (or inside the storage engine where possible).

None of my wishes in previous years [2010], [2011], [2012], [2013] came true (and mostly gone worse). I’m still willing to settle for two out of three.

]]>
https://shlomi-noach.github.io/blog/mysql/three-wishes-for-a-new-year-3/feed 7 7002
Some anecdotes I learned at Percona Live https://shlomi-noach.github.io/blog/mysql/some-anecdotes-i-learned-at-percona-live https://shlomi-noach.github.io/blog/mysql/some-anecdotes-i-learned-at-percona-live#respond Tue, 08 Apr 2014 02:06:20 +0000 https://shlomi-noach.github.io/blog/?p=6814 While on the plane back home I wrote down all my impressions from Percona Live 2014. Have lots of TODOs and great ideas to implement. Among all my impressions, there were a few anecdotes worth noting.

  • 5.6 GTID is still unfriendly. It will require complete shutdown & reconfiguration of your entire replication topology; and some companies present hacks around this. Notable, Facebook recoded GTID related code (slave agrees to replicate with GTID even though its master still uses binlog coordinates). Booking.com have their own hack around slowly migrating their topologies. And in a great lightning talk we were shown how to patch MySQL such that the relay logs turn into a  consistent GTID-like coordinate system.
  • Galera replication has been implemented for TokuDB (only active-passive mode, not active-active). This came as a surprise to Tokutek ; apparently Codership did this within a few hours of work. The rest is up for Tokutek!
  • WebScaleSQL is a cool initiative that aims to assist in pushing commonly desired featured back upstream. It is Web Scale. It is also not a productio0n distribution and I do not expect it to be, It is not a fork that is meant for the common DBA to download and deploy.
  • Tungsten replicator has MySQL to Hadoop replication using staging tables – an auditing of changes that are group-deployed on Hadoop.
  • Still so many people unfamiliar with MySQLSandbox. It’s such a basic tool, especially for testing and local installations.
  • Still misconceptions about common_schema. Yes, I do use it on production.
  • Everyone has the same problems 🙂
  • Replication is still queen of MySQL’s featureset. We’re still all about failing over, promoting, scaling via replication.
  • Linux rules. Where two MacBooks failed to connect to the projector, my Lenovo/Ubuntu Linux did the job just fine and saved the day.
]]>
https://shlomi-noach.github.io/blog/mysql/some-anecdotes-i-learned-at-percona-live/feed 0 6814
Why delegating code to MySQL Stored Routines is poor engineering practice https://shlomi-noach.github.io/blog/mysql/why-delegating-code-to-mysql-stored-routines-is-poor-engineering-practice https://shlomi-noach.github.io/blog/mysql/why-delegating-code-to-mysql-stored-routines-is-poor-engineering-practice#comments Thu, 06 Feb 2014 08:32:17 +0000 https://shlomi-noach.github.io/blog/?p=6713 I happen to use stored routines with MySQL. In fact, my open source project common_schema heavily utilizes them. DBA-wise, I think they provide with a lot of power (alas, the ANSI:SQL 2003 syntax feels more like COBOL than a sane programming language, which is why I use QueryScript instead).

However I wish to discuss the use of stored routines as integral part of your application code, which I discourage.

The common discussion on whether to use or not use stored routines typically revolves around data transfer (with stored routines you transfer less data since it’s being processed on server side), security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API) and performance (with MySQL this is not what you would expect, as routines are interpreted & their queries re-evaluated, as opposed to other RDBMS you may be used to).

But I wish to discuss the use of stored routines from an engineering standpoint. The first couple of points I raise are cultural/behavioural.

2nd grade citizens

Your stored routines are not likely to integrate well with your IDE. While your Java/Scala/PHP/Ruby/whatnot code comfortably lies within your home directory, the stored routines live in their own space: a database container. They’re not as visible to you as your standard code. Your IDE is unaware of their existence and is unlikely to have the necessary plugin/state of mind to be able to view these.

This leads to difficulty in maintaining the code. People typically resort to using some SQL-oriented GUI tool such as MySQL Workbench, SequelPro or other, commercial tools. But these tools, while make it easy to edit your routine code, do not integrate (well?) with your source control. I can’t say I’ve used all GUI tools; but how many of them will have Git/SVN/Mercurial connectors? How many of them will keep local history changes once you edit a routine? I’m happy to get introduced to such a tool.

Even with such integration, you’re split between two IDEs. And if you’re the command line enthusiast, well, you can’t just svn ci -m “fixed my stored procedure bug”. Your code is simply not in your trunk directory.

It can be done. You could maintain the entire routine code from within your source tree, and hats off to all those who do it. Most will not. See later on about deployments for more on this.

Testing

While engineers are keen on writing unit tests for every class and method they create, they are less keen on doing the same for stored routines. This is an observation, having seen many instalments. And I can tell you why: your stored routine testing will not integrate well with your JUnit/PHPUnit/…

There are testing frameworks for databases, and indeed I hacked my own mini unit testing code with common_schema. But it’s a different testing framework. You might also have realized by now that testing databases is somewhat different. It can be done, and hats off again to those that implement it as common practice. Many don’t. Database are often more heavyweight to test. Not all operations done by routines are easily rolled back, which leads to having to rebuild the entire dataset before tests. This in itself leads to longer test periods and a need for multiple test databases so as to allow for concurrent builds.

How many companies practice both version control and unit testing over their routine code? I believe not many (and am happy to hear about those who do). To be more direct, of all the companies I ever consulted to: I have never seen one that does both.

Debugging

MySQL stored routines do not have built in debugging capabilities. To debug your routines, you will have to use one of two methods:

  • Simulate your routine code (ie mimic their execution on top of some interpreter). There are tools to do that. For me this is a complete NO GO and utterly untrustworthy. You can mimic what you think is how the routine should behave, but never they full behaviour. While developing common_schema I came upon plenty weird behaviour, some of it bugs, that you just can’t build into your emulation.
  • Inject debugging code into your routine code. I do that with RDebug. You can do breakpoints, step into, step out, most of the interesting stuff. Other tools do that as well. It is not the right way to go: you’re essentially modifying your code, placing more locks, communicating, and losing some functionality. It is a necessary evil solution for a necessary evil programming method… How good can that be?

The right way to go would be to have debugging API built into the MySQL server.

But, wait, that would still be next to worthless, since our discussion is over programming with stored routines: letting your application call upon stored routines in your database. Until the day where I could use my IntelliJ debugger to step from my java method which calls upon a stored procedure, and into the stored procedure itself, debugging your code is completely detached from your stored routine debugging.

Refactoring & deploying

Say you wanted to add a column to your table: you would go ahead and add it, and perhaps populate it. You would then modify your application code to support this new column, and deploy. Say you wanted to drop a table column. You would first deploy changes to your application code that ignore said column, and once the code is in place you would go and actually make the DROP.

How do you do the same with a stored routine? Support your routine accepts two parameters, and you wish to add a third?

There is no support for optional parameters. Your routine either accepts two parameters or three. Your application code will have to provide the exact number of parameters. You will have to deploy both your SQL changes and your application changes at the same time. This is by definition impossible, unless you are OK with a stop the world approach, which is unlikely in production.

Code constraints

One solution to the above is to create a new routines. Somehow “overload” it. But you can’t overload a stored routine; you’ll have to create a routine by a new name. This will allow you to slowly and smoothly migrate between the two.

Ahem, smoothly? How easy is it to find all invocations of a certain routines from your code? It will be typically lie in some String, or within some XML config file. There is no safe “find references to this procedure” IDE mechanism. There is no constraint in your IDE that will tell you “there is no such procedure” if you misspell the name.

Trash bin

Suppose you overcame the above. You now have two routines. You need to remember to DROP the old one, right? Will you?

When presenting common_schema, a common question I ask the audience is as follows:

Suppose I accessed your database and listed the entire set of stored functions and procedures. How many of them are you not even sure are in use anymore? How many of them you think you can DROP, but are too afraid to, and keep them in just in case?

I wouldn’t commonly ask that question had it not always provides a common nodding and smiling in the audience. People forget to drop their routines, and then forget about them, and are never sure whether they are used (your IDE doesn’t easily tell you that, remember? Sure, you can grep around; that’s not what most engineers would do). And those routines pile up to become trash.

Data or code?

Last but not least: a stored routine is a piece of code, right? Well, as far as the database is concerned, it’s really a piece of data. It’s located within a schema. It’s stored. It is an integral part of your data set: when you back up your data, you’re most likely to backup the code as well. When you restore, you’re likely to restore both. There are obvious advantages to that, DB-wise. Or should I say, DBA-wise. Engineering-wise? Does a database-restore operation count as code deployment? We can argue over beer.

Final notes

Having said all that: yes, I’m using an occasional stored routine. I see these occasions as a necessary evil, and sometimes it’s just the correct solution.

I’m happy to know what methods have been developed out there to overcome the above, please share; and please feel free to contradict the above.

]]>
https://shlomi-noach.github.io/blog/mysql/why-delegating-code-to-mysql-stored-routines-is-poor-engineering-practice/feed 11 6713