{"id":7848,"date":"2018-02-20T09:29:58","date_gmt":"2018-02-20T07:29:58","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7848"},"modified":"2020-05-11T08:28:06","modified_gmt":"2020-05-11T06:28:06","slug":"using-dbdeployer-in-ci-tests","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/using-dbdeployer-in-ci-tests","title":{"rendered":"Using dbdeployer in CI tests"},"content":{"rendered":"<p>I was very pleased when Giuseppe Maxia (aka <a href=\"http:\/\/datacharmer.blogspot.co.il\/\">datacharmer<\/a>) unveiled <a href=\"https:\/\/github.com\/datacharmer\/dbdeployer\">dbdeployer<\/a> in his talk at <a href=\"http:\/\/lefred.be\/content\/pre-fosdem-mysql-day-2018-the-schedule\/\">pre-FOSDEM MySQL day<\/a>. The announcement came just at the right time. I wish to briefly describe how we use <code>dbdeployer<\/code> (work in progress).<\/p>\n<h3>The case for gh-ost<\/h3>\n<p>A user opened <a href=\"https:\/\/github.com\/github\/gh-ost\/issues\/538\">an issue<\/a> on <a href=\"https:\/\/github.com\/github\/gh-ost\"><code>gh-ost<\/code><\/a>, and the user was using MySQL <code>5.5<\/code>. <code>gh-ost<\/code> is being tested on <code>5.7<\/code> where the problem does not reproduce. A discussion with Gillian Gunson raised the concern of not testing on all versions. Can we run <code>gh-ost<\/code> tests for all MySQL\/Percona\/MariaDB versions? Should we? How easy would it be?<\/p>\n<h3>gh-ost tests<\/h3>\n<p><code>gh-ost<\/code> has three different test types:<\/p>\n<ul>\n<li>Unit tests: these are plain <code>golang<\/code> logic tests which are very easy and quick to run.<\/li>\n<li>Integration tests: the topic of this post, see following. Today these do not run as part of an automated CI testing.<\/li>\n<li>System tests: putting our production tables to the test, continuously migrating our production data on dedicated replicas, verifying checksums are identical and data is intact, <a href=\"https:\/\/githubengineering.com\/mysql-testing-automation-at-github\/#schema-migrations\">read more<\/a>.<\/li>\n<\/ul>\n<p>Unit tests are already running as part of automated CI (every PR is subjected to those tests). Systems tests are clearly tied to our production servers. What&#8217;s the deal with the integration tests?<!--more--><\/p>\n<h3>gh-ost integration tests<\/h3>\n<p>The <code>gh-ost<\/code> <a href=\"https:\/\/github.com\/github\/gh-ost\/tree\/master\/localtests\">integration tests <\/a>are a suite of scenarios which verify <code>gh-ost<\/code>&#8216;s operation is sound. These scenarios are mostly concerned with data types, special <code>alter<\/code> statements etc. Is converting <code>DATETIME<\/code> to <code>TIMESTAMP<\/code> working properly? Are <code>latin1<\/code> columns being updated correctly? How about renaming a column? Changing a <code>PRIMARY KEY<\/code>? Column reorder? <code>5.7<\/code> JSON values? And so on. Each test will recreate the table, run migration, stop replication, check the result, resume replication&#8230;<\/p>\n<p>The environment for these tests is a master-replica setup, where <code>gh-ost<\/code> modifies on the table on the replica and can then checksum or compare both the original and the altered <em>ghost<\/em> table.<\/p>\n<p>We develop <code>gh-ost<\/code> internally at GitHub, but it&#8217;s also an open source project. We have our own internal CI environment, but then we also wish the public to have visibility into test failures (so that a user can submit a PR and get a reliable automated feedback). We use <a href=\"https:\/\/travis-ci.org\/\">Travis CI<\/a> for the public facing tests.<\/p>\n<p>To run <code>gh-ost<\/code>&#8216;s integration tests as described above as part of our CI tests we should be able to:<\/p>\n<ul>\n<li>Create a master\/replica setup in CI.<\/li>\n<li>Actually, create a master\/replica setup in <em>any<\/em> CI, and namely in Travis CI.<\/li>\n<li>Actually, create multiple master\/replica setups, of varying versions and vendors, in any ci, including both our internal CI and Travis CI.<\/li>\n<\/ul>\n<p>I was about to embark on a MySQL Sandbox setup, which I was not keen on. But FOSDEM was around the corner and I had other things to complete beforehand. Lucky me, <code>dbdeplyer<\/code> stepped in.<\/p>\n<h3>dbdeployer<\/h3>\n<p><code>dbdeployer<\/code> is a rewrite, a replacement to <a href=\"https:\/\/mysqlsandbox.net\/\">MySQL Sandbox<\/a>. I&#8217;ve been using MySQL Sandbox for many years, and my laptop is running two sandboxes at this very moment. But MySQL Sandbox has a few limitations or complications:<\/p>\n<ul>\n<li>Perl. Versions of Perl. Dependencies of packages of Perl. I mean, it&#8217;s fine, we can automate that.<\/li>\n<li>Command line flag complexity: I always get lost in the complexity of the flags.<\/li>\n<li>Get it right or prepare for battle: if you deployed something, but not the way you wanted, there&#8217;s sometimes limbo situations where you cannot re-deploy the same sandbox again, or you should start deleting files everywhere.<\/li>\n<li>Deploy, not remove. Adding a sandbox is one thing. How about removing it?<\/li>\n<\/ul>\n<p><code>dbdeployer<\/code> is a <code>golang<\/code> rewrite, which solves the dependency problem. It ships as a single binary and nothing more is needed. It is simple to use. While it generates the equivalence of a that of a MySQL Sandbox, it does so with less command line flags and less confusion. There&#8217;s first class handling of the MySQL binaries: you unpack MySQL tarballs, you can list what&#8217;s available. You can then create sandbox environments: replication, standalone, etc. You can then delete those.<\/p>\n<p>It&#8217;s pretty simple and I have not much more to add &#8212; which is the best thing about it.<\/p>\n<p>So, with <code>dbdeployer<\/code> it is easy to create a master\/replica. Something like:<\/p>\n<blockquote>\n<pre>dbdeployer unpack path\/to\/5.7.21.tar.gz --unpack-version=5.7.21 --sandbox-binary <span class=\"pl-smi\">${PWD}<\/span>\/sandbox\/binary\ndbdeployer replication 5.7.21 --nodes 2 --sandbox-binary <span class=\"pl-smi\">${PWD}<\/span>\/sandbox\/binary --sandbox-home <span class=\"pl-smi\">${PWD}<\/span>\/sandboxes --gtid --my-cnf-options log_slave_updates --my-cnf-options log_bin --my-cnf-options binlog_format=ROW<\/pre>\n<\/blockquote>\n<h3>Where does it all fit in, and what about the MySQL binaries though?<\/h3>\n<p>So, should <code>dbdeployer<\/code> be part of the <code>gh-ost<\/code> repo? And where does one get those MySQL binaries from? Are they to be part of the <code>gh-ost<\/code> repo? Aren&#8217;t they a few GB to extract?<\/p>\n<p>Neither <code>dbdeployer<\/code> nor MySQL binaries should be added to the <code>gh-ost<\/code> repo. And fortunately, Giuseppe also <a href=\"https:\/\/github.com\/datacharmer\/mysql-docker-minimal\">solved<\/a> the MySQL binaries problem.<\/p>\n<p>The scheme I&#8217;m looking at right now is as follows:<\/p>\n<ul>\n<li>A new public repo, <a href=\"https:\/\/github.com\/github\/gh-ost-ci-env\">gh-ost-ci-env<\/a> is created. This repo includes:\n<ul>\n<li><code>dbdeployer<\/code> compiled binaries<\/li>\n<li>Minimal MySQL tarballs for selected versions. Those tarballs are reasonably small: between `14MB` and `44MB` at this time.<\/li>\n<\/ul>\n<\/li>\n<li><code>gh-ost<\/code>&#8216;s CI to <code>git clone\u00a0https:\/\/github.com\/github\/gh-ost-ci-env.git<\/code> (<a href=\"https:\/\/github.com\/github\/gh-ost\/pull\/546\/files#diff-d78e7acd07ce8a2aad92026ae10cbec2R7\">code<\/a>)<\/li>\n<li><code>gh-ost<\/code>&#8216;s CI to setup a master\/replica sandbox (<a href=\"https:\/\/github.com\/github\/gh-ost\/pull\/546\/files#diff-d78e7acd07ce8a2aad92026ae10cbec2R13\">one<\/a>, <a href=\"https:\/\/github.com\/github\/gh-ost\/pull\/546\/files#diff-d78e7acd07ce8a2aad92026ae10cbec2R15\">two<\/a>).<\/li>\n<li><a href=\"https:\/\/github.com\/github\/gh-ost\/pull\/546\/files#diff-d78e7acd07ce8a2aad92026ae10cbec2R30\">Kick the tests<\/a>.<\/li>\n<\/ul>\n<p>The above is a work in progress:<\/p>\n<ul>\n<li>At this time only runs a single MySQL version.<\/li>\n<li>There is a known issue where after a test, replication may take time to resume. Currently on slower boxes (such as the Travis CI containers) this leads to failures.<\/li>\n<\/ul>\n<p>Another concern I have at this time is build time. For a single MySQL version, it takes some <code>5-7<\/code> minutes on my local laptop to run all integration tests. It will be faster on our internal CI. It will be considerably <em>slower<\/em> on Travis CI, I can expect between <code>10m - 15m<\/code>. Add multiple versions and we&#8217;re looking at a <code>1hr<\/code> build. Such long build times will affect our development and delivery times, and so we will split them off the main build. I need to consider what the best approach is.<\/p>\n<p>That&#8217;s all for now. I&#8217;m pretty excited for the potential of <code>dbdeployer<\/code> and will be looking into incorporating the same for <code>orchestrator<\/code> CI tests.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was very pleased when Giuseppe Maxia (aka datacharmer) unveiled dbdeployer in his talk at pre-FOSDEM MySQL day. The announcement came just at the right time. I wish to briefly describe how we use dbdeployer (work in progress). The case for gh-ost A user opened an issue on gh-ost, and the user was using MySQL [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[136,125,57,8,131],"class_list":["post-7848","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-dbdeployer","tag-gh-ost","tag-open-source","tag-replication","tag-testing"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-22A","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7848","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7848"}],"version-history":[{"count":5,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7848\/revisions"}],"predecessor-version":[{"id":7853,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7848\/revisions\/7853"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7848"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7848"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7848"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}