common_schema roadmap thoughts

I’m happy with common_schema; it is in fact a tool I use myself on an almost daily basis. I’m also happy to see that it gains traction; which is why I’m exposing a little bit of my thoughts on general future development. I’d love to get feedback.

Supported versions

At this moment, common_schema supports MySQL >= 5.1, all variants. This includes 5.5, 5.6, MySQL, Percona Server & MariaDB.

5.1 is today past end of line, and I’m really missing the SIGNAL/RESIGNAL syntax that I would like to use; I can do in the meanwhile with version-specific code such as /*!50500 … */. Nevertheless, I’m wondering whether I will eventually have to:

  • Support different branches of common_schema (one that supports 5.1, one that supports >= 5.5)
  • Stop support for 5.1

Of course community-wise, the former is preferred; but I have limited resources, so I would like to make a quick poll here:

[poll id=”3″]

I’ll use the poll’s results as a vague idea of what people use and want. Or please use comments below to sound your voice!

rdebug

This was a crazy jump at providing a stored routine debugger and debugging API. From some talk I made I don’t see this getting traction. For the time being, I don’t see that I will concentrate my efforts on this. Actually it is almost complete. You can step-into, step-out, step-over, set breakpoints, read variables, modify variables — it’s pretty cool. Continue reading » “common_schema roadmap thoughts”

common_schema 2.1 released: advanced & improved split(), persistent script tables, more schema analysis, and (ahem) charts!

common_schema 2.1 is released! common_schema is your free & open source companion schema within your MySQL server, providing with a function library, scripting capabilities, powerful routines and ready-to-apply information and recommendations.

New and noteworthy in version 2.1:

  • Better QueryScript’s split() functionality
  • Persistent tables for QueryScript: no long held temporary tables
  • Index creation analysis, further range partition analysis
  • grant_access(): allow everyone to use common_schema
  • Ascii charts, google charts
  • debugged_routines: show routines with debug code

Other minor enhancements and bugfixes not listed.

Here’s a breakdown of the above:

split() enhancements

split is one of those parts of common_schema that (should) appeal to every DBA. Break a huge transaction automagically into smaller chunks, and don’t worry about how it’s done. If you like, throttle execution, or print progress, or…

split enhancements include:

  • A much better auto-detection-and-selection of the chunking index. split now consults all columns covered by the index, and uses realistic heuristics to decide which UNIQUE KEY on your table is best for the chunking process. A couple bugs are solved on the way; split is much smarter now.
  • Better support for multi-column chunking keys. You may now utilize the start/stop parameters even on multi column keys, passing a comma delimited of values for the split operation to start/end with, respectively. Also fixed issue for nonexistent start/stop values, which are now valid: split will just keep to the given range.
  • split no longer requires a temporary table open through the duration of its operation. See next section. Continue reading » “common_schema 2.1 released: advanced & improved split(), persistent script tables, more schema analysis, and (ahem) charts!”

common_schema & openark-kit in the media: #DBHangOps, OurSQL

#DBHangOps

I had the pleasure of joining into @DBHangOps today, and speak about common_schema and openark-kit. What was meant to be a 15 minute session turned to be 50 — sorry, people, I don’t talk as much at home, but when it comes to my pet projects…

I also realized I was missing on a great event: DBHangOps is a hangout where you can chat and discuss MySQL & related technologies with friends and colleagues, with whom you typically only meet at conferences. I will certainly want to attend future events.

Thanks to John Cesario and Geoffrey Anderson who invited me to talk, and to the friends and familiar faces who attended; I was happy to talk about my work, and very interested in hearing about how it’s being put to use. We also had time to discuss ps_helper with no other than Mark Leith!

The video is available on Twitter/YouTube.

OurSQL

openark-kit has also been featured on the OurSQL podcast by Sheeri & Gerry, who did great coverage of some tools. I will disclose that more is to come; I’m happy this is in capable hands and look further to hear the next episode!

 

Working at Outbrain

I’m starting my new position as Senior Software Engineer at Outbrain. While I’m still to fully grasp the scope of my work, I will be handling data on all things MySQL, Hadoop, Cassandra, Kafka, more, more and more, at very large volumes.

I find Outbrain a great supporter of community and open source:

  • Open source solutions are highly preferred over commercial solutions
  • Outbrain is an early adopter for many open source technologies
  • Employees are encouraged to contribute back as much as possible to the open source community
  • It is happy to pay for support from companies developing open source
  • Outbrain is a major participant in the ILTechTalks initiative: free, volunteer, professional exchange of knowledge

I can already testify I will be working with very smart and knowledgeable people; I expect to learn a lot and of course contribute from my own knowledge and skills.

I’ve been in process with some other companies, I’d like to kindly thank those companies I’ve been in touch with for their good will.

 

Easy SELECT COUNT(*) with split()

The two conservative ways of getting the number of rows in an InnoDB table are:

  • SELECT COUNT(*) FROM my_table:
    provides with an accurate number, but makes for a long running transaction which take ages on large tables. Long transactions make for locks
  • SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA=’my_schema’ AND TABLE_NAME=’my_table’, or get same info via SHOW TABLE STATUS.
    Gives immediate response, but the value can be way off; it can be two times as large as real value, or half the value. For query execution plans this may be a “good enough” estimation, but typically you just can’t trust it for your own purposes.

Get a good estimate using chunks

You can get a good estimate by calculating the total number of rows in steps. Walk the table 1,000 rows at a time, and keep a counter. Each chunk is its own transaction, so, if the table is modified while counting, the final value does not make for an accurate account at any point in time. Typically this should be a far better estimate than TABLE_ROWS.

QueryScript’s split() construct provides you with the means to work this out. Consider this script: Continue reading » “Easy SELECT COUNT(*) with split()”

Converting compressed InnoDB tables to TokuDB 7.0.1

Or: how to make it work in TokuDB version 7.0.1. This is a follow up on a discussion on the tokudb-user group.

Background

I wanted to test TokuDB’s compression. I took a staging machine of mine, with production data, and migrated it from Percona Server 5.5 To MariaDB 5.5+TokuDB 7.0.1. Migration went well, no problems.

To my surprise, when I converted tables from InnoDB to TokuDB, I saw an increase in table file size on disk. As explained by Tim Callaghan, this was due to TokuDB interpreting my compressed table’s “KEY_BLOCK_SIZE=4” as an instruction for TokuDB’s page size. TokuDB should be using 4MB block size, but thinks it’s being instructed to use 4KB. Problem is, you can’t get rid of table options. When one converts a table to InnoDB in ROW_FORMAT=COMPACT, or even to MyISAM, the KEY_BLOCK_SIZE option keeps lurking in the dark.

So until this is hopefully resolved in TokuDB’s next version, here’s a way to go around the problem. Continue reading » “Converting compressed InnoDB tables to TokuDB 7.0.1”

mycheckpoint revision 231 released

A new release for mycheckpoint: lightweight, SQL oriented MySQL monitoring solution.

If you’re unfamiliar with mycheckpoint, well, the one minute sales pitch is: it’s a free and open source monitoring tool for MySQL, which is extremely easy to install and execute, and which includes custom queries, alerts (via emails), and out of the box HTTP server and charting.

This is mostly a maintenance release, with some long-time requested features, and of course solved bugs. Here are a few highlights:

  • Supports MariaDB and MySQL 5.6 (issues with new variables, space padded variables, text-valued variables)
  • Supports alerts via function invocation on monitored host (so not only checking alerts via aggregated data like ‘Seconds_behind_master’ but also by SELECT my_sanity_check_function() on monitored instance). See alerts.
  • Supports single-running-instance via “–single” command line argument
  • Supports strict sql_mode, including ONLY_FULL_GROUP_BY, overcoming bug #69310.
  • Supports sending of pending email HTML report
  • Better re-deployment process
  • Better recognizing of SIGNED/UNSIGNED values
  • Some other improvements in charting, etc.

mycheckpoint is released under the BSD license.

Downloads are available from the project’s page.

opeark-kit revision 196 released

This is a long due maintenance release of openark-kit. This release includes bugfixes and some enhancements, mainly to oak-online-alter-table.

oak-online-alter-table Changes / bug fixes include:

  • Support for keyword-named columns
  • Use of FORCE INDEX due to lack of MySQL’s ability for figure out the chunking key at all times
  • –sleep-ratio option added; allows for sleep time proportional to execution time (as opposed to constant sleep time with –sleep)
  • Support for chunk-retry (in case of deadlock) via –max-lock-retries)
  • Fixed order of cleanup
  • Fixed bug with verbose messages with non-integer chunking key
  • Fixed bug with single-row tables (people, no need for this tool for single row tables :))
  • Friendly verbose messages to remind you what’s being executed

oak-chunk-update changes includes:

  • Verbosing query comment if exists (friendly printing of what’s being executed)

Continue reading » “opeark-kit revision 196 released”

On compiling TokuDB from source

Sharing my experience of compiling TokuDB + MariaDB 5.5. Why? Because I must have this patch to Sphinx 2.0.4.

Note: I was using what seems to be the “old” method of compiling; quoting Leif Walsh:

… We are looking at deprecating that method of building (MariaDB source plus binary fractal tree handlerton).  It only really needed to be that complex when we were closed source.

I also tried the “new” method of compiling, which I couldn’t work out.

Here’s how it goes: TokuDB is newly released as open source. As such, it got a lot of attention, many downloads and I hope it will succeed.

However as stable as the product may be, it’s new to open source, which means anyone compiling it from source is an early adopter (at least for the compilation process).

Installation process

This is an unorthodox, and actually weird process. See section 6 on the Tokutek docs. In order to compile the project you must download:

Percona Live 2013 keynotes: followup questions and discussion

Here are a few questions remained open for me from Percona Live 2013 about things that have been said during keynotes; I will appreciate a discussion on comments. Here goes:

Question #1

Brian Aker (HP) asks Simone Brunozzi (Amazon) what the underlying technology for DynamoDB is. Simone says can’t disclose. Brian says: “it’s MySQL!!”. Simone says: “can’t disclose”. Brian insists: “it’s MySQL!!”

Seriously? I will be very much surprised to learn that DynamoDB uses MySQL; it doesn’t make sense to me. Why would Brian Aker say that though? Did he just mean to tease Simone or is there something I just don’t get? Continue reading » “Percona Live 2013 keynotes: followup questions and discussion”