common_schema & openark-kit in the media: #DBHangOps, OurSQL

#DBHangOps

I had the pleasure of joining into @DBHangOps today, and speak about common_schema and openark-kit. What was meant to be a 15 minute session turned to be 50 — sorry, people, I don’t talk as much at home, but when it comes to my pet projects…

I also realized I was missing on a great event: DBHangOps is a hangout where you can chat and discuss MySQL & related technologies with friends and colleagues, with whom you typically only meet at conferences. I will certainly want to attend future events.

Thanks to John Cesario and Geoffrey Anderson who invited me to talk, and to the friends and familiar faces who attended; I was happy to talk about my work, and very interested in hearing about how it’s being put to use. We also had time to discuss ps_helper with no other than Mark Leith!

The video is available on Twitter/YouTube.

OurSQL

openark-kit has also been featured on the OurSQL podcast by Sheeri & Gerry, who did great coverage of some tools. I will disclose that more is to come; I’m happy this is in capable hands and look further to hear the next episode!

 

Working at Outbrain

I’m starting my new position as Senior Software Engineer at Outbrain. While I’m still to fully grasp the scope of my work, I will be handling data on all things MySQL, Hadoop, Cassandra, Kafka, more, more and more, at very large volumes.

I find Outbrain a great supporter of community and open source:

  • Open source solutions are highly preferred over commercial solutions
  • Outbrain is an early adopter for many open source technologies
  • Employees are encouraged to contribute back as much as possible to the open source community
  • It is happy to pay for support from companies developing open source
  • Outbrain is a major participant in the ILTechTalks initiative: free, volunteer, professional exchange of knowledge

I can already testify I will be working with very smart and knowledgeable people; I expect to learn a lot and of course contribute from my own knowledge and skills.

I’ve been in process with some other companies, I’d like to kindly thank those companies I’ve been in touch with for their good will.

 

Easy SELECT COUNT(*) with split()

The two conservative ways of getting the number of rows in an InnoDB table are:

  • SELECT COUNT(*) FROM my_table:
    provides with an accurate number, but makes for a long running transaction which take ages on large tables. Long transactions make for locks
  • SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA=’my_schema’ AND TABLE_NAME=’my_table’, or get same info via SHOW TABLE STATUS.
    Gives immediate response, but the value can be way off; it can be two times as large as real value, or half the value. For query execution plans this may be a “good enough” estimation, but typically you just can’t trust it for your own purposes.

Get a good estimate using chunks

You can get a good estimate by calculating the total number of rows in steps. Walk the table 1,000 rows at a time, and keep a counter. Each chunk is its own transaction, so, if the table is modified while counting, the final value does not make for an accurate account at any point in time. Typically this should be a far better estimate than TABLE_ROWS.

QueryScript’s split() construct provides you with the means to work this out. Consider this script: Continue reading » “Easy SELECT COUNT(*) with split()”

Converting compressed InnoDB tables to TokuDB 7.0.1

Or: how to make it work in TokuDB version 7.0.1. This is a follow up on a discussion on the tokudb-user group.

Background

I wanted to test TokuDB’s compression. I took a staging machine of mine, with production data, and migrated it from Percona Server 5.5 To MariaDB 5.5+TokuDB 7.0.1. Migration went well, no problems.

To my surprise, when I converted tables from InnoDB to TokuDB, I saw an increase in table file size on disk. As explained by Tim Callaghan, this was due to TokuDB interpreting my compressed table’s “KEY_BLOCK_SIZE=4” as an instruction for TokuDB’s page size. TokuDB should be using 4MB block size, but thinks it’s being instructed to use 4KB. Problem is, you can’t get rid of table options. When one converts a table to InnoDB in ROW_FORMAT=COMPACT, or even to MyISAM, the KEY_BLOCK_SIZE option keeps lurking in the dark.

So until this is hopefully resolved in TokuDB’s next version, here’s a way to go around the problem. Continue reading » “Converting compressed InnoDB tables to TokuDB 7.0.1”