Converting an OLAP database to TokuDB, part 1

This is the first in a series of posts describing my impressions of converting a large OLAP server to TokuDB. There’s a lot to tell, and the experiment is not yet complete, so this is an ongoing blogging. In this post I will describe the case at hand and out initial reasons for looking at TokuDB.

Disclosure: I have no personal interests and no company interests; we did get friendly, useful and free advice from Tokutek engineers. TokuDB is open source and free to use, though commercial license is also available.

The case at hand

We have a large and fast growing DWH MySQL setup. This data warehouse is but one component in a larger data setup, which includes Hadoop, Cassandra and more. For online dashboards and most reports, MySQL is our service. We populate this warehouse mainly via Hive/Hadoop. Thus, we have an hourly load of data from Hive, as well as a larger daily load.

There are some updates on the data, but the majority of writes are just mysqlimports of Hive queries.

Usage of this database is OLAP: no concurrency issues here; we have some should-be-fast-running queries issued by our dashboards, as well as ok-to-run-longer queries issued for reports.

Our initial and most burning trouble is with size. Today we use COMPRESSED InnoDB tables (KEY_BLOCK_SIZE is default, i.e. 8). Our data volume sums right now at about 2TB. I happen to know this translates as 4TB of uncompressed data.

However growth of data is accelerating. A year ago we would capture a dozen GB per month. Today it is a 100GB per month, and by the end of this year it may climb to 150GB per month or more.

Our data is not sharded. We have a simple replication topology of some 6 servers. Machines are quite generous as detailed following. And yet, we will be running out of resources shortly: disk space (total 2.7TB) is now running low and is expected to run out in about six months. One of my first tasks in Outbrain is to find a solution to our DWH growth problem. The solution could be sharding; it could be a commercial DWH product; anything that works.

The approach we experiment with

It was at my initial interview that I suggested TokuDB might be a good solution, with the primary reason of being so good with compression. And we decided to experiment with this simple (setup-wise) solution of compression. If we could compress the data even by 50%, that would buy us considerable time. And it’s the simplest approach as we would need to change nothing at the application side, nor add additional frameworks.

Of course, we were already using InnoDB COMPRESSED tables. How about just improving the compression? And here I thought to myself: we can try KEY_BLOCK_SIZE=4, which I know would generally compress by 50% as compared to KEY_BLOCK_SIZE=8 (not always, but in many use cases). We’re already using InnoDB so this isn’t a new beast; it will be “more of the same”. It would work.

I got myself a dedicated machine: a slave in our production topology I am free to play with. I installed TokuDB 7.0.1, later upgraded to 7.0.3, based on MySQL 5.5.30.

The machine is a Dell Inc. PowerEdge R510 machine, with 16 CPUs @ 2.1 GHz and 126 GiB RAM, 16 GiB Swap. OS is CentOS 5.7, kernel 2.6.18. We have RAID 10 over local 10k RPM SAS disks (10x600GB disks)

How to compare InnoDB & TokuDB?

2TB of compressed data (for absolute measurement I consider it to be a 4TB worth of data) is quite a large setup. How do I do the comparison? I don’t even have too much disk space here…

We have tables of various size. Our largest is in itself 1TB (2TB uncompressed) – half of the entire volume. The rest ranging 330GB, 140GB, 120GB, 90GB, 50GB and below. We have MONTHly partitioning schemes on most tables and obviously on our larger tables.

For our smaller tables, we could just CREATE TABLE test_table LIKE small_table, populating it and comparing compression. However, the really interesting question (and perhaps the only interesting question compression-wise) is how well would our larger (and specifically largest) tables would compress.

Indeed, for our smaller tables we saw between 20% to 70% reduction in size when using stronger InnoDB compression: KEY_BLOCK_SIZE=4/2/1. How well would that work on our larger tables? How much slower would it be?

We know MySQL partitions are implemented by actual independent tables. Our testing approach was: let’s build a test_table from a one month worth of data (== one single partition) of our largest table. We tested:

The time it takes to load the entire partition (about 120M rows, 100GB COMPRESSED data as seen on .idb file)
The time it would take to load a single day’s worth of data from Hive/Hadoop (loading real data, as does our nightly import)
The time it would take for various important SELECT query to execute on this data.

InnoDB vs. TokuDB comparison

In this post I will only describe our impressions of compression size. I have a lot to say about TokuDB vs InnoDB partitioning and queries; this will wait till later post.

So here goes:

Engine	Compression	Time to Insert 1 month	Table size (optimized)	Time to import 1 day
InnoDB	8k	10.5h	58GB	32m
InnoDB	4k	48h	33GB	unknown (too long)
TokuDB	quicklz	14h	17GB	40m
TokuDB	lzma (small/aggresive)	15h	7.5GB	42m

Some comments and insights:

Each test was performed 3-4 times. There were no significant differences on the various cycles.
The 1 month insert was done courtesy QueryScript split, 5,000 rows at a time, no throttling.
The 1 day import via mysqlimport. There were multiple files imported. Each file is sorted by PRIMARY KEY ASC.
Isn’t it nice to know that your 100GB InnoDB table actually fits within 58GB when rebuilt?
For InnoDB flush_logs_at_trx_commit=2, flush_method=O_DIRECT.
I used default configuration to TokuDB — touched nothing. More on this in later post.
InnoDB 4k was prohibitively slow to load data. It was so slow so as to be unacceptable. For the 1 day load it took 1 hour for a mere 20% of data to load. 1 hour was already marginal for our requirements; waiting for 5 hours was out of the question. I tested several times, never got to wait for completion. Did I say it would just be “more of the same”? 4k turned to be “not an option”.
I saw almost no difference in load time between the two TokuDB compression formats. Both somewhat (30%) longer than InnoDB to load, but comparable.
TokuDB compression: nothing short of amazing.

With InnoDB 4k being “not an option”, and with both TokuDB compressions being similar in load time yet so different in compression size, we are left with the following conclusion: if we want to compress more than our existing 8k (and we have to) – TokuDB’s agressive compression (aka small, aka lzma) is our only option.

Shameless plug

common_schema turned to be quite the “save the day” tool here. Not only did we use it to extract 100GB of data from a large dataset and load it onto our tables, it also helped out in the ALTER process for TokuDB: at this time (<= 7.0.4) TokuDB still has a bug with KEY_BLOCK_SIZE: when this option is found in table definition, it impacts TokuDB’s indexes by bloating them. This is how sql_alter_table_tokudb was born. Hopefully it will be redundant shortly.

More to come

Was our test fair? Should we have configure TokuDB differently? Is loading via small 5,000 row chunks the right way?

In the next post I will describe the process of migrating our 4TB worth of data to TokuDB, pitfalls, issues, party crushers, sport spoilers, configuration, recovery, cool behaviour and general advice you should probably want to embrace. At later stage I’ll describe how our DWH looks after migration. Finally I’ll share some (ongoing) insights on performance.

You’ll probably want to know “How much is (non compressed) 4TB of data worth in TokuDB?” Let’s keep the suspense 🙂

8 thoughts on “Converting an OLAP database to TokuDB, part 1”

Peter (Stig) Edwards says:

September 10, 2013 at 3:27 pm

Hello Shlomi and thank you for the write-ups, both part 1 and 2.

As you mentioned the machine is a Dell Inc. PowerEdge R510 and the workload is likely CPU heavy because of the compression…

The Dell BIOS for PowerEdge R510 needs to be Version 1.9.0 for the Intel Xeon Processor 5600 Series B1 stepping microcode fix.
For code where CPU frequency stepping settings or latency can limit performance you may want to check the bios version with

/opt/dell/srvadmin/bin/omreport chassis bios

and the power profile with:

/opt/dell/srvadmin/bin/omreport chassis pwrmanagement config=profile

as root you can set the power profile (on next reboot) with:

/opt/dell/srvadmin/bin/omconfig chassis pwrmanagement config=profile profile=maxperformance

If you do update the bios then watch out for the power profile settings reverting to the default.

http://downloads.dell.com/FOLDER00792833M/1/PER510-011100BIOS.TXT

Sorry if this is old hat and somewhat off topic, I thought it may be of interest if it was news to you.
shlomi says:

September 17, 2013 at 11:11 am

@Peter,

Thanks for all this info. I am not familiar with it — passing to our operations groups for review.
Pingback: Converting an OLAP database to TokuDB, part 3: operational stuff | code.openark.org

This site uses Akismet to reduce spam. Learn how your comment data is processed.