TokuDB – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Sat, 10 Dec 2016 09:26:50 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Percona Live 2015: Reflections https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections#respond Sat, 18 Apr 2015 01:41:07 +0000 https://shlomi-noach.github.io/blog/?p=7223 Some personal reflections on PerconaLive 2015:

Percona acquires Tokutek

Well done! Tokutek develops the TokuDB storage engine for MySQL and TokuMX engine for MongoDB. I will discuss the MySQL aspect only.

TokuDB was released as open source in 2013. It has attained a lot of traction and I have used it myself for some time. I met issues with locking or otherwise operational difficulties which I reported, and otherwise was fascinated by such features as great compression, online schema changes, and more.

Recently another company, InfiniDB, that also released its MySQL-backed codebase as open source, went out of business. I was afraid the same might happen to Tokutek.

I see Percona’s purchase as a very good move for the community. I saw a lot of TokuDB interest in Percona for some time now, and it is clearly interested in the technology. I expect they will add their own hands-on experience into the development of more operations-friendly features; put effort in solving locking issues (it’s been a while since I last checked, of course some of these may have been addressed by now). I am guessing they will work on a Galera/TokuDB integration and offer a “Toku-XtraDB-Cluster”.

TokuDB can compete with InnoDB in many places, while in others each will have its distinct advantage.

I see this is as good news for the community.

Community Awards and Lightning Talks

On a completely different subject, I believe it is commonly accepted that this year’s setup for the community awards & lightning talks was unsuccessful. The noise was astounding, human traffic was interrupting and overall this was a poor experience. We (Giuseppe Maxia, Kortney Runyan & myself) made a quick, informal brainstorming on this and came up with a couple ideas. One of which we hope to try in the upcoming Percona Live Europe – Amsterdam.

We apologize to the speakers for the difficulties.

Percona Live Europe – Amsterdam

Haha! Having recently relocated to the Netherlands I’m of course very happy. But regardless, Percona Live London was fun – and yet running on low fuel. I think it was a great idea to change location (and more locations expected in the future). This is the path taken by such conferences as OSCon, Velocity, Strata and more. Amsterdam in particular, as I’ve recently learned, is especially appreciated by many. I think this conf will do great!

Woz

And now for something completely different. Woz’ talk was that. I’m happy he came; I appreciate that he discussed education; and it was fun.

]]>
https://shlomi-noach.github.io/blog/mysql/percona-live-2015-reflections/feed 0 7223
TokuDB configuration variables of interest https://shlomi-noach.github.io/blog/mysql/tokudb-configuration-variables-of-interest https://shlomi-noach.github.io/blog/mysql/tokudb-configuration-variables-of-interest#comments Wed, 23 Oct 2013 17:42:12 +0000 https://shlomi-noach.github.io/blog/?p=6613 During our experiments I came upon a few TokuDB variables of interest; if you are using TokuDB you might want to look into these:

  • tokudb_analyze_time

This is a boundary on the number of seconds an ANALYZE TABLE will operate on each index on each partition on a TokuDB table.

That is, if tokudb_analyze_time = 5, and your table has 4 indexes (including PRIMARY) and 7 partitions, then the total runtime is limited to 5*4*7 = 140 seconds.

Default in 7.1.0: 5 seconds

  • tokudb_cache_size

Similar to innodb_buffer_pool_size, this variable sets the amount of memory allocated by TokuDB for caching pages. Like InnoDB the table is clustered within the index, so the cache includes pages for both indexes and data.

Default: 50% of total memory

  • tokudb_directio

Boolean, values are 0/1. Setting tokudb_directio = 1 is like specifying innodb_flush_method = O_DIRECT. Which in turn means the OS should not cache pages requested by TokuDB. Default: 0.

Now here’s the interesting part: we are used to tell InnoDB to get the most memory we can provide (because we want it to cache as much as it can) and to avoid OS caching (because that would mean a page would appear both in the buffer pool and in OS memory, which is a waste). So the following setup is common:

innodb_buffer_pool_size = [as much as you can allocate while leaving room for connection memory]G
innodb_flush_method = O_DIRECT

And my first instinct was to do the same for TokuDB. But after speaking to Gerry Narvaja of Tokutek, I realized it was not that simple. The reason TokuDB’s default memory allocation is 50% and not, say, 90%, is that OS cache caches the data in compressed form, while TokuDB cache caches data in uncompressed form. Which means if you limit the TokuDB cache, you allow for more cache to the OS, that is used to cache compressed data, which means more data (hopefully, pending duplicates) in memory.

I did try both options and did not see an obvious difference, but did not test this thoroughly. My current setup is:

#No setup. just keep to the default for both:
#tokudb_cache_size
#tokudb_directio
  • tokudb_commit_sync

  • tokudb_fsync_log_period

These two variable are similar in essence to innodb_flush_log_at_trx_commit, but allow for finer tuning. With innodb_flush_log_at_trx_commit you choose between syncing the transaction log to disk upon each commit and once per second. With tokudb_commit_sync = 1 (which is default) you get transaction log sync to disk per commit. When tokudb_commit_sync = 0, then tokudb_fsync_log_period dictates the interval between flushes. So a value of tokudb_fsync_log_period = 1000 means once per second.

Since our original InnoDB installation used innodb_flush_log_at_trx_commit = 2, our TokuDB setup is:

tokudb_commit_sync = 0
tokudb_fsync_log_period = 1000
  • tokudb_load_save_space

Turned on (value 1) by default as of TokuDB 7.1.0, this parameter decides whether temporary file created on bulk load operations (e.g. ALTER TABLE) are compressed or uncompressed. Do yourself a big favour (why? read here) and keep it on. Our setup is:

tokudb_load_save_space = 1

TokuDB’s general recommendation is: don’t change the variables; the engine should work well right out of the box. I like the approach (by MySQL 5.5 I already lost count of InnoDB variables that can have noticeable impact; with 5.6 I’m all but lost). The complete list of configuration variables is found in TokuDB’s Users Guide.

]]>
https://shlomi-noach.github.io/blog/mysql/tokudb-configuration-variables-of-interest/feed 14 6613
Converting an OLAP database to TokuDB, part 3: operational stuff https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-3-operational-stuff https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-3-operational-stuff#comments Mon, 14 Oct 2013 10:03:43 +0000 https://shlomi-noach.github.io/blog/?p=6603 This is the third post in a series of posts describing our experience in migrating a large DWH server to TokuDB (see 1st and 2nd parts). This post discusses operations; namely ALTER TABLE operations in TokuDB. We ran into quite a few use cases by this time that we can shed light on.

Quick recap: we’ve altered one of out DWH slaves to TokuDB, with the goal of migrating most of out servers, including the master, to TokuDB.

Adding an index

Shortly after migrating our server to TokuDB we noticed an unreasonably disproportionate slave lag on our TokuDB slave (red line in chart below) as compared to other slaves.

tokudb-slave-lag

Quick investigation led to the fact that, coincidentally, a manual heavy-duty operation was just taking place, which updated some year’s worth of data retroactively. OK, but why so slow on TokuDB? Another quick investigation led to an apples vs. oranges problem: as depicted in part 1, our original setup included MONTHly partitioning on our larger tables, whereas we could not do the same in TokuDB, where we settled for YEARly partitioning.

The heavy-duty operation included a query that was relying on the MONTHly partitioning to do reasonable pruning: a WHERE condition on a date column did the right partition pruning; but where on InnoDB that would filter 1 month’s worth of data, on TokuDB it would filter 1 year.

Wasn’t it suggested that TokuDB has online table operations? I decided to give it a shot, and add a proper index on our date column (I actually created a compound index, but irrelevant).

It took 13 minutes to add an index on a 1GB TokuDB table (approx. 20GB InnoDB uncompressed equivalent):

  • The ALTER was non blocking: table was unlocked at that duration
  • The client issuing the ALTER was blocked (I thought it would happen completely in the background) — but who cares?
  • I would say 13 minutes is fast

Not surprisingly adding the index eliminated the problem altogether.

Modifying a PRIMARY KEY

It was suggested by our DBA that there was a long time standing need to modify our PRIMARY KEY. It was impossible to achieve with our InnoDB setup (not enough disk space for the operation, would take weeks to complete if we did have the disk space). Would it be possible to modify our TokuDB tables? On some of our medium-sized tables we issued an ALTER of the form:

ALTER TABLE my_table DROP PRIMARY KEY, ADD PRIMARY KEY (c1, c2, c3, ...);

Time-wise the operation completed in good time. We did note, however, that the disk space consumed by the new table was doubled. Was it due to the fact we added two columns to our PK? Did that account for the bloated space? I did not believe so, and decided to rebuild the table:

OPTIMIZE TABLE my_table

Nope. Space not reduced. However we were unconvinced and asked. As usual, we got quick response from the Tokutek team; this was a bug: while our original table used the TOKUDB_SMALL row format (high compression), the table rebuild reset it to TOKUDB_FAST (normal compression), which makes for roughly twice the file size. The bug was filed as: alter table operations that rebuild the table lose the original tokudb compression.

Now, we were altering the PRIMARY KEY. We were not expecting an online operation anyhow, and didn’t mind blocking the table; hence the solution was simple: make sure to spceify the row format:

ALTER TABLE my_table DROP PRIMARY KEY, ADD PRIMARY KEY (c1, c2, c3, ...) ENGINE=TokuDB ROW_FORMAT=TOKUDB_SMALL;

This worked in terms of disk space — but we only later realized it would still make us trouble.

Modifying a PRIMARY KEY on our largest table

We moved on to our largest table: originally 1TB InnoDB COMPRESSED, worth of 2TB uncompressed. With TokuDB it went down to 100GB. Converting this table to TokuDB took about 40 hours, which is just fast. We issued an ALTAR TABLE modifying the PRIMARY KEY as above and waited.

The operation did not complete after 40 hours. Nor after 3 days. By day 4 we thought we might look into this. Fortunately, TokuDB is friendly on SHOW PROCESSLIST and provides you with useful information, such as “Fetched about 1234567890 rows, loading data still remains“. Yikes! We extrapolated the values to realize it would take 2 weeks to complete! Weekend went by and we decided to find a better way. Again, posting on the tokudb-user group, we got a definitive answer: a table rebuild does not utilize the bulk loader (you really want to be friends with the bulk loader, it’s the process that loads your data quickly).

And so we chose to KILL the ALTER process and go another way; again, KILLs are very easy with TokuDB ALTER operations: took 3 minutes to abort this week old operation. The alternative operation was:

CREATE TABLE my_table_New LIKE my_table;
ALTER TABLE my_table_New DROP PRIMARY KEY, ADD PRIMARY KEY (c1, c2, c3, ...) ENGINE=TokuDB ROW_FORMAT=TOKUDB_SMALL;
INSERT INTO my_table_New SELECT * FROM my_table;
RENAME TABLE my_table TO my_table_Old, my_table_New TO my_table;
DROP TABLE my_table_Old;

The INSERT INTO … SELECT operation does use the bulk loader when you do it on an empty table. It completed within merely 30 hours. Hurrah!

DROPping a TABLE

It was an immediate operation to drop our “Old” table — subsecond. Nothing like your InnoDB DROP.

]]>
https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-3-operational-stuff/feed 7 6603
Converting an OLAP database to TokuDB, part 2: the process of migration https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration#comments Mon, 09 Sep 2013 03:29:30 +0000 https://shlomi-noach.github.io/blog/?p=6530 This is a second in a series of posts describing our experience in migrating a large DWH server to TokuDB. This post discusses the process of migration itself.

As a quick recap (read part 1 here), we have a 2TB compressed InnoDB (4TB uncompressed) based DWH server. Space is running low, and we’re looking at TokuDB for answers. Early experiments show that TokuDB’s compression could make a good impact on disk space usage. I’m still not discussing performance — keeping this till later post.

Those with weak hearts can skip right to the end, where we finally have a complete conversion. You can also peek at the very end to find out how much 4TB uncompressed InnoDB data is worth in TokuDB. But you might want to read through. The process was not smooth, and not as expected (it’s a war story thing). Throughout the migration we got a lot of insight on TokuDB’s behaviour, limitations, conveniences, inconveniences and more.

Disclosure: I have no personal interests and no company interests; throughout the process we were in touch with Tokutek engineers, getting free, friendly & professional advice and providing with input of our own. Most of this content has already been presented to Tokutek throughout the process. TokuDB is open source and free to use, though commercial license is also available.

How do you convert 4TB worth of data to TokuDB?

Obviously one table at a time. But we had another restriction: you may recall I took a live slave for the migration process. And we wanted to end the process with a live slave. So the restriction was: keep it replicating!

How easy would that be? Based on our initial tests, I extrapolated over 20 days of conversion from InnoDB to TokuDB. Even with one table at a time, our largest table was expected to convert in some 12-14 days. Can we retain 14 days of binary logs on a server already running low on disk space? If only I knew then what I know today 🙂

I recently claimed (as I recall it was in one of the @DBHangops meetings) I was done with ALTER TABLE statements. I would not touch them again: with oak-online-alter-table and pt-online-schema-change I can get better control of my server (and my sleep). Can I use an online migration tool?

Fortunately we are using Statement Based Replication on this MySQL topology. This makes for good news, because triggers are activated on slave as it is replicating its master’s statements. You can run an online migration tool on the slave, while it keeps replicating. This is so cool. I don’t need to worry too much about binary logs and relay logs. I can take my time with conversion…

I chose to use pt-online-schema-change. Why choose Percona’s tool over my own? Percona’s tool supports the –check-slave-lag option, which throttles the operation should the server’s slaves start lagging. Wait, I’m running the tool on a slave, so what’s the point? Well, you can cheat and provide –check-slave-lag=h=127.0.0.1 so that the tool assumes the localhost is the slave (while it is actually the server being altered); which means it will check on its own slave lag to do the throttling. This works well and is fun to watch.

Starting the migration

Some of our tables had the KEY_BLOCK_SIZE explicitly declared. As I mentioned in previous post, for TokuDB <= 7.0.4 this causes problems by bloating the indexes instead of compressing them (and Tim Callaghan of Tokutek notes this is fixed in next version). common_schema to the rescue: the ALTER statement has to include a DROP and recreate of all indexes.

This is the place to mention our tables are heavily partitioned. This will play a crucial role in the next events. We use RANGE partitions over dates. We have a monthly partitioning scheme on all partitioned tables. And we have partitions to spare: starting a few years back from today (depending on the table) and made until Dec. 2025 — making for some 170 – 200 partitions per table.

Starting from smallest table (a few rows) and increasing in size, we migrated tables one by one to TokuDB.

Party crashers

A few party crashers were made obvious right from the start (they are all being addressed by Tokutek as far as I know):

  1. @@datadir: all TokuDB files reside in @@datadir. You get a pile of files in the same directory where you would find your ib_logfile*, master.info, etc. files.
  2. File names: you do not get file names after table names. Instead, you get cryptic names like ./_myschema_sql_7fda_8e73_*.You would suspect that there is some uniqueness to the 7fda_8e73 thing; that it relates to a single table — it doesn’t. Same table get different file names, different tables get similar names — there’s not one regular expression to differentiate tables — and I do know my regexes.
  3. INFORMATION_SCHEMA.Tokudb_file_map doesn’t make it much easier, either. It is meant to tell you about tables to file names mappings. But the tables are not laid out in easy TABLE_SCHEMA, TABLE_NAME columns, but are denormalized themselves, and can be vague and almost ambiguous, to some extent. common_schema to the rescue, its rewritten tokudb_file_map maps your tables to aggregated list of file names, along with shell commands you would typically want to issue.

But let me emphasize these issues, before you think I’m just spoiled. TokuDB creates multiple files per table: other than the standard .frm file, you get one “main” file for each table, and then another file for each index. So it’s quite possible your table will consist of some 7 files.

Well, as you may know, MySQL’s implementation of partitioning is that each partition is made of its own standalone table, hidden from the user (but easily viewable on the filesystem). Now this means a single table with 170 partitions and a few indexes can make for over 1,000 files. That’s right – for a single table. We have a few dozens like this.

Now consider:

  • You may need to accommodate thousands or tens of thousands of files in your @@datadir
  • None of which it is easy for you to know who to relate to.

So common_schema‘s tokudb_file_map gives you this crazy list of 1,000 files which make up your single table. This isn’t too friendly, either.

I think Tokutek are missing here on one of the greatest advantages they bring to the table. The one thing a DBA want to know when looking at her MySQL’s filesystem is: how much disk space is used by a schema/table. And this information becomes hard to get. Again, common_schema‘s view will provide you with the script to do it (du -c … | tail -1) — but you would have to go into MySQL, out to shell… Bother.

Crashes and failures

The above did not stop at inconveniences. Soon enough, and while still altering my smaller tables, I would get failures from pt-online-schema-change. Checking manually to eliminate the possibility of a bug in Percona’s tool, I got:

mysql> alter table my_schema.my_table engine=tokudb row_format=tokudb_small;
ERROR 1016 (HY000): Can't open file: './my_schema/#sql-80d_2.frm' (errno: 24)

This would happen again and again and again. What’s the deal here?

Let me cut short on this one: we got over 20,000 files in @@datadir. And MySQL was unable to open any more files. Mind you, we had:

  • open_files_limit=30000
  • open_table_cache=16000
  • table_definition_cache=3000

Quite the generous numbers (also backed up by ulimit, to be on the safe side; and also note we’re using XFS file system). And yet, open files were an issue. To prove my point, it was possible to ALTER a table with a fewer number of partitions. It was then possible to ALTER another smaller table without partitions. It was then impossible to alter any additional partitioned table. Once I dropped partitioning for some very small table some room was made and I was again able to ALTER a partitioned table… And this would happen for completely empty tables — no data involved. We were on some file limit here.

Another such ALTER and the server crashed. This was quite unceremonious. The error log produced nothing; no stack trace. Zit.

I was fervently querying the Tokudb_file_map to get a picture of what’s going on. I would need to do a self join on the table (as common_schema‘s view does) to get a per-table listing of files. This would occasionally crash the server. I guess I had 3 or 4 such crashes.

Recovery

On this I feel I’m quite the veteran here :D. TokuDB recovery works well. As with InnoDB, TokuDB recognize there has been a crash, and before allowing MySQL to open connections it restores to a stable state.

ALTER TABLE in TokuDB

Here I found two comforting features (the third and fourth yet to be discovered). As indicated above, I did turn to issue a manual ALTER TABLE. What I found was:

  1. A TokuDB ALTER TABLE statement lets you know its progress. This is no little thing! Your SHOW PROCESSLIST output shows messages like “Fetched about 312724000 rows, loading data still remains”, or “Loading of data about 66.1% done”, or “Queried about 33113001 rows, Inserted about 33113000 rows”. Cool!
  2. Even better, the crash I had during the ALTER TABLE? I thought that would be the end of it. If you ever had a crash while ALTERing an InnoDB table, you know how it goes: InnoDB will forever complain about some table existing but not listed (or the other way around). And don’t get me started with DISCARD TABLESPACE; when InnoDB decides it is upset about something – you cannot convince it otherwise.
    Thankfully, TokuDB completely reverted the ALTER operation. It removed what temporary files were created (further notes following) and forgot all about it. No complaints, no ghosts. Great! Back to consistency!

What do we do now?

Throwing my hands up in the air, having worked on this for many days, I thought to myself: OK, I still have this server all to myself. If TokuDB is not going to work out, I have some time to come up with a sharding/commercial solution. Let’s use up this time and learn something about TokuDB. And I decided to re-create all tables without partitions. My colleague argued that she was not ready to give up on partitioning altogether and we decided to try again with YEARly partitioing scheme. This would reduce number of files by factor of 12. Also, 2025 is so far away, we agreed to settle for 2015. So reducing number of files by factor of 25-30.

And this made all the difference in the world. Having reduced number of files made the impact we were hoping for. Suddenly all went well. No crashes, no weird complaints, little proliferation of files in @@datadir.

ALTER TABLE

And I did notice that a manual ALTER TABLE went considerably faster than I would expect. And by far faster than the pt-online-schema-change pace. I tried a couple more — sure thing. ALTERing a table from InnoDB to TokuDB is fast.

How fast?

  • I converted a 47GB InnoDB COMPRESSED table to TokuDB in 73 minutes. By the way, resulting table size measured 3.4GB.
  • A 330GB InnoDB COMPRESSED table converted to TokuDB took little over 9 hours. I dare you alter 600GB worth of uncompressed data into InnoDB (COMPRESSED) in less than a few days. It went down to 31GB.
  • And our largest, 1TB COMPRESSED  table (2TB worth of uncompressed data)? There’s yet another story here.

Altering 1 (2 uncomressed) TB of data

Here’s a tip that will save you some exhaustion: SET tokudb_load_save_space := 1.

While ALTERing our largest table, I was concerned to find our disk space was running low. Plenty temporary TokuDB files were created. I assumed these would consume only so much disk space, but to my surprise they accumulated and accumulated… It turns out for ALTERing a table TokuDB creates the equivalent of the table in temporary files, and only then generates the new table. This means you need to have enough room for your own original table, the equivalent in temporary files, and your new table altogether.

With great compression that would be nothing. However you’ll be surprised to learn that by default those temporary files are not compressed. Thus, the ALTER operation consumed more than 1.3TB of disk space in temporary files, until I had no choice and (36 hours into the operation) had to KILL it before it consumed the entire 3TB of disk space.

Setting the variable as specified and the next attempt was far more successful: the temporary files were created with same compression algorithm as target table, which left with a lot of free space to work with.

ALTER time took about 40 hours.

Well, what’s the resulting size?

And we were finally done! It took the better part of three weeks to work through all the pitfalls, the pt-online-schems-change attempts, the crashes, the tests, the no-partitions, the YEARly partitions… Finally we are with a TokuDB version of our data warehouse.

Suspension is over. We got from 2TB of InnoDB COMPRESSED (KEY_BLOCK_SIZE=8) down to 200GB of TokuDB_SMALL (aka agressive, aka lzma) tables.

I mean, this is beyond expectations. It is ridiculously small. From 80% disk space utilization down to 8% disk space utilization. Absolutely ridiculous!

Conclusions

  • TokuDB does not play well with many partitions.
  • Crashes encountered. Recovery is fine.
  • Good ALTER TABLE experience
  • SET tokudb_load_save_space := 1
  • Great compression (x20 from uncompressed InnoDB; x10 from KEY_BLOCK_SIZE=8)

Next

In the following post I’ll share some observations on how well our newly converted TokuDB slave performs as compared to our equivalent InnoDB slaves; some configuration you might care about; and some things you can do with TokuDB that would be so very painful with InnoDB. Stay tuned!

]]>
https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration/feed 7 6530
Converting an OLAP database to TokuDB, part 1 https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1 https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1#comments Tue, 03 Sep 2013 07:04:12 +0000 https://shlomi-noach.github.io/blog/?p=6473 This is the first in a series of posts describing my impressions of converting a large OLAP server to TokuDB. There’s a lot to tell, and the experiment is not yet complete, so this is an ongoing blogging. In this post I will describe the case at hand and out initial reasons for looking at TokuDB.

Disclosure: I have no personal interests and no company interests; we did get friendly, useful and free advice from Tokutek engineers. TokuDB is open source and free to use, though commercial license is also available.

The case at hand

We have a large and fast growing DWH MySQL setup. This data warehouse is but one component in a larger data setup, which includes Hadoop, Cassandra and more. For online dashboards and most reports, MySQL is our service. We populate this warehouse mainly via Hive/Hadoop. Thus, we have an hourly load of data from Hive, as well as a larger daily load.

There are some updates on the data, but the majority of writes are just mysqlimports of Hive queries.

Usage of this database is OLAP: no concurrency issues here; we have some should-be-fast-running queries issued by our dashboards, as well as ok-to-run-longer queries issued for reports.

Our initial and most burning trouble is with size. Today we use COMPRESSED InnoDB tables (KEY_BLOCK_SIZE is default, i.e. 8). Our data volume sums right now at about 2TB. I happen to know this translates as 4TB of uncompressed data.

However growth of data is accelerating. A year ago we would capture a dozen GB per month. Today it is a 100GB per month, and by the end of this year it may climb to 150GB per month or more.

Our data is not sharded. We have a simple replication topology of some 6 servers. Machines are quite generous as detailed following. And yet, we will be running out of resources shortly: disk space (total 2.7TB) is now running low and is expected to run out in about six months. One of my first tasks in Outbrain is to find a solution to our DWH growth problem. The solution could be sharding; it could be a commercial DWH product; anything that works.

The approach we experiment with

It was at my initial interview that I suggested TokuDB might be a good solution, with the primary reason of being so good with compression. And we decided to experiment with this simple (setup-wise) solution of compression. If we could compress the data even by 50%, that would buy us considerable time. And it’s the simplest approach as we would need to change nothing at the application side, nor add additional frameworks.

Of course, we were already using InnoDB COMPRESSED tables. How about just improving the compression? And here I thought to myself: we can try KEY_BLOCK_SIZE=4, which I know would generally compress by 50% as compared to KEY_BLOCK_SIZE=8 (not always, but in many use cases). We’re already using InnoDB so this isn’t a new beast; it will be “more of the same”. It would work.

I got myself a dedicated machine: a slave in our production topology I am free to play with. I installed TokuDB 7.0.1, later upgraded to 7.0.3, based on MySQL 5.5.30.

The machine is a Dell Inc. PowerEdge R510 machine, with 16 CPUs @ 2.1 GHz and 126 GiB RAM, 16 GiB Swap. OS is CentOS 5.7,  kernel 2.6.18. We have RAID 10 over local 10k RPM SAS disks (10x600GB disks)

How to compare InnoDB & TokuDB?

2TB of compressed data (for absolute measurement I consider it to be a 4TB worth of data) is quite a large setup. How do I do the comparison? I don’t even have too much disk space here…

We have tables of various size. Our largest is in itself 1TB (2TB uncompressed) – half of the entire volume. The rest ranging 330GB, 140GB, 120GB, 90GB, 50GB and below. We have MONTHly partitioning schemes on most tables and obviously on our larger tables.

For our smaller tables, we could just CREATE TABLE test_table LIKE small_table, populating it and comparing compression. However, the really interesting question (and perhaps the only interesting question compression-wise) is how well would our larger (and specifically largest) tables would compress.

Indeed, for our smaller tables we saw between 20% to 70% reduction in size when using stronger InnoDB compression: KEY_BLOCK_SIZE=4/2/1. How well would that work on our larger tables? How much slower would it be?

We know MySQL partitions are implemented by actual independent tables. Our testing approach was: let’s build a test_table from a one month worth of data (== one single partition) of our largest table. We tested:

  • The time it takes to load the entire partition (about 120M rows, 100GB COMPRESSED data as seen on .idb file)
  • The time it would take to load a single day’s worth of data from Hive/Hadoop (loading real data, as does our nightly import)
  • The time it would take for various important SELECT query to execute on this data.

InnoDB vs. TokuDB comparison

In this post I will only describe our impressions of compression size. I have a lot to say about TokuDB vs InnoDB partitioning and queries; this will wait till later post.

So here goes:

Engine Compression Time to Insert 1 month Table size (optimized) Time to import 1 day
InnoDB 8k 10.5h 58GB 32m
InnoDB 4k 48h 33GB unknown (too long)
TokuDB quicklz 14h 17GB 40m
TokuDB lzma (small/aggresive) 15h 7.5GB 42m

Some comments and insights:

  • Each test was performed 3-4 times. There were no significant differences on the various cycles.
  • The 1 month insert was done courtesy QueryScript split5,000 rows at a time, no throttling.
  • The 1 day import via mysqlimport. There were multiple files imported. Each file is sorted by PRIMARY KEY ASC.
  • Isn’t it nice to know that your 100GB InnoDB table actually fits within 58GB when rebuilt?
  • For InnoDB flush_logs_at_trx_commit=2, flush_method=O_DIRECT.
  • I used default configuration to TokuDB — touched nothing. More on this in later post.
  • InnoDB 4k was prohibitively slow to load data. It was so slow so as to be unacceptable. For the 1 day load it took 1 hour for a mere 20% of data to load. 1 hour was already marginal for our requirements; waiting for 5 hours was out of the question. I tested several times, never got to wait for completion. Did I say it would just be “more of the same”? 4k turned to be “not an option”.
  • I saw almost no difference in load time between the two TokuDB compression formats. Both somewhat (30%) longer than InnoDB to load, but comparable.
  • TokuDB compression: nothing short of amazing.

With InnoDB 4k being “not an option”, and with both TokuDB compressions being similar in load time yet so different in compression size, we are left with the following conclusion: if we want to compress more than our existing 8k (and we have to) – TokuDB’s agressive compression (aka small, aka lzma) is our only option.

Shameless plug

common_schema turned to be quite the “save the day” tool here. Not only did we use it to extract 100GB of data from a large dataset and load it onto our tables, it also helped out in the ALTER process for TokuDB: at this time (<= 7.0.4) TokuDB still has a bug with KEY_BLOCK_SIZE: when this option is found in table definition, it impacts TokuDB’s indexes by bloating them. This is how sql_alter_table_tokudb was born. Hopefully it will be redundant shortly.

More to come

Was our test fair? Should we have configure TokuDB differently? Is loading via small 5,000 row chunks the right way?

In the next post I will describe the process of migrating our 4TB worth of data to TokuDB, pitfalls, issues, party crushers, sport spoilers, configuration, recovery, cool behaviour and general advice you should probably want to embrace. At later stage I’ll describe how our DWH looks after migration. Finally I’ll share some (ongoing) insights on performance.

You’ll probably want to know “How much is (non compressed) 4TB of data worth in TokuDB?” Let’s keep the suspense 🙂

]]>
https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1/feed 8 6473
Converting compressed InnoDB tables to TokuDB 7.0.1 https://shlomi-noach.github.io/blog/mysql/converting-compressed-innodb-tables-to-tokudb-7-0-1 https://shlomi-noach.github.io/blog/mysql/converting-compressed-innodb-tables-to-tokudb-7-0-1#comments Wed, 05 Jun 2013 07:10:12 +0000 https://shlomi-noach.github.io/blog/?p=6368 Or: how to make it work in TokuDB version 7.0.1. This is a follow up on a discussion on the tokudb-user group.

Background

I wanted to test TokuDB’s compression. I took a staging machine of mine, with production data, and migrated it from Percona Server 5.5 To MariaDB 5.5+TokuDB 7.0.1. Migration went well, no problems.

To my surprise, when I converted tables from InnoDB to TokuDB, I saw an increase in table file size on disk. As explained by Tim Callaghan, this was due to TokuDB interpreting my compressed table’s “KEY_BLOCK_SIZE=4” as an instruction for TokuDB’s page size. TokuDB should be using 4MB block size, but thinks it’s being instructed to use 4KB. Problem is, you can’t get rid of table options. When one converts a table to InnoDB in ROW_FORMAT=COMPACT, or even to MyISAM, the KEY_BLOCK_SIZE option keeps lurking in the dark.

So until this is hopefully resolved in TokuDB’s next version, here’s a way to go around the problem.

The case at hand

Consider the following table:

 CREATE TABLE `t` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c1` int(10) unsigned NOT NULL DEFAULT '0',
  `c2` int(10) unsigned NOT NULL DEFAULT '0',
  `c3` int(10) unsigned NOT NULL DEFAULT '0',
  `c4` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `c5` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `c6` smallint(10) unsigned NOT NULL DEFAULT '0',
  `c7` smallint(10) unsigned NOT NULL DEFAULT '0',
  `c8` smallint(10) unsigned NOT NULL DEFAULT '0',
  `c9` smallint(10) unsigned NOT NULL DEFAULT '0',
  `c10` smallint(10) unsigned NOT NULL DEFAULT '0',
  `c11` smallint(10) NOT NULL DEFAULT '0',
  `c12` smallint(10) NOT NULL DEFAULT '0',
  `c13` smallint(10) NOT NULL DEFAULT '0',
  `c14` smallint(10) NOT NULL DEFAULT '0',
  `ct` text NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `c1c4` (`c1`,`c4`),
  KEY `c4` (`c4`)
) ENGINE=InnoDB AUTO_INCREMENT=4688271 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4

Note that it is in COMPRESSED format, with KEY_BLOCK_SIZE=4. It mostly has INT columns, so I don’t expect it to compress by much.

On disk, the .ibd file amounts to 160MB. Table has 3,587,488 rows. Same table in InnoDB COMPACT row format amounts to 412MB on disk.

Converting the table to TokuDB with aggressive compression resulted with:

mysql> alter table t engine=tokudb row_format=tokudb_lzma;
Query OK, 3587488 rows affected (29 min 48.79 sec)
Records: 3587488  Duplicates: 0  Warnings:

And over 873MB of combined files on disk! Also note it took nearly 30 minutes to ALTER. Clearly this is not the expected outcome.

Attempt to make it work

I tried both the following approaches:

  • alter table t engine=tokudb row_format=tokudb_lzma key_block_size=4096: thought it would fool TokuDB to think it should create a 4M key block size.
  • alter table t engine=tokudb row_format=tokudb_lzma key_block_size=0: try and reset the key block size.

Both the above attempts resulted with same bloat in resulting table.

The reason? When ALTERing a table with a nother KEY_BLOCK_SIZE, the keys on the table remain with their old KEY_BLOCK_SIZE. They are unaffected by the ALTER. As suggested by Nail Kashapov, indexes must be rebuilt as well.

Making it work

The next ALTER modifies the KEY_BLOCK_SIZE and rebuilds all the indexes on the table:

mysql> alter table t drop primary key, add primary key(id), drop key c1c4, add unique key `c1c4` (c1, c4), drop key c4, add key `c4` (c4), engine=tokudb row_format=tokudb_lzma key_block_size=0;
Query OK, 3587488 rows affected (2 min 7.97 sec)
Records: 3587488  Duplicates: 0  Warnings: 0

Yep! Runtime seems much more agreeable. Total size on disk? Little over 26M. Did I say I wasn’t expecting good reduction in terms of compression?

Have done the same for multiple tables; compression is consistently strong (e.g. 16MB InnoDB compressed -> 3.5MB TokuDB aggressive, 548MB InnoDB non-compressed -> 36MB TokuDB aggressive), on varying table schemata. Very impressive reduction in disk space!

Conclusion

Next version of TokuDB is expected to ignore the KEY_BLOCK_SIZE table option; until then converting compressed tables to TokuDB is a pain in terms of the syntax — but worthwhile in terms of disk space.

 

]]>
https://shlomi-noach.github.io/blog/mysql/converting-compressed-innodb-tables-to-tokudb-7-0-1/feed 4 6368
On compiling TokuDB from source https://shlomi-noach.github.io/blog/mysql/on-compiling-tokudb-from-source https://shlomi-noach.github.io/blog/mysql/on-compiling-tokudb-from-source#comments Thu, 02 May 2013 05:14:55 +0000 https://shlomi-noach.github.io/blog/?p=6320 Sharing my experience of compiling TokuDB + MariaDB 5.5. Why? Because I must have this patch to Sphinx 2.0.4.

Note: I was using what seems to be the “old” method of compiling; quoting Leif Walsh:

… We are looking at deprecating that method of building (MariaDB source plus binary fractal tree handlerton).  It only really needed to be that complex when we were closed source.

I also tried the “new” method of compiling, which I couldn’t work out.

Here’s how it goes: TokuDB is newly released as open source. As such, it got a lot of attention, many downloads and I hope it will succeed.

However as stable as the product may be, it’s new to open source, which means anyone compiling it from source is an early adopter (at least for the compilation process).

Installation process

This is an unorthodox, and actually weird process. See section 6 on the Tokutek docs. In order to compile the project you must download:

  • The source code tar.gz
  • And the binary (?!) tar.gz
  • And the binary checksum
  • And the Tokutek patches
  • And the patches checksum

You extract the source tarball. But instead of doing the standard “./configure && make && sudo make install” you need to copy a shell script called tokudb.build.bash one directory level up, and run it from there.

tokudb.build.bash lists gcc47 and g++47 on rows 3, 4. Modify “gcc47” to “gcc”, modify “g++47” to  “g++”. I’m assuming you don’t have a binary called gcc47. Why would you?

Dependencies

You will need CMake >= 2.8

This means Ubuntu LTS 10.04 users are unable to compile out of the box; will need to manually install later version of CMake.

Also needed is zlib1g-dev, rpmbuild.

While compiling

I ran out of disk space. What? I was using a 10G partition I use for my compilations. Looking at “df -h” I get that:

  • The source tarball is extracted (I did it)
  • The binary tarball is also extracted (someone has to explain this for me)
  • And inside the source directory we have:
bash$ df -h
...
1484    build.RelWithDebInfo.rpms
5540    build.RelWithDebInfo

At about 7GB (and counting) of build… stuff?.

UPDATE: just ran out on disk space again. Is this an incremental thing? Like every time my compilation fails and I recompile some files are not cleaned up? If so, put them on /tmp! OK, moving everything to a 300GB partition and starting all over.

More while compiling

I got errors on missing libraries. Like I was missing libssl, rpmbuild. This is what the “configure” script is for — to test for dependencies. It’s really a bummer to have to recompile 4-5 times (and it’s a long compilation), only to find out there’s another missing package.

After compiling

What is the result of the compilation? Not a “make install” prepared binary. The result is a MySQL-binary package. Se need to extract and put on /usr/local/somewhere etc.

Conclusions

The compilation process is unexpected and non-standard. The output is unexpected.

The correct way of doing this is a “./configure && make && sudo make install”. I don’t understand the need for a binary package while compiling from source. Isn’t this the chicken and the egg?

A source distribution is no different from a binary distribution. You must have a testing environment to verify the source distribution actually works. This test environment is typically a bare-new-RedHat or a bare-new-Ubuntu etc. The machines at Tokutek are already installed with needed packages. Not so on my compilation machine. I suggest that apt-gets and yum installs for dependencies are added to the source distribution testing. This is the only reliable way for you guys at Tokutek to know that clients will actually be able to install via source.

]]>
https://shlomi-noach.github.io/blog/mysql/on-compiling-tokudb-from-source/feed 14 6320