On compiling TokuDB from source

Sharing my experience of compiling TokuDB + MariaDB 5.5. Why? Because I must have this patch to Sphinx 2.0.4.

Note: I was using what seems to be the “old” method of compiling; quoting Leif Walsh:

… We are looking at deprecating that method of building (MariaDB source plus binary fractal tree handlerton).  It only really needed to be that complex when we were closed source.

I also tried the “new” method of compiling, which I couldn’t work out.

Here’s how it goes: TokuDB is newly released as open source. As such, it got a lot of attention, many downloads and I hope it will succeed.

However as stable as the product may be, it’s new to open source, which means anyone compiling it from source is an early adopter (at least for the compilation process).

Installation process

This is an unorthodox, and actually weird process. See section 6 on the Tokutek docs. In order to compile the project you must download:

  • The source code tar.gz
  • And the binary (?!) tar.gz
  • And the binary checksum
  • And the Tokutek patches
  • And the patches checksum

You extract the source tarball. But instead of doing the standard “./configure && make && sudo make install” you need to copy a shell script called tokudb.build.bash one directory level up, and run it from there.

tokudb.build.bash lists gcc47 and g++47 on rows 3, 4. Modify “gcc47” to “gcc”, modify “g++47” to  “g++”. I’m assuming you don’t have a binary called gcc47. Why would you?

Dependencies

You will need CMake >= 2.8

This means Ubuntu LTS 10.04 users are unable to compile out of the box; will need to manually install later version of CMake.

Also needed is zlib1g-dev, rpmbuild.

While compiling

I ran out of disk space. What? I was using a 10G partition I use for my compilations. Looking at “df -h” I get that:

  • The source tarball is extracted (I did it)
  • The binary tarball is also extracted (someone has to explain this for me)
  • And inside the source directory we have:
bash$ df -h
...
1484    build.RelWithDebInfo.rpms
5540    build.RelWithDebInfo

At about 7GB (and counting) of build… stuff?.

UPDATE: just ran out on disk space again. Is this an incremental thing? Like every time my compilation fails and I recompile some files are not cleaned up? If so, put them on /tmp! OK, moving everything to a 300GB partition and starting all over.

More while compiling

I got errors on missing libraries. Like I was missing libssl, rpmbuild. This is what the “configure” script is for — to test for dependencies. It’s really a bummer to have to recompile 4-5 times (and it’s a long compilation), only to find out there’s another missing package.

After compiling

What is the result of the compilation? Not a “make install” prepared binary. The result is a MySQL-binary package. Se need to extract and put on /usr/local/somewhere etc.

Conclusions

The compilation process is unexpected and non-standard. The output is unexpected.

The correct way of doing this is a “./configure && make && sudo make install”. I don’t understand the need for a binary package while compiling from source. Isn’t this the chicken and the egg?

A source distribution is no different from a binary distribution. You must have a testing environment to verify the source distribution actually works. This test environment is typically a bare-new-RedHat or a bare-new-Ubuntu etc. The machines at Tokutek are already installed with needed packages. Not so on my compilation machine. I suggest that apt-gets and yum installs for dependencies are added to the source distribution testing. This is the only reliable way for you guys at Tokutek to know that clients will actually be able to install via source.

14 thoughts on “On compiling TokuDB from source

  1. Hello Shlomi,
    Thanks for the feedback on the build process. Sorry for the unneeded complexity. We are working to make this much simpler in the next point release.

  2. Schlomi, I hate to be critical, but this whole blog posting smells like pilot error.

    I don’t think it makes much sense to start out knowing that we support only build from git, and then to complain that the old closed-source way of compiling doesn’t work. Yes the old closed-source way needed the fractal tree library binary, because you couldn’t build it from source. No you shouldn’t use that any more. As Leif told you.

    Siince mysql uses cmake, why would we add autotools?

    We clearly state our build dependencies in the README and Leif apparently pointed them out to you.

    You don’t need to build rpms, which seem to take quite a bit of space to build. Build a tarball so you can build and install in another machine. You can always type “make install” instead of “make package if you simply want to directly install.

    It’s true that without installing some new tools, you may have trouble building tokudb on a 3-year old OS (the desktop version of which will no longer be supported as of late next week.) I can suggest only that you either install the right tools or upgrade to a more recent OS.

  3. Hi Bradley,

    Very good that you’re being critical. Here are a few thoughts:

    – I’m perfectly OK to define this as a pilot error: but this is my pilot error, and might be the next person’s pilot error. I’ve used the online documentation to work out the build process; the new compilation method is not described in the online documentation, so this is what I had to work with.

    – Sure, I don’t need to build RPMs; is there a “–skip-rpm” flag to pass so that the compilation process does not terminate with error? Since the compilation did exit with error, I had no way of telling I’m through with all necessary steps.

    – I’m using Ubuntu Server 10.04, which I believe is a common version to be found. It’s fine if you do not support it; but it’s common. I wasn’t accusing you (and I get that I may have sounded negative, I apologize for that) of not supporting said version. It’s valuable information for those trying to achieve the same, or for the future me.

    – With regard “We clearly state our build dependencies in the README”, please note I’ve listed a couple packages that were not listed in your README; perhaps it would do good to add them.

    – With regard “knowing that we support only build from git”, I did not know that, actually. See, I followed this link: http://www.tokutek.com/resources/support/gadownloads/ to download “MariaDB 5.5.30 sources (patched for TokuDB)”. This is titled “The following two downloads are only necessary if you want to build MySQL from source.”. I understand it as “this is the source package you want to use if you want to compile TokuDB from source”. I’m happy to stand corrected.

    I am actually very enthusiast to use TokuDB; I have some good reasons to try it out on production; if all works well, I will write on that as well. I don’t mean to offend or do damage; yes, I have been critical — again, apologies. Hopefully I have presented some insight.

    Regards,
    Shlomi

  4. Hi Shlomi,

    Thanks for the feedback so far. I think the compilation process will become clearer soon. For now, the best documentation is in the readmes in the github repo: http://github.com/Tokutek/ft-engine/blob/master/README.md . This describes building using the make.mysql.bash script, which is what the rest of my advice applies to.

    The script has a “–build_rpm=0” parameter, which is now the default. This will skip building rpms.

    Running on Ubuntu 10.04 is certainly fine, and it should work. Building without a C++11 compiler or a recent cmake is not going to work though, sorry. We describe the build requirements in the readme, I’ll add a note about cmake 2.8.8, thanks for the catch. You shouldn’t need rpmbuild unless you’re trying to build rpms.

    The organization we have in github is very new, and we’re still hunting down all the old cruft, which includes the old way we built source tarballs. The next release should make this a lot simpler, even if you just download the source tarball (it would basically skip all the git cloning and get you straight to the end build stage).

    About having the build install binaries directly to the filesystem: I actually think this is a bad idea, but I’m willing to be convinced. Most people that want to patch the build and run it will probably build on one machine and then install on other machines for evaluation or production (and if you’re running Ubuntu 10.04, I expect you should be in this category too). Having the default target be “make package” makes this a lot easier. You can always go to the build directory and “make install” yourself, that’s the beauty of having the source code available.

    Anyway, thanks for the input as we iron out this workflow.

  5. Hi Leif,

    With regard “make install” — great, but I never got to see a Makefile in the installation process. There is none. There are plenty Makefile instances in inner directories, but no one Makefile for the entire project.
    Of course if you generate a Makefile than a “make install” is wonderful. That’s what I was suggesting myself: to have “./configure && make && make install” (the last can be skipped).

    Thanks for the link! Will look more closely into it. Question: since it builds directly from Git, does that mean I get a source code which is modified constantly? Or does it get a particular version?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.