{"id":6613,"date":"2013-10-23T19:42:12","date_gmt":"2013-10-23T17:42:12","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=6613"},"modified":"2013-10-24T07:27:22","modified_gmt":"2013-10-24T05:27:22","slug":"tokudb-configuration-variables-of-interest","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/tokudb-configuration-variables-of-interest","title":{"rendered":"TokuDB configuration variables of interest"},"content":{"rendered":"<p>During our experiments I came upon a few TokuDB variables of interest; if you are using TokuDB you might want to look into these:<\/p>\n<ul>\n<li>\n<h4>tokudb_analyze_time<\/h4>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\">This is a boundary on the number of seconds an <strong>ANALYZE TABLE<\/strong> will operate on each index on each partition on a TokuDB table.<\/p>\n<p style=\"padding-left: 30px;\">That is, if <strong>tokudb_analyze_time = 5<\/strong>, and your table has <strong>4<\/strong> indexes (including <strong>PRIMARY<\/strong>) and <strong>7<\/strong> partitions, then the total runtime is limited to <strong>5*4*7 = 140<\/strong> seconds.<\/p>\n<p style=\"padding-left: 30px;\">Default in <strong>7.1.0<\/strong>: <strong>5<\/strong> seconds<\/p>\n<ul>\n<li>\n<h4>tokudb_cache_size<\/h4>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\">Similar to <strong>innodb_buffer_pool_size<\/strong>, this variable sets the amount of memory allocated by TokuDB for caching pages. Like InnoDB the table is clustered within the index, so the cache includes pages for both indexes and data.<\/p>\n<p style=\"padding-left: 30px;\">Default: <strong>50%<\/strong> of total memory<\/p>\n<ul>\n<li>\n<h4>tokudb_directio<\/h4>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\">Boolean, values are <strong>0\/1<\/strong>. Setting <strong>tokudb_directio = 1<\/strong> is like specifying <strong>innodb_flush_method = O_DIRECT<\/strong>. Which in turn means the OS should not cache pages requested by TokuDB. Default: <strong>0<\/strong>.<\/p>\n<p style=\"padding-left: 30px;\">Now here&#8217;s the interesting part: we are used to tell InnoDB to get the most memory we can provide (because we want it to cache as much as it can) and to avoid OS caching (because that would mean a page would appear both in the buffer pool and in OS memory, which is a waste). So the following setup is common:<!--more--><\/p>\n<blockquote style=\"padding-left: 30px;\">\n<pre style=\"padding-left: 30px;\"><strong>innodb_buffer_pool_size<\/strong> = [as much as you can allocate while leaving room for connection memory]G\r\n<strong>innodb_flush_method<\/strong> = O_DIRECT<\/pre>\n<\/blockquote>\n<p style=\"padding-left: 30px;\">And my first instinct was to do the same for TokuDB. But after speaking to Gerry Narvaja of Tokutek, I realized it was not that simple. The reason TokuDB&#8217;s default memory allocation is <strong>50%<\/strong> and not, say, <strong>90%<\/strong>, is that OS cache caches the data in compressed form, while TokuDB cache caches data in uncompressed form. Which means if you limit the TokuDB cache, you allow for more cache to the OS, that is used to cache compressed data, which means <em>more data<\/em> (hopefully, pending duplicates) in memory.<\/p>\n<p style=\"padding-left: 30px;\">I did try both options and did not see an obvious difference, but did not test this thoroughly. My current setup is:<\/p>\n<blockquote style=\"padding-left: 30px;\">\n<pre style=\"padding-left: 30px;\"><strong>#No setup. just keep to the default for both:<\/strong>\r\n#tokudb_cache_size\r\n#tokudb_directio<\/pre>\n<\/blockquote>\n<ul>\n<li>\n<h4>tokudb_commit_sync<\/h4>\n<\/li>\n<\/ul>\n<ul>\n<li>\n<h4>tokudb_fsync_log_period<\/h4>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\">These two variable are similar in essence to <strong>innodb_flush_log_at_trx_commit<\/strong>, but allow for finer tuning. With <strong>innodb_flush_log_at_trx_commit<\/strong> you choose between syncing the transaction log to disk upon each commit and once per second. With <strong>tokudb_commit_sync = 1<\/strong> (which is default) you get transaction log sync to disk per commit. When <strong>tokudb_commit_sync = 0<\/strong>, then <strong>tokudb_fsync_log_period<\/strong> dictates the interval between flushes. So a value of <strong>tokudb_fsync_log_period = 1000<\/strong> means once per second.<\/p>\n<p style=\"padding-left: 30px;\">Since our original InnoDB installation used <strong>innodb_flush_log_at_trx_commit = 2<\/strong>, our TokuDB setup is:<\/p>\n<blockquote style=\"padding-left: 30px;\">\n<pre style=\"padding-left: 30px;\"><strong>tokudb_commit_sync<\/strong> = 0\r\n<strong>tokudb_fsync_log_period<\/strong> = 1000<\/pre>\n<\/blockquote>\n<ul>\n<li>\n<h4>tokudb_load_save_space<\/h4>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\">Turned on (value <strong>1<\/strong>) by default as of TokuDB <strong>7.1.0<\/strong>, this parameter decides whether temporary file created on bulk load operations (e.g. ALTER TABLE) are compressed or uncompressed. Do yourself a big favour (why? <a href=\"http:\/\/code.openark.org\/blog\/mysql\/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration\">read here<\/a>) and keep it on. Our setup is:<\/p>\n<blockquote>\n<pre><strong>tokudb_load_save_space<\/strong> = 1<\/pre>\n<\/blockquote>\n<p>TokuDB&#8217;s general recommendation is: don&#8217;t change the variables; the engine should work well right out of the box. I like the approach (by MySQL <strong>5.5<\/strong> I already lost count of InnoDB variables that can have noticeable impact; with <strong>5.6<\/strong> I&#8217;m all but lost). The complete list of configuration variables is found in <a href=\"http:\/\/www.tokutek.com\/wp-content\/uploads\/2013\/10\/mysql-5.5.30-tokudb-7.1.0-users-guide.pdf\">TokuDB&#8217;s Users Guide<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>During our experiments I came upon a few TokuDB variables of interest; if you are using TokuDB you might want to look into these: tokudb_analyze_time This is a boundary on the number of seconds an ANALYZE TABLE will operate on each index on each partition on a TokuDB table. That is, if tokudb_analyze_time = 5, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[11,14,52,102],"class_list":["post-6613","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-configuration","tag-innodb","tag-performance","tag-tokudb"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1IF","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/6613","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=6613"}],"version-history":[{"count":17,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/6613\/revisions"}],"predecessor-version":[{"id":6642,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/6613\/revisions\/6642"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=6613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=6613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=6613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}