{"id":7180,"date":"2015-01-26T17:50:46","date_gmt":"2015-01-26T15:50:46","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7180"},"modified":"2016-03-04T13:43:18","modified_gmt":"2016-03-04T11:43:18","slug":"reading-rbr-binary-logs-with-pt-query-digest","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/reading-rbr-binary-logs-with-pt-query-digest","title":{"rendered":"Reading RBR binary logs with pt-query-digest"},"content":{"rendered":"<p>For purposes of auditing anything that goes on our servers we&#8217;re looking to parse the binary logs of all servers (masters), as with &#8220;<a href=\"http:\/\/code.openark.org\/blog\/mysql\/anemomaster-dml-visibility-your-must-do-for-tomorrow\">Anemomaster<\/a>&#8220;. With Row Based Replication this is problematic since <strong>pt-query-digest<\/strong> <a href=\"https:\/\/bugs.launchpad.net\/percona-toolkit\/+bug\/1377887\">does not support parsing RBR binary logs<\/a> (true for <strong>2.2.12<\/strong>, latest at this time).<\/p>\n<p>I&#8217;ve written a simple script that translates RBR logs to SBR-like logs, with a little bit of cheating. My interest is that <strong>pt-query-digest<\/strong> is able to capture and count the queries, nothing else. By doing some minimal text manipulation on the binary log I&#8217;m able to now feed it to <strong>pt-query-digest<\/strong> which seems to be happy.<\/p>\n<p>The script of course does not parse the binary log directly; furthermore, it requires the binary log to be extracted via:<\/p>\n<blockquote>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">mysqlbinlog --verbose --base64-output=DECODE-ROWS your-mysql-binlog-filemame.000001<\/pre>\n<\/blockquote>\n<p>The above adds the interpretation of the RBR entires in the form of (unconventional) statements, commented, and strips out the cryptic RBR text. All that is left is to do a little manipulation on entry headers and uncomment the interpreted queries.<\/p>\n<p>The script can be found in <a href=\"https:\/\/gist.github.com\/shlomi-noach\/cc243fd690403e7617e3\">my gist repositories<\/a>. Current version is as follows:<!--more--><\/p>\n<blockquote>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n#!\/usr\/bin\/python\r\n#\r\n# Convert a Row-Based-Replication binary log to Statement-Based-Replication format, cheating a little.\r\n# This script exists since Percona Toolkit's pt-query-digest cannot digest RBR format. The script\r\n# generates enough for it to work with.\r\n# Expecting standard input\r\n# Expected input is the output of &quot;mysqlbinlog --verbose --base64-output=DECODE-ROWS &lt;binlog_file_name&gt;&quot;\r\n# For example:\r\n# $ mysqlbinlog --verbose --base64-output=DECODE-ROWS mysql-bin.000006 | python binlog-rbr-to-sbr.py | pt-query-digest --type=binlog --order-by Query_time:cnt --group-by fingerprint\r\n#\r\n\r\nimport fileinput\r\n\r\ndef convert_rbr_to_pseudo_sbr():\r\n    inside_rbr_statement = False\r\n    for line in fileinput.input():\r\n        line = line.strip()\r\n        if line.startswith(&quot;#&quot;) and &quot;end_log_pos&quot; in line:\r\n            for rbr_token in &#x5B;&quot;Update_rows:&quot;, &quot;Write_rows:&quot;, &quot;Delete_rows:&quot;, &quot;Rows_query:&quot;, &quot;Table_map:&quot;,]:\r\n                if rbr_token in line:\r\n                    line = &quot;%s%s&quot; % (line.split(rbr_token)&#x5B;0], &quot;Query\\tthread_id=1\\texec_time=0\\terror_code=0&quot;)\r\n        if line.startswith(&quot;### &quot;):\r\n            inside_rbr_statement = True\r\n            # The &quot;### &quot; commented rows are the pseudo-statement interpreted by mysqlbinlog's &quot;--verbose&quot;,\r\n            # and which we will feed into pt-query-digest\r\n            line = line&#x5B;4:]\r\n        else:\r\n            if inside_rbr_statement:\r\n                print(&quot;\/*!*\/;&quot;)\r\n            inside_rbr_statement = False\r\n        print(line) \r\n\r\nconvert_rbr_to_pseudo_sbr()\r\n<\/pre>\n<\/blockquote>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For purposes of auditing anything that goes on our servers we&#8217;re looking to parse the binary logs of all servers (masters), as with &#8220;Anemomaster&#8220;. With Row Based Replication this is problematic since pt-query-digest does not support parsing RBR binary logs (true for 2.2.12, latest at this time). I&#8217;ve written a simple script that translates RBR [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[25,96,31,8,50],"class_list":["post-7180","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-analysis","tag-percona-toolkit","tag-python","tag-replication","tag-scripts"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1RO","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7180"}],"version-history":[{"count":9,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7180\/revisions"}],"predecessor-version":[{"id":7547,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7180\/revisions\/7547"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}