{"id":7717,"date":"2017-06-20T06:05:39","date_gmt":"2017-06-20T04:05:39","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7717"},"modified":"2017-06-20T06:05:39","modified_gmt":"2017-06-20T04:05:39","slug":"observations-on-the-hashicorpraft-library-and-notes-on-rdbms","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/observations-on-the-hashicorpraft-library-and-notes-on-rdbms","title":{"rendered":"Observations on the hashicorp\/raft library, and notes on RDBMS"},"content":{"rendered":"<p>The <a href=\"https:\/\/github.com\/hashicorp\/raft\">hashicorp\/raft<\/a> library is a Go library to provide consensus via Raft protocol implementation. It is the underlying library behind Hashicorp&#8217;s <a href=\"https:\/\/github.com\/hashicorp\/consul\">Consul<\/a>.<\/p>\n<p>I&#8217;ve had the opportunity to work with this library a couple projects, namely <a href=\"https:\/\/github.com\/github\/freno\">freno<\/a> and <a href=\"https:\/\/github.com\/github\/orchestrator\/pull\/183\/\">orchestrator<\/a>. Here are a few observations on working with this library:<\/p>\n<ul>\n<li>TL;DR on Raft: a group communication protocol; multiple nodes communicate, elect a leader. A leader leads a <em>consensus<\/em> (any subgroup of more than half the nodes of the original group, or hopefully all of them). Nodes may leave and rejoin, and will remain consistent with consensus.<\/li>\n<li>The hashicorp\/raft library is an implementation of the Raft protocol. There are <a href=\"https:\/\/raft.github.io\/#implementations\">other implementations<\/a>, and different implementations support different features.<\/li>\n<li>The most basic premise is leader election. This is pretty straightforward to implement; you set up nodes to communicate to each other, and they elect a leader. You may query for the leader identity via <a href=\"https:\/\/godoc.org\/github.com\/hashicorp\/raft#Raft.Leader\">Leader()<\/a>, <a href=\"https:\/\/godoc.org\/github.com\/hashicorp\/raft#Raft.VerifyLeader\">VerifyLeader()<\/a>, or observing <a href=\"https:\/\/godoc.org\/github.com\/hashicorp\/raft#Raft.LeaderCh\">LeaderCh<\/a>.<\/li>\n<li>You have no control over the identity of the leader. You cannot &#8220;prefer&#8221; one node to be the leader. You cannot\u00a0<em>grab<\/em> leadership from an elected leader, and\u00a0you cannot demote a leader unless by killing it.<\/li>\n<li>The next premise is gossip, sending messages between the raft nodes. With <code>hashicorp\/raft<\/code>, only the leader may send messages to the group. This is done via the <a href=\"https:\/\/godoc.org\/github.com\/hashicorp\/raft#Raft.Apply\">Apply()<\/a> function.<\/li>\n<li>Messages are nothing but blobs. Your app encodes the messages into <code>[]byte<\/code>\u00a0and ships it via raft. Receiving ends need to decode the bytes into a meaningful message.<\/li>\n<li>You will check the result of Apply(), an\u00a0<a href=\"https:\/\/godoc.org\/github.com\/hashicorp\/raft#ApplyFuture\">ApplyFuture<\/a>. The call to <a href=\"https:\/\/godoc.org\/github.com\/hashicorp\/raft#Future\">Error()<\/a> will wait for consensus.<\/li>\n<li>Just what is a message consensus? It&#8217;s a guarantee that the consensus of nodes has received and registered the message.<\/li>\n<li>Messages form the raft log.<\/li>\n<li>Messages are guaranteed to be handled in-order across all nodes.<\/li>\n<li>The leader is satisfied when the followers receive the messages\/log, but it cares not for their interpretation of the log.<\/li>\n<li>The leader does not collect the output, or return value, of the followers applying of the log.<\/li>\n<li>Consequently, your followers may not abort the message. They may not cast an opinion. They must adhere to the instruction received from the leader.<\/li>\n<li><code>hashicorp\/raft<\/code>\u00a0uses either an <a href=\"http:\/\/github.com\/hashicorp\/raft-mdb\">LMDB-based<\/a> store or <a href=\"https:\/\/github.com\/boltdb\/bolt\">BoltDB<\/a> for persisting your messages. Both are transactional stores.<\/li>\n<li>Messages are expected to be idempotent: a node that, say, happens to restart, will request to join back the consensus (or to form a consensus with some other node). To do that, it will have to reapply historical messages that it may have applied in the past.<\/li>\n<li>Number of messages (log entries) will grow infinitely. Snapshots are taken so as to truncate the log\u00a0history. You will implement the snapshot dump &amp; load.<\/li>\n<li>A snapshot includes the log index up to which it covers.<\/li>\n<li>Upon startup, your node will look for the most recent snapshot. It will read it, then resume replication from the aforementioned log index.<\/li>\n<li><code>hashicorp\/raft<\/code>\u00a0provides a file-system based snapshot implementation.<\/li>\n<\/ul>\n<p>One of my use cases is completely satisfied with the existing implementations of <code>BoltDB<\/code> and of the filesystem snapshot.<\/p>\n<p>However in another (<code>orchestrator<\/code>), my app stores its state in a relational backend. To that effect,\u00a0I&#8217;ve modified the logstore and snapshot store. I&#8217;m using either MySQL or <code>sqlite<\/code>\u00a0as backend stores for my app. How does that affect my <code>raft<\/code>\u00a0use?<!--more--><\/p>\n<ul>\n<li>My backend RDBMS is the de-facto state of my <code>orchestrator<\/code> app. Anything\u00a0written to this DB is persisted and durable.<\/li>\n<li>When\u00a0<code>orchestrator<\/code>\u00a0applies a raft log\/message, it runs some app logic which ends with a write to the backend DB. At that time, the raft log is effectively not required anymore to persist. I care not for the history of logs.<\/li>\n<li>Moreover, I care not for snapshotting.\u00a0To elaborate, I care not for snapshot data. My backend RDBMS <em>is the snapshot data<\/em>.<\/li>\n<li>Since I&#8217;m running a RDBMS, I find <code>BoltDB<\/code>\u00a0to be wasteful, an additional transaction store on top a transaction store I already have.<\/li>\n<li>Likewise, the filesystem snapshots are yet another form of store.<\/li>\n<li>Log Store (including Stable Store) are <a href=\"https:\/\/github.com\/github\/orchestrator\/blob\/222e5b55ee51c89c39b2876c774364baecc01878\/go\/raft\/rel_store.go\">easily re-implemented<\/a> on top of RDBMS. The log is a classic relational entity.<\/li>\n<li>Snapshot is <a href=\"https:\/\/github.com\/github\/orchestrator\/blob\/222e5b55ee51c89c39b2876c774364baecc01878\/go\/raft\/rel_snapshot.go\">also implemented<\/a> on top of RDBMS,\u00a0\u00a0however I only care for the snapshot metadata (what log entry is covered by a snapshot) and completely discard storing\/loading snapshot\u00a0<em>state<\/em> or\u00a0<em>content<\/em>.<\/li>\n<li>With all these in place, I have a single\u00a0entity that defines:\n<ul>\n<li>What my data looks like<\/li>\n<li>Where my node fares in the group gossip<\/li>\n<\/ul>\n<\/li>\n<li>A single RDBMS restore returns a dataset that will catch up with raft log correctly. However my restore window is limited by the number of snapshots I store and their frequency.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The hashicorp\/raft library is a Go library to provide consensus via Raft protocol implementation. It is the underlying library behind Hashicorp&#8217;s Consul. I&#8217;ve had the opportunity to work with this library a couple projects, namely freno and orchestrator. Here are a few observations on working with this library: TL;DR on Raft: a group communication protocol; [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[53,5],"tags":[129,124,118,108,128],"class_list":["post-7717","post","type-post","status-publish","format-standard","hentry","category-development","category-mysql","tag-freno","tag-golang","tag-mysql","tag-orchestrator","tag-raft"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-20t","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7717","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7717"}],"version-history":[{"count":7,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7717\/revisions"}],"predecessor-version":[{"id":7725,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7717\/revisions\/7725"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7717"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7717"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7717"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}