golang – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Thu, 04 Jan 2018 11:37:10 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Implementing non re-entrant functions in Golang https://shlomi-noach.github.io/blog/golang/implementing-non-re-entrant-functions-in-golang https://shlomi-noach.github.io/blog/golang/implementing-non-re-entrant-functions-in-golang#respond Thu, 04 Jan 2018 11:34:13 +0000 https://shlomi-noach.github.io/blog/?p=7820 A non re-entrant function is a function that could only be executing once at any point in time, regardless of how many times it is being invoked and by how many goroutines.

This post illustrates blocking non re-entrant functions and yielding non re-entrant functions implementations in golang.

A use case

A service is polling for some conditions, monitoring some statuses once per second. We want each status to be checked independently of others without blocking. An implementation might look like:

func main() {
    tick := time.Tick(time.Second)
    go func() {
        for range tick {
            go CheckSomeStatus()
            go CheckAnotherStatus()
        }
    }()
}

We choose to run each status check in its own goroutine so that CheckAnotherStatus() doesn’t wait upon CheckSomeStatus() to complete.

Each of these checks typically take a very short amount of time, and much less than a second. What happens, though, if CheckAnotherStatus() itself takes more than one second to run? Perhaps there’s an unexpected network or disk latency affecting the execution time of the check.

Does it make sense for the function to be executed twice at the same time? If not, we want it to be non re-entrant.

Blocking, non-reentrant functions

The simple way to prevent a function from operating multiple times concurrently is using a sync.Mutex.

Assuming we only care about calling this function from the loop above, we can implement the lock from outside the function:

import (
    "sync"
    "time"
)

func main() {
    tick := time.Tick(time.Second)
    var mu sync.Mutex
    go func() {
        for range tick {
            go CheckSomeStatus()
            go func() {
                mu.Lock()
                defer mu.Unlock()
                CheckAnotherStatus()
            }()
        }
    }()
}

The above ensures CheckAnotherStatus() is not executed by multiple iterations of our loop concurrently. Any subsequent iterations of the loop whilst a previous execution of CheckAnotherStatus() is still running, will be blocked by the mutex.

The blocking solution has the following properties:

  • It ensures as many `CheckAnotherStatus()` invocations as the number of loop iterations.
  • Assuming one execution of `CheckAnotherStatus()` stalls, subsequent iterations make for a pileup of requests to invoke same function.

Yielding, non re-entrant functions

In our status check story, it may not make sense for some 10 subsequent calls to pile up. One the stalled CheckAnotherStatus() execution completes, all 10 calls suddenly execute, sequentially, and possibly all complete within the next second, making for 10 identical checks at that same second.

Another solution would be to yield. A yielding solution will:

  • Abort execution of `CheckAnotherStatus()` if it is already being executed.
  • Will run at most one execution of `CheckAnotherStatus()` per second.
  • May in effect run less `CheckAnotherStatus()` invocations than the number of loop iterations.

The solution is achieved via:

import (
    "sync/atomic"
    "time"
)

func main() {
    tick := time.Tick(time.Second)
    var reentranceFlag int64
    go func() {
        for range tick {
            go CheckSomeStatus()
            go func() {
                if atomic.CompareAndSwapInt64(&reentranceFlag, 0, 1) {
                    defer atomic.StoreInt64(&reentranceFlag, 0)
                } else {
                    return
                }
                CheckAnotherStatus()
            }()
        }
    }()
}

atomic.CompareAndSwapInt64(&reentranceFlag, 0, 1) will return true only when reentranceFlag == 0, and will atomically set it to 1. In such case, entry is allowed and the function can be executed. reentranceFlag is kept at 1 until CheckAnotherStatus() completes, at which time it is reset. When CompareAndSwapInt64(...) returns false that means reentranceFlag != 0, meaning the function is already being executing by another goroutine. The code yields and silently exits the function.

We have chosen to implement the non re-entrant code outside the function in question; we could have implemented it within the function itself. Also, I have a thing for int64. An int32 will of course suffice.

]]>
https://shlomi-noach.github.io/blog/golang/implementing-non-re-entrant-functions-in-golang/feed 0 7820
Observations on the hashicorp/raft library, and notes on RDBMS https://shlomi-noach.github.io/blog/mysql/observations-on-the-hashicorpraft-library-and-notes-on-rdbms https://shlomi-noach.github.io/blog/mysql/observations-on-the-hashicorpraft-library-and-notes-on-rdbms#comments Tue, 20 Jun 2017 04:05:39 +0000 https://shlomi-noach.github.io/blog/?p=7717 The hashicorp/raft library is a Go library to provide consensus via Raft protocol implementation. It is the underlying library behind Hashicorp’s Consul.

I’ve had the opportunity to work with this library a couple projects, namely freno and orchestrator. Here are a few observations on working with this library:

  • TL;DR on Raft: a group communication protocol; multiple nodes communicate, elect a leader. A leader leads a consensus (any subgroup of more than half the nodes of the original group, or hopefully all of them). Nodes may leave and rejoin, and will remain consistent with consensus.
  • The hashicorp/raft library is an implementation of the Raft protocol. There are other implementations, and different implementations support different features.
  • The most basic premise is leader election. This is pretty straightforward to implement; you set up nodes to communicate to each other, and they elect a leader. You may query for the leader identity via Leader(), VerifyLeader(), or observing LeaderCh.
  • You have no control over the identity of the leader. You cannot “prefer” one node to be the leader. You cannot grab leadership from an elected leader, and you cannot demote a leader unless by killing it.
  • The next premise is gossip, sending messages between the raft nodes. With hashicorp/raft, only the leader may send messages to the group. This is done via the Apply() function.
  • Messages are nothing but blobs. Your app encodes the messages into []byte and ships it via raft. Receiving ends need to decode the bytes into a meaningful message.
  • You will check the result of Apply(), an ApplyFuture. The call to Error() will wait for consensus.
  • Just what is a message consensus? It’s a guarantee that the consensus of nodes has received and registered the message.
  • Messages form the raft log.
  • Messages are guaranteed to be handled in-order across all nodes.
  • The leader is satisfied when the followers receive the messages/log, but it cares not for their interpretation of the log.
  • The leader does not collect the output, or return value, of the followers applying of the log.
  • Consequently, your followers may not abort the message. They may not cast an opinion. They must adhere to the instruction received from the leader.
  • hashicorp/raft uses either an LMDB-based store or BoltDB for persisting your messages. Both are transactional stores.
  • Messages are expected to be idempotent: a node that, say, happens to restart, will request to join back the consensus (or to form a consensus with some other node). To do that, it will have to reapply historical messages that it may have applied in the past.
  • Number of messages (log entries) will grow infinitely. Snapshots are taken so as to truncate the log history. You will implement the snapshot dump & load.
  • A snapshot includes the log index up to which it covers.
  • Upon startup, your node will look for the most recent snapshot. It will read it, then resume replication from the aforementioned log index.
  • hashicorp/raft provides a file-system based snapshot implementation.

One of my use cases is completely satisfied with the existing implementations of BoltDB and of the filesystem snapshot.

However in another (orchestrator), my app stores its state in a relational backend. To that effect, I’ve modified the logstore and snapshot store. I’m using either MySQL or sqlite as backend stores for my app. How does that affect my raft use?

  • My backend RDBMS is the de-facto state of my orchestrator app. Anything written to this DB is persisted and durable.
  • When orchestrator applies a raft log/message, it runs some app logic which ends with a write to the backend DB. At that time, the raft log is effectively not required anymore to persist. I care not for the history of logs.
  • Moreover, I care not for snapshotting. To elaborate, I care not for snapshot data. My backend RDBMS is the snapshot data.
  • Since I’m running a RDBMS, I find BoltDB to be wasteful, an additional transaction store on top a transaction store I already have.
  • Likewise, the filesystem snapshots are yet another form of store.
  • Log Store (including Stable Store) are easily re-implemented on top of RDBMS. The log is a classic relational entity.
  • Snapshot is also implemented on top of RDBMS,  however I only care for the snapshot metadata (what log entry is covered by a snapshot) and completely discard storing/loading snapshot state or content.
  • With all these in place, I have a single entity that defines:
    • What my data looks like
    • Where my node fares in the group gossip
  • A single RDBMS restore returns a dataset that will catch up with raft log correctly. However my restore window is limited by the number of snapshots I store and their frequency.
]]>
https://shlomi-noach.github.io/blog/mysql/observations-on-the-hashicorpraft-library-and-notes-on-rdbms/feed 1 7717
Forking Golang repositories on GitHub and managing the import path https://shlomi-noach.github.io/blog/development/forking-golang-repositories-on-github-and-managing-the-import-path https://shlomi-noach.github.io/blog/development/forking-golang-repositories-on-github-and-managing-the-import-path#comments Mon, 23 Nov 2015 12:22:34 +0000 https://shlomi-noach.github.io/blog/?p=7506 Problem: there’s an awesome Golang project on GitHub which you want to fork. You want to develop & collaborate on that fork, but the golang import path, in your source code, still references the original path, breaking everything.

A couple solutions offered below. First, though, let’s get some names.

A sample case, the problem at hand

There’s an awesome tool on http://github.com/awsome-org/tool. You successfully fork it onto http://github.com/awesome-you/tool.

You want to collaborate on http://github.com/awesome-you/tool; you wish to pull, commit & push. Maybe you want to send pull requests to the origin.

The following is commonly found throughout .go files in the repository:

import (
    "github.com/awesome-org/tool/config"
    "github.com/awesome-org/tool/driver"
    "github.com/awesome-org/tool/net"
    "github.com/awesome-org/tool/util"
)

If you:

go get http://github.com/awesome-you/tool

golang creates your $GOPATH/src/github.com/awesome-you/tool/, which is awesome. However, as you resolve dependencies via

cd $GOPATH/src/github.com/awesome-you/tool/ ; go get ./...

golang digs into the source code, finds references to github.com/awesome-org/tool/configgithub.com/awesome-org/tool/driver etc, and fetches those from http://github.com/awsome-org/tool and onto $GOPATH/src/github.com/awesome-org/tool/, which is not awesome. You actually have two copies of the code, one from your fork, one from the origin, and your own fork will be largely ignored as it mostly points back to the origin.

A bad solution

The dirty, bad solution would be for you to go over the source code and replace “github.com/awesome-org/tool” entries with “github.com/awesome-you/tool”. It is bad for two reasons:

  • You will not be able to further pull changes from upstream
  • You will not be able to pull-request and push your own changes upstream

When I say “You will not be able” I mean “in a reasonable, developer-friendly manner”. The code will be incompatible with upstream and you have effectively detached your code. You will need to keep editing and re-editing those entries anytime you wish to pull/push upstream.

Solution #1: add remote

Described in GitHub and Go: forking, pull requests, and go-getting, follow these procedures:

go get http://github.com/awesome-org/tool
git remote add awesome-you-fork http://github.com/awesome-you/tool

You’re adding your repository as remote. You will from now on need to explicitly:

git pull --rebase awesome-you-fork
git push awesome-you-fork

If you forget to add the “awesome-you-fork” argument, you are pulling and pushing from upstream.

Solution #2: cheat “go get”, DIY

The problem began with the go get command, which copied the URI path onto $GOPATH/src. However go get implicitly issues a git clone, and we can do the same ourselves. We will dirty our hands just once, and then benefit from an ambiguous-less environment.

We will now create our git repository in the name of awesome-org but with the contents of awesome-you:

cd $GOPATH
mkdir -p {src,bin,pkg}
mkdir -p src/github.com/awesome-org/
cd src/github.com/awesome-org/
git clone git@github.com:awesome-you/tool.git # OR: git clone https://github.com/awesome-you/tool.git
cd tool/
go get ./...

The mkdir -p {src,bin,pkg} is there just in case you do not have anything setup in your $GOPATH. We then create the repository path under the name of awesome-org, but once inside clone from awesome-you.

The source code’s import path fits your directory layout now, but as you push/pull you are only speaking to your own awesome-you repository.

]]>
https://shlomi-noach.github.io/blog/development/forking-golang-repositories-on-github-and-managing-the-import-path/feed 4 7506