¹¹institutetext: Faculty of Computer Science and Business Information Systems
Karlsruhe University of Applied Sciences
Moltkestrasse 30, 76133 Karlsruhe, Germany
¹¹email: martin.sulzmann@hs-karlsruhe.de
¹¹email: kai.stadtmueller@live.de

Trace-Based Run-Time Analysis of Message-Passing Go Programs

Martin Sulzmann and Kai Stadtmüller

Abstract

We consider the task of analyzing message-passing programs by observing their run-time behavior. We introduce a purely library-based instrumentation method to trace communication events during execution. A model of the dependencies among events can be constructed to identify potential bugs. Compared to the vector clock method, our approach is much simpler and has in general a significant lower run-time overhead. A further advantage is that we also trace events that could not commit. Thus, we can infer more alternative communications. This provides the user with additional information to identify potential bugs. We have fully implemented our approach in the Go programming language and provide a number of examples to substantiate our claims.

1 Introduction

We consider run-time analysis of programs that employ message-passing. Specifically, we consider the Go programming language [4] which integrates message-passing in the style of Communicating Sequential Processes (CSP) [6] into a C style language. We assume the program is instrumented to trace communication events that took place during program execution. Our objective is to analyze program traces to assist the user in identifying potential concurrency bugs.

Motivating Example

⬇

func reuters(ch chan string) { ch <- "REUTERS" } // r!

func bloomberg(ch chan string) { ch <- "BLOOMBERG" } // b!

func newsReader(rCh chan string, bCh chan string) {

ch := make(chan string)

go func() { ch <- (<-rCh) }() // r?; ch!

go func() { ch <- (<-bCh) }() // b?; ch!

x := <-ch // ch?

}

func main() {

reutersCh := make(chan string)

bloombergCh := make(chan string)

go reuters(reutersCh)

go bloomberg(bloombergCh)

go newsReader(reutersCh, bloombergCh) // N1

newsReader(reutersCh, bloombergCh) // N2

}

Listing 1: Message passing in Go

In Listing 1 we find a Go program implementing a system of newsreaders. The main function creates two synchronous channels, one for each news agency. Go supports (a limited form of) type inference and therefore no type annotations are required. Next, we create one thread per news agency via the keyword go. Each news agency transmits news over its own channel. In Go, we write ch <- "REUTERS" to send value "REUTERS" via channel ch. We write <-ch to receive a value via channel ch. As we assume synchronous channels, both operations block and only unblock once a sender finds a matching receiver. We find two newsreader instances. Each newsreader creates two helper threads that wait for news to arrive and transfer any news that has arrived to a common channel. The intention is that the newsreader wishes to receive any news whether it be from Reuters or Bloomberg. However, there is a subtle bug (to be explained shortly).

Trace-Based Run-Time Verification

We only consider finite program runs and therefore each of the news agencies supplies only a finite number of news (exactly one in our case) and then terminates. During program execution, we trace communication events, e.g. send and receive, that took place. Due to concurrency, a bug may not manifest itself because a certain ‘bad’ schedule is rarely taken in practice.

Here is a possible trace resulting from a ‘good’ program run.

r!; N1.r?; N1.ch!; N1.ch?; b!; N2.b?; N2.ch!; N2.ch?

We write r! to denote that a send event via the Reuters channel took place. As there are two instances of the newsReader function, we write N1.r? to denote that a receive event via the local channel took place in case of the first newsReader call. From the trace we can conclude that the Reuters news was consumed by the first newsreader and the Bloomberg news by the second newsreader.

Here is a trace resulting from a bad program run.

r!; b!; N1.r?; N1.b?; N1.ch!; N1.ch?; DEADLOCK

The helper thread of the first newsreader receives the Reuters and the Bloomberg news. However, only one of these messages will actually be read (consumed). This is the bug! Hence, the second newsreader gets stuck and we encounter a deadlock. The issue is that such a bad program run may rarely show up. So, the question is how can we assist the user based on the trace information resulting from a good program run? How can we infer that alternative schedules and communications may exist?

Event Order via Vector Clock Method

A well-established approach is to derive a partial order among events. This is usually achieved via a vector of (logical) clocks. The vector clock method was independently developed by Fidge [1] and Mattern [8]. For the above good program run, we obtain the following partial order among events.

r! < N1.r?         b! < N2.b?
N1.r? < N1.ch!     N2.b? < N2.ch!  (1)
N1.ch! < N1.ch?    N2.ch! < N2.ch? (2)

For example, (1) arises because N2.ch! happens (sequentially) after N2.b? For synchronous send/receive, we assume that receive happens after send. See (2). Based on the partial order, we can conclude that alternative schedules are possible. For example, b! could take place before r!. However, it is not clear how to infer alternative communications. Recall that the issue is that one of the newsreaders may consume both news messages. Our proposed method is able to clearly identify this issue and has the advantage to require a much simpler instrumentation We discuss these points shortly. First, we take a closer look at the details of instrumentation for the vector clock method.

Vector clocks are a refinement of Lamport’s time stamps [7]. Each thread maintains a vector of (logical) clocks of all participating partner threads. For each communication step, we advance and synchronize clocks. In pseudo code, the vector clock instrumentation for event sndR.

⬇

vc[reutersThread]++

ch <- ("REUTERS", vc, vcCh)

vc’ := max(vc, <-vcCh)’

We assume that vc holds the vector clock. The clock of the Reuters thread is incremented. Besides the original value, we transmit the sender’s vector clock and a helper channel vcCh. For convenience, we use tuple notation. The sender’s vector clock is updated by building the maximum among all entries of its own vector clock and the vector clock of the receiving party. The same vector clock update is carried out on the receiver side.

Our Method

We propose a much simpler instrumentation and tracing method to obtain a partial order among events. Instead of a vector clock, each thread traces the events that might happen and have happened. We refer to them as pre and post events. In pseudo code, our instrumentation for sndR looks like follows.

⬇

pre(hash(ch), "!")

ch <- ("REUTERS", threadId)

post(hash(ch), "!")

The bang symbol (‘!’) indicates a send operation. Function hash builds a hash index of channel names. The sender transmits its thread id number to the receiver. This is the only intra-thread overhead. No extra communication link is necessary.

Here are the traces for individual threads resulting from the above good program run.

R:           pre(r!); post(r!)
N1_helper1:  pre(r?); post(R#r?); pre(ch1!); post(ch1!)
N1_helper2:  pre(b?)
N1:          pre(ch1?); post(N1_helper1#ch1?)
B:           pre(b!); post(b!)
N2_helper1:  pre(r?)
N2_helper2:  pre(b?); post(B#b?); pre(ch2!); post(ch2!)
N2:          pre(ch2?); post(N2_helper2#ch2?)

We write pre(r!) to indicate that a send via the Reuters channel might happen. We write post(R#r?) to indicate that a receive has happened via thread R. The partial order among events is obtained by a simple post-processing phase where we linearly scan through traces. For example, within a trace there is a strict order and therefore

N2_helper2:  pre(b?); post(B#b?); pre(ch2!); post(ch2!)

implies N2.b? < N2.ch!. Across threads we check for matching pre/post events. Hence,

R:           pre(r!); post(r!)
N1_helper1:  pre(r?); post(R#r?); ...

implies r! < N1.r?. So, we obtain the same (partial order) information as the vector clock approach but with less overhead.

The reduction in terms of tracing overhead compared to the vector clock method is rather drastic assuming a library-based tracing scheme with no access to the Go run-time system. For each communication event we must exchange vector clocks, i.e. $n$ additional (time stamp) values need to be transmitted where $n$ is the number of threads. Besides extra data to be transmitted, we also require an extra communication link because the sender requires the receivers vector clock. In contrast, our method incurs a constant tracing overhead. Each sender transmits in addition its thread id. No extra communication link is necessary. This results in much less run-time overhead as we will see later.

The vector clock tracing method can be improved assuming we extend the Go run-time system. For example, by maintaining a per-thread vector clock and having the run-time system carrying out the exchange of vector clocks for each send/receive communication. There is still the $O(n)$ space overhead. Our method does not require any extension of the Go run-time system to be efficient and therefore is also applicable to other languages that offer similar features as found in Go.

A further advantage of our method is that we also trace (via pre) events that could not commit (post is missing). Thus, we can easily infer alternative communications. For example, for R: pre(r!); ... there is the alternative match N2_helper1: pre(r?). Hence, instead of r! < N1.r? also r! < N2.r? is possible. This indicates that one newsreader may consume both news message. The vector clock method, only traces events that could commit, post events in our notation. Hence, the above alternative communication could not be derived.

Contributions

Compared to earlier works based on the vector clock method, we propose a much more light-weight and more informative instrumentation and tracing scheme. Specifically, we make the following contributions:

•

We give a precise account of our run-time tracing method (Section 3) for message-passing as found in the Go programming language (Section 2) where for space reasons we only formalize the case of synchronous channels and selective communications.
•

A simple analysis of the resulting traces allows us to detect alternative schedules and communications (Section 4). For efficiency reasons, we employ a directed dependency graph to represent happens-before relations (Section 4.1).
•

We show that vector clocks can be easily recovered based on our tracing method (Section 5). We also discuss the pros and cons of both methods for analysis purposes.
•

Our tracing method can be implemented efficiently as a library. We have fully implemented the approach supporting all Go language features dealing with message-passing such as buffered channels, select with default or timeout and closing of channels (Section 6).
•

We provide experimental results measuring the often significantly lower overhead of our method compared to the vector clock method assuming based methods are implemented as libraries (Section 6.2).

The online version of this paper contains an appendix with further details.¹¹1https://arxiv.org/abs/1709.01588

2 Message-Passing Go

Syntax

For brevity, we consider a much simplified fragment of the Go programming language. We only cover straight-line code, i.e. omitting procedures, if-then-else etc. This is not an onerous restriction as we only consider finite program runs. Hence, any (finite) program run can be represented as a program consisting of straight-line code only.

Definition 1 (Program Syntax)

\begin{array}[]{lcll}x,y,&\dots&&\mbox{Variables, Channel Names}\\ i,j,&\dots&&\mbox{Integers}\\ b&::=&x\mid i\mid{\sf hash}(x)\mid{\sf head}(b)\mid{\sf last}(b)\mid\textit{bs}\mid\mbox{\sf{tid}}&\mbox{Expressions}\\ \textit{bs}&::=&[]\mid b:\textit{bs}\\ e,f&::=&x\leftarrow b\mid y:=\leftarrow x&\mbox{Transmit/Receive}\\ c&::=&y:=b\mid y:=\mbox{\sf makeChan}\mid\mbox{\sf go}\ p\mid\mbox{\sf select}\ [e_{i}\Rightarrow p_{i}]_{i\in I}&\mbox{Commands}\\ p,q,r&::=&[]\mid c:p&\mbox{Program}\end{array}

For our purposes, values are integers or lists (slices in Go terminology). For lists we follow Haskell style notation and write $b:bs$ to refer to a list with head element $b$ and tail $bs$ . We can access the head and last element in a list via primitives ${\sf head}$ and ${\sf last}$ . We often write $[b_{1},\dots,b_{n}]$ as a shorthand $b_{1}:\dots:[]$ . Primitive tid yields the thread id number of the current thread. We assume that the main thread always has thread id number $1$ and new thread id numbers are generated in increasing order. Primitive ${\sf hash}()$ yields a unique hash index for each variable name. Both primitives show up in our instrumentation.

A program is a sequence of commands where commands are stored in a list. Primitive makeChan creates a new synchronous channel. Primitive go creates a new go routine (thread). For send and receive over a channel we follow Go notation. We assume that a receive is always tied to an assignment. For assignment we use symbol $:=$ to avoid confusion with the mathematical equality symbol $=$ . In Go, symbol $:=$ declares a new variable with some initial value. We also use $:=$ to overwrite the value of existing variables. As a message passing command we only support selective communication via select. Thus, we can fix the bug in our newsreader example.

⬇

func newsReaderFixed(rCh chan string, bCh chan string) {

ch := make(chan string)

select {

case x := <-rCh:

case x := <-bCh:

}

The select statement guarantees that at most one news message will be consumed and blocks if no news are available. In our simplified language, we assume that the $x\leftarrow b$ command is a shorthand for $\mbox{\sf select}\ [x\leftarrow b\Rightarrow[]]$ . For space reasons, we omit buffered channels, select paired with a default/timeout case and closing of channels. All three features are fully supported by our implementation.

Trace-Based Semantics

The semantics of programs is defined via a small-step operational semantics. The semantics keeps track of the trace of channel-based communications that took place. This allows us to relate the traces obtained by our instrumentation with the actual run-time traces.

We support multi-threading via a reduction relation

(S,[i_{1}\sharp p_{1},\dots,i_{n}\sharp p_{n}])\ext@arrow 0359\Rightarrowfill@{}{T}(S^{\prime},[j_{1}\sharp q_{1},\dots,j_{n}\sharp q_{n}]).

We write $i\sharp p$ to denote a program $p$ that runs in its own thread with thread id $i$ . We use lists to store the set of program threads. The state of program variables, before and after execution, is recorded in $S$ and $S^{\prime}$ . We assume that threads share the same state. Program trace $T$ records the sequence of communications that took place during execution. We write $x!$ to denote a send operation on channel $x$ and $x?$ to denote a receiver operation on channel $x$ . The semantics of expressions is defined in terms a big-step semantics. We employ a reduction relation $(i,S)\,\vdash\,b\Downarrow v$ where $S$ is the current state, $b$ the expression and $v$ the result of evaluating $b$ . The formal details follow.

Definition 2 (State)

\begin{array}[]{lcll}v&::=&x\mid i\mid[]\mid\textit{vs}&\mbox{Values}\\ \textit{vs}&::=&[]\mid v:\textit{vs}\\ s&::=&v\mid\mathit{Chan}&\mbox{Storables}\\ S&::=&()\mid(x\mapsto s)\mid S\lhd S&\mbox{State}\end{array}

A state $S$ is either empty, a mapping, or an override of two states. Each state maps variables to storables. A storable is either a plain value or a channel. Variable names may appear as values. In an actual implementation, we would identify the variable name by a unique hash index. We assume that mappings in the right operand of the map override operator $\lhd$ take precedence. They overwrite any mappings in the left operand. That is, $(x\mapsto v_{1})\lhd(x\mapsto v_{2})=(x\mapsto v_{2})$ .

Definition 3 (Expression Semantics $(i,S)\,\vdash\,b\Downarrow v$ )

\begin{array}[]{c}{\begin{array}[]{c}S(x)=v\\ \hline\cr(i,S)\,\vdash\,x\Downarrow v\end{array}}\ \ \ \ (i,S)\,\vdash\,j\Downarrow j\ \ \ \ (i,S)\,\vdash\,[]\Downarrow[]\ \ \ \ {\begin{array}[]{c}(i,S)\,\vdash\,b\Downarrow v\ (i,S)\,\vdash\,\textit{bs}\Downarrow\textit{vs}\\ \hline\cr(i,S)\,\vdash\,b:\textit{bs}\Downarrow v:\textit{vs}\end{array}}\\ {\begin{array}[]{c}(i,S)\,\vdash\,b\Downarrow v:\textit{vs}\\ \hline\cr(i,S)\,\vdash\,{\sf head}(b)\Downarrow v\end{array}}\ \ \ \ {\begin{array}[]{c}(i,S)\,\vdash\,b\Downarrow[v_{1},\dots,v_{n}]\\ \hline\cr(i,S)\,\vdash\,{\sf last}(b)\Downarrow v_{n}\end{array}}\ \ \ \ (i,S)\,\vdash\,\mbox{\sf{tid}}\Downarrow i\ \ \ \ (i,S)\,\vdash\,{\sf hash}(x)\Downarrow x\end{array}

Definition 4 (Program Execution $(S,P)\ext@arrow 0359\Rightarrowfill@{}{T}(S^{\prime},Q)$ )

\begin{array}[]{lcll}i\sharp p&&&\mbox{Single program thread}\\ P,Q&::=&[]\mid i\sharp p:P&\text{Program threads}\\ t&:=&i\sharp x!\mid i\leftarrow j\sharp x?&\text{Send and receive event}\\ T&::=&[]\mid t:T&\mbox{Trace}\end{array}

We write $(S,P)\ext@arrow 0359\Rightarrowfill@{}{}(S^{\prime},Q)$ as a shorthand for $(S,P)\ext@arrow 0359\Rightarrowfill@{}{[]}(S^{\prime},Q)$ .

Definition 5 (Single Step)

\begin{array}[]{c}\mbox{(Terminate)}\ (S,i\sharp[]:P)\ext@arrow 0359\Rightarrowfill@{}{}(S,P)\\ \\ \mbox{(Assign)}\ {\begin{array}[]{c}(i,S)\,\vdash\,b\Downarrow v\ \ \ S^{\prime}=S\lhd(y\mapsto v)\\ \hline\cr(S,i\sharp(y:=b:p):P)\ext@arrow 0359\Rightarrowfill@{}{}(S^{\prime},i\sharp p:P)\end{array}}\\ \\ \mbox{(MakeChan)}\ {\begin{array}[]{c}S^{\prime}=S\lhd(y\mapsto\mathit{Chan}\\ \hline\cr(S,i\sharp(y:=\mbox{\sf makeChan}:p):P)\ext@arrow 0359\Rightarrowfill@{}{}(S^{\prime},i\sharp p:P)\end{array}}\end{array}

Definition 6 (Multi-Threading and Synchronous Message-Passing)

\begin{array}[]{c}\mbox{(Go)}\ {\begin{array}[]{c}i\not\in\{i_{1},\dots,i_{n}\}\\ \hline\cr(S,i_{1}\sharp(\mbox{\sf go}\ p:p_{1}):P)\ext@arrow 0359\Rightarrowfill@{}{}(S,i\sharp p:i_{1}\sharp p_{1}:P)\end{array}}\\ \\ \mbox{(Sync)}\ {\begin{array}[]{c}\exists l\in J,m\in K.e_{l}=x\leftarrow b\ \ \ f_{m}=y:=\leftarrow x\ \ \ S(x)=\mathit{Chan}\\ (i_{1},S)\,\vdash\,b\Downarrow v\ \ \ S^{\prime}=S\lhd(y\mapsto v)\\ \hline\cr(S,i_{1}\sharp(\mbox{\sf select}\ [e_{j}\Rightarrow q_{j}]_{j\in J}:p_{1}):i_{2}\sharp(\mbox{\sf select}\ [f_{k}\Rightarrow r_{k}]_{k\in K}:p_{2}):P)\\ \ext@arrow 0359\Rightarrowfill@{}{[i_{1}\sharp x!,i_{2}\leftarrow i_{1}\sharp x?]}\\ (S^{\prime},i_{1}\sharp(q_{l}\ \texttt{++}\ p_{1}):i_{2}\sharp(r_{m}\ \texttt{++}\ p_{2}):P)\end{array}}\end{array}

Definition 7 (Scheduling)

\begin{array}[]{c}\mbox{(Schedule)}\ {\begin{array}[]{c}\mbox{$\pi$ permutation on $\{1,\dots,n\}$}\\ \hline\cr(S,[i_{1}\sharp p_{1},\dots,i_{n}\sharp p_{n}])\ext@arrow 0359\Rightarrowfill@{}{}(S,[\pi(i_{1})\sharp p_{\pi(1)},\dots,\pi(i_{n})\sharp p_{\pi(n)}])\end{array}}\\ \\ \mbox{(Closure)}\ {\begin{array}[]{c}(S,P)\ext@arrow 0359\Rightarrowfill@{}{T}(S^{\prime},P^{\prime})\ \ \ (S^{\prime},P^{\prime})\ext@arrow 0359\Rightarrowfill@{}{T^{\prime}}(S^{\prime\prime},P^{\prime\prime})\\ \hline\cr(S,P)\ext@arrow 0359\Rightarrowfill@{}{T\ \texttt{++}\ T^{\prime}}(S^{\prime\prime},P^{\prime\prime})\end{array}}\end{array}

3 Instrumentation and Run-Time Tracing

For each message passing primitive (send/receive) we log two events. In case of send, (1) a pre event to indicate the message is about to be sent, and (2) a post event to indicate the message has been sent. The treatment is analogous for receive. In our instrumentation, we write $x!$ to denote a single send event and $x?$ to denote a single receive event. These notations are shorthands and can be expressed in terms of the language described so far. We use $\equiv$ to define short-forms and their encodings. We define $x!\equiv[{\sf hash}(x),1]$ and $x?\equiv[{\sf hash}(x),0]$ . That is, send is represented by the number $1$ and receive by the number $0$ .

As we support non-deterministic selection, we employ a list of pre events to indicate that one of several events may be chosen For example, $\mathit{pre}([x!,y?])$ indicates that there is the choice among sending over channel $x$ and receiving over channel $y$ . This is again a shorthand notation where we assume $\mathit{pre}([b_{1},\dots,b_{n}])\equiv[0,b_{1},\dots,b_{n}]$ .

A post event is always singleton as at most one of the possible communications is chosen. As we also trace communication partners, we assume that the sending party transmits its identity, the thread id, to the receiving party. We write $\mathit{post}(i\sharp x?)$ to denote reception via channel $x$ where the sender has thread id $i$ . In case of a post send event, we simply write $\mathit{post}(x!)$ . The above are yet again shorthands where $i\sharp x?\equiv[{\sf hash}(x),0,i]$ and $\mathit{post}(b)\equiv[1,b]$ .

Pre and post events are written in a fresh thread local variable, denoted by $x_{tid}$ where $tid$ refers to the thread’s id number. At the start of the thread the variable is initialized by $x_{tid}:=[]$ . Instrumentation ensures that pre and post events are appropriately logged. As we keep track of communication partners, we must also inject and project messages with additional information (the sender’s thread id).

We consider instrumentation of $\mbox{\sf select}\ [x\leftarrow 1\Rightarrow[],y:=\leftarrow x\Rightarrow[z\leftarrow y]].$ We assume the above program text is part of a thread with id number $1$ . We non-deterministically choose between a send an receive operation. In case of receive, the received value is further transmitted. Instrumentation yields the following.

\begin{array}[]{l}[x_{1}:=x_{1}\ \texttt{++}\ \mathit{pre}([x!,x?]),\\ \ \mbox{\sf select}\ [x\leftarrow[\mbox{\sf{tid}},1]\Rightarrow[x_{1}:=x_{1}\ \texttt{++}\ \mathit{post}(x!)],\\ \ \ \ \ \ \ \ \ \ \ y^{\prime}:=\leftarrow x\Rightarrow[x_{1}:=x_{1}\ \texttt{++}\ \mathit{post}({\sf head}(y^{\prime})\sharp x?),y:={\sf last}(y^{\prime}),\\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ z\leftarrow[\mbox{\sf{tid}},y]]]\end{array}

We first store the pre events, either a read or send via channel $x$ . The send is instrumented by additionally transmitting the senders thread id. The post event for this case simply logs that a send took place. Instrumentation of receive is slightly more involved. As senders supply their thread id, we introduce a fresh variable $y^{\prime}$ . Via ${\sf head}(y^{\prime})$ we extract the senders thread id to properly record the communication partner in the post event. The actual value transmitted is accessed via ${\sf last}(y^{\prime})$ .

Definition 8 (Instrumentation of Programs)

We write $\mathit{instr}(p)=q$ to denote the instrumentation of program $p$ where $q$ is the result of instrumentation. Function $\mathit{instr}(\cdot)$ is defined by structural induction on a program. We assume a similar instrumentation function for commands.

\begin{array}[]{lcl}\mathit{instr}([])&=&[]\\ \mathit{instr}(c:p)&=&\mathit{instr}(c):\mathit{instr}(p)\\ \\ \mathit{instr}(y:=b)&=&[y:=b]\\ \mathit{instr}(y:=\mbox{\sf makeChan})&=&[y:=\mbox{\sf makeChan}]\\ \mathit{instr}(\mbox{\sf go}\ p)&=&[\mbox{\sf go}\ ([x_{tid}:=[]\ \texttt{++}\ \mathit{instr}(p)])]\\ \mathit{instr}(\mbox{\sf select}\ [e_{i}\Rightarrow p_{i}]_{i\in\{1,\dots,n\}})&=&[x_{tid}:=x_{tid}\ \texttt{++}\ [\mathit{pre}([\mathit{retr}(e_{1}),\dots,\mathit{retr}(e_{n})])],\\ &&\mbox{\sf select}\ [\mathit{instr}(e_{i}\Rightarrow p_{i})]_{i\in\{1,\dots,n\}}]\\ \mathit{instr}(x\leftarrow b\Rightarrow p)&=&x\leftarrow[\mbox{\sf{tid}},b]\Rightarrow(x_{tid}:=x_{tid}\ \texttt{++}\ [\mathit{post}(x!)])\ \texttt{++}\ \mathit{instr}(p)\\ \mathit{instr}(y:=\leftarrow x\Rightarrow p)&=&y^{\prime}:=\leftarrow x\Rightarrow[x_{tid}:=x_{tid}\ \texttt{++}\ [\mathit{post}({\sf head}(y^{\prime})\sharp x?)],\\ &&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y:={\sf last}(y^{\prime})]\ \texttt{++}\ \mathit{instr}(p)\end{array}

\begin{array}[]{c}\mathit{retr}(x\leftarrow b)=x!\ \ \ \ \mathit{retr}(y=\leftarrow x)=x?\end{array}

Run-time tracing proceeds as follows. We simply run the instrumented program and extract the local traces connected to variables $x_{tid}$ . We assume that thread id numbers are created during program execution and can be enumerated by $1\dots n$ for some $n>0$ where thread id number $1$ belongs to the main thread.

Definition 9 (Run-Time Tracing)

Let $p$ and $q$ be programs such that $\mathit{instr}(p)=q$ . We consider a specific instrumented program run where $((),[1\sharp[x_{1}:=[]]\ \texttt{++}\ q])\ext@arrow 0359\Rightarrowfill@{}{T}(S,1\sharp[]:P)$ for some $S$ , $T$ and $P$ . Then, we refer to $T$ as $p$ ’s actual run-time trace. We refer to the list $[1\sharp S(x_{1}),\dots,n\sharp S(x_{n})]$ as the local traces obtained via the instrumentation of $p$ .

Command $x_{1}:=[]$ is added to the instrumented program to initialize the trace of the main thread. Recall that main has thread id number $1$ . This extra step is necessary because our instrumentation only initializes local traces of threads generated via go. The final configuration $(S,1\sharp[]:P)$ indicates that the main thread has run to full completion. This is a realistic assumption as we assume that programs exhibit no obvious bug during execution. There might still be some pending threads, in case $P$ differs from the empty list.

4 Trace Analysis

We assume that the program has been instrumented and after some program run we obtain a list of local traces. We show that the actual run-time trace can be recovered and we are able to point out alternative behaviors that could have taken place. Alternative behaviors are either due alternative schedules or different choices among communication partners.

We consider the list of local traces $[1\sharp S(x_{1}),\dots,n\sharp S(x_{n})]$ . Their shape can be characterized as follows.

Definition 10 (Local Traces)

\begin{array}[]{lcl}U,V&::=&[]\mid i\sharp L:U\\ L&::=&[]\mid\mathit{pre}(as):M\\ as&::=&[]\mid x!:as\mid x?:as\\ M&::=&[]\mid\mathit{post}(x!):L\mid\mathit{post}(i\sharp x?):L\end{array}

We refer to $U=[1\sharp L_{1},\dots,n\sharp L_{n}]$ as a residual list of local traces if for each $L_{i}$ either $L_{i}=[]$ or $L_{i}=[\mathit{pre}(\dots)]$ .

To recover the communications that took place we check for matching pre and post events recorded in the list of local traces. For this purpose, we introduce a relation $U\ext@arrow 0359\Rightarrowfill@{}{T}V$ to denote that ‘replaying’ of $U$ leads to $V$ where communications $T$ took place. Valid replays are defined via the following rules.

Definition 11 (Replay $U\ext@arrow 0359\Rightarrowfill@{}{T}V$ )

\begin{array}[]{c}\mbox{(Sync)}\ {\begin{array}[]{c}L_{1}=\mathit{pre}([\dots,x!,\dots]):\mathit{post}(x!):L_{1}^{\prime}\\ L_{2}=\mathit{pre}([\dots,x?,\dots]):\mathit{post}(i_{1}\sharp x?):L_{2}^{\prime}\\ \hline\cr i_{1}\sharp L_{1}:i_{2}\sharp L_{2}:U\ext@arrow 0359\Rightarrowfill@{}{[i_{1}\sharp x!,i_{2}\leftarrow i_{1}\sharp x?]}i_{1}\sharp L_{1}^{\prime}:i_{2}\sharp L_{2}^{\prime}:U\end{array}}\\ \\ \mbox{(Schedule)}\ {\begin{array}[]{c}\mbox{$\pi$ permutation on $\{1,\dots,n\}$}\\ \hline\cr[i_{1}\sharp L_{1},\dots,i_{n}\sharp L_{n}]\ext@arrow 0359\Rightarrowfill@{}{[]}[i_{\pi(1)}\sharp L_{\pi(1)},\dots,i_{\pi(n)}\sharp L_{\pi(n)}]\end{array}}\\ \\ \mbox{(Closure)}\ {\begin{array}[]{c}U\ext@arrow 0359\Rightarrowfill@{}{T}U^{\prime}\ \ U^{\prime}\ext@arrow 0359\Rightarrowfill@{}{T^{\prime}}U^{\prime\prime}\\ \hline\cr U\ext@arrow 0359\Rightarrowfill@{}{T\ \texttt{++}\ T^{\prime}}U^{\prime\prime}\end{array}}\end{array}

Rule (Sync) checks for matching communication partners. In each trace, we must find complementary pre events and the post events must match as well. Recall that in the instrumentation the sender transmits its thread id to the receiver. Rule (Schedule) shuffles the local traces as rule (Sync) only considers the two leading local traces. Via rule (Closure) we perform repeated replay steps.

We can state that the actual run-time trace can be obtained via the replay relation $U\ext@arrow 0359\Rightarrowfill@{}{T}V$ but further run-time traces are possible. This is due to alternative schedules.

Proposition 1 (Replay Yields Run-Time Traces)

Let $p$ be a program and $q$ its instrumentation where for a specific program run we observe the actual behavior $T$ and the list $[1\sharp L_{1},\dots,n\sharp L_{n}]$ of local traces. Let ${\cal T}=\{T^{\prime}\mid[1\sharp L_{1},\dots,n\sharp L_{n}]\ext@arrow 0359\Rightarrowfill@{}{T^{\prime}}1\sharp[]:U\ \mbox{for some residual $U$}\}$ . Then, we find that $T\in{\cal T}$ and for each $T^{\prime}\in{\cal T}$ we have that $((),p)\ext@arrow 0359\Rightarrowfill@{}{T^{\prime}}(S,1\sharp[]:P)$ for some $S$ and $P$ .

Definition 12 (Alternative Schedules)

We say $[1\sharp L_{1},\dots,n\sharp L_{n}]$ contains alternative schedules iff the cardinality of the set $\{T^{\prime}\mid[1\sharp L_{1},\dots,n\sharp L_{n}]\ext@arrow 0359\Rightarrowfill@{}{T^{\prime}}1\sharp[]:U\ \mbox{for some residual $U$}\}$ is greater than one.

We can also check if even further run-time traces might have been possible by testing for alternative communications.

Definition 13 (Alternative Communications)

We say $[1\sharp L_{1},\dots,n\sharp L_{n}]$ contains alternative matches iff for some $i,j,x,L,L^{\prime}$ we have that (1) $L_{i}=\mathit{pre}([\dots,x!,\dots]):L$ , (2) $L_{j}=\mathit{pre}([\dots,x?,\dots]):L^{\prime}$ , and (3) if $L=\mathit{post}(x!):L^{\prime\prime}$ for some $L^{\prime\prime}$ then $L^{\prime}\not=\mathit{post}(j\sharp x?):L^{\prime\prime\prime}$ for any $L^{\prime\prime\prime}$ .

We say $U=[1\sharp L_{1},\dots,n\sharp L_{n}]$ contains alternative communications iff $U$ contains alternative matches or there exists $T$ and $V$ such that $U\ext@arrow 0359\Rightarrowfill@{}{T}V$ and $V$ contains alternative matches.

The alternative match condition states that a sender could synchronize with a receiver (see (1) and (2)) but this synchronization did not take place (see (3)). For an alternative match to result in an alternative communication, the match must be along a possible run-time trace.

4.1 Dependency Graph for Efficient Trace Analysis

Refer to caption — Figure 1: Dependency Graph among Events

Instead of replaying traces to check for alternative schedules and communications, we build a dependency graph where the graph captures the partial order among events. It is much more efficient to carry out the analysis on the graph than replaying traces. Figure 1 shows a simple example.

We find a program that makes use of two channels and four threads. For reference, send/receive events are annotated (as subscript) with unique numbers. We omit the details of instrumentation and assume that for a specific program run we find the list of given traces on the left. Pre events consist of singleton lists as there is no select. Hence, we write $\mathit{pre}((y?)_{6})$ as a shorthand for $\mathit{pre}([(y?)_{6}])$ . Replay of the trace shows that the following locations synchronize with each other: $(4,6)$ , $(3,1)$ and $(5,2)$ . This information as well as the order among events can be captured by a dependency graph. Nodes are obtained by a linear scan through the list of traces. To derive edges, we require another scan for each element in a trace as we need to find pre/post pairs belonging to matching synchronizations. This results overall in $O(m*m)$ for the construction of the graph where $m$ is the number of elements found in each trace. To avoid special treatment of dangling pre events (with not subsequent post event), we assume that some dummy post events are added to the trace.

Definition 14 (Construction of Dependency Graph)

Each node corresponds to a send or a receive operation in the program text. Edges are constructed by observing events recorded in the list of traces. We draw a (directed) edge among nodes if either

•

the pre and post events of one node precede the pre and post events of another node in the trace, or
•

the pre and post events belonging to both nodes can be synchronized. See rule (Sync) in Definition 11. We assume that the edge starts from the node with the send operation.

Applied to our example, this results in the graph on the right. See Figure 1. For example, $x!|3$ denotes a send communication over channel $x$ at program location $3$ . As send precedes receive we find an edge from $x!|3$ to $x?|1$ . In general, there may be several initial nodes. By construction, each node has at most one outgoing edge but may have multiple incoming edges.

The trace analysis can be carried out directly on the dependency graph. To check if one event happens-before another event we seek for a path from one event to the other. This can be done via a depth-first search and takes time $O(v+e)$ where $v$ is the number of nodes and $e$ the number of edges. Two events are concurrent if neither happens-before the other. To check for alternative communications, we check for matching nodes that are concurrent to each other. By matching we mean that one of the nodes is a send and the other is a receive over the same channel. For our example, we find that $x!|5$ and $x?|1$ represents an alternative communication as both nodes are matching and concurrent to each other.

To derive (all) alternative schedules, we perform a backward traversal of the graph. Backward in the sense that we traverse the graph by moving from children to parent node. We start with some final node (no outgoing edge). Each node visited is marked. We proceed to the parent if all children are marked. Thus, we guarantee that the happens-before relation is respected. For our example, suppose we visit first $y?{6}$ . We cannot visit its parent $y!{4}$ until we have visited $x?{2}$ and $x!{5}$ . Via a (backward) breadth-first search we can ‘accumulate’ all schedules.

5 Comparison to Vector Clock Method

Via a simple adaptation of the Replay Definition 11 we can attach vector clocks to each send and receive event. Hence, our tracing method strictly subsumes the vector clock method as we are also able to trace events that could not commit.

Definition 15 (Vector Clock)

\begin{array}[]{lcll}cs&::=&[]\mid n:cs\end{array}

For convenience, we represent a vector clock as a list of clocks where the first position belongs to thread 1 etc. We write $cs[i]$ to retrieve the $i$ -th component in $cs$ . We write ${\sf inc}(i,cs)$ to denote the vector clock obtained from $cs$ where all elements are the same but at index $i$ the element is incremented by one. We write ${\sf max}(cs_{1},cs_{2})$ to denote the vector clock where we per-index take the greater element. We write $i^{cs}$ to denote thread $i$ with vector clock $cs$ . We write $i\sharp x!^{cs}$ to denote a send over channel $x$ in thread $i$ with vector clock $cs$ . We write $i\leftarrow j\sharp x?^{cs}$ to denote a receive over channel $x$ in thread $i$ from thread $j$ with vector clock $cs$ .

Definition 16 (From Trace Replay to Vector Clocks)

\begin{array}[]{c}\mbox{(Sync)}\ {\begin{array}[]{c}L_{1}=\mathit{pre}([\dots,x!,\dots]):\mathit{post}(x!):L_{1}^{\prime}\\ L_{2}=\mathit{pre}([\dots,x?,\dots]):\mathit{post}(i_{1}\sharp x?):L_{2}^{\prime}\\ cs={\sf max}({\sf inc}(i_{1},cs_{1}),{\sf inc}(i_{2},cs_{2}))\\ \hline\cr i_{1}^{cs_{1}}\sharp L_{1}:i_{2}^{cs_{2}}\sharp L_{2}:U\ext@arrow 0359\Rightarrowfill@{}{[i_{1}\sharp x!^{cs},i_{2}\leftarrow i_{1}\sharp x?^{cs}]}i_{1}^{cs}\sharp L_{1}^{\prime}:i_{2}^{cs}\sharp L_{2}^{\prime}:U\end{array}}\end{array}

Like the construction of the dependency graph, the (re)construction of vector clocks takes time $O(m*m)$ where $m$ is the number of elements found in each trace.

To check for an alternative communication, the vector clock method seeks for matching events. This incurs the same (quadratic in the size of the trace) cost as for our method. However, the check that these two events are concurrent to each other can be performed more efficiently via vector clocks. Comparison of vector clocks takes time $O(n)$ where $n$ is the number of threads. Recall that our graph-based method requires time $O(v+e)$ where $v$ is the number of nodes and $e$ the number of edges. The number $n$ is smaller than $v+e$ .

However, our dependency graph representation is more efficient in case of exploring alternative schedules. In case of the vector clock method, we need to continuously compare vector clocks whereas we only require a (backward) traversal of the graph. We believe that the dependency graph has further advantages in case of user interaction and visualization as it is more intuitive to navigate through the graph. This is something we intend to investigate in future work.

6 Implementation

We have fully integrated the approach laid out in the earlier sections into the Go programming language and have built a prototype tool. We give an overview of our implementation which can be found here [5]. A detailed treatment of all of Go’s message-passing features can be found in the extended version of this paper.

6.1 Library-Based Instrumentation and Tracing

We use a pre-processor to carry out the instrumentation as described in Section 3. In our implementation, each thread maintains an entry in a lock-free hashmap where each entry represents a thread (trace). The hashmap is written to file either at the end of the program or when a deadlock occurs. We currently do not deal with the case that the program crashes as we focus on the detection of potential bugs in programs that do not show any abnormal behavior.

6.2 Measurement of Run-Time Overhead Library-Based Tracing

Figure 2: Performance overhead using Pre/Post vs Vector clocks(VC) in ms.

We measure the run-time overhead of our method against the vector clock method. Both methods are implemented as libraries assuming no access to the Go run-time system. For experimentation we use three programs where each program exercises some of the factors that have an impact on tracing. For example, dynamic versus static number of threads and channels. Low versus high amount of communication among threads.

The Add-Pipe (AP) example uses $n$ threads where the first $n-1$ threads receive on an input channel, add one to the received value and then send the new value on their output channel to the next thread. The first thread sends the initial value and receives the result from the last thread.

In the Primesieve (PS) example, the communication among threads is similar to the Add-Pipe example. The difference is that threads and channels are dynamically generated to calculate the first $n$ prime numbers. For each found prime number a ‘filter’ thread is created. Each thread has an input channel to receive new possible prime numbers $v$ and an output channel to report each number for which $v\mod{\mathit{p}rime}\neq 0$ where ${\mathit{p}rime}$ is the prime number associated with this filter thread. The filter threads are run in a chain where the first thread stores the prime number 2.

The Collector (C) example creates $n$ threads that produce a number which is then sent to the main thread for collection. This example has much fewer communications compared to the other examples but uses a high number of threads.

Figure 2 summarizes our results. Results are carried out on some commodity hardware (Intel i7-6600U with 12 GB RAM, a SSD and Go 1.8.3 running on Windows 10 was used for the tests). Our results show that a library-based implementation of the vector clock method does not scale well for examples with a dynamic number of threads and/or a high amount communication among threads. See examples Primesieve and Add-Pipe. None of the vector clock optimizations [3] apply here because of the dynamic number of threads and channels. Our method performs much better. This is no surprise as we require less (tracing) data and no extra communication links. We believe that the overhead can still be further reduced as access to the thread id in Go is currently rather cumbersome and expensive.

7 Conclusion

One of the challenges of run-time verification in the concurrent setting is to establish a partial order among recorded events. Thus, we can identify potential bugs due to bad schedules that are possible but did not take place in some specific program run. Vector clocks are the predominant method to achieve this task. For example, see work by Vo [11] in the MPI setting and work by Tasharofi [10] in the actor setting. There are several works that employ vector clocks in the shared memory setting For example, see Pozniansky’s and Schuster’s work [9] on data race detection. Some follow-up work by Flanagan and Freund [2] employs some optimizations to reduce the tracing overhead by recording only a single clock instead of the entire vector. We leave to future work to investigate whether such optimizations are applicable in the message-passing setting and how they compare to existing optimizations such as [3].

We have introduced a novel tracing method that has much less overhead compared to the vector clock method. Our method can deal with all of Go’s message-passing language features and can be implemented efficiently as a library. We have built a prototype that can automatically identify alternative schedules and communications. In future work we plan to conduct some case studies and integrate heuristics for specific scenarios, e.g. reporting a send operation on a closed channel etc.

Acknowledgments

We thank some HVC’17 reviewers for their constructive feedback on an earlier version of this paper.

References

[1] C. J. Fidge. Timestamps in message-passing systems that preserve the partial ordering. 10(1):56–66, 1987.
[2] C. Flanagan and S. N. Freund. Fasttrack: Efficient and precise dynamic race detection. In Proc. of PLDI ’09, pages 121–133. ACM, 2009.
[3] V. K. Garg, C. Skawratananond, and N. Mittal. Timestamping messages and events in a distributed system using synchronous communication. Distributed Computing, 19(5-6):387–402, 2007.
[4] The Go programming language. https://golang.org/.
[5] Trace-based run-time analysis of message-passing Go programs. https://github.com/KaiSta/gopherlyzer-GoScout.
[6] C. A. R. Hoare. Communicating sequential processes. Commun. ACM, 21(8):666–677, Aug. 1978.
[7] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, 1978.
[8] F. Mattern. Virtual time and global states of distributed systems. In Parallel and Distributed Algorithms, pages 215–226. North-Holland, 1989.
[9] E. Pozniansky and A. Schuster. Multirace: efficient on-the-fly data race detection in multithreaded C++ programs. Concurrency and Computation: Practice and Experience, 19(3):327–340, 2007.
[10] S. Tasharofi. Efficient testing of actor programs with non-deterministic behaviors. PhD thesis, University of Illinois at Urbana-Champaign, 2013.
[11] A. Vo. Scalable Formal Dynamic Verification of Mpi Programs Through Distributed Causality Tracking. PhD thesis, University of Utah, 2011. AAI3454168.

Appendix 0.A Further Go Message-Passing Features

0.A.1 Overview

⬇ func A(x chan int) { x <- 1 // A1 } func bufferedChan() { x := make(chan int,1) go A(x) x <- 1 // A2 <-x } func closedChan() { x := make(chan int) go A(x) go B(x) close(x) } ⬇ func B(x chan int) { <-x } func selDefault() { x := make(chan int) go A(x) select { case <-x: // A3 fmt.Println("received from x") default: fmt.Println("default") } }

Figure 3: Further Go Features

Besides selective synchronous message-passing, Go supports some further message passing features that can be easily dealt with by our approach and are fully supported by our implementation. Figure 3 shows such examples where we put the program text in two columns.

Buffered Channels

Go also supports buffered channels where send is asynchronous assuming sufficient buffer space exists. See function buffered in Figure 3. Depending on the program run, our analysis reports that either A1 or A2 are alternative matches for the receive operation.

In terms of the instrumentation and tracing, we treat each asynchronous send as if the send is executed in its own thread. This may lead to some slight inaccuracies. Consider the following variant.

⬇

func buffered2() {

x := make(chan int,1)

x <- 1 // B1

go A(x) // B2

<-x // B3

}

Our analysis reports that B2 and B3 form an alternative match. However, in the Go semantics, buffered messages are queued. Hence, for every program run the only possibility is that B1 synchronizes with B3. B3 never takes place! As our main objective is bug finding, we argue that this loss of accuracy is justifiable. How to eliminate such false positives is subject of future work.

Select with default/timeout

Another feature in Go is to include a default/timeout case to select. See selDefault in Figure 3. The purpose is to avoid (indefinite) blocking if none of the other cases are available. For the user it is useful to find out if other alternatives are available in case the default case is selected. The default case applies for most program runs. Our analysis reports that A1 and A3 are an alternative match.

To deal with default/timeout we introduce a new post event $\mathit{post}(\mathit{select})$ . To carry out the analysis in terms of the dependency graph, each subtrace $\dots,\mathit{pre}([\dots,\mathit{select},\dots]),\mathit{post}(\mathit{select}),\dots$ creates a new node. Construction of edges remains unchanged.

Closing of Channels

Another feature in Go is the ability to close a channel. See closedChan in Figure 3. Once a channel is closed, each send on a closed channel leads to failure (the program crashes). On the other hand, each receive on a closed channel is always successful, as we receive a dummy value. A run of is successful if the close operation of the main thread happens after the send in thread A. As the close and send operations happen concurrently, our analysis reports that the send A1 may take place after close.

For instrumentation/tracing, we introduce event $\mathit{close}(x)$ . It is easy to identify a receive on a closed channel, as we receive a dummy thread id. So, for each subtrace $[\dots,\mathit{pre}([\dots,x?,\dots]),\mathit{post}(i\sharp x?),\dots]$ where $i$ is a dummy value we draw an edge from $\mathit{close}(x)$ to $x?$ .

Here are the details of how to include buffered channels, select and closing of channels.

0.A.2 Buffered Channels

Consider the following Go program.

⬇

x := make(chan, 2)

x <- 1 // E1

x <- 1 // E2

<- x // E3

<- x // E4

We create a buffer of size 2. The two send operations will then be carried out asynchronously and the subsequent receive operations will pick up the buffered values. We need to take special care of buffered send operations. If we would treat them like synchronous send operations, their respective pre and post events would be recorded in the same trace as the pre and post events of the receive operations. This would have the consequence that our trace analysis does not find out that events E1 and E2 happen before E3 and E4.

Our solution to this issue is to treat each send operation on a buffered channel as if the send operation is carried out in its own thread. Thus, our trace analysis is able to detect that E1 and E2 take place before E3 and E4. This is achieved by marking each send on a buffered channel in the instrumentation. After tracing, pre and post events will then be moved to their own trace. From the viewpoint of our trace analysis, a buffered channel then appears as having infinite buffer space. Of course, when running the program a send operation may still block if all buffer space is occupied.

Here are the details of the necessary adjustments to our method. During instrumentation/tracing, we simply record if a buffered send operation took place. The only affected case in the instrumentation of commands (Definition 8) is $x\leftarrow b\Rightarrow p$ . We assume a predicate $\sf{isBuffered}(\cdot)$ to check if a channel is buffered or not. In terms of the actual implementation this is straightforward to implement. We write $\mathit{postB}(x,n)$ to indicate a buffered send operation via $x$ where $n$ is a fresh thread id. We create fresh thread id numbers via tidB.

Definition 17 (Instrumentation of Buffered Channels)

Let $x$ be a buffered channel.

\begin{array}[]{lcl}\mathit{instr}(x\leftarrow b\Rightarrow p)&&\\ \ \ \mid\sf{isBuffered}(x)&=&x\leftarrow[n,b]\Rightarrow(x_{tid}:=x_{tid}\ \texttt{++}\ [\mathit{postB}(x!{n},]))\ \texttt{++}\ \mathit{instr}(p)\\ &&\mbox{where}\ \ n=\mbox{\sf{tidB}}\\ \ \ \mid\mbox{otherwise}&=&x\leftarrow[\mbox{\sf{tid}},b]\Rightarrow(x_{tid}:=x_{tid}\ \texttt{++}\ [\mathit{post}(x!)])\ \texttt{++}\ \mathit{instr}(p)\end{array}

The treatment of buffered channels has no overhead on the instrumentation and tracing. However, we require a post-processing phase where marked events will be then moved to their own trace. This can be achieved via a linear scan through each trace. Hence, requires time complexity $O(k)$ where $k$ is the overall size of all (initially recorded) traces. For the sake of completeness, we give below a declarative description of post-processing in terms of relation $U\Rightarrow V$ .

Definition 18 (Post-Processing for Buffered Channels $U\Rightarrow V$ )

\begin{array}[]{c}\mbox{(MovePostB)}\ {\begin{array}[]{c}L=\mathit{pre}(as):\mathit{postB}(x!,n):L^{\prime}\\ \hline\cr i\sharp L:U\Rightarrow i\sharp L^{\prime}:n\sharp[\mathit{pre}(as),\mathit{postB}(x!,n)]:U\end{array}}\\ \\ \mbox{(Shift)}\ {\begin{array}[]{c}L=\mathit{pre}(as):\mathit{post}(a):L^{\prime}\\ (a=x!\ \vee\ a=j\sharp x?)\\ i\sharp L^{\prime}:U\Rightarrow i\sharp L^{\prime\prime}:U^{\prime}\\ \hline\cr i\sharp L:U\Rightarrow i\sharp\mathit{pre}(as):\mathit{post}(a):L^{\prime\prime}:U^{\prime}\end{array}}\\ \\ \mbox{(Schedule)}\ {\begin{array}[]{c}\mbox{$\pi$ permutation on $\{1,\dots,n\}$}\\ \hline\cr[i_{1}\sharp L_{1},\dots,i_{n}\sharp L_{n}]\Rightarrow[i_{\pi(1)}\sharp L_{\pi(1)},\dots,i_{\pi(n)}\sharp L_{\pi(n)}]\end{array}}\\ \\ \mbox{(Closure)}\ {\begin{array}[]{c}U\Rightarrow U^{\prime}\ \ U^{\prime}\Rightarrow U^{\prime\prime}\\ \hline\cr U\Rightarrow U^{\prime\prime}\end{array}}\end{array}

Subsequent analysis steps will be carried out on the list of traces obtained via post-processing.

There is some space for improvement. Consider the following program text.

⬇

func A(x chan int) {

x <- 1 // A1

}

func buffered2() {

x := make(chan int,1)

x <- 1 // B1

go A(x) // B2

<-x // B3

}

Our analysis (for some program run) reports that B2 and B3 is an alternative match. However, in the Go semantics, buffered messages are queued. Hence, for every program run the only possibility is that B1 synchronizes with B3. B3 never takes place. As our main objective is bug finding, we can live with this inaccuracy. We will investigate in future work how to eliminate this false positive.

Appendix 0.B Select with default/timeout

In terms of the instrumentation/tracing, we introduce a new special post event $\mathit{post}(\mathit{select})$ . For the trace analysis (Definition 11), we require a new rule.

\begin{array}[]{c}\mbox{(Default/Timeout)}\ i\sharp\mathit{pre}([\dots]):\mathit{post}(\mathit{select}):L:U\ext@arrow 0359\Rightarrowfill@{}{[i\sharp\mathit{select}]}i\sharp L:U\end{array}

This guarantees that in case default or timeout is chosen, select acts as if asynchronous.

The dependency graph construction easily takes care of this new feature. For each default/timeout case we introduce a node. Construction of edges remains unchanged.

Appendix 0.C Closing of Channels

For instrumentation/tracing of the $\mathit{close}(x)$ operation on channel $x$ , we introduce a special pre and post event. Our trace analysis keeps track of closed channels. As a receive on a closed channel yields some dummy values, it is easy to distinguish this case from the regular (Sync). Here are the necessary adjustments to our replay relation from Definition 11.

\begin{array}[]{lcl}C&::=&[]\mid i\sharp\mathit{close}(x):C\end{array}

\begin{array}[]{c}\mbox{(Close)}\ (i\sharp\mathit{pre}(\mathit{close}(x)):\mathit{post}(\mathit{close}(x)):L:U\mid C)\ext@arrow 0359\Rightarrowfill@{}{[]}(i\sharp L:U\mid i\sharp\mathit{close}(x):C)\\ \\ \mbox{(RcvClosed)}\ {\begin{array}[]{c}Q=j\sharp\mathit{close}(x):Q^{\prime}\\ \hline\cr(i\sharp\mathit{pre}([\dots,x?,\dots]):\mathit{post}(j^{\prime}\sharp x?):L:U\mid Q)\ext@arrow 0359\Rightarrowfill@{}{[j\sharp\mathit{close}(x),i\leftarrow j\sharp x?]}(i\sharp L:U\mid Q)\end{array}}\end{array}

For the construction of the dependency graph, we create a node for each close statement. For each receive on a closed channel $x$ at program location $l$ , we draw an edge from $\mathit{close}(x)$ to $x?|l$ .

Appendix 0.D Codes used for the Experimental results

Add-Pipe

⬇

func add1(in chan int) chan int {

out := make(chan int)

go func() {

for {

n := <-in

out <- n + 1

}

}()

return out

}

func main() {

in := make(chan int)

c1 := add1(in)

for i := 0; i < 19; i++ {

c1 = add1(c1)

}

for n := 1; n < 1000; n++ {

in <- n

<-c1

}

Primesieve

⬇

func generate(ch chan int) {

for i := 2; ; i++ {

ch <- i

}

func filter(in chan int, out chan int, prime int) {

for {

tmp := <-in

if tmp%prime != 0 {

out <- tmp

}

func main() {

ch := make(chan int)

go generate(ch)

for i := 0; i < 100; i++ {

prime := <-ch

ch1 := make(chan int)

go filter(ch, ch1, prime)

ch = ch1

}

Collector

⬇

func collect(x chan int, v int) {

x <- v

}

func main() {

x := make(chan int)

for i := 0; i < 1000; i++ {

go collect(x, i)

}

for i := 0; i < 1000; i++ {

<-x

}

Trace-Based Run-Time Analysis of Message-Passing Go Programs

Abstract

1 Introduction

Motivating Example

Trace-Based Run-Time Verification

Event Order via Vector Clock Method

Our Method

Contributions

2 Message-Passing Go

Syntax

Definition 1 (Program Syntax)

Trace-Based Semantics

Definition 2 (State)

Definition 3 (Expression Semantics (i,S)⊢b⇓v(i,S)\,\vdash\,b\Downarrow v)

Definition 4 (Program Execution (S,P)​\ext@arrow​0359​\Rightarrowfill@​T​(S′,Q)(S,P)\ext@arrow 0359\Rightarrowfill@{}{T}(S^{\prime},Q))

Definition 5 (Single Step)

Definition 6 (Multi-Threading and Synchronous Message-Passing)

Definition 7 (Scheduling)

3 Instrumentation and Run-Time Tracing

Definition 8 (Instrumentation of Programs)

Definition 9 (Run-Time Tracing)

4 Trace Analysis

Definition 10 (Local Traces)

Definition 11 (Replay U​\ext@arrow​0359​\Rightarrowfill@​T​VU\ext@arrow 0359\Rightarrowfill@{}{T}V)

Proposition 1 (Replay Yields Run-Time Traces)

Definition 12 (Alternative Schedules)

Definition 13 (Alternative Communications)

4.1 Dependency Graph for Efficient Trace Analysis

Definition 14 (Construction of Dependency Graph)

5 Comparison to Vector Clock Method

Definition 15 (Vector Clock)

Definition 16 (From Trace Replay to Vector Clocks)

6 Implementation

6.1 Library-Based Instrumentation and Tracing

6.2 Measurement of Run-Time Overhead Library-Based Tracing

7 Conclusion

Acknowledgments

References

Appendix 0.A Further Go Message-Passing Features

0.A.1 Overview

Buffered Channels

Select with default/timeout

Closing of Channels

0.A.2 Buffered Channels

Definition 17 (Instrumentation of Buffered Channels)

Definition 18 (Post-Processing for Buffered Channels U⇒VU\Rightarrow V)

Appendix 0.B Select with default/timeout

Appendix 0.C Closing of Channels

Appendix 0.D Codes used for the Experimental results

Add-Pipe

Primesieve

Collector

Definition 3 (Expression Semantics $(i,S)\,\vdash\,b\Downarrow v$ )

Definition 4 (Program Execution $(S,P)\ext@arrow 0359\Rightarrowfill@{}{T}(S^{\prime},Q)$ )

Definition 11 (Replay $U\ext@arrow 0359\Rightarrowfill@{}{T}V$ )

Definition 18 (Post-Processing for Buffered Channels $U\Rightarrow V$ )