¹¹institutetext: University of Waterloo, Canada ²²institutetext: MPI-SWS, Germany ³³institutetext: SRI International, USA ⁴⁴institutetext: ConsenSys, Germany

Compositional Verification of Smart Contracts Through Communication Abstraction (Extended)

Scott Wesley 11 Maria Christakis 22 Jorge A. Navas 33 Richard Trefler 11
Valentin Wüstholz 44 Arie Gurfinkel 11

Abstract

Solidity smart contracts are programs that manage up to $2^{160}$ users on a blockchain. Verifying a smart contract relative to all users is intractable due to state explosion. Existing solutions either restrict the number of users to under-approximate behaviour, or rely on manual proofs. In this paper, we present local bundles that reduce contracts with arbitrarily many users to sequential programs with a few representative users. Each representative user abstracts concrete users that are locally symmetric to each other relative to the contract and the property. Our abstraction is semi-automated. The representatives depend on communication patterns, and are computed via static analysis. A summary for the behaviour of each representative is provided manually, but a default summary is often sufficient. Once obtained, a local bundle is amenable to sequential static analysis. We show that local bundles are relatively complete for parameterized safety verification, under moderate assumptions. We implement local bundle abstraction in SmartACE, and show order-of-magnitude speedups compared to a state-of-the-art verifier.

1 Introduction

Solidity smart contracts are distributed programs that facilitate information flow between users. Users alternate and execute predefined transactions, that each terminate within a predetermined number of steps. Each user (and contract) is assigned a unique, $160$ -bit address, that is used by the smart contract to map the user to that user’s data. In theory, smart contracts are finite-state systems with $2^{160}$ users. However, in practice, the state space of a smart contract is huge—with at least $2^{2^{160}}$ states to accommodate all users and their data (conservatively counting one bit per user). In this paper, we consider the challenge of automatically verifying Solidity smart contracts that rely on user data.

A naive solution for smart contract verification is to verify the finite-state system directly. However, verifying systems with at least $2^{2^{160}}$ states is intractable. The naive solution fails because the state space is exponential in the number of users. Instead, we infer correctness from a small number of representative users to ameliorate state explosion. To restrict a contract to fewer users, we first generalize to a family of finite-state systems parameterized by the number of users. In this way, smart contract verification is reduced to parameterized verification.

⬇

1contract Auction {

2 mapping(address => uint) bids; ÂÂ

3 address manager; uint leadingBid; bool stopped; ÂÂ

4 Â°uint _sum;Â°

6 constructor(address mgr) public { manager = mgr; }ÂÂ

8 function bid(uint amount) public { ÂÂ

9 require(msg.sender != manager); ÂÂ

10 require(amount > leadingBid); ÂÂ

11 require(!stopped);

12 Â°_sum = _sum + amount - bids[msg.sender];Â°

13 bids[msg.sender] = amount;

14 leadingBid = amount;

15 }

17 function withdraw() public { ÂÂ

18 require(msg.sender != manager); ÂÂ

19 require(bids[msg.sender] != leadingBid); ÂÂ

20 require(!stopped);

21 Â°_sum = _sum + 0 - bids[msg.sender];Â°

22 bids[msg.sender] = 0; ÂÂ

23 }

25 function stop() public { ÂÂ

26 require(msg.sender == manager); ÂÂ

27 stopped = true;

28 }

29}

Figure 1: A smart contract that implements a simple auction.

⬇

1Auction _a = new Auction(address(2));

2_a.address = address(1);

4while (true) {

5 // Applies an interference invariant.

6 Â°uint bid = *;Â° ÂÂ

7 Â°uint maxBid = _a.leadingBid;Â°

8 Â°require(bid <= maxBid);Â°

9 Â°require(bid == maxBid || bid + maxBid <= _a.sum);Â°

10 Â°_a.bids[address(3)] = bid;Â° ÂÂ

11 // Selects a sender.

12 msg.sender = *;

13 require(msg.sender > address(1)); ÂÂ

14 require(msg.sender < address(5));

15 Â°require(msg.sender < address(4));Â° ÂÂ

16 // Selects a call.

17 if (*) _a.bid(*);

18 else if (*) _a.withdraw();

19 else if (*) _a.stop();

20}

Figure 2: A harness to verify Prop. 1 (ignore the highlighted lines) and Prop. 2.

For example, consider {Auction} in \cref{Fig:Auction} (for now, ignore the highlighted lines). In \mbox{\code{Auction},} each user starts with a bid of $0$. Users alternate, and submit increasingly larger bids, until a designated manager stops the auction. While the auction is not stopped, a non-leading user may withdraw their bid\footnote{For simplicity of presentation, we do not use Ether, Ethereum’s␣native␣currency.}.␣\code{Auction}␣satisfies␣\prop{1}:␣‘‘\emph{Once␣\mbox{\code{stop()}}␣is␣called,␣all␣bids␣are␣immutable}.’’␣\prop{1}␣␣␣␣␣␣␣␣is␣satisfied␣since␣\mbox{\code{stop()}}␣sets␣\code{stopped}␣to␣true,␣no␣function␣␣␣␣␣␣␣␣sets␣\code{stopped}␣to␣false,␣and␣while␣\code{stopped}␣is␣true␣neither␣␣\mbox{\code{bid()}}␣nor␣\mbox{\code{withdraw()}}␣is␣enabled.␣Formally,␣\prop{1}␣is␣initially␣true,␣and␣remains␣true␣due␣to␣\prop{1b}:␣‘‘\emph{Once␣␣␣␣␣␣\mbox{\code{stop()}}␣is␣called,␣\code{stopped}␣remains␣true}.’’␣\prop{1}␣is␣said␣␣␣␣␣␣␣␣to␣be␣inductive␣relative␣to␣its␣\emph{inductive␣strengthening}␣\prop{1b}.␣A␣␣␣␣␣\emph{Software␣Model␣Checker␣(SMC)}␣can␣establish␣\prop{1}␣by␣an␣exhaustive␣␣␣␣␣search␣for␣its␣inductive␣strengthening.␣However,␣this␣requires␣a␣bound␣on␣the␣␣␣number␣of␣addresses,␣since␣a␣search␣with␣all␣$2^{160}$␣addresses␣is␣intractable.␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣A␣bound␣of␣at␣least␣four␣addresses␣is␣necessary␣to␣represent␣the␣zero-account␣␣␣(i.e.,␣a␣null␣user␣that␣cannot␣send␣transactions),␣the␣smart␣contract␣account,␣␣the␣manager,␣and␣an␣arbitrary␣sender.␣However,␣once␣the␣arbitrary␣sender␣submits␣␣␣␣␣␣␣␣a␣bid,␣the␣sender␣is␣now␣the␣leading␣bidder,␣and␣cannot␣withdraw␣its␣bid.␣To␣␣␣␣enable␣\mbox{\code{withdraw()},}␣a␣fifth␣user␣is␣required.␣It␣follows␣by␣applying␣␣␣␣␣␣␣the␣results␣of~\cite{KaiserKroening2010},␣that␣a␣bound␣of␣five␣addresses␣is␣also␣␣␣␣␣␣␣␣sufficient,␣since␣users␣do␣not␣read␣each␣other’s bids, and adding a sixth user does not enable additional changes to \mbox{\code{leadingBid}~\cite{KaiserKroening2010}}. The bounded system, known as a harness, in \cref{Fig:Harness} assigns the zero-account to address 0, the smart contract account to address 1, the manager to address 2, the arbitrary senders to addresses 3 and 4, and then executes an unbounded sequence of arbitrary function calls. Establishing \prop{1} on the harness requires finding its inductive strengthening. A strengthening such as \prop{1b} (or, in general, a counterexample violating \prop{1}) can be found by an SMC, directly on the harness code. The above bound for \prop{1} also works for checking all control-reachability properties of \code{Auction}. This, for example, follows by applying the results of~\cite{KaiserKroening2010}. That is, \code{Auction} has a \emph{Small Model Property (SMP)}~(e.g.,~\cite{KaiserKroening2010,AbdullaHH13}) for such properties. However, not all contracts enjoy an SMP. Consider \prop{2}: ‘‘\emph{The sum of all active bids is at least \code{leadingBid}}.’’ \code{Auction} satisfies \prop{2} since the leading bid is never withdrawn. To prove \code{Auction} satisfies \prop{2}, we instrument the code to track the current sum, through the highlighted lines in \cref{Fig:Auction}. With the addition of \code{\_sum}, \code{Auction} no longer enjoys an SMP. Intuitively, each user enables new combinations of \code{\_sum} and \code{leadingBid}. As a proof, assume that there are $N$ users (other than the zero-account, the smart contract account, and the manager) and let $S_N = 1 + 2 + \cdots + N$. In every execution with $N$ users, if \code{leadingBid} is $N + 1$, then \code{\_sum} is less than $S_{N + 1}$, since active bids are unique and $S_{N + 1}$ is the sum of $N + 1$ bids from $1$ to $N + 1$. However, in an execution with $N + 1$ users, if the $i$-th user has a bid of $i$, then \code{leadingBid} is $N + 1$ and \code{\_sum} is $S_{N + 1}$. Therefore, increasing $N$ extends the reachable combinations of \code{\_sum} and \code{leadingBid}. For example, if $N = 2$, then $S_3 = 1 + 2 + 3 = 6$. If the leading bid is $3$, then the second highest bid is at most $2$, and, therefore, $\mathsol{\_sum} \le 5 < S_3$. However, when $N = 3$, if the three active bids are $\{ 1, 2, 3 \}$, then \code{\_sum} is $S_3$. Therefore, instrumenting \code{Auction} with \code{\_sum} violates the SMP of the original \code{Auction}. Despite the absence of such an SMP, each function of \code{Auction} interacts with at most one user per transaction. Each user is classified as either the zero-account, the smart contract, the manager, or an arbitrary sender. In fact, all arbitrary senders are indistinguishable with respect to \prop{2}. For example, if there are exactly three active bids, $\{ 2, 4, 8 \}$, it does not matter which user placed which bid. The leading bid is $8$ and the sum of all bids is $14$. On the other hand, if the leading bid is $8$, then each participant of \code{Auction} must have a bid in the range of $0$~to~$8$. To take advantage of these classes, rather than analyze \code{Auction} relative to all $2^{160}$ users, it is sufficient to analyze \code{Auction} relative to a representative user from each class. In our running example, there must be representatives for the zero-account, the smart contract account, the manager, and an (arbitrary) sender. The key idea is that each representative user can correspond to one or \emph{many} concrete users. Intuitively, each representative user summarizes the concrete users in its class. If a representative’s␣class␣contains␣a␣single␣concrete␣user,␣then␣there␣␣is␣no␣difference␣between␣the␣concrete␣user␣and␣the␣representative␣user.␣For␣␣␣␣␣example,␣the␣zero-account,␣the␣smart␣contract␣account,␣and␣the␣manager␣each␣␣␣␣␣correspond␣to␣single␣concrete␣users.␣The␣addresses␣of␣these␣users,␣and␣in␣turn,␣their␣bids,␣are␣known␣with␣absolute␣certainty.␣On␣the␣other␣hand,␣there␣are␣many␣␣␣␣␣␣␣␣arbitrary␣senders.␣Since␣senders␣are␣indistinguishable␣from␣each␣other,␣the␣␣␣␣␣precise␣address␣of␣the␣representative␣sender␣is␣unimportant.␣What␣matters␣is␣␣␣␣that␣the␣representative␣sender␣does␣not␣share␣an␣address␣with␣the␣zero-account,␣the␣smart␣contract␣account,␣nor␣the␣manager.␣However,␣this␣means␣that␣at␣the␣␣␣␣start␣of␣each␣transaction␣the␣location␣of␣the␣representative␣sender␣is␣not␣␣␣␣␣␣absolute,␣and,␣therefore,␣the␣sender␣has␣a␣range␣of␣possible␣bids.␣To␣account␣␣␣for␣this,␣we␣introduce␣a␣predicate␣that␣is␣true␣of␣all␣initial␣bids,␣and␣holds␣␣inductively␣across␣all␣transactions.␣We␣provide␣this␣predicate␣manually,␣and␣use␣␣␣␣␣␣␣␣it␣to␣over-approximate␣all␣possible␣bids.␣An␣obvious␣predicate␣for␣␣␣␣␣␣\code{Auction}␣is␣that␣all␣bids␣are␣at␣most␣\mbox{\code{leadingBid},}␣but␣this␣␣predicate␣is␣not␣strong␣enough␣to␣prove␣\prop{2}.␣For␣example,␣the␣␣␣␣␣␣representative␣sender␣could␣first␣place␣a␣bid␣of␣$10$,␣and␣then␣(spuriously)␣␣␣␣withdraw␣a␣bid␣of␣$5$,␣resulting␣in␣a␣sum␣of␣$5$␣but␣a␣leading␣bid␣of␣$10$.␣A␣␣␣stronger␣predicate,␣that␣is␣adequate␣to␣prove␣\prop{2},␣is␣given␣by␣$\theta_U$:␣‘‘\emph{Each␣bid␣is␣at␣most␣\mbox{\code{leadingBid}.}␣If␣a␣bid␣is␣not␣␣␣\mbox{\code{leadingBid},}␣then␣its␣sum␣with␣\code{leadingBid}␣is␣at␣most␣␣␣␣␣␣␣␣\mbox{\code{\_sum}.}}’’␣␣␣␣␣␣␣␣␣Given␣$\theta_U$,␣\prop{2}␣can␣be␣verified␣by␣an␣SMC.␣This␣requires␣a␣new␣␣␣␣␣␣␣harness,␣with␣representative,␣rather␣than␣concrete,␣users.␣The␣new␣harness,␣␣␣␣␣\cref{Fig:Harness}␣(now␣including␣the␣highlighted␣lines),␣is␣similar␣to␣the␣SMP␣harness␣in␣that␣the␣zero-account,␣the␣smart␣contract␣account,␣and␣the␣manager␣␣␣account␣are␣assigned␣to␣addresses␣0,␣1,␣and␣2,␣respectively,␣followed␣by␣an␣␣␣␣␣unbounded␣sequence␣of␣arbitrary␣calls.␣However,␣there␣is␣now␣a␣single␣sender␣␣␣␣that␣is␣assigned␣to␣address~3␣(line~\ref{line:harness-instr-sender}).␣That␣is,␣␣the␣harness␣uses␣a␣fixed␣configuration␣of␣representatives␣in␣which␣the␣fourth␣␣␣representative␣is␣the␣sender.␣Before␣each␣function␣call,␣the␣sender’s bid is set to a non-deterministic value that satisfies $\theta_U$ (lines~\ref{line:harness-instr-start}--\ref{line:harness-instr-end}). If the new harness and \prop{2} are provided to an SMC, the SMC will find an inductive strengthening such as, ‘‘\emph{The leading bid is at most the sum of all bids}.’’ The harness in \cref{Fig:Harness} differs from existing smart contract verification techniques in two ways. First, each address in \cref{Fig:Harness} is an abstraction of one or more concrete users. Second, \code{msg.sender} is restricted to a finite address space by lines~\ref{line:harness-instr-sender-start}~to~\ref{line:harness-instr-sender}. If these lines are removed, then an inductive invariant must constrain all cells of \code{bids}, to accommodate \code{bids[msg.sender]}. This requires quantified invariants over arrays that is challenging to automate. By introducing lines~\ref{line:harness-instr-sender-start}~to~\ref{line:harness-instr-sender}, a quantifier-free predicate, such as our $\theta_U$, can directly constrain cell \code{bids[msg.sender]} instead. Adding lines \ref{line:harness-instr-sender-start}--\ref{line:harness-instr-sender} makes the contract finite state. Thus, its verification problem is decidable and can be handled by existing SMCs. However, as illustrated by \prop{2}, the restriction on each user must not exclude feasible counterexamples. Finding such a restriction is the focus of this paper. In this paper, we present a new approach to smart contract verification. We construct finite-state abstractions of parameterized smart contracts, known as \emph{local bundles}. A local bundle generalizes the harness in \cref{Fig:Harness}, and is constructed from a set of representatives and their predicates. When a local bundle and a property are provided to an SMC, there are three possible outcomes. First, if a predicate does not over-approximate its representative, a counterexample to the predicate is returned. Second, if the predicates do not entail the property, then a counterexample to verification is returned (this counterexample refutes the proof, rather than the property itself). Finally, if the predicates do entail the property, then an inductive invariant is returned. As opposed to deductive smart contract solutions, our approach finds inductive strengthenings automatically~\cite{HajduJovanovic2019,ZhongCheang2020}. As opposed to other model checking solutions for smart contracts, our approach is not limited to pre- and post-conditions~\cite{KalraGoel2018}, and can scale to $2^{160}$ users~\cite{Kolb2020}. Key theoretical contributions of this paper are to show that verification with local bundle abstraction is an instance of Parameterized Compositional Model Checking (PCMC)~\cite{NamjoshiTrefler2016} and the automation of the side-conditions for its applicability. Specifically, \cref{Thm:SafetyCheck} shows that the local bundle abstraction is a