numbers=left, numberstyle=

¹¹institutetext: Instituto de Tecnología del Conocimiento, Spain²²institutetext: Complutense University of Madrid, Spain ³³institutetext: Universidad Politécnica de Madrid, Spain

Analyzing Smart Contracts: From EVM to a sound Control-Flow Graph

Elvira Albert^1,2 Jesús Correas² Pablo Gordillo² Alejandro Hernández-Cerezo²
Guillermo Román-Díez³ Albert Rubio^1,2

Abstract

The EVM language is a simple stack-based language with words of 256 bits, with one significant difference between the EVM and other virtual machine languages (like Java Bytecode or CLI for .Net programs): the use of the stack for saving the jump addresses instead of having it explicit in the code of the jumping instructions. Static analyzers need the complete control flow graph (CFG) of the EVM program in order to be able to represent all its execution paths. This report addresses the problem of obtaining a precise and complete stack-sensitive CFG by means of a static analysis, cloning the blocks that might be executed using different states of the execution stack. The soundness of the analysis presented is proved.

1 EVM Language

The EVM language is a simple stack-based language with words of 256 bits with a local volatile memory that behaves as a simple word-addressed array of bytes, and a persistent storage that is part of the blockchain state. A more detailed description of the language and the complete set of operation codes can be found in [6]. In this section, we focus only on the relevant characteristics of the EVM that are needed for describing our work. We will consider EVM programs that satisfy two constraints: (1) jump addresses are constants,i.e. they are introduced by a PUSH operation, they do not depend on input values and they are not stored in memory nor storage, and (2) the size of the stack when executing a jump instruction can be bounded by a constant. These two cases are mostly produced by the use of recursion and higher-order programming in the high-level language that compiles to EVM, as e.g. Solidity.

⬇

1contract EthereumPot {

2 address[] public addresses;

3 address public winnerAddress;

4 uint[] public slots;

5 function __callback (bytes32 _queryId, string _result, bytes _proof){

6 if (msg.sender != oraclize_cbAddress()) throw;

7 random_number = uint(sha3(_result))

8 winnerAddress = findWinner(random_number);

9 amountWon = this.balance * 98 / 100 ;

10 winnerAnnounced(winnerAddress, amountWon);

11 if (winnerAddress.send(amountWon)) {

12 if (owner.send(this.balance)) {

13 openPot();

14 }

15 }

16 }

18 function findWinner (uint random) constant returns (address winner) {

19 for (uint i = 0; i < slots.length; i++) {

20 if (random <= slots[i]) {

21 return addresses[i];

22 }

23 }

24 }

25 // Other functions

26}

⬇

⋯

64B: JUMPDEST

64C: PUSH1 0x00

64E: DUP1

64F: PUSH1 0x00

651: SWAP1

652: POP

653: JUMPDEST

654: PUSH1 0x03

656: DUP1

657: SLOAD

658: SWAP1

659: POP

65A: DUP2

65B: LT

65C: ISZERO

65D: PUSH2 0x06D0

660: JUMPI

661: PUSH1 0x03

663: DUP2

664: DUP2

665: SLOAD

666: DUP2

⋯

941: JUMPDEST

942: MOD

943: ADD

944: PUSH1 0x0A

946: DUP2

947: SWAP1

948: SSTORE

949: POP

94A: PUSH2 0x0954

94D: PUSH1 0x0A

94F: SLOAD

950: PUSH2 0x064B

953: JUMP

⋯

Figure 1: Excerpt of Solidity code for EthereumPot contract (left), and fragment of EVM code for function findWinner (right)

Example 1

In order to describe our techniques, we use as running example a simplified version (without calls to the external service Oraclize and the authenticity proof verifier) of the contract [1] that implements a lottery system. During a game, players call a method joinPot to buy lottery tickets; each player’s address is appended to an array addresses of current players, and the number of tickets is appended to an array slots, both having variable length. After some time has elapsed, anyone can call rewardWinner which calls the Oraclize service to obtain a random number for the winning ticket. If all goes according to plan, the Oraclize service then responds by calling the __callback method with this random number and the authenticity proof as arguments. A new instance of the game is then started, and the winner is allowed to withdraw her balance using a withdraw method. Figure 2 shows an excerpt of the Solidity code (including the public function findWinner) and a fragment of the EVM code produced by the compiler. The Solidity source code is shown for readability, as our analysis works directly on the EVM code.

⬇

1contract EthereumPot {

2 address[] public addresses;

3 address public winnerAddress;

4 uint[] public slots;

5 function __callback (bytes32 _queryId, string _result, bytes _proof){

6 if (msg.sender != oraclize_cbAddress()) throw;

7 random_number = uint(sha3(_result))

8 winnerAddress = findWinner(random_number);

9 amountWon = this.balance * 98 / 100 ;

10 winnerAnnounced(winnerAddress, amountWon);

11 if (winnerAddress.send(amountWon)) {

12 if (owner.send(this.balance)) {

13 openPot();

14 }

15 }

16 }

18 function findWinner (uint random) constant returns (address winner) {

19 for (uint i = 0; i < slots.length; i++) {

20 if (random <= slots[i]) {

21 return addresses[i];

22 }

23 }

24 }

25 // Other functions

26}

⬇

⋯

64B: JUMPDEST

64C: PUSH1 0x00

64E: DUP1

64F: PUSH1 0x00

651: SWAP1

652: POP

653: JUMPDEST

654: PUSH1 0x03

656: DUP1

657: SLOAD

658: SWAP1

659: POP

65A: DUP2

65B: LT

65C: ISZERO

65D: PUSH2 0x06D0

660: JUMPI

661: PUSH1 0x03

663: DUP2

664: DUP2

665: SLOAD

666: DUP2

⋯

941: JUMPDEST

942: MOD

943: ADD

944: PUSH1 0x0A

946: DUP2

947: SWAP1

948: SSTORE

949: POP

94A: PUSH2 0x0954

94D: PUSH1 0x0A

94F: SLOAD

950: PUSH2 0x064B

953: JUMP

⋯

Figure 2: Excerpt of Solidity code for EthereumPot contract (left), and fragment of EVM code for function findWinner (right)

To the right of Figure 2 we show a fragment of the EVM code of method findWinner. It can be seen that the EVM has instructions for operating with the stack contents, like DUPx or SWAPx; for comparisons, like LT, GT; for accessing the storage (memory) of the contract, like SSTORE, SLOAD (MLOAD, MSTORE); to add/remove elements to/from the stack, like PUSHx/ POP; and many others (we again refer to [6] for details). Some instructions increment the program counter in several units (e.g., PUSHx Y adds a word with the constant Y of x bytes to the stack and increments the program counter by $x+1$ ). In what follows, we use $size(b)$ to refer to the number of units that instruction $b$ increments the value of the program counter. For instance ${size(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}})=1$ , ${size(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}}}}})=2$ or ${size(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH3}}}}}})=4$ . $\blacksquare$

One significant difference between the EVM and other virtual machine languages (like Java Bytecode or CLI for .Net programs) is the use of the stack for saving the jump addresses instead of having it explicit in the code of the jumping instructions. In EVM, instructions JUMP and JUMPI will jump, unconditionally and conditionally respectively, to the program counter stored in the top of the execution stack. This feature of the EVM requires, in order to obtain the control flow graph of the program, to keep track of the information stored in the stack. Let us illustrate it with an example.

Example 2

In the EVM code to the right of Figure 2 we can see two jump instructions at program points 953 and 660, respectively, and the jump address (64B and 6D0) is stored in the instruction immediately before them: 950 or 65D. It then jumps to this destination by using the instruction JUMPDEST (program points 941, 64B, 653).

$\blacksquare$

We start our analysis by defining the set $\mathcal{J}$ , which contains all possible jump destinations in an EVM program $P\equiv b_{0},\dots,b_{p}$ :

{\mathcal{J}(P)=\{pc\leavevmode\nobreak\ |\leavevmode\nobreak\ b_{pc}\in P\wedge b_{pc}\equiv\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\}.

We use $b_{pc}\in P$ for referring to the instruction at program counter $pc$ in the EVM program $P$ . In what follows, we omit $P$ from definitions when it is clear from the context, e.g., we use $\mathcal{J}$ to refer to $\mathcal{J}(P)$ .

Example 3

Given the EVM code that corresponds to function findWinner, we get the following set:

\mathcal{J}=\{\text{{123}},\text{{142}},\text{{954}},\text{{64B}},\text{{6D0}},\text{{66F}},\text{{653}},\text{{6C3}},\text{{691}},\text{{6D1}},\text{{6BA}}\}

$\blacksquare$

The first step in the computation of the CFG is to define the notion of block. In general [2], given a program $P$ , a block is a maximal sequence of straight-line consecutive code in the program with the properties that the flow of control can only enter the block through the first instruction in the block, and can only leave the block at the last instruction. Let us define the concept of block in an EVM program:

Definition 1 (blocks)

Given an EVM program $P\equiv b_{0},\ldots,b_{p}$ , we define

{{{{blocks(P)=\bigg{\{}B_{i}\equiv b_{i},\ldots,b_{j}\leavevmode\nobreak\ \bigg{|}\leavevmode\nobreak\ \begin{array}[]{l}(\forall k.i<k<j,b_{k}\not\in Jump\cup End\cup\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\})\leavevmode\nobreak\ \wedge\\ (\leavevmode\nobreak\ i{=}0\vee b_{i}{\equiv}\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\vee b_{i-1}{=}\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\leavevmode\nobreak\ )\leavevmode\nobreak\ \wedge\\ (\leavevmode\nobreak\ j{=}p\vee b_{j}\in Jump\vee b_{j}\in End\vee b_{j+1}{\equiv}\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\leavevmode\nobreak\ )\end{array}\bigg{\}}

where

{{{{{\begin{array}[]{rcl}Jump&=&\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}},\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\}\\ End&=&\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{REVERT}}}}}},\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{STOP}}}}}},\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{INVALID}}}}}}\}\\ \end{array}

Refer to caption — Figure 3: Fragment of the CFG of findWinner

Example 4

Figure 3 shows the blocks (nodes) obtained for findWinner and their corresponding jump invocations. Solid and dashed edges represent the two possible execution paths depending on the entry block: solid edges represent the path that starts from block 941 and dashed edges the path that starts from 123. Note that most of the blocks start with a JUMPDEST instruction (123, 941, 64B, 653, 66F, 954, 6C3, 691, 142, 6D1, 6D0). The rest of the blocks start with instructions that come right after a JUMPI instruction (661, 683). Analogously, most blocks end in a JUMP (941, 6C3, 123, 691, 6D1), JUMPI (653, 661, 66F, 683) or RETURN (142) instruction or in the instruction that precedes JUMPDEST (64B). $\blacksquare$

Observing the blocks in Figure 3, we can see that most JUMP instructions use the address introduced in the PUSH instruction executed immediately before the JUMP. However, in general, in EVM code, it is possible to find a JUMP whose address has been stored in a different block. This happens for instance when a public function is invoked privately from other methods of the same contract, the returning program counter is introduced by the invokers at different program points and it will be used in a unique JUMP instruction when the invoked method finishes in order to return to the particular caller that invoked that function.

Example 5

In Figure 3, at block 6D1 we have a JUMP (marked with ✩) whose address is not pushed in the same block. This JUMP takes the returned address from function findWinner. If findWinner is publicly invoked, it jumps to address 142 (pushed at block 123 at $\star$ ) and if it is invoked from __callback it jumps to 954 (pushed at block 941 at $\star$ ).

1.1 Operational Semantics

Figure 4 shows the semantics of some instructions involved in the computation of the values stored in the stack for handling jumps. The state of the program $S$ is a tuple $\langle pc,\langle n,\sigma\rangle\rangle$ where $pc$ is the value of the program counter with the index of the next instruction to be executed, and $\langle n,\sigma\rangle$ is a stack state as defined in Section 2 ( $n$ is the number of elements in the stack, and $\sigma$ is a partial mapping that relates some stack positions with a set of jump destinations). Interesting rules are the ones that deal with jump destination addresses on the stack: Rule (4) adds a new address on the stack, and Rules (6) and (8-10) copy or exchange existing addresses on top of the stack, respectively. Rules (1) to (3) perform a jump in the program and therefore consume the address placed on top of the stack, plus an additional word in the case of JUMPI. If the instructions considered in this simplified semantics do not handle jump addresses, the corresponding rules just remove some values from the stack in the program state $S$ (Rules (5), (7) and (11)). The remaining EVM instructions not explicitely considered in this simplified semantics are generically represented by Rule (12) with $b_{pc}^{\delta,\alpha}$ , where $\delta$ is the number of items removed from stack when $b_{pc}$ is executed, and $\alpha$ is the number of additional items placed on the stack. Complete executions are traces of the form $S_{0}\Rightarrow S_{1}\Rightarrow\dots\Rightarrow S_{n}$ where $S_{0}\equiv\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle$ is the initial state, $\sigma_{\emptyset}$ is the empty mapping, and $S_{n}$ corresponds to the last state. There are no infinite traces, as any transaction that executes EVM code has a finite gas limit and every instruction executed consumes some amount of gas. When the gas limit is exceeded, an out-of-gas exception occurs and the program halts immediately.

(1)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle\sigma(s_{n-1}),\langle n-1,\sigma\backslash[s_{n-1}]\rangle\rangle\end{array}$
(2)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle\sigma(s_{n{-}1}),\langle n{-}2,\sigma\backslash[s_{n{-}1},s_{n{-}2}]\rangle\rangle\end{array}$
(3)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n{-}2,\sigma\backslash[s_{n{-}1},s_{n{-}2}]\rangle\rangle\end{array}$
(4)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x\leavevmode\nobreak\ v,\text{v}\in\mathcal{J}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma[s_{n}\mapsto\{v\}]\rangle\rangle\end{array}$
(5)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x\leavevmode\nobreak\ v,v\notin\mathcal{J}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma\rangle\rangle\end{array}$
(6)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP}}}}}}x,s_{n{-}x}\in dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma[s_{n}\mapsto\sigma(s_{n{-}x})]\rangle\rangle\end{array}$
(7)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP}}}}}}x,s_{n{-}x}\notin dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma\rangle\rangle\end{array}$
(8)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\in dom(\sigma),s_{n{-}x{-}1}\in dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma[s_{n{-}x{-}1}\mapsto\sigma(s_{n{-}1}),s_{n{-}1}\mapsto\sigma(s_{n{-}x{-}1})]\rangle\rangle\end{array}$
(9)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\in dom(\sigma),s_{n{-}x{-}1}\notin dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma[s_{n{-}x{-}1}\mapsto\sigma(s_{n{-}1})]\backslash[s_{n{-}1}]\rangle\rangle\end{array}$
(10)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\notin dom(\sigma),s_{n{-}x{-}1}\in dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma[s_{n{-}1}\mapsto\sigma(s_{n{-}x{-}1})]\backslash[s_{n{-}x{-}1}]\rangle\rangle\end{array}$
(11)	${\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\notin dom(\sigma),s_{n{-}x{-}1}\notin dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma\backslash[s_{n{-}1},s_{n{-}x{-}1}]\rangle\rangle\end{array}$
(12)	${{{\begin{array}[]{c}b_{pc}^{\delta,\alpha}\notin\textit{End}\cup\textit{Jump}\cup\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP}}}}}}x,\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x\}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n{-}\delta{+}\alpha,\sigma\backslash[s_{n-1},...,s_{n-\delta}]\rangle\rangle\end{array}$

Figure 4: Simplified EVM semantics for handling jumps

2 From EVM to a Sound CFG

As we have seen in the previous section, the addresses used by the jumping instructions are stored in the execution stack. In EVM, blocks can be reached with different stack sizes an contents. As it is used in other tools [4, 3, 5], to precisely infer the possible addresses at jumping program points, we need a context-sensitive static analysis that analyze separately all blocks for each possible stack than can reach them (only considering the addresses stored in the stack). This section presents an address analysis of EVM programs which allows us to compute a complete CFG of the EVM code. To compute the addresses involved in the jumping instructions, we define a static analysis which soundly infers all possible addresses that a JUMP instruction could use.

In our address analysis we aim at having the stack represented by explicit variables. Given the characteristics of EVM programs, the execution stack of EVM programs produced from Solidity programs without recursion can be flattened. Besides, as the size of the stack of the Ethereum Virtual Machine is bounded to 1024 elements (see [6]), the number of stack variables is limited. We use $\mathcal{V}$ to represent the set of all possible stack variables that may be used in the program. The first element we define for our analysis is its abstract state:

The abstract state

Our analysis uses a partial representation of the execution stack as basic element. To this end, we use the notion of stack state as a pair $\langle n,\sigma\rangle$ , where $n$ is the number of elements in the stack, and $\sigma$ is a partial mapping that relates some stack positions with a set of jump destinations. A position in the stack is referred as $s_{i}$ with $0\leq i<n$ , and $s_{n-1}$ is the position at the top of the stack. The abstract state of the analysis is defined on the set of all stack states $\mathcal{S}=\{\langle n,\sigma\rangle\leavevmode\nobreak\ |\leavevmode\nobreak\ 0\leq n\leq|\mathcal{V}|\wedge\sigma(s)\in\Sigma_{n}\}$ where $\Sigma_{n}$ is the set of all mappings using up to $n$ stack variables.

Definition 2 (abstract state)

The abstract state is a partial mapping $\pi$ of the form $\mathcal{S}\mapsto\mathcal{P}(\mathcal{S})$ .

The application of $\sigma$ to an element $s_{i}$ , that is, $\sigma(s_{i})$ , corresponds to the set of jump destinations that a stack variable $s_{i}$ can contain. The first element of the tuple, that is, $n$ , stores the size of the stack in the different abstract states.

The abstract domain is the lattice $\langle\it AS,\pi_{{}_{\top}},\pi_{{}_{\bot}},\sqcup,\sqsubseteq\rangle$ , where $\it AS$ is the set of abstract states and $\pi_{{}_{\top}}$ is the top of the lattice defined as the mapping $\pi_{\top}$ such that $\forall s\in\mathcal{S},\pi_{\top}(s)=\mathcal{S}$ . The bottom element of the lattice $\pi_{{}_{\bot}}$ is the empty mapping. Now, to define $\sqcup$ and $\sqsubseteq$ , we first define the function $img(\pi,s)$ as $\pi(s)$ if $s\in dom(\pi)$ and $\emptyset$ , otherwise. Given two abstract states $\pi_{1}$ and $\pi_{2}$ , we use $\pi=\pi_{1}\sqcup\pi_{2}$ to denote that $\pi$ is the least upper-bound defined as follows $\forall s\in dom(\pi_{1})\cup dom(\pi_{2}),\pi(s)=img(\pi_{1},s)\cup img(\pi_{2},s)$ . At this point, $\pi_{1}\sqsubseteq\pi_{2}$ holds iff $dom(\pi_{1})\subseteq dom(\pi_{2})$ and $\forall s\in dom(\pi_{1}),\pi_{1}(s)\subseteq\pi_{2}(s).$

Transfer function

One of the ingredients of our analysis is a transfer function that models the effect of each EVM instruction on the abtract state for the different instructions. Given a stack state $s$ of the form $\langle n,\sigma\rangle$ , Figure 5 defines the updating function $\lambda(b_{pc}^{\delta,\alpha},s)$ where $b$ corresponds to the EVM instruction to be applied, $pc$ corresponds to the program counter of the instruction and $\alpha$ and $\delta$ to the number of elements placed to and removed from the EVM stack when executing $b$ , respectively. Given a map $m$ we will use $m[x\mapsto y]$ to indicate the result of updating $m$ by making $m(x)=y$ while $m$ stays the same for all locations different from $x$ , and we will use $m\backslash[x]$ to refer to a partial mapping that stays the same for all locations different from $x$ , and $m(x)$ is undefined. By means of $\lambda$ , we define the transfer function of our analysis.

Definition 3 (transfer function)

Given the set of abstract states $AS$ and the set of EVM instructions $Ins$ , the transfer function $\tau$ is defined as a mapping of the form

\tau\leavevmode\nobreak\ :\leavevmode\nobreak\ Ins\times AS\mapsto AS

is defined as follows:

\tau(b,\pi)=\pi^{\prime}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \text{where}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \forall s\in dom(\pi),\pi^{\prime}(s)=\lambda(b,\pi(s))

	$b^{\delta,\alpha}$	$\lambda(b,\langle n,\sigma\rangle)$
(1)	PUSH $x$ $v$	$\langle n+1,\sigma[s_{n}\mapsto\{v\}]\rangle$	when $v\in\mathcal{J}$
(1)	PUSH $x$ $v$	$\langle n+1,\sigma\rangle$	when $v\not\in\mathcal{J}$
(2)	DUPx	$\langle n+1,\sigma\rangle$	when $s_{n-x}\not\in dom(\sigma)$
(2)	DUPx	$\langle n+1,\sigma[s_{n}\mapsto\sigma(s_{n-x})]\rangle$	when $s_{n-x}\in dom(\sigma)$
(3)	SWAPx	$\langle n,\sigma\rangle$	when $s_{n-1}\not\in dom(\sigma)\wedge s_{n{-}x{-}1}\not\in dom(\sigma)$
		$\langle n,\sigma[s_{n-x-1}\mapsto\sigma(s_{n-1}),s_{n-1}\mapsto\sigma(s_{n-x-1})]\rangle$	when $s_{n-1}\in dom(\sigma)\wedge s_{n{-}x{-}1}\in dom(\sigma)$
		$\langle n,\sigma[s_{n{-}1}\mapsto\sigma(s_{n{-}x{-}1})]\backslash\sigma[s_{n{-}x{-}1}]\rangle$	when $s_{n{-}1}\not\in dom(\sigma)\wedge s_{n{-}x{-}1}\in dom(\sigma)$
		$\langle n,\sigma[s_{n{-}x{-}1}\mapsto\sigma(s_{n{-}1})]\backslash\sigma[s_{n{-}1}]\rangle$	when $s_{n{-}1}\in dom(\sigma)\wedge s_{n{-}x{-}1}\not\in dom(\sigma)$
(4)	otherwise	$\langle n-\delta+\alpha,\sigma\backslash[s_{n-1},\dots,s_{n-\delta}]\rangle$

Figure 5: Updating function

Example 6

Given the following initial abstract state $\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}$ , which corresponds to the initial stack state for executing block 941, the application of the transfer function $\tau$ to the block that starts at EVM instruction 941, produces the following results (between parenthesis we show the program point). To the right we show the application of the transfer function to block 123 with its initial abstract state $\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{}\right\}\rangle\}\}$ .

{{{{{{{{{{{{{\begin{array}[]{lll}(\text{{941}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ (\text{{942}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{MOD}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{}\right\}\rangle\}\}\\ (\text{{943}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{ADD}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 6,\left\{{}\right\}\rangle\}\}\\ (\text{{944}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}0{\@listingGroup{ltx_lst_identifier}{A}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{}\right\}\rangle\}\}\\ (\text{{946}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP2}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ (\text{{947}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP1}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ (\text{{948}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SSTORE}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 6,\left\{{}\right\}\rangle\}\}\\ (\text{{949}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 5,\left\{{}\right\}\rangle\}\}\\ (\text{{94A}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}0954}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 6,\left\{{s_{5}\mapsto\text{{954}}}\right\}\}\rangle\}\}\\ (\text{{94D}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}0{\@listingGroup{ltx_lst_identifier}{A}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{6}\mapsto\text{{954}}}\right\}\}\rangle\}\}\\ (\text{{94F}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SLOAD}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\}\rangle\}\}\\ (\text{{950}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}064{\@listingGroup{ltx_lst_identifier}{B}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{s_{5}\mapsto\text{{954}},s_{7}\mapsto\text{{64B}}}\right\}\rangle\}\}\\ (\text{{953}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}\}}\right\}\rangle\}\}\\ \end{array}

{{{{{{{{{{{{{\begin{array}[]{lll}(\text{{123}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{}\right\}\rangle\}\}\\ (\text{{124}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 1,\left\{{}\right\}\rangle\}\}\\ (\text{{125}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}0142}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{128}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}04}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{12A}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP1}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{12B}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{CALLDATASIZE}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{12C}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SUB}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \vdots\\ (\text{{13A}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP1}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13B}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13C}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13D}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13E}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}064{\@listingGroup{ltx_lst_identifier}{B}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}},s_{2}\mapsto\text{{64B}}}\right\}\rangle\}\}\\ (\text{{141}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ \end{array}

$\blacksquare$

2.1 Addresses equation system

The next step consists in defining, by means of the transfer and the updating functions, a constraint equation system to represent all possible jumping addresses that could be valid for executing a jump instruction in the program.

Definition 4 (addresses equation system)

Given an EVM program $P$ of the form $b_{0},\ldots,b_{p}$ , its addresses equation system, $\mathcal{E}(P)$ includes the following equations according to all EVM bytecode instruction $b_{pc}\in P$ :

	$b_{pc}$	$C_{pc}$
(1)	JUMP	${\mathcal{X}}_{\sigma(s_{n-1})}$	$\sqsupseteq$	$\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle))$	$\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)$
(2)	JUMPI	${\mathcal{X}}_{\sigma(s_{n-1})}$	$\sqsupseteq$	$\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle))$	$\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)$
(2)	JUMPI	${\mathcal{X}}_{pc+1}$	$\sqsupseteq$	$\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle))$	$\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)$
(3)	$b_{pc}\not\in End\wedge$ ${b_{pc+size(b_{pc})}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}$	${\mathcal{X}}_{pc+size(b_{pc})}$	$\sqsupseteq$	$\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle))$	$\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)$
(4)	$b_{pc}\not\in End$	${\mathcal{X}}_{pc+size(b_{pc})}$	$\sqsupseteq$	$\tau(b_{pc},{\mathcal{X}}_{pc})$
(5)	otherwise	${\mathcal{X}}_{pc+size(b_{pc})}$	$\sqsupseteq$	$\tau(b_{pc},{\mathcal{X}}_{pc})$

where $idmap(s)$ returns a map $\pi$ such that $dom(\pi)=\{s\}$ and $\pi(s)=\{s\}$ and $size(b_{pc})$ returns the number of bytes of the instruction $b_{pc}$ .

Observe that the addresses equation system will have equations for all program points of the program. Concretely, variables of the form ${\mathcal{X}}_{pc}$ store the jumping addresses saved in the stack after executing $b_{pc}$ for all possible entry stacks. This information will be used for computing all possible jump destinations when executing JUMP or JUMPI instructions. For computing the system, most instructions, cases (4) and (5), just apply the transfer function $\tau$ to compute the possible stack states of the subsequent instruction. Note that the expression $pc+size(b_{pc})$ at (3) just computes the position of the next instruction in the EVM program. Jumping instructions, points (1) and (2), compute the initial state of the invoked blocks, thus they produce a map with all possible input stack states that can reach one block. JUMP and JUMPI instructions produce, for each stack state, one equation by taking the element from the previous stack state ${\mathcal{X}}_{\sigma(s_{n-1})}$ . JUMPI, point (2), produces an extra equation ${\mathcal{X}}_{pc+1}$ to capture the possibility of continuing to the next instruction instead of jumping to the destination address. Additionally, those instructions before JUMPDEST, point (3), produce initial states for the block that starts in the JUMPDEST. When the constraint equation system is solved, constraint variables over-approximate the jumping information for the program.

Example 7

As it can be seen in Figure 3, we can jump to block 64B from two different blocks, 941 and 123. The computation of the jump equations systems will produce the following equations for the entry program points of these two blocks:

\begin{array}[]{rcl}{\mathcal{X}_{\text{{941}}}}&\sqsupseteq&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ &\vdots\\ {\mathcal{X}_{\text{{950}}}}&\sqsupseteq&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\{s_{5}\mapsto\text{{954}},s_{8}\mapsto\text{{64B}}\}\rangle\}\}\\ {\mathcal{X}_{\text{{64B}}}}^{{\ooalign{\hfil\raise 0.18988pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 1}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\}\}\end{array}

\begin{array}[]{rcl}{\mathcal{X}_{\text{{123}}}}&\sqsupseteq&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\{\}\rangle\}\}\\ &\vdots\\ {\mathcal{X}_{\text{{318}}}}&\sqsupseteq&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\{s_{1}\mapsto\text{{142}},s_{3}\mapsto\text{{64B}}\}\rangle\}\}\\ {\mathcal{X}_{\text{{64B}}}}^{{\ooalign{\hfil\raise 0.18988pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 2}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}}&\sqsupseteq&\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\par\end{array}

Observe that we have two different stack contents reaching the same program point, e.g. two equations for ${\mathcal{X}_{\text{{64B}}}}$ are produced by two different blocks, the JUMP at the end of block 941, identified by ${\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 1}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}$ , and the JUMP at the end of block 123, identified by ${\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 2}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}$ . Thus the equation that must hold for ${\mathcal{X}_{\text{{64B}}}}$ is produced by the application of the operation ${\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 1}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}\sqcup{\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 2}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}$ , as follows:

{\mathcal{X}_{\text{{64B}}}}\sqsupseteq\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}

Note that the application of the transfer function $\tau$ for all instructions of block 64B applies function $\lambda$ to all elements in the abstract state and updates the stack state accordingly

{{{{{{\begin{array}[]{rrcll}(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}})&{\mathcal{X}_{\text{{64B}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}00}}}})&{\mathcal{X}_{\text{{64C}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 8,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP1}}}}}})&{\mathcal{X}_{\text{{64E}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}00}}}})&{\mathcal{X}_{\text{{64F}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP1}}}}}})&{\mathcal{X}_{\text{{651}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}})&{\mathcal{X}_{\text{{652}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ \end{array}

$\blacksquare$

Solving the addresses equation system of a program $P$ can be done iteratively. A naïve algorithm consists in first creating one constraint variable ${\mathcal{X}}_{0}\sqsupseteq\pi_{\emptyset}[\langle 0,\sigma_{\emptyset}\rangle\mapsto\left\{{\langle 0,\sigma_{\emptyset}\rangle}\right\}]$ , where $\pi_{\emptyset}$ and $\sigma_{\emptyset}$ are empty mappings, and ${\mathcal{X}}_{pc}\sqsupseteq\pi_{{}_{\bot}}$ for all $pc\in P,pc\not=0$ , and then iteratively refining the values of these variables as follows:

1.

substitute the current values of the constraint variables in the right-hand side of each constraint, and then evaluate the right-hand side if needed;
2.

if each constraint ${\mathcal{X}}\sqsupseteq E$ holds, where $E$ is the value of the evaluation of the right-hand side of the previous step, then the process finishes; otherwise
3.

for each ${\mathcal{X}}\sqsupseteq E$ which does not hold, let $E^{\prime}$ be the current value of ${\mathcal{X}}$ . Then update the current value of ${\mathcal{X}}$ to $E\sqcup E^{\prime}$ . Once all these updates are (iteratively) applied we repeat the process at step 1.

Termination is guaranteed since the abstract domain does not have infinitely ascending chains as the number of jump destinations and the stack size are finite. This is the case of the programs that satisfy the constraints stated in Section 1.

\begin{array}[]{rclll}\hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{64B}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{652}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{653}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{660}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\left\{{s_{5}\mapsto\text{{954}},s_{10}\mapsto\text{{6D0}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{1}\mapsto\text{{142}},s_{6}\mapsto\text{{6D0}}}\right\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{661}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ {\mathcal{X}_{\text{{6D0}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{66D}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 13,\left\{{s_{5}\mapsto\text{{954}},s_{12}\mapsto\text{{66F}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{1}\mapsto\text{{142}},s_{10}\mapsto\text{{66F}}}\right\}\rangle&\}\\ {\mathcal{X}_{\text{{66E}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{66F}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{682}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\{s_{5}\mapsto\text{{954}},s_{10}\mapsto\text{{6C3}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\{s_{1}\mapsto\text{{142}},s_{6}\mapsto\text{{6C3}}\}\rangle&\}\par\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{6C3}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{6CF}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\{s_{5}\mapsto\text{{954}},s_{9}\mapsto\text{{653}}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\{s_{1}\mapsto\text{{142}},s_{5}\mapsto\text{{653}}\}\rangle&\}\par\\[5.69046pt] \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{683}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{68F}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 13,\{s_{5}\mapsto\text{{954}},s_{12}\mapsto\text{{691}}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 9,\{s_{1}\mapsto\text{{142}},s_{8}\mapsto\text{{691}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{690}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\{s_{5}\mapsto\text{{954}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{691}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\{s_{5}\mapsto\text{{954}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\par\\ &\vdots\\ {\mathcal{X}_{\text{{6C2}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\{s_{5}\mapsto\text{{954}},s_{9}\mapsto\text{{6D1}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\{s_{1}\mapsto\text{{142}},s_{5}\mapsto\text{{6D1}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{6D1}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\par\\ {\mathcal{X}_{\text{{6D2}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 8,\{s_{5}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 4,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\\ {\mathcal{X}_{\text{{6D3}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 8,\{s_{7}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 4,\{s_{3}\mapsto\text{{142}}\}\rangle&\}\\ {\mathcal{X}_{\text{{6D4}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 8,\{s_{6}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 4,\{s_{2}\mapsto\text{{142}}\}\rangle&\}\\ {\mathcal{X}_{\text{{6D5}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 7,\{s_{6}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 3,\{s_{2}\mapsto\text{{142}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize B}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{954}}}}&\sqsupseteq&\{\langle 6,\{\}\rangle\mapsto\langle 6,\{\}\rangle&&\}\\ &\vdots\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize B}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{142}}}}&\sqsupseteq&\{\langle 2,\{\}\rangle\mapsto\langle 2,\{\}\rangle&&\}\\ &\vdots\end{array}

Figure 6: Jumps equations system of __callback function

Example 8

Figure 6 shows the equations produced by Definition 4 of the first and the last instruction of all blocks shown in Figure 3. The first instruction shown in the system is ${\mathcal{X}_{\text{{64B}}}}$ , computed in Example 7. Observe that the application of $\tau$ stores the jumping addresses in the corresponding abstract states after PUSH instructions (see ${\mathcal{X}_{\text{{660}}}}$ , ${\mathcal{X}_{\text{{66D}}}}$ , ${\mathcal{X}_{\text{{6CF}}}}$ , ${\mathcal{X}_{\text{{68C}}}}$ , …). Such addresses will be used to produce the equations at the JUMP or JUMPI instructions. In the case of JUMP, as the jump is unconditional, it only produces one equation, e.g. ${\mathcal{X}_{\text{{66E}}}}$ consumes address 66F to produce the input state of ${\mathcal{X}_{\text{{66F}}}}$ , or ${\mathcal{X}_{\text{{6C2}}}}$ produces the input abstract state for ${\mathcal{X}_{\text{{6D1}}}}$ . JUMPI instructions produce two different equations: (1) one equation which corresponds to the jumping address stored in the stack, e.g. equations ${\mathcal{X}_{\text{{6D0}}}}$ and ${\mathcal{X}_{\text{{66F}}}}$ produced by the jumps of the equations ${\mathcal{X}_{\text{{660}}}}$ and ${\mathcal{X}_{\text{{66D}}}}$ respectively; and (2) one equation which corresponds to the next instruction, e.g. ${\mathcal{X}_{\text{{661}}}}$ and ${\mathcal{X}_{\text{{66E}}}}$ produced by ${\mathcal{X}_{\text{{660}}}}$ and ${\mathcal{X}_{\text{{66D}}}}$ , respectively. Finally, another point to highlight occurs at equation ${\mathcal{X}_{\text{{6D5}}}}$ : as we have two possible jumping addresses in the stack of and both can be used by the JUMP at the end of the block, we produce two inputs for the two possible jumping addresses, ${\mathcal{X}_{\text{{954}}}}$ and ${\mathcal{X}_{\text{{142}}}}$ , for capturing the two possible branches from block 6D1 (see Figure 3). $\blacksquare$

Theorem 2.1 (Soundness of the addresses equation system)

Let $P\equiv b_{0},\dots,b_{p}$ be a program, ${\mathcal{X}}_{1},\dots,{\mathcal{X}}_{n}$ the solution of the jumps equations system of $P$ , and $pc$ the program counter of a jump instruction. Then for any execution trace $t$ of $P$ , there exists $s\in dom({\mathcal{X}}_{pc})$ such that $\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)$ and $\sigma(s_{n-1})$ contains all jump addresses that instruction $b_{pc}$ jumps to in $t$ .

We follow the next steps to prove the soundness of this theorem:

1.

We first define an EVM collecting semantics for the operational semantics of Figure 4. Such collecting semantics gathers all transitions that can be produced by the execution of a program $P$ .
2.

We continue by defining the jumps-to property as a property of this collecting semantics.
3.

Then we prove Lemma 1 below that states that the least solution of the addresses equation system generated from the EVM program as described in Definition 4 is a safe approximation of the EVM collecting semantics w.r.t. the jumps-to property.
4.

Finally, Theorem 2.1 trivially follows from Lemma 1.

Definition 5 (EVM collecting semantics)

Given an EVM program $P$ , the EVM collecting semantics operator $\mathcal{C}_{P}$ is defined as follows:

\mathcal{C}_{P}(X)=\left\{{\langle S,S^{\prime}\rangle\leavevmode\nobreak\ |\leavevmode\nobreak\ \langle\_,S\rangle\in X\wedge S\Rightarrow S^{\prime}}\right\}

The EVM semantics is defined as $\xi_{P}=\bigcup_{n>0}\mathcal{C}_{P}^{n}(X_{0})$ , where $X_{0}\equiv\left\{{\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle}\right\}$ is the initial configuration.

Definition 6 (jumps-to property)

Let $P$ be an IR program, $\xi_{P}=\bigcup_{n>0}\mathcal{C}_{P}^{n}(X_{0})$ , and $b$ an instruction at program point $pc$ , then we say that $\xi_{P}\vDash_{pc}T$ if $T=\left\{{\langle n,\sigma\rangle\leavevmode\nobreak\ |\leavevmode\nobreak\ \langle S,S^{\prime}\rangle\in\xi_{P}\wedge\langle pc,\langle n,\sigma\rangle\rangle\in S^{\prime}}\right\}$ .

The following lemma states that the least solution of the constraint equation system defined in Definition 2.1 is a safe approximation of $\xi_{P}$ :

Lemma 1

Let $P\equiv b_{0},\dots,b_{p}$ be a program, $pc$ a program point and ${\mathcal{X}}_{0},\dots,{\mathcal{X}}_{p}$ the least solution of the constraints equation system of Definition 4. The following holds:

If $\xi_{P}\vDash_{pc}T$ , then for all $\langle n,\sigma\rangle\in T$ , exists $s\in dom({\mathcal{X}}_{pc})$ such that $\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)$ .

Proof

We use ${\mathcal{X}}_{pc}^{m}$ to refer to the value obtained for ${\mathcal{X}}_{pc}$ after $m$ iterations of the algorithm for solving the equation system depicted in Section 2. We say that ${\mathcal{X}}_{pc}$ covers $\langle n,\sigma\rangle$ in $\mathcal{C}_{P}^{m}(X_{0})$ at program point $pc$ when this lemma holds for the result of computing $\mathcal{C}_{P}^{m}(X_{0})$ . In order to prove this lemma, we can reason by induction on the value of $m$ , the length of the traces $S_{0}\Rightarrow^{m}S_{m}$ considered in $\mathcal{C}_{P}^{m}(X_{0})$ .

Base case: if $m=0$ , $S_{0}=\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle$ and the Lemma trivially holds as $\langle 0,\sigma_{\emptyset}\rangle\in{\mathcal{X}}_{0}^{0}(\langle 0,\sigma_{\emptyset}\rangle)$ .

Induction Hypothesis: we assume Lemma 1 holds for all traces of length $m\geq 0$ .

Inductive Case: Let us consider traces of length $m+1$ , which are of the form $S_{0}\Rightarrow^{m}S_{m}\Rightarrow S_{m+1}$ . $S_{m}$ is a program state of the form $S_{m}=\langle pc,\langle n,\sigma\rangle\rangle$ . We can apply the induction hypothesis to $S_{m}$ : there exists some $s\in dom({\mathcal{X}}_{pc}^{m})$ such that $\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}^{m}(s)$ . For extending the Lemma, we reason for all possible rules in the simplified EVM semantics (Fig. 4) we may apply from $S_{m}$ to $S_{m+1}$ :

•

Rule (1): After executing a JUMP instruction $S_{m+1}$ is of the form $\langle\sigma(s_{n-1}),\langle n-1,\sigma\backslash[s_{n-1}]\rangle\rangle$ . In iteration $m+1$ , the following set of equations corresponding to $b_{pc}$ is evaluated:

\begin{array}[]{rcll}{\mathcal{X}}_{\sigma(s_{n-1})}&\sqsupseteq&\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\end{array}

where $\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))=\pi_{\bot}[\langle n^{\prime}-1,\sigma^{\prime}\backslash[s_{n-1}]\rangle\mapsto\left\{{\langle n^{\prime}-1,\sigma^{\prime}\backslash[s_{n-1}]\rangle}\right\}]$ (Case (4) in Fig. 5). The induction hypothesis guarantees that there exists some $s^{\prime\prime}\in{\mathcal{X}}_{pc}^{m}$ such that $\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}^{m}(s^{\prime\prime})$ , where $S_{m}=\langle pc,\langle n,\sigma\rangle\rangle$ . Therefore, at Iteration $m+1$ , the following must hold:

{\mathcal{X}}_{\sigma(s_{n-1})}^{m+1}\sqsupseteq\pi_{\bot}[\langle n-1,\sigma\backslash[s_{n-1}]\rangle\mapsto\left\{{\langle n-1,\sigma\backslash[s_{n-1}]\rangle}\right\}]

so $\langle n-1,\sigma\backslash[s_{n-1}]\rangle\in{\mathcal{X}}_{\sigma(s_{n-1})}^{m+1}(\langle n-1,\sigma\backslash[s_{n-1}]\rangle)$ and thus Lemma 1 holds.

•

Rules (2) and (3): After executing a JUMPI instruction, $S_{m+1}$ is either $\langle\sigma(s_{n-1}),\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\rangle$ or $\langle pc+size(b_{pc}),\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\rangle$ , respectively. In any of those cases the following sets of equations are evaluated:

{{\begin{array}[]{rcll}{\mathcal{X}}_{\sigma(s_{n-2})}&\sqsupseteq&\textit{idmap}(\lambda(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\\ {\mathcal{X}}_{pc+1}&\sqsupseteq&\textit{idmap}(\lambda(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\end{array}

where
$\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))=\pi_{\bot}[\langle n^{\prime}-2,\sigma^{\prime}\backslash[s_{n-1},s_{n-2}]\rangle\mapsto\left\{{\langle n^{\prime}-2,\sigma^{\prime}\backslash[s_{n-1},s_{n-2}]\rangle}\right\}]$ (Case (4) of the definition of the update function $\lambda$ in Fig. 5). As in the previous case, the induction hypothesis guarantees that at Iteration $m$ there exists $s^{\prime\prime}\in{\mathcal{X}}_{pc}^{m}$ such that $\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}^{m}(s^{\prime\prime})$ . Therefore, in Iteration $m+1$ , the following must hold:

\begin{array}[]{rcl}{\mathcal{X}}_{\sigma(s_{n-1})}^{m+1}&\sqsupseteq&\pi_{\bot}[\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\mapsto\left\{{\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle}\right\}]\\ {\mathcal{X}}_{pc+1}^{m+1}&\sqsupseteq&\pi_{\bot}[\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\mapsto\left\{{\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle}\right\}]\end{array}

and thus Lemma 1 holds for these cases as well.

•

Rules (4) - (12): We will first consider the case in which any of these rules corresponds to an EVM instruction followed by an instruction different from JUMPDEST. All rules are similar, as they all use the set of equations generated by Case (4) in Definition 4. We will see Rule (4) in detail.

After executing a ${\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x\leavevmode\nobreak\ v$ instruction, $S_{m+1}$ is $\langle pc+size(b_{pc}),\langle n+1,\sigma[s_{n}\mapsto\left\{{v}\right\}]\rangle\rangle$ . We have to prove that exists some $s\in dom({\mathcal{X}}_{pc+size(b_{pc})})$ such that $\langle n+1,\sigma[s_{n}\mapsto\left\{{v}\right\}]\rangle\in{\mathcal{X}}_{pc+size(b_{pc})}(s)$ . The following set of equations is evaluated:

{\begin{array}[]{rcll}{\mathcal{X}}_{pc+size(b_{pc})}&\sqsupseteq&\tau(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,{\mathcal{X}}_{pc})\end{array}

(1)

By Definition 3, ${\tau(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,{\mathcal{X}}_{pc})=\pi^{\prime}$ , where $\forall s^{\prime}\in dom(\pi),$ ${\pi^{\prime}(s^{\prime})=\lambda(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,{\mathcal{X}}_{pc}(s^{\prime}))$ . By the case (1) of the definition of the update function $\lambda$ , we have that:

\forall\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle\in dom({\mathcal{X}}_{pc}),\pi^{\prime}(\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle)=\langle n^{\prime\prime}+1,\sigma^{\prime\prime}[s_{n}\mapsto\left\{{v}\right\}]\rangle

(2)

By the induction hypothesis, at Iteration $m$ there exists some $s\in dom({\mathcal{X}}^{m}_{pc})$ such that $\langle n,\sigma\rangle\in{\mathcal{X}}^{m}_{pc}(s)$ . Therefore, by 1 and 2, at Iteration $m+1$ we have that the following holds:

s\in dom({\mathcal{X}}^{m+1}_{pc+size(b_{pc})})\leavevmode\nobreak\ \text{and}\leavevmode\nobreak\ \langle n+1,\sigma[s_{n}\mapsto\left\{{v}\right\}]\rangle\in{\mathcal{X}}_{pc+size(b_{pc})}(s)

and thus Lemma 1 holds for Rule (4).

•

Rules (4) - (12), followed by a JUMPDEST instruction. After executing any of these instructions, $S_{m+1}$ is $\langle pc+size(b_{pc}),\langle n^{\prime\prime\prime},\sigma^{\prime\prime\prime}\rangle\rangle$ , where $\langle n^{\prime\prime\prime},\sigma^{\prime\prime\prime}\rangle$ is obtained according to the rule from Figure 4. We have to prove that exists some $s\in dom({\mathcal{X}}_{pc+size(b_{pc})})$ such that $\langle n^{\prime\prime\prime},\sigma^{\prime\prime\prime}\rangle\in{\mathcal{X}}_{pc+size(b_{pc})}(s)$ . The following set of equations is evaluated:

\begin{array}[]{rcll}{\mathcal{X}}_{pc+size(b_{pc})}&\sqsupseteq&\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\end{array}

(3)

where $\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))=\pi_{\bot}[\langle n^{\prime\prime},\sigma^{\prime\prime}]\rangle\mapsto\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}]$ , where $n^{\prime\prime}$ and $\sigma`$ are obtained according to the cases of the updating function detailed in Figure 5. We can see that $\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}]$ match the modification made to the state $S_{m+1}$ by the corresponding rule of the semantics. Therefore, at Iteration there exists an $s=\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}]$ such that $\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}]\in{\mathcal{X}}_{pc+size(b_{pc})}^{m+1}$ , and Lemma 1 also holds.

When the algorithm stops Lemma 1 holds, as for any $pc$ ${\mathcal{X}}_{pc}^{m+1}\sqsupseteq{\mathcal{X}}_{pc}^{m}$ for each iteration of the algorithm for solving the equation system of Section 2. $\square$

3 Stack-Sensitive Control Flow Graph

At this point, by means of the addresses equation system solution, we compute the control flow graph of the program. In order to simplify the notation, given a block $B_{i}$ , we define the function $getId(i,\langle n,\sigma\rangle)$ , which receives the block identifier $i$ and an abstract stack $\langle n,\sigma\rangle$ and returns a unique identifier for the abstract stack $\langle n,\sigma\rangle\in dom({\mathcal{X}}_{i})$ . Similarly, $getStack(i,id)$ returns the abstract state $\langle n,\sigma\rangle$ that corresponds to the identifier $id$ of block $B_{i}$ . Besides, we define the function $getSize(pc,id)$ that, given a program point $pc\in B_{i}$ and a unique identifier $id$ for $B_{i}$ , returns the value $n^{\prime}$ s.t. $\langle n,\sigma\rangle=getStack(i,id)$ , and ${\mathcal{X}}_{pc}(\langle n,\sigma\rangle)=\langle n^{\prime},\sigma^{\prime}\rangle$ .

Example 9

Given the equation:

{\mathcal{X}_{\text{{64B}}}}\sqsupseteq\{\underbrace{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle}_{1}\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,\underbrace{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle}_{2}\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\},

if we compute the functions $getId$ and $getSize$ , we have that $getId(\text{{64B}},\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle)=1$ and $getId(\text{{64B}},\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle)=2$ . Analogously, $getSize(\text{{64B}},1)=7$ and $getSize(\text{{64B}},2)=3$ . $\blacksquare$

Definition 7 (stack-sensitive control flow graph)

Given an EVM program $P$ , its blocks $B_{i}\equiv b_{i}\dots b_{j}\in blocks(P)$ and its flow analysis results provided by a set of variables of the form ${\mathcal{X}}_{pc}$ for all $pc\in P$ , we define the control flow graph of $P$ as a directed graph $\textit{CFG}=\langle V,E\rangle$ with a set of vertices

V=\{B_{i{:}id}\leavevmode\nobreak\ |\leavevmode\nobreak\ B_{i}\in blocks(P)\wedge\langle n,\sigma\rangle\in dom({\mathcal{X}}_{i})\wedge id=getId(i,\langle n,\sigma\rangle)\}

and a set of edges $E=E_{jump}\cup E_{next}$ such that:

{\begin{array}[]{rcll}E_{jump}&=&\{B_{i{:}id}\to B_{d:id_{2}}\leavevmode\nobreak\ |\nobreak\leavevmode&b_{j}\in Jump\leavevmode\nobreak\ \wedge\\ &&&\langle n,\sigma\rangle\in dom({\mathcal{X}}_{j})\wedge id=getId(i,\langle n,\sigma\rangle)\leavevmode\nobreak\ \wedge\nobreak\leavevmode\\ &&&\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{j}(\langle n,\sigma\rangle)\leavevmode\nobreak\ \wedge\leavevmode\nobreak\ d=\sigma^{\prime}(s_{n^{\prime}-1})\leavevmode\nobreak\ \wedge\nobreak\leavevmode\\ &&&\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle=\lambda(b_{j},\langle n^{\prime},\sigma^{\prime}\rangle)\wedge id_{2}=getId(d,\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle)\leavevmode\nobreak\ \}\par\\ E_{next}&=&\{B_{i:id}\to B_{d:id_{2}}\leavevmode\nobreak\ |\nobreak\leavevmode&b_{j}\neq\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}\leavevmode\nobreak\ \wedge b_{j}\not\in End\leavevmode\nobreak\ \wedge\\ &&&\langle n,\sigma\rangle\in dom({\mathcal{X}}_{j})\wedge id=getId(i,\langle n,\sigma\rangle)\leavevmode\nobreak\ \wedge\nobreak\leavevmode\\ &&&\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{j}(\langle n,\sigma\rangle)\wedge d=j+size(b_{j})\leavevmode\nobreak\ \wedge\\ &&&\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle=\lambda(b_{j},\langle n^{\prime},\sigma^{\prime}\rangle)\wedge id_{2}=getId(d,\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle)\par\end{array}

The first relevant point of the control flow graph (CFG) we produce is that, for producing the set of vertices $V$ , we replicate each block for each different stack state that could be used for invoking it. Analogously, the different entry stack states are also used to produce different edges depending on its corresponding replicated blocks. Note that the definition distinguishes between two kinds of edges. (1) edges produced by JUMP or JUMPI instructions at the end of the blocks, whose destination is taken from the values stored in the stack states of the instruction before the jump with $d=\sigma^{\prime}(s_{n^{\prime}-1})$ ; and (2) edges produced by continuations to the next instruction, whose destination is computed with $d=j+size(b_{j})$ . In both kinds of edges, as we could have replicated blocks, we apply function $\lambda$ and get the id of the resulting state to compute the $id$ of the destination: $\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle=\lambda(b_{j},\langle n^{\prime},\sigma^{\prime}\rangle)\wedge id_{2}=getId(d,\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle)$ .

Example 10

Considering the blocks shown in Figure 3 and the equations shown at Figure 6, the CFG of the program includes non-replicated nodes for those blocks that only receive one possible stack state (white nodes in Figure 3). However, the nodes that could be reached by two different stack states (gray nodes in Figure 3) will be replicated in the CFG:

\begin{array}[]{rcl}V&=\{B_{\text{{941}}},B_{\text{{123}}},B_{\text{{954}}},B_{\text{{142}}},&B_{\text{{64B{:}1}}},B_{\text{{653{:}1}}},B_{\text{{661{:}1}}},B_{\text{{66F{:}1}}},B_{\text{{6C3{:}1}}},B_{\text{{66E{:}1}}},B_{\text{{690{:}1}}},B_{\text{{683{:}1}}},B_{\text{{691{:}1}}},B_{\text{{6D0{:}1}}},B_{\text{{6D1{:}1}}},\\ &&B_{\text{{64B{:}2}}},B_{\text{{653{:}2}}},B_{\text{{661{:}2}}},B_{\text{{66F{:}2}}},B_{\text{{6C3{:}2}}},B_{\text{{66E{:}2}}},B_{\text{{690{:}2}}},B_{\text{{683{:}2}}},B_{\text{{691{:}2}}},B_{\text{{6D0{:}2}}},B_{\text{{6D1{:}2}}}\par\}\end{array}

Analogously, our CFG replicates the edges according to the nodes replicated (solid and dashed edges in Figure 3):

{

B_{\text{{941}}}\rightarrow B_{\text{{64B{:}1}}},

B_{\text{{64B{:}1}}}\rightarrow B_{\text{{653{:}1}}},

B_{\text{{653{:}1}}}\rightarrow B_{\text{{661{:}1}}},

B_{\text{{661{:}1}}}\rightarrow B_{\text{{66F{:}1}}},

B_{\text{{66F{:}1}}}\rightarrow B_{\text{{6C3{:}1}}},

B_{\text{{6C3{:}1}}}\rightarrow B_{\text{{653{:}1}}},

B_{\text{{66D{:}1}}}\rightarrow B_{\text{{66E{:}1}}},

B_{\text{{66F{:}1}}}\rightarrow B_{\text{{690{:}1}}},

B_{\text{{66F{:}1}}}\rightarrow B_{\text{{683{:}1}}},

B_{\text{{683{:}1}}}\rightarrow B_{\text{{691{:}1}}},

B_{\text{{691{:}1}}}\rightarrow B_{\text{{6D1{:}1}}},

B_{\text{{6D1{:}1}}}\rightarrow B_{\text{{954}}},

B_{\text{{123}}}\dashrightarrow B_{\text{{64B{:}2}}},

B_{\text{{64B{:}2}}}\dashrightarrow B_{\text{{653{:}2}}},

B_{\text{{653{:}2}}}\dashrightarrow B_{\text{{661{:}2}}},

B_{\text{{661{:}2}}}\dashrightarrow B_{\text{{66F{:}2}}},

B_{\text{{66F{:}2}}}\dashrightarrow B_{\text{{6C3{:}2}}},

B_{\text{{6C3{:}2}}}\dashrightarrow B_{\text{{653{:}2}}},

B_{\text{{66D{:}2}}}\dashrightarrow B_{\text{{66E{:}2}}},

B_{\text{{66F{:}2}}}\dashrightarrow B_{\text{{690{:}2}}},

B_{\text{{66F{:}2}}}\dashrightarrow B_{\text{{683{:}2}}},

B_{\text{{683{:}2}}}\dashrightarrow B_{\text{{691{:}2}}},

B_{\text{{691{:}2}}}\dashrightarrow B_{\text{{6D1{:}2}}},

B_{\text{{6D1{:}2}}}\dashrightarrow B_{\text{{142}}}

}

Note that, in Figure 3, we distinguish dashed and solid edges just to remark that we could have two possible execution paths, that is, if the call to findWinner comes from block $B_{\text{{941}}}$ , it will return to block $B_{\text{{954}}}$ and, if the execution comes from a public invocation, i.e. block $B_{\text{{123}}}$ , it will return to block $B_{\text{{142}}}$ . $\blacksquare$

Theorem 3.1 (Soundness of the stack-sensitive control flow graph)

Let $P$ be an EVM program. If a stack-sensitive control flow graph CFG can be generated, then for any execution trace $t$ of $P$ there exists a directed walk that visits, in the same order, nodes in the CFG that correspond to replicas of the blocks executed in $t$ .

Proof

We prove this theorem reasoning by induction on the value of $m$ , the length of the trace $t\equiv S_{0}\Rightarrow^{m}S_{m}$ . We will assume that a directed walk of the CFG is of the form $B_{0:0}\cdot\ldots\cdot B_{n:id_{n}}$ , where $B_{0:0}$ is a replica of the block that contains the first instruction in the program $b_{0}$ .

Base case: if $m=0$ , $S_{0}=\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle$ and the Lemma trivially holds as $b_{0}$ is the first instruction of block $B_{0}$ .

Induction Hypothesis: we assume Theorem 3.1 holds for all traces of length $m\geq 0$ .

Inductive Case: Let us consider a trace of length $m+1$ , $t\equiv S_{0}\Rightarrow^{m}S_{m}\Rightarrow S_{m+1}$ . $S_{m}$ is a program state of the form $S_{m}=\langle pc,\langle n_{m},\sigma_{m}\rangle\rangle$ . We can apply the induction hypothesis to $S_{m}$ : there exists a directed walk in CFG, $w\equiv B_{0:0}\cdot\ldots\cdot B_{j:id}$ that visits nodes corresponding to replicas of the blocks executed in $t$ in the same order, and $b_{pc}\in B_{j:id}$ . There may be two cases:

a)

Instruction $b_{pc}$ is not the last instruction in $B_{j:id}$ . By Definition 1, $b_{pc+size(b_{pc})}$ is also in $B_{j:id}$ , and $b_{pc}\not\in Jump$ . The applicable rules of the semantics of Figure 4 are Rules (4) to (12). In all cases, $S_{m+1}=\langle pc+size(b_{pc}),\langle n_{m+1},\sigma_{m+1}\rangle\rangle$ and Theorem 3.1 holds, since the same directed walk $w$ already visits a replica of the node that contains the instruction executed in $S_{m+1}$ .
b)

Instruction $b_{pc}$ is the last instruction in block $B_{j:id}$ . We reason on all possible instructions that can be the last instruction of a block:
- –
  
  $b_{pc}\not\in Jump$ . This case is the result of the application of Rules (4-12) of the Semantics in Figure 4. Therefore, $S_{m+1}$ is of the form $S_{m+1}=\langle d,\langle n_{m+1},\sigma_{m+1}\rangle\rangle$ , where $d=b_{pc+size(b_{pc})}$ . Lemma 1 guarantees that there exists a stack state $\langle n,\sigma\rangle$ such that $\langle n_{m},\sigma_{m}\rangle\in{\mathcal{X}}_{pc}(\langle n,\sigma\rangle)$ . By Definition 7 there is an edge in $E_{next}$ of the form $B_{j:id}\to B_{d:id_{2}}$ where $d=pc+size(b_{pc})$ for each element in ${\mathcal{X}}_{pc}(\langle n,\sigma\rangle)$ . Therefore, there exists a directed walk $w^{\prime}=w\cdot B_{d:id_{2}}$ that visits nodes in CFG corresponding to replicas of the blocks executed in $t$ , and Theorem 3.1 holds.
- –
  
  $b_{pc}\in Jump$ . This case is the result of the application of Rules (1-3) of the Semantics in Figure 4.
  
  The application of Rules (1-2) corresponds to a jump in the code, and $S_{m+1}$ is of the form $S_{m+1}=\langle d_{m+1},\langle n_{m+1},\sigma_{m+1}\rangle\rangle$ , where $d_{m+1}=\sigma_{m}(s_{n_{m}-1})$ . Lemma 1 guarantees that there exists a stack state $\langle n,\sigma\rangle$ such that $\langle n_{m},\sigma_{m}\rangle\in{\mathcal{X}}_{pc}(\langle n,\sigma\rangle)$ . By Definition 7, $E_{Jump}$ contains an edge $B_{j{:}id}\to B_{d:id_{2}}$ for each element in ${\mathcal{X}}_{pc}(\langle n,\sigma\rangle)$ , such that $d=\sigma_{m}(s_{n_{m}-1})$ and $id_{2}$ is the replica identifier of block $d$ corresponding to $\lambda(b_{pc},\langle n_{m},\sigma_{m}\rangle)$ . Therefore, $d=d_{m+1}$ , and the directed walk $w^{\prime}=w\cdot B_{d:id_{2}}$ visits nodes in CFG corresponding to replicas of the blocks executed in $t$ , so Theorem 3.1 holds.
  
  The application of Rule (3) corresponds to a JUMPI instruction in the code that does not jump to its destination address. This case is equal to the previous case in which $b_{pc}\not\in Jump$ .

$\square$

References

[1] The EthereumPot contract, 2017. https://etherscan.io/address/0x5a13caa82851342e14cd2ad0257707cddb8a31b7.
[2] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2nd edition, 2006.
[3] Lexi Brent, Anton Jurisevic, Michael Kong, Eric Liu, Francois Gauthier, Vincent Gramoli, Ralph Holz, and Bernhard Scholz. Vandal: A Scalable Security Analysis Framework for Smart Contracts, 2018. arXiv:1809.03981.
[4] Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. Gigahorse: thorough, declarative decompilation of smart contracts. In Joanne M. Atlee, Tevfik Bultan, and Jon Whittle, editors, Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, pages 1176–1186. IEEE / ACM, 2019.
[5] Neville Grech, Michael Kong, Anton Jurisevic, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. Madmax: surviving out-of-gas conditions in ethereum smart contracts. PACMPL, 2(OOPSLA):116:1–116:27, 2018.
[6] Gavin Wood. Ethereum: A secure decentralised generalised transaction ledger, 2014.