This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

numbers=left, numberstyle=

11institutetext: Instituto de Tecnología del Conocimiento, Spain22institutetext: Complutense University of Madrid, Spain 33institutetext: Universidad Politécnica de Madrid, Spain

Analyzing Smart Contracts: From EVM to a sound Control-Flow Graph

Elvira Albert1,2    Jesús Correas2    Pablo Gordillo2    Alejandro Hernández-Cerezo2   
Guillermo Román-Díez3
   Albert Rubio1,2
Abstract

The EVM language is a simple stack-based language with words of 256 bits, with one significant difference between the EVM and other virtual machine languages (like Java Bytecode or CLI for .Net programs): the use of the stack for saving the jump addresses instead of having it explicit in the code of the jumping instructions. Static analyzers need the complete control flow graph (CFG) of the EVM program in order to be able to represent all its execution paths. This report addresses the problem of obtaining a precise and complete stack-sensitive CFG by means of a static analysis, cloning the blocks that might be executed using different states of the execution stack. The soundness of the analysis presented is proved.

1 EVM Language

The EVM language is a simple stack-based language with words of 256 bits with a local volatile memory that behaves as a simple word-addressed array of bytes, and a persistent storage that is part of the blockchain state. A more detailed description of the language and the complete set of operation codes can be found in [6]. In this section, we focus only on the relevant characteristics of the EVM that are needed for describing our work. We will consider EVM programs that satisfy two constraints: (1) jump addresses are constants,i.e. they are introduced by a PUSH operation, they do not depend on input values and they are not stored in memory nor storage, and (2) the size of the stack when executing a jump instruction can be bounded by a constant. These two cases are mostly produced by the use of recursion and higher-order programming in the high-level language that compiles to EVM, as e.g. Solidity.

1contract EthereumPot {
2 address[] public addresses;
3 address public winnerAddress;
4 uint[] public slots;
5 function __callback (bytes32 _queryId, string _result, bytes _proof){
6 if (msg.sender != oraclize_cbAddress()) throw;
7 random_number = uint(sha3(_result))
8 winnerAddress = findWinner(random_number);
9 amountWon = this.balance * 98 / 100 ;
10 winnerAnnounced(winnerAddress, amountWon);
11 if (winnerAddress.send(amountWon)) {
12 if (owner.send(this.balance)) {
13 openPot();
14 }
15 }
16 }
17
18 function findWinner (uint random) constant returns (address winner) {
19 for (uint i = 0; i < slots.length; i++) {
20 if (random <= slots[i]) {
21 return addresses[i];
22 }
23 }
24 }
25 // Other functions
26}
64B: JUMPDEST
64C: PUSH1 0x00
64E: DUP1
64F: PUSH1 0x00
651: SWAP1
652: POP
653: JUMPDEST
654: PUSH1 0x03
656: DUP1
657: SLOAD
658: SWAP1
659: POP
65A: DUP2
65B: LT
65C: ISZERO
65D: PUSH2 0x06D0
660: JUMPI
661: PUSH1 0x03
663: DUP2
664: DUP2
665: SLOAD
666: DUP2
941: JUMPDEST
942: MOD
943: ADD
944: PUSH1 0x0A
946: DUP2
947: SWAP1
948: SSTORE
949: POP
94A: PUSH2 0x0954
94D: PUSH1 0x0A
94F: SLOAD
950: PUSH2 0x064B
953: JUMP
Figure 1: Excerpt of Solidity code for EthereumPot contract (left), and fragment of EVM code for function findWinner (right)
Example 1

In order to describe our techniques, we use as running example a simplified version (without calls to the external service Oraclize and the authenticity proof verifier) of the contract [1] that implements a lottery system. During a game, players call a method joinPot to buy lottery tickets; each player’s address is appended to an array addresses of current players, and the number of tickets is appended to an array slots, both having variable length. After some time has elapsed, anyone can call rewardWinner which calls the Oraclize service to obtain a random number for the winning ticket. If all goes according to plan, the Oraclize service then responds by calling the __callback method with this random number and the authenticity proof as arguments. A new instance of the game is then started, and the winner is allowed to withdraw her balance using a withdraw method. Figure 2 shows an excerpt of the Solidity code (including the public function findWinner) and a fragment of the EVM code produced by the compiler. The Solidity source code is shown for readability, as our analysis works directly on the EVM code.

1contract EthereumPot {
2 address[] public addresses;
3 address public winnerAddress;
4 uint[] public slots;
5 function __callback (bytes32 _queryId, string _result, bytes _proof){
6 if (msg.sender != oraclize_cbAddress()) throw;
7 random_number = uint(sha3(_result))
8 winnerAddress = findWinner(random_number);
9 amountWon = this.balance * 98 / 100 ;
10 winnerAnnounced(winnerAddress, amountWon);
11 if (winnerAddress.send(amountWon)) {
12 if (owner.send(this.balance)) {
13 openPot();
14 }
15 }
16 }
17
18 function findWinner (uint random) constant returns (address winner) {
19 for (uint i = 0; i < slots.length; i++) {
20 if (random <= slots[i]) {
21 return addresses[i];
22 }
23 }
24 }
25 // Other functions
26}
64B: JUMPDEST
64C: PUSH1 0x00
64E: DUP1
64F: PUSH1 0x00
651: SWAP1
652: POP
653: JUMPDEST
654: PUSH1 0x03
656: DUP1
657: SLOAD
658: SWAP1
659: POP
65A: DUP2
65B: LT
65C: ISZERO
65D: PUSH2 0x06D0
660: JUMPI
661: PUSH1 0x03
663: DUP2
664: DUP2
665: SLOAD
666: DUP2
941: JUMPDEST
942: MOD
943: ADD
944: PUSH1 0x0A
946: DUP2
947: SWAP1
948: SSTORE
949: POP
94A: PUSH2 0x0954
94D: PUSH1 0x0A
94F: SLOAD
950: PUSH2 0x064B
953: JUMP
Figure 2: Excerpt of Solidity code for EthereumPot contract (left), and fragment of EVM code for function findWinner (right)

To the right of Figure 2 we show a fragment of the EVM code of method findWinner. It can be seen that the EVM has instructions for operating with the stack contents, like DUPx or SWAPx; for comparisons, like LT, GT; for accessing the storage (memory) of the contract, like SSTORE, SLOAD (MLOAD, MSTORE); to add/remove elements to/from the stack, like PUSHx/ POP; and many others (we again refer to [6] for details). Some instructions increment the program counter in several units (e.g., PUSHx Y adds a word with the constant Y of x bytes to the stack and increments the program counter by x+1x+1). In what follows, we use size(b)size(b) to refer to the number of units that instruction bb increments the value of the program counter. For instance size(POP)=1{size(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}})=1, size(PUSH1)=2{size(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}}}}})=2 or size(PUSH3)=4{size(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH3}}}}}})=4. \blacksquare

One significant difference between the EVM and other virtual machine languages (like Java Bytecode or CLI for .Net programs) is the use of the stack for saving the jump addresses instead of having it explicit in the code of the jumping instructions. In EVM, instructions JUMP and JUMPI will jump, unconditionally and conditionally respectively, to the program counter stored in the top of the execution stack. This feature of the EVM requires, in order to obtain the control flow graph of the program, to keep track of the information stored in the stack. Let us illustrate it with an example.

Example 2

In the EVM code to the right of Figure 2 we can see two jump instructions at program points 953 and 660, respectively, and the jump address (64B and 6D0) is stored in the instruction immediately before them: 950 or 65D. It then jumps to this destination by using the instruction JUMPDEST (program points 941, 64B, 653).

\blacksquare

We start our analysis by defining the set 𝒥\mathcal{J}, which contains all possible jump destinations in an EVM program Pb0,,bpP\equiv b_{0},\dots,b_{p}:

𝒥(P)={pc|bpcPbpcJUMPDEST}.{\mathcal{J}(P)=\{pc\leavevmode\nobreak\ |\leavevmode\nobreak\ b_{pc}\in P\wedge b_{pc}\equiv\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\}.

We use bpcPb_{pc}\in P for referring to the instruction at program counter pcpc in the EVM program PP. In what follows, we omit PP from definitions when it is clear from the context, e.g., we use 𝒥\mathcal{J} to refer to 𝒥(P)\mathcal{J}(P).

Example 3

Given the EVM code that corresponds to function findWinner, we get the following set:

𝒥={123,142,954,64B,6D0,66F,653,6C3,691,6D1,6BA}\mathcal{J}=\{\text{{123}},\text{{142}},\text{{954}},\text{{64B}},\text{{6D0}},\text{{66F}},\text{{653}},\text{{6C3}},\text{{691}},\text{{6D1}},\text{{6BA}}\}

\blacksquare

The first step in the computation of the CFG is to define the notion of block. In general [2], given a program PP, a block is a maximal sequence of straight-line consecutive code in the program with the properties that the flow of control can only enter the block through the first instruction in the block, and can only leave the block at the last instruction. Let us define the concept of block in an EVM program:

Definition 1 (blocks)

Given an EVM program Pb0,,bpP\equiv b_{0},\ldots,b_{p}, we define

blocks(P)={Bibi,,bj|(k.i<k<j,bkJumpEnd{JUMPDEST})(i=0biJUMPDESTbi1=JUMPI)(j=pbjJumpbjEndbj+1JUMPDEST)}{{{{blocks(P)=\bigg{\{}B_{i}\equiv b_{i},\ldots,b_{j}\leavevmode\nobreak\ \bigg{|}\leavevmode\nobreak\ \begin{array}[]{l}(\forall k.i<k<j,b_{k}\not\in Jump\cup End\cup\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\})\leavevmode\nobreak\ \wedge\\ (\leavevmode\nobreak\ i{=}0\vee b_{i}{\equiv}\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\vee b_{i-1}{=}\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\leavevmode\nobreak\ )\leavevmode\nobreak\ \wedge\\ (\leavevmode\nobreak\ j{=}p\vee b_{j}\in Jump\vee b_{j}\in End\vee b_{j+1}{\equiv}\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}\leavevmode\nobreak\ )\end{array}\bigg{\}}

where

Jump={JUMP,JUMPI}End={REVERT,STOP,INVALID}{{{{{\begin{array}[]{rcl}Jump&=&\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}},\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\}\\ End&=&\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{REVERT}}}}}},\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{STOP}}}}}},\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{INVALID}}}}}}\}\\ \end{array}

Refer to caption


Figure 3: Fragment of the CFG of findWinner
Example 4

Figure 3 shows the blocks (nodes) obtained for findWinner and their corresponding jump invocations. Solid and dashed edges represent the two possible execution paths depending on the entry block: solid edges represent the path that starts from block 941 and dashed edges the path that starts from 123. Note that most of the blocks start with a JUMPDEST instruction (123, 941, 64B, 653, 66F, 954, 6C3, 691, 142, 6D1, 6D0). The rest of the blocks start with instructions that come right after a JUMPI instruction (661, 683). Analogously, most blocks end in a JUMP (941, 6C3, 123, 691, 6D1), JUMPI (653, 661, 66F, 683) or RETURN (142) instruction or in the instruction that precedes JUMPDEST (64B). \blacksquare

Observing the blocks in Figure 3, we can see that most JUMP instructions use the address introduced in the PUSH instruction executed immediately before the JUMP. However, in general, in EVM code, it is possible to find a JUMP whose address has been stored in a different block. This happens for instance when a public function is invoked privately from other methods of the same contract, the returning program counter is introduced by the invokers at different program points and it will be used in a unique JUMP instruction when the invoked method finishes in order to return to the particular caller that invoked that function.

Example 5

In Figure 3, at block 6D1 we have a JUMP (marked with ✩) whose address is not pushed in the same block. This JUMP takes the returned address from function findWinner. If findWinner is publicly invoked, it jumps to address 142 (pushed at block 123 at \star) and if it is invoked from __callback it jumps to 954 (pushed at block 941 at \star).

1.1 Operational Semantics

Figure 4 shows the semantics of some instructions involved in the computation of the values stored in the stack for handling jumps. The state of the program SS is a tuple pc,n,σ\langle pc,\langle n,\sigma\rangle\rangle where pcpc is the value of the program counter with the index of the next instruction to be executed, and n,σ\langle n,\sigma\rangle is a stack state as defined in Section 2 (nn is the number of elements in the stack, and σ\sigma is a partial mapping that relates some stack positions with a set of jump destinations). Interesting rules are the ones that deal with jump destination addresses on the stack: Rule (4) adds a new address on the stack, and Rules (6) and (8-10) copy or exchange existing addresses on top of the stack, respectively. Rules (1) to (3) perform a jump in the program and therefore consume the address placed on top of the stack, plus an additional word in the case of JUMPI. If the instructions considered in this simplified semantics do not handle jump addresses, the corresponding rules just remove some values from the stack in the program state SS (Rules (5), (7) and (11)). The remaining EVM instructions not explicitely considered in this simplified semantics are generically represented by Rule (12) with bpcδ,αb_{pc}^{\delta,\alpha}, where δ\delta is the number of items removed from stack when bpcb_{pc} is executed, and α\alpha is the number of additional items placed on the stack. Complete executions are traces of the form S0S1SnS_{0}\Rightarrow S_{1}\Rightarrow\dots\Rightarrow S_{n} where S00,0,σS_{0}\equiv\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle is the initial state, σ\sigma_{\emptyset} is the empty mapping, and SnS_{n} corresponds to the last state. There are no infinite traces, as any transaction that executes EVM code has a finite gas limit and every instruction executed consumes some amount of gas. When the gas limit is exceeded, an out-of-gas exception occurs and the program halts immediately.

(1) bpc=JUMPpc,n,σσ(sn1),n1,σ\[sn1]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle\sigma(s_{n-1}),\langle n-1,\sigma\backslash[s_{n-1}]\rangle\rangle\end{array}
(2) bpc=JUMPIpc,n,σσ(sn1),n2,σ\[sn1,sn2]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle\sigma(s_{n{-}1}),\langle n{-}2,\sigma\backslash[s_{n{-}1},s_{n{-}2}]\rangle\rangle\end{array}
(3) bpc=JUMPIpc,n,σpc+size(bpc),n2,σ\[sn1,sn2]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n{-}2,\sigma\backslash[s_{n{-}1},s_{n{-}2}]\rangle\rangle\end{array}
(4) bpc=PUSHxv,v𝒥pc,n,σpc+size(bpc),n+1,σ[sn{v}]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x\leavevmode\nobreak\ v,\text{v}\in\mathcal{J}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma[s_{n}\mapsto\{v\}]\rangle\rangle\end{array}
(5) bpc=PUSHxv,v𝒥pc,n,σpc+size(bpc),n+1,σ{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x\leavevmode\nobreak\ v,v\notin\mathcal{J}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma\rangle\rangle\end{array}
(6) bpc=DUPx,snxdom(σ)pc,n,σpc+size(bpc),n+1,σ[snσ(snx)]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP}}}}}}x,s_{n{-}x}\in dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma[s_{n}\mapsto\sigma(s_{n{-}x})]\rangle\rangle\end{array}
(7) bpc=DUPx,snxdom(σ)pc,n,σpc+size(bpc),n+1,σ{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP}}}}}}x,s_{n{-}x}\notin dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n+1,\sigma\rangle\rangle\end{array}
(8) bpc=SWAPx,sn1dom(σ),snx1dom(σ)pc,n,σpc+size(bpc),n,σ[snx1σ(sn1),sn1σ(snx1)]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\in dom(\sigma),s_{n{-}x{-}1}\in dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma[s_{n{-}x{-}1}\mapsto\sigma(s_{n{-}1}),s_{n{-}1}\mapsto\sigma(s_{n{-}x{-}1})]\rangle\rangle\end{array}
(9) bpc=SWAPx,sn1dom(σ),snx1dom(σ)pc,n,σpc+size(bpc),n,σ[snx1σ(sn1)]\[sn1]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\in dom(\sigma),s_{n{-}x{-}1}\notin dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma[s_{n{-}x{-}1}\mapsto\sigma(s_{n{-}1})]\backslash[s_{n{-}1}]\rangle\rangle\end{array}
(10) bpc=SWAPx,sn1dom(σ),snx1dom(σ)pc,n,σpc+size(bpc),n,σ[sn1σ(snx1)]\[snx1]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\notin dom(\sigma),s_{n{-}x{-}1}\in dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma[s_{n{-}1}\mapsto\sigma(s_{n{-}x{-}1})]\backslash[s_{n{-}x{-}1}]\rangle\rangle\end{array}
(11) bpc=SWAPx,sn1dom(σ),snx1dom(σ)pc,n,σpc+size(bpc),n,σ\[sn1,snx1]{\begin{array}[]{c}b_{pc}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x,s_{n{-}1}\notin dom(\sigma),s_{n{-}x{-}1}\notin dom(\sigma)\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n,\sigma\backslash[s_{n{-}1},s_{n{-}x{-}1}]\rangle\rangle\end{array}
(12) bpcδ,αEndJump{PUSHx,DUPx,SWAPx}pc,n,σpc+size(bpc),nδ+α,σ\[sn1,,snδ]{{{\begin{array}[]{c}b_{pc}^{\delta,\alpha}\notin\textit{End}\cup\textit{Jump}\cup\{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP}}}}}}x,\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP}}}}}}x\}\\ \hline\cr\langle pc,\langle n,\sigma\rangle\rangle\Rightarrow\langle pc{+}size(b_{pc}),\langle n{-}\delta{+}\alpha,\sigma\backslash[s_{n-1},...,s_{n-\delta}]\rangle\rangle\end{array}
Figure 4: Simplified EVM semantics for handling jumps

2 From EVM to a Sound CFG

As we have seen in the previous section, the addresses used by the jumping instructions are stored in the execution stack. In EVM, blocks can be reached with different stack sizes an contents. As it is used in other tools [4, 3, 5], to precisely infer the possible addresses at jumping program points, we need a context-sensitive static analysis that analyze separately all blocks for each possible stack than can reach them (only considering the addresses stored in the stack). This section presents an address analysis of EVM programs which allows us to compute a complete CFG of the EVM code. To compute the addresses involved in the jumping instructions, we define a static analysis which soundly infers all possible addresses that a JUMP instruction could use.

In our address analysis we aim at having the stack represented by explicit variables. Given the characteristics of EVM programs, the execution stack of EVM programs produced from Solidity programs without recursion can be flattened. Besides, as the size of the stack of the Ethereum Virtual Machine is bounded to 1024 elements (see [6]), the number of stack variables is limited. We use 𝒱\mathcal{V} to represent the set of all possible stack variables that may be used in the program. The first element we define for our analysis is its abstract state:

The abstract state

Our analysis uses a partial representation of the execution stack as basic element. To this end, we use the notion of stack state as a pair n,σ\langle n,\sigma\rangle, where nn is the number of elements in the stack, and σ\sigma is a partial mapping that relates some stack positions with a set of jump destinations. A position in the stack is referred as sis_{i} with 0i<n0\leq i<n, and sn1s_{n-1} is the position at the top of the stack. The abstract state of the analysis is defined on the set of all stack states 𝒮={n,σ| 0n|𝒱|σ(s)Σn}\mathcal{S}=\{\langle n,\sigma\rangle\leavevmode\nobreak\ |\leavevmode\nobreak\ 0\leq n\leq|\mathcal{V}|\wedge\sigma(s)\in\Sigma_{n}\} where Σn\Sigma_{n} is the set of all mappings using up to nn stack variables.

Definition 2 (abstract state)

The abstract state is a partial mapping π\pi of the form 𝒮𝒫(𝒮)\mathcal{S}\mapsto\mathcal{P}(\mathcal{S}).

The application of σ\sigma to an element sis_{i}, that is, σ(si)\sigma(s_{i}), corresponds to the set of jump destinations that a stack variable sis_{i} can contain. The first element of the tuple, that is, nn, stores the size of the stack in the different abstract states.

The abstract domain is the lattice 𝐴𝑆,π,π,,\langle\it AS,\pi_{{}_{\top}},\pi_{{}_{\bot}},\sqcup,\sqsubseteq\rangle, where 𝐴𝑆\it AS is the set of abstract states and π\pi_{{}_{\top}} is the top of the lattice defined as the mapping π\pi_{\top} such that s𝒮,π(s)=𝒮\forall s\in\mathcal{S},\pi_{\top}(s)=\mathcal{S}. The bottom element of the lattice π\pi_{{}_{\bot}} is the empty mapping. Now, to define \sqcup and \sqsubseteq, we first define the function img(π,s)img(\pi,s) as π(s)\pi(s) if sdom(π)s\in dom(\pi) and \emptyset, otherwise. Given two abstract states π1\pi_{1} and π2\pi_{2}, we use π=π1π2\pi=\pi_{1}\sqcup\pi_{2} to denote that π\pi is the least upper-bound defined as follows sdom(π1)dom(π2),π(s)=img(π1,s)img(π2,s)\forall s\in dom(\pi_{1})\cup dom(\pi_{2}),\pi(s)=img(\pi_{1},s)\cup img(\pi_{2},s). At this point, π1π2\pi_{1}\sqsubseteq\pi_{2} holds iff dom(π1)dom(π2)dom(\pi_{1})\subseteq dom(\pi_{2}) and sdom(π1),π1(s)π2(s).\forall s\in dom(\pi_{1}),\pi_{1}(s)\subseteq\pi_{2}(s).

Transfer function

One of the ingredients of our analysis is a transfer function that models the effect of each EVM instruction on the abtract state for the different instructions. Given a stack state ss of the form n,σ\langle n,\sigma\rangle, Figure 5 defines the updating function λ(bpcδ,α,s)\lambda(b_{pc}^{\delta,\alpha},s) where bb corresponds to the EVM instruction to be applied, pcpc corresponds to the program counter of the instruction and α\alpha and δ\delta to the number of elements placed to and removed from the EVM stack when executing bb, respectively. Given a map mm we will use m[xy]m[x\mapsto y] to indicate the result of updating mm by making m(x)=ym(x)=y while mm stays the same for all locations different from xx, and we will use m\[x]m\backslash[x] to refer to a partial mapping that stays the same for all locations different from xx, and m(x)m(x) is undefined. By means of λ\lambda, we define the transfer function of our analysis.

Definition 3 (transfer function)

Given the set of abstract states ASAS and the set of EVM instructions InsIns, the transfer function τ\tau is defined as a mapping of the form

τ:Ins×ASAS\tau\leavevmode\nobreak\ :\leavevmode\nobreak\ Ins\times AS\mapsto AS

is defined as follows:

τ(b,π)=πwheresdom(π),π(s)=λ(b,π(s))\tau(b,\pi)=\pi^{\prime}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \text{where}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \forall s\in dom(\pi),\pi^{\prime}(s)=\lambda(b,\pi(s))
bδ,αb^{\delta,\alpha} λ(b,n,σ)\lambda(b,\langle n,\sigma\rangle)
(1) PUSHxx vv n+1,σ[sn{v}]\langle n+1,\sigma[s_{n}\mapsto\{v\}]\rangle when v𝒥v\in\mathcal{J}
n+1,σ\langle n+1,\sigma\rangle when v𝒥v\not\in\mathcal{J}
(2) DUPx n+1,σ\langle n+1,\sigma\rangle when snxdom(σ)s_{n-x}\not\in dom(\sigma)
n+1,σ[snσ(snx)]\langle n+1,\sigma[s_{n}\mapsto\sigma(s_{n-x})]\rangle when snxdom(σ)s_{n-x}\in dom(\sigma)
(3) SWAPx n,σ\langle n,\sigma\rangle when sn1dom(σ)snx1dom(σ)s_{n-1}\not\in dom(\sigma)\wedge s_{n{-}x{-}1}\not\in dom(\sigma)
n,σ[snx1σ(sn1),sn1σ(snx1)]\langle n,\sigma[s_{n-x-1}\mapsto\sigma(s_{n-1}),s_{n-1}\mapsto\sigma(s_{n-x-1})]\rangle when sn1dom(σ)snx1dom(σ)s_{n-1}\in dom(\sigma)\wedge s_{n{-}x{-}1}\in dom(\sigma)
n,σ[sn1σ(snx1)]\σ[snx1]\langle n,\sigma[s_{n{-}1}\mapsto\sigma(s_{n{-}x{-}1})]\backslash\sigma[s_{n{-}x{-}1}]\rangle when sn1dom(σ)snx1dom(σ)s_{n{-}1}\not\in dom(\sigma)\wedge s_{n{-}x{-}1}\in dom(\sigma)
n,σ[snx1σ(sn1)]\σ[sn1]\langle n,\sigma[s_{n{-}x{-}1}\mapsto\sigma(s_{n{-}1})]\backslash\sigma[s_{n{-}1}]\rangle when sn1dom(σ)snx1dom(σ)s_{n{-}1}\in dom(\sigma)\wedge s_{n{-}x{-}1}\not\in dom(\sigma)
(4) otherwise nδ+α,σ\[sn1,,snδ]\langle n-\delta+\alpha,\sigma\backslash[s_{n-1},\dots,s_{n-\delta}]\rangle
Figure 5: Updating function
Example 6

Given the following initial abstract state {8,{}{8,{}}}\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}, which corresponds to the initial stack state for executing block 941, the application of the transfer function τ\tau to the block that starts at EVM instruction 941, produces the following results (between parenthesis we show the program point). To the right we show the application of the transfer function to block 123 with its initial abstract state {2,{}{2,{}}}\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{}\right\}\rangle\}\}.

(941)JUMPDEST{8,{}{8,{}}}(942)MOD{8,{}{7,{}}}(943)ADD{8,{}{6,{}}}(944)PUSH1 0A{8,{}{7,{}}}(946)DUP2{8,{}{8,{}}}(947)SWAP1{8,{}{8,{}}}(948)SSTORE{8,{}{6,{}}}(949)POP{8,{}{5,{}}}(94A)PUSH2 0954{8,{}{6,{s5954}}}}(94D)PUSH1 0A{8,{}{7,{s6954}}}}(94F)SLOAD{8,{}{7,{s5954}}}}(950)PUSH2 064B{8,{}{8,{s5954,s764B}}}(953)JUMP{8,{}{7,{s5954}}}}{{{{{{{{{{{{{\begin{array}[]{lll}(\text{{941}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ (\text{{942}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{MOD}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{}\right\}\rangle\}\}\\ (\text{{943}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{ADD}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 6,\left\{{}\right\}\rangle\}\}\\ (\text{{944}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}0{\@listingGroup{ltx_lst_identifier}{A}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{}\right\}\rangle\}\}\\ (\text{{946}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP2}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ (\text{{947}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP1}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ (\text{{948}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SSTORE}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 6,\left\{{}\right\}\rangle\}\}\\ (\text{{949}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 5,\left\{{}\right\}\rangle\}\}\\ (\text{{94A}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}0954}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 6,\left\{{s_{5}\mapsto\text{{954}}}\right\}\}\rangle\}\}\\ (\text{{94D}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}0{\@listingGroup{ltx_lst_identifier}{A}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{6}\mapsto\text{{954}}}\right\}\}\rangle\}\}\\ (\text{{94F}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SLOAD}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\}\rangle\}\}\\ (\text{{950}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}064{\@listingGroup{ltx_lst_identifier}{B}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{s_{5}\mapsto\text{{954}},s_{7}\mapsto\text{{64B}}}\right\}\rangle\}\}\\ (\text{{953}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}\}}\right\}\rangle\}\}\\ \end{array}
(123)JUMPDEST{2,{}{2,{}}}(124)POP{2,{}{1,{}}}(125)PUSH2 0142{2,{}{2,{s1142}}}(128)PUSH1 04{2,{}{3,{s1142}}}(12A)DUP1{2,{}{4,{s1142}}}(12B)CALLDATASIZE{2,{}{5,{s1142}}}(12C)SUB{2,{}{4,{s1142}}}(13A)SWAP1{2,{}{5,{s1142}}}(13B)POP{2,{}{4,{s1142}}}(13C)POP{2,{}{3,{s1142}}}(13D)POP{2,{}{2,{s1142}}}(13E)PUSH2 064B{2,{}{3,{s1142,s264B}}}(141)JUMP{2,{}{2,{s1142}}}{{{{{{{{{{{{{\begin{array}[]{lll}(\text{{123}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{}\right\}\rangle\}\}\\ (\text{{124}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 1,\left\{{}\right\}\rangle\}\}\\ (\text{{125}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}0142}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{128}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}04}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{12A}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP1}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{12B}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{CALLDATASIZE}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{12C}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SUB}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \vdots\\ (\text{{13A}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP1}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13B}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13C}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13D}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ (\text{{13E}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH2}}{\@listingGroup{ltx_lst_space}{ }}064{\@listingGroup{ltx_lst_identifier}{B}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}},s_{2}\mapsto\text{{64B}}}\right\}\rangle\}\}\\ (\text{{141}})&\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\\ \end{array}

\blacksquare

2.1 Addresses equation system

The next step consists in defining, by means of the transfer and the updating functions, a constraint equation system to represent all possible jumping addresses that could be valid for executing a jump instruction in the program.

Definition 4 (addresses equation system)

Given an EVM program PP of the form b0,,bpb_{0},\ldots,b_{p}, its addresses equation system, (P)\mathcal{E}(P) includes the following equations according to all EVM bytecode instruction bpcPb_{pc}\in P:


bpcb_{pc} CpcC_{pc}
(1) JUMP 𝒳σ(sn1){\mathcal{X}}_{\sigma(s_{n-1})} \sqsupseteq idmap(λ(bpc,n,σ))\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle)) sdom(𝒳pc),n,σ𝒳pc(s)\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)
(2) JUMPI 𝒳σ(sn1){\mathcal{X}}_{\sigma(s_{n-1})} \sqsupseteq idmap(λ(bpc,n,σ))\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle)) sdom(𝒳pc),n,σ𝒳pc(s)\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)
𝒳pc+1{\mathcal{X}}_{pc+1} \sqsupseteq idmap(λ(bpc,n,σ))\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle)) sdom(𝒳pc),n,σ𝒳pc(s)\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)
(3) bpcEndb_{pc}\not\in End\wedge
bpc+size(bpc)=JUMPDEST{b_{pc+size(b_{pc})}=\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}}
𝒳pc+size(bpc){\mathcal{X}}_{pc+size(b_{pc})} \sqsupseteq idmap(λ(bpc,n,σ))\text{{idmap}}(\lambda(b_{pc},\langle n,\sigma\rangle)) sdom(𝒳pc),n,σ𝒳pc(s)\forall s\in dom({\mathcal{X}}_{pc}),\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s)
(4) bpcEndb_{pc}\not\in End 𝒳pc+size(bpc){\mathcal{X}}_{pc+size(b_{pc})} \sqsupseteq τ(bpc,𝒳pc)\tau(b_{pc},{\mathcal{X}}_{pc})
(5) otherwise 𝒳pc+size(bpc){\mathcal{X}}_{pc+size(b_{pc})} \sqsupseteq τ(bpc,𝒳pc)\tau(b_{pc},{\mathcal{X}}_{pc})

where idmap(s)idmap(s) returns a map π\pi such that dom(π)={s}dom(\pi)=\{s\} and π(s)={s}\pi(s)=\{s\} and size(bpc)size(b_{pc}) returns the number of bytes of the instruction bpcb_{pc}.

Observe that the addresses equation system will have equations for all program points of the program. Concretely, variables of the form 𝒳pc{\mathcal{X}}_{pc} store the jumping addresses saved in the stack after executing bpcb_{pc} for all possible entry stacks. This information will be used for computing all possible jump destinations when executing JUMP or JUMPI instructions. For computing the system, most instructions, cases (4) and (5), just apply the transfer function τ\tau to compute the possible stack states of the subsequent instruction. Note that the expression pc+size(bpc)pc+size(b_{pc}) at (3) just computes the position of the next instruction in the EVM program. Jumping instructions, points (1) and (2), compute the initial state of the invoked blocks, thus they produce a map with all possible input stack states that can reach one block. JUMP and JUMPI instructions produce, for each stack state, one equation by taking the element from the previous stack state 𝒳σ(sn1){\mathcal{X}}_{\sigma(s_{n-1})}. JUMPI, point (2), produces an extra equation 𝒳pc+1{\mathcal{X}}_{pc+1} to capture the possibility of continuing to the next instruction instead of jumping to the destination address. Additionally, those instructions before JUMPDEST, point (3), produce initial states for the block that starts in the JUMPDEST. When the constraint equation system is solved, constraint variables over-approximate the jumping information for the program.

Example 7

As it can be seen in Figure 3, we can jump to block 64B from two different blocks, 941 and 123. The computation of the jump equations systems will produce the following equations for the entry program points of these two blocks:

𝒳941{8,{}{8,{}}}𝒳950{8,{}{8,{s5954,s864B}}}𝒳64B1{7,{s5954}{7,{s5954}}}\begin{array}[]{rcl}{\mathcal{X}_{\text{{941}}}}&\sqsupseteq&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\left\{{}\right\}\rangle\}\}\\ &\vdots\\ {\mathcal{X}_{\text{{950}}}}&\sqsupseteq&\{\langle 8,\left\{{}\right\}\rangle\mapsto\{\langle 8,\{s_{5}\mapsto\text{{954}},s_{8}\mapsto\text{{64B}}\}\rangle\}\}\\ {\mathcal{X}_{\text{{64B}}}}^{{\ooalign{\hfil\raise 0.18988pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 1}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\}\}\end{array}
𝒳123{2,{}{2,{}}}𝒳318{2,{}{4,{s1142,s364B}}}𝒳64B2{3,{s1142}{3,{s1142}}}\begin{array}[]{rcl}{\mathcal{X}_{\text{{123}}}}&\sqsupseteq&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 2,\{\}\rangle\}\}\\ &\vdots\\ {\mathcal{X}_{\text{{318}}}}&\sqsupseteq&\{\langle 2,\left\{{}\right\}\rangle\mapsto\{\langle 4,\{s_{1}\mapsto\text{{142}},s_{3}\mapsto\text{{64B}}\}\rangle\}\}\\ {\mathcal{X}_{\text{{64B}}}}^{{\ooalign{\hfil\raise 0.18988pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 2}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}}&\sqsupseteq&\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\}\par\end{array}

Observe that we have two different stack contents reaching the same program point, e.g. two equations for 𝒳64B{\mathcal{X}_{\text{{64B}}}} are produced by two different blocks, the JUMP at the end of block 941, identified by 𝒳64B1{\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 1}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}, and the JUMP at the end of block 123, identified by 𝒳64B2{\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 2}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}. Thus the equation that must hold for 𝒳64B{\mathcal{X}_{\text{{64B}}}} is produced by the application of the operation 𝒳64B1𝒳64B2{\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 1}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}\sqcup{\mathcal{X}_{\text{{64B}}}}^{\ooalign{\hfil\raise 0.21098pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize 2}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}, as follows:

𝒳64B{7,{s5954}7,{s5954},3,{s1142}3,{s1142}}{\mathcal{X}_{\text{{64B}}}}\sqsupseteq\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}

Note that the application of the transfer function τ\tau for all instructions of block 64B applies function λ\lambda to all elements in the abstract state and updates the stack state accordingly

(JUMPDEST)𝒳64B{7,{s5954}7,{s5954},3,{s1142}3,{s1142}}(PUSH1 00)𝒳64C{7,{s5954}8,{s5954},3,{s1142}4,{s1142}}(DUP1)𝒳64E{7,{s5954}9,{s5954},3,{s1142}5,{s1142}}(PUSH1 00)𝒳64F{7,{s5954}10,{s5954},3,{s1142}6,{s1142}}(SWAP1)𝒳651{7,{s5954}10,{s5954},3,{s1142}6,{s1142}}(POP)𝒳652{7,{s5954}9,{s5954},3,{s1142}5,{s1142}}{{{{{{\begin{array}[]{rrcll}(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPDEST}}}}}})&{\mathcal{X}_{\text{{64B}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}00}}}})&{\mathcal{X}_{\text{{64C}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 8,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 4,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{DUP1}}}}}})&{\mathcal{X}_{\text{{64E}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH1}}{\@listingGroup{ltx_lst_space}{ }}00}}}})&{\mathcal{X}_{\text{{64F}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{SWAP1}}}}}})&{\mathcal{X}_{\text{{651}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ (\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{POP}}}}}})&{\mathcal{X}_{\text{{652}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\}\\ \end{array}

\blacksquare

Solving the addresses equation system of a program PP can be done iteratively. A naïve algorithm consists in first creating one constraint variable 𝒳0π[0,σ{0,σ}]{\mathcal{X}}_{0}\sqsupseteq\pi_{\emptyset}[\langle 0,\sigma_{\emptyset}\rangle\mapsto\left\{{\langle 0,\sigma_{\emptyset}\rangle}\right\}], where π\pi_{\emptyset} and σ\sigma_{\emptyset} are empty mappings, and 𝒳pcπ{\mathcal{X}}_{pc}\sqsupseteq\pi_{{}_{\bot}} for all pcP,pc0pc\in P,pc\not=0, and then iteratively refining the values of these variables as follows:

  1. 1.

    substitute the current values of the constraint variables in the right-hand side of each constraint, and then evaluate the right-hand side if needed;

  2. 2.

    if each constraint 𝒳E{\mathcal{X}}\sqsupseteq E holds, where EE is the value of the evaluation of the right-hand side of the previous step, then the process finishes; otherwise

  3. 3.

    for each 𝒳E{\mathcal{X}}\sqsupseteq E which does not hold, let EE^{\prime} be the current value of 𝒳{\mathcal{X}}. Then update the current value of 𝒳{\mathcal{X}} to EEE\sqcup E^{\prime}. Once all these updates are (iteratively) applied we repeat the process at step 1.

Termination is guaranteed since the abstract domain does not have infinitely ascending chains as the number of jump destinations and the stack size are finite. This is the case of the programs that satisfy the constraints stated in Section 1.

A𝒳64B{7,{s5954}7,{s5954},3,{s1142}3,{s1142}}𝒳652{7,{s5954}9,{s5954},3,{s1142}5,{s1142}}A𝒳653{9,{s5954}9,{s5954},5,{s1142}5,{s1142}}𝒳660{9,{s5954}11,{s5954,s106D0},5,{s1142}7,{s1142,s66D0}}A𝒳661{9,{s5954}9,{s5954},5,{s1142}5,{s1142}}𝒳6D0{9,{s5954}9,{s5954},5,{s1142}5,{s1142}}𝒳66D{9,{s5954}13,{s5954,s1266F},5,{s1142}9,{s1142,s1066F}}𝒳66E{11,{s5954}11,{s5954},7,{s1142}7,{s1142}}A𝒳66F{11,{s5954}11,{s5954},7,{s1142}7,{s1142}}𝒳682{11,{s5954}11,{s5954,s106C3,7,{s1142}7,{s1142,s66C3}}A𝒳6C3{9,{s5954}9,{s5954},5,{s1142}5,{s1142}}𝒳6CF{9,{s5954}10,{s5954,s9653,5,{s1142}6,{s1142,s5653}}A𝒳683{9,{s5954}9,{s5954},5,{s1142}5,{s1142}}𝒳68F{9,{s5954}13,{s5954,s12691,5,{s1142}9,{s1142,s8691}}A𝒳690{11,{s5954}11,{s5954,7,{s1142}7,{s1142}}A𝒳691{11,{s5954}11,{s5954,7,{s1142}7,{s1142}}𝒳6C2{11,{s5954}10,{s5954,s96D1,7,{s1142}6,{s1142,s56D1}}A𝒳6D1{9,{s5954}9,{s5954},5,{s1142}5,{s1142}}𝒳6D2{9,{s5954}8,{s5954},5,{s1142}4,{s1142}}𝒳6D3{9,{s5954}8,{s7954},5,{s1142}4,{s3142}}𝒳6D4{9,{s5954}8,{s6954},5,{s1142}4,{s2142}}𝒳6D5{9,{s5954}7,{s6954},5,{s1142}3,{s2142}}B𝒳954{6,{}6,{}}B𝒳142{2,{}2,{}}\begin{array}[]{rclll}\hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{64B}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{652}}}}&\sqsupseteq&\{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{653}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{660}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\left\{{s_{5}\mapsto\text{{954}},s_{10}\mapsto\text{{6D0}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{1}\mapsto\text{{142}},s_{6}\mapsto\text{{6D0}}}\right\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{661}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ {\mathcal{X}_{\text{{6D0}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{66D}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 13,\left\{{s_{5}\mapsto\text{{954}},s_{12}\mapsto\text{{66F}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{1}\mapsto\text{{142}},s_{10}\mapsto\text{{66F}}}\right\}\rangle&\}\\ {\mathcal{X}_{\text{{66E}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{66F}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{682}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\{s_{5}\mapsto\text{{954}},s_{10}\mapsto\text{{6C3}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\{s_{1}\mapsto\text{{142}},s_{6}\mapsto\text{{6C3}}\}\rangle&\}\par\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{6C3}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{6CF}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\{s_{5}\mapsto\text{{954}},s_{9}\mapsto\text{{653}}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\{s_{1}\mapsto\text{{142}},s_{5}\mapsto\text{{653}}\}\rangle&\}\par\\[5.69046pt] \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{683}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle&\}\\ &\vdots\\ {\mathcal{X}_{\text{{68F}}}}&\sqsupseteq&\{\langle 9,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 13,\{s_{5}\mapsto\text{{954}},s_{12}\mapsto\text{{691}}\rangle,&\langle 5,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 9,\{s_{1}\mapsto\text{{142}},s_{8}\mapsto\text{{691}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{690}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\{s_{5}\mapsto\text{{954}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{691}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 11,\{s_{5}\mapsto\text{{954}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 7,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\par\\ &\vdots\\ {\mathcal{X}_{\text{{6C2}}}}&\sqsupseteq&\{\langle 11,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle\mapsto\langle 10,\{s_{5}\mapsto\text{{954}},s_{9}\mapsto\text{{6D1}}\rangle,&\langle 7,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\mapsto\langle 6,\{s_{1}\mapsto\text{{142}},s_{5}\mapsto\text{{6D1}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize A}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{6D1}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\par\\ {\mathcal{X}_{\text{{6D2}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 8,\{s_{5}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 4,\{s_{1}\mapsto\text{{142}}\}\rangle&\}\\ {\mathcal{X}_{\text{{6D3}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 8,\{s_{7}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 4,\{s_{3}\mapsto\text{{142}}\}\rangle&\}\\ {\mathcal{X}_{\text{{6D4}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 8,\{s_{6}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 4,\{s_{2}\mapsto\text{{142}}\}\rangle&\}\\ {\mathcal{X}_{\text{{6D5}}}}&\sqsupseteq&\{\langle 9,\{s_{5}\mapsto\text{{954}}\}\rangle\mapsto\langle 7,\{s_{6}\mapsto\text{{954}}\}\rangle,&\langle 5,\{s_{1}\mapsto\text{{142}}\}\rangle\mapsto\langle 3,\{s_{2}\mapsto\text{{142}}\}\rangle&\}\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize B}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{954}}}}&\sqsupseteq&\{\langle 6,\{\}\rangle\mapsto\langle 6,\{\}\rangle&&\}\\ &\vdots\\ \hskip-5.69046pt{\ooalign{\hfil\raise 0.27127pt\hbox{\small{\rm{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{\scriptsize B}}}}\hfil\crcr{\small{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\text{$\mathchar 525\relax$}}}}}{\mathcal{X}_{\text{{142}}}}&\sqsupseteq&\{\langle 2,\{\}\rangle\mapsto\langle 2,\{\}\rangle&&\}\\ &\vdots\end{array}

Figure 6: Jumps equations system of __callback function
Example 8

Figure 6 shows the equations produced by Definition 4 of the first and the last instruction of all blocks shown in Figure 3. The first instruction shown in the system is 𝒳64B{\mathcal{X}_{\text{{64B}}}}, computed in Example 7. Observe that the application of τ\tau stores the jumping addresses in the corresponding abstract states after PUSH instructions (see 𝒳660{\mathcal{X}_{\text{{660}}}}, 𝒳66D{\mathcal{X}_{\text{{66D}}}}, 𝒳6CF{\mathcal{X}_{\text{{6CF}}}}, 𝒳68C{\mathcal{X}_{\text{{68C}}}}, …). Such addresses will be used to produce the equations at the JUMP or JUMPI instructions. In the case of JUMP, as the jump is unconditional, it only produces one equation, e.g. 𝒳66E{\mathcal{X}_{\text{{66E}}}} consumes address 66F to produce the input state of 𝒳66F{\mathcal{X}_{\text{{66F}}}}, or 𝒳6C2{\mathcal{X}_{\text{{6C2}}}} produces the input abstract state for 𝒳6D1{\mathcal{X}_{\text{{6D1}}}}. JUMPI instructions produce two different equations: (1) one equation which corresponds to the jumping address stored in the stack, e.g. equations 𝒳6D0{\mathcal{X}_{\text{{6D0}}}} and 𝒳66F{\mathcal{X}_{\text{{66F}}}} produced by the jumps of the equations 𝒳660{\mathcal{X}_{\text{{660}}}} and 𝒳66D{\mathcal{X}_{\text{{66D}}}} respectively; and (2) one equation which corresponds to the next instruction, e.g. 𝒳661{\mathcal{X}_{\text{{661}}}} and 𝒳66E{\mathcal{X}_{\text{{66E}}}} produced by 𝒳660{\mathcal{X}_{\text{{660}}}} and 𝒳66D{\mathcal{X}_{\text{{66D}}}}, respectively. Finally, another point to highlight occurs at equation 𝒳6D5{\mathcal{X}_{\text{{6D5}}}}: as we have two possible jumping addresses in the stack of and both can be used by the JUMP at the end of the block, we produce two inputs for the two possible jumping addresses, 𝒳954{\mathcal{X}_{\text{{954}}}} and 𝒳142{\mathcal{X}_{\text{{142}}}}, for capturing the two possible branches from block 6D1 (see Figure 3). \blacksquare

Theorem 2.1 (Soundness of the addresses equation system)

Let Pb0,,bpP\equiv b_{0},\dots,b_{p} be a program, 𝒳1,,𝒳n{\mathcal{X}}_{1},\dots,{\mathcal{X}}_{n} the solution of the jumps equations system of PP, and pcpc the program counter of a jump instruction. Then for any execution trace tt of PP, there exists sdom(𝒳pc)s\in dom({\mathcal{X}}_{pc}) such that n,σ𝒳pc(s)\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s) and σ(sn1)\sigma(s_{n-1}) contains all jump addresses that instruction bpcb_{pc} jumps to in tt.

We follow the next steps to prove the soundness of this theorem:

  1. 1.

    We first define an EVM collecting semantics for the operational semantics of Figure 4. Such collecting semantics gathers all transitions that can be produced by the execution of a program PP.

  2. 2.

    We continue by defining the jumps-to property as a property of this collecting semantics.

  3. 3.

    Then we prove Lemma 1 below that states that the least solution of the addresses equation system generated from the EVM program as described in Definition 4 is a safe approximation of the EVM collecting semantics w.r.t. the jumps-to property.

  4. 4.

    Finally, Theorem 2.1 trivially follows from Lemma 1.

Definition 5 (EVM collecting semantics)

Given an EVM program PP, the EVM collecting semantics operator 𝒞P\mathcal{C}_{P} is defined as follows:

𝒞P(X)={S,S|_,SXSS}\mathcal{C}_{P}(X)=\left\{{\langle S,S^{\prime}\rangle\leavevmode\nobreak\ |\leavevmode\nobreak\ \langle\_,S\rangle\in X\wedge S\Rightarrow S^{\prime}}\right\}

The EVM semantics is defined as ξP=n>0𝒞Pn(X0)\xi_{P}=\bigcup_{n>0}\mathcal{C}_{P}^{n}(X_{0}), where X0{0,0,σ}X_{0}\equiv\left\{{\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle}\right\} is the initial configuration.

Definition 6 (jumps-to property)

Let PP be an IR program, ξP=n>0𝒞Pn(X0)\xi_{P}=\bigcup_{n>0}\mathcal{C}_{P}^{n}(X_{0}), and bb an instruction at program point pcpc, then we say that ξPpcT\xi_{P}\vDash_{pc}T  if  T={n,σ|S,SξPpc,n,σS}T=\left\{{\langle n,\sigma\rangle\leavevmode\nobreak\ |\leavevmode\nobreak\ \langle S,S^{\prime}\rangle\in\xi_{P}\wedge\langle pc,\langle n,\sigma\rangle\rangle\in S^{\prime}}\right\}.

The following lemma states that the least solution of the constraint equation system defined in Definition 2.1 is a safe approximation of ξP\xi_{P}:

Lemma 1

Let Pb0,,bpP\equiv b_{0},\dots,b_{p} be a program, pcpc a program point and 𝒳0,,𝒳p{\mathcal{X}}_{0},\dots,{\mathcal{X}}_{p} the least solution of the constraints equation system of Definition 4. The following holds:

If ξPpcT\xi_{P}\vDash_{pc}T, then for all n,σT\langle n,\sigma\rangle\in T, exists sdom(𝒳pc)s\in dom({\mathcal{X}}_{pc}) such that n,σ𝒳pc(s)\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}(s).

Proof

We use 𝒳pcm{\mathcal{X}}_{pc}^{m} to refer to the value obtained for 𝒳pc{\mathcal{X}}_{pc} after mm iterations of the algorithm for solving the equation system depicted in Section 2. We say that 𝒳pc{\mathcal{X}}_{pc} covers n,σ\langle n,\sigma\rangle in 𝒞Pm(X0)\mathcal{C}_{P}^{m}(X_{0}) at program point pcpc when this lemma holds for the result of computing 𝒞Pm(X0)\mathcal{C}_{P}^{m}(X_{0}). In order to prove this lemma, we can reason by induction on the value of mm, the length of the traces S0mSmS_{0}\Rightarrow^{m}S_{m} considered in 𝒞Pm(X0)\mathcal{C}_{P}^{m}(X_{0}).

Base case: if m=0m=0, S0=0,0,σS_{0}=\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle and the Lemma trivially holds as 0,σ𝒳00(0,σ)\langle 0,\sigma_{\emptyset}\rangle\in{\mathcal{X}}_{0}^{0}(\langle 0,\sigma_{\emptyset}\rangle).

Induction Hypothesis: we assume Lemma 1 holds for all traces of length m0m\geq 0.

Inductive Case: Let us consider traces of length m+1m+1, which are of the form S0mSmSm+1S_{0}\Rightarrow^{m}S_{m}\Rightarrow S_{m+1}. SmS_{m} is a program state of the form Sm=pc,n,σS_{m}=\langle pc,\langle n,\sigma\rangle\rangle. We can apply the induction hypothesis to SmS_{m}: there exists some sdom(𝒳pcm)s\in dom({\mathcal{X}}_{pc}^{m}) such that n,σ𝒳pcm(s)\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}^{m}(s). For extending the Lemma, we reason for all possible rules in the simplified EVM semantics (Fig. 4) we may apply from SmS_{m} to Sm+1S_{m+1}:

  • Rule (1): After executing a JUMP instruction Sm+1S_{m+1} is of the form σ(sn1),n1,σ\[sn1]\langle\sigma(s_{n-1}),\langle n-1,\sigma\backslash[s_{n-1}]\rangle\rangle. In iteration m+1m+1, the following set of equations corresponding to bpcb_{pc} is evaluated:

    𝒳σ(sn1)idmap(λ(bpc,n,σ))sdom(𝒳pc),n,σ𝒳pc(s)\begin{array}[]{rcll}{\mathcal{X}}_{\sigma(s_{n-1})}&\sqsupseteq&\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\end{array}

    where idmap(λ(bpc,n,σ))=π[n1,σ\[sn1]{n1,σ\[sn1]}]\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))=\pi_{\bot}[\langle n^{\prime}-1,\sigma^{\prime}\backslash[s_{n-1}]\rangle\mapsto\left\{{\langle n^{\prime}-1,\sigma^{\prime}\backslash[s_{n-1}]\rangle}\right\}] (Case (4) in Fig. 5). The induction hypothesis guarantees that there exists some s′′𝒳pcms^{\prime\prime}\in{\mathcal{X}}_{pc}^{m} such that n,σ𝒳pcm(s′′)\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}^{m}(s^{\prime\prime}), where Sm=pc,n,σS_{m}=\langle pc,\langle n,\sigma\rangle\rangle. Therefore, at Iteration m+1m+1, the following must hold:

    𝒳σ(sn1)m+1π[n1,σ\[sn1]{n1,σ\[sn1]}]{\mathcal{X}}_{\sigma(s_{n-1})}^{m+1}\sqsupseteq\pi_{\bot}[\langle n-1,\sigma\backslash[s_{n-1}]\rangle\mapsto\left\{{\langle n-1,\sigma\backslash[s_{n-1}]\rangle}\right\}]

    so n1,σ\[sn1]𝒳σ(sn1)m+1(n1,σ\[sn1])\langle n-1,\sigma\backslash[s_{n-1}]\rangle\in{\mathcal{X}}_{\sigma(s_{n-1})}^{m+1}(\langle n-1,\sigma\backslash[s_{n-1}]\rangle) and thus Lemma 1 holds.

  • Rules (2) and (3): After executing a JUMPI instruction, Sm+1S_{m+1} is either σ(sn1),n2,σ\[sn1,sn2]\langle\sigma(s_{n-1}),\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\rangle or pc+size(bpc),n2,σ\[sn1,sn2]\langle pc+size(b_{pc}),\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\rangle, respectively. In any of those cases the following sets of equations are evaluated:

    𝒳σ(sn2)idmap(λ(JUMPI,n,σ))sdom(𝒳pc),n,σ𝒳pc(s)𝒳pc+1idmap(λ(JUMPI,n,σ))sdom(𝒳pc),n,σ𝒳pc(s){{\begin{array}[]{rcll}{\mathcal{X}}_{\sigma(s_{n-2})}&\sqsupseteq&\textit{idmap}(\lambda(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\\ {\mathcal{X}}_{pc+1}&\sqsupseteq&\textit{idmap}(\lambda(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMPI}}}}}},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\end{array}

    where
    idmap(λ(bpc,n,σ))=π[n2,σ\[sn1,sn2]{n2,σ\[sn1,sn2]}]\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))=\pi_{\bot}[\langle n^{\prime}-2,\sigma^{\prime}\backslash[s_{n-1},s_{n-2}]\rangle\mapsto\left\{{\langle n^{\prime}-2,\sigma^{\prime}\backslash[s_{n-1},s_{n-2}]\rangle}\right\}] (Case (4) of the definition of the update function λ\lambda in Fig. 5). As in the previous case, the induction hypothesis guarantees that at Iteration mm there exists s′′𝒳pcms^{\prime\prime}\in{\mathcal{X}}_{pc}^{m} such that n,σ𝒳pcm(s′′)\langle n,\sigma\rangle\in{\mathcal{X}}_{pc}^{m}(s^{\prime\prime}). Therefore, in Iteration m+1m+1, the following must hold:

    𝒳σ(sn1)m+1π[n2,σ\[sn1,sn2]{n2,σ\[sn1,sn2]}]𝒳pc+1m+1π[n2,σ\[sn1,sn2]{n2,σ\[sn1,sn2]}]\begin{array}[]{rcl}{\mathcal{X}}_{\sigma(s_{n-1})}^{m+1}&\sqsupseteq&\pi_{\bot}[\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\mapsto\left\{{\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle}\right\}]\\ {\mathcal{X}}_{pc+1}^{m+1}&\sqsupseteq&\pi_{\bot}[\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle\mapsto\left\{{\langle n-2,\sigma\backslash[s_{n-1},s_{n-2}]\rangle}\right\}]\end{array}

    and thus Lemma 1 holds for these cases as well.

  • Rules (4) - (12): We will first consider the case in which any of these rules corresponds to an EVM instruction followed by an instruction different from JUMPDEST. All rules are similar, as they all use the set of equations generated by Case (4) in Definition 4. We will see Rule (4) in detail.

    After executing a PUSHxv{\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x\leavevmode\nobreak\ v instruction, Sm+1S_{m+1} is pc+size(bpc),n+1,σ[sn{v}]\langle pc+size(b_{pc}),\langle n+1,\sigma[s_{n}\mapsto\left\{{v}\right\}]\rangle\rangle. We have to prove that exists some sdom(𝒳pc+size(bpc))s\in dom({\mathcal{X}}_{pc+size(b_{pc})}) such that n+1,σ[sn{v}]𝒳pc+size(bpc)(s)\langle n+1,\sigma[s_{n}\mapsto\left\{{v}\right\}]\rangle\in{\mathcal{X}}_{pc+size(b_{pc})}(s). The following set of equations is evaluated:

    𝒳pc+size(bpc)τ(PUSHx,𝒳pc){\begin{array}[]{rcll}{\mathcal{X}}_{pc+size(b_{pc})}&\sqsupseteq&\tau(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,{\mathcal{X}}_{pc})\end{array} (1)

    By Definition 3, τ(PUSHx,𝒳pc)=π{\tau(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,{\mathcal{X}}_{pc})=\pi^{\prime}, where sdom(π),\forall s^{\prime}\in dom(\pi), π(s)=λ(PUSHx,𝒳pc(s)){\pi^{\prime}(s^{\prime})=\lambda(\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{PUSH}}}}}}x,{\mathcal{X}}_{pc}(s^{\prime})). By the case (1) of the definition of the update function λ\lambda, we have that:

    n′′,σ′′dom(𝒳pc),π(n′′,σ′′)=n′′+1,σ′′[sn{v}]\forall\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle\in dom({\mathcal{X}}_{pc}),\pi^{\prime}(\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle)=\langle n^{\prime\prime}+1,\sigma^{\prime\prime}[s_{n}\mapsto\left\{{v}\right\}]\rangle (2)

    By the induction hypothesis, at Iteration mm there exists some sdom(𝒳pcm)s\in dom({\mathcal{X}}^{m}_{pc}) such that n,σ𝒳pcm(s)\langle n,\sigma\rangle\in{\mathcal{X}}^{m}_{pc}(s). Therefore, by 1 and 2, at Iteration m+1m+1 we have that the following holds:

    sdom(𝒳pc+size(bpc)m+1)andn+1,σ[sn{v}]𝒳pc+size(bpc)(s)s\in dom({\mathcal{X}}^{m+1}_{pc+size(b_{pc})})\leavevmode\nobreak\ \text{and}\leavevmode\nobreak\ \langle n+1,\sigma[s_{n}\mapsto\left\{{v}\right\}]\rangle\in{\mathcal{X}}_{pc+size(b_{pc})}(s)

    and thus Lemma 1 holds for Rule (4).

  • Rules (4) - (12), followed by a JUMPDEST instruction. After executing any of these instructions, Sm+1S_{m+1} is pc+size(bpc),n′′′,σ′′′\langle pc+size(b_{pc}),\langle n^{\prime\prime\prime},\sigma^{\prime\prime\prime}\rangle\rangle, where n′′′,σ′′′\langle n^{\prime\prime\prime},\sigma^{\prime\prime\prime}\rangle is obtained according to the rule from Figure 4. We have to prove that exists some sdom(𝒳pc+size(bpc))s\in dom({\mathcal{X}}_{pc+size(b_{pc})}) such that n′′′,σ′′′𝒳pc+size(bpc)(s)\langle n^{\prime\prime\prime},\sigma^{\prime\prime\prime}\rangle\in{\mathcal{X}}_{pc+size(b_{pc})}(s). The following set of equations is evaluated:

    𝒳pc+size(bpc)idmap(λ(bpc,n,σ))sdom(𝒳pc),n,σ𝒳pc(s)\begin{array}[]{rcll}{\mathcal{X}}_{pc+size(b_{pc})}&\sqsupseteq&\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))&\forall s^{\prime}\in dom({\mathcal{X}}_{pc}),\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{pc}(s^{\prime})\end{array} (3)

    where idmap(λ(bpc,n,σ))=π[n′′,σ′′]{n′′,σ′′}]\textit{idmap}(\lambda(b_{pc},\langle n^{\prime},\sigma^{\prime}\rangle))=\pi_{\bot}[\langle n^{\prime\prime},\sigma^{\prime\prime}]\rangle\mapsto\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}], where n′′n^{\prime\prime} and σ\sigma` are obtained according to the cases of the updating function detailed in Figure 5. We can see that {n′′,σ′′}]\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}] match the modification made to the state Sm+1S_{m+1} by the corresponding rule of the semantics. Therefore, at Iteration there exists an s={n′′,σ′′}]s=\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}] such that {n′′,σ′′}]𝒳pc+size(bpc)m+1\left\{{\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle}\right\}]\in{\mathcal{X}}_{pc+size(b_{pc})}^{m+1}, and Lemma 1 also holds.

When the algorithm stops Lemma 1 holds, as for any pcpc 𝒳pcm+1𝒳pcm{\mathcal{X}}_{pc}^{m+1}\sqsupseteq{\mathcal{X}}_{pc}^{m} for each iteration of the algorithm for solving the equation system of Section 2. \square

3 Stack-Sensitive Control Flow Graph

At this point, by means of the addresses equation system solution, we compute the control flow graph of the program. In order to simplify the notation, given a block BiB_{i}, we define the function getId(i,n,σ)getId(i,\langle n,\sigma\rangle), which receives the block identifier ii and an abstract stack n,σ\langle n,\sigma\rangle and returns a unique identifier for the abstract stack n,σdom(𝒳i)\langle n,\sigma\rangle\in dom({\mathcal{X}}_{i}). Similarly, getStack(i,id)getStack(i,id) returns the abstract state n,σ\langle n,\sigma\rangle that corresponds to the identifier idid of block BiB_{i}. Besides, we define the function getSize(pc,id)getSize(pc,id) that, given a program point pcBipc\in B_{i} and a unique identifier idid for BiB_{i}, returns the value nn^{\prime} s.t. n,σ=getStack(i,id)\langle n,\sigma\rangle=getStack(i,id), and 𝒳pc(n,σ)=n,σ{\mathcal{X}}_{pc}(\langle n,\sigma\rangle)=\langle n^{\prime},\sigma^{\prime}\rangle.

Example 9

Given the equation:

𝒳64B{7,{s5954}17,{s5954},3,{s1142}23,{s1142}},{\mathcal{X}_{\text{{64B}}}}\sqsupseteq\{\underbrace{\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle}_{1}\mapsto\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle,\underbrace{\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle}_{2}\mapsto\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle\},

if we compute the functions getIdgetId and getSizegetSize, we have that getId(64B,7,{s5954})=1getId(\text{{64B}},\langle 7,\left\{{s_{5}\mapsto\text{{954}}}\right\}\rangle)=1 and getId(64B,3,{s1142})=2getId(\text{{64B}},\langle 3,\left\{{s_{1}\mapsto\text{{142}}}\right\}\rangle)=2. Analogously, getSize(64B,1)=7getSize(\text{{64B}},1)=7 and getSize(64B,2)=3getSize(\text{{64B}},2)=3. \blacksquare

Definition 7 (stack-sensitive control flow graph)

Given an EVM program PP, its blocks Bibibjblocks(P)B_{i}\equiv b_{i}\dots b_{j}\in blocks(P) and its flow analysis results provided by a set of variables of the form 𝒳pc{\mathcal{X}}_{pc} for all pcPpc\in P, we define the control flow graph of PP as a directed graph CFG=V,E\textit{CFG}=\langle V,E\rangle with a set of vertices

V={Bi:id|Biblocks(P)n,σdom(𝒳i)id=getId(i,n,σ)}V=\{B_{i{:}id}\leavevmode\nobreak\ |\leavevmode\nobreak\ B_{i}\in blocks(P)\wedge\langle n,\sigma\rangle\in dom({\mathcal{X}}_{i})\wedge id=getId(i,\langle n,\sigma\rangle)\}

and a set of edges E=EjumpEnextE=E_{jump}\cup E_{next} such that:

Ejump={Bi:idBd:id2|bjJumpn,σdom(𝒳j)id=getId(i,n,σ)n,σ𝒳j(n,σ)d=σ(sn1)n′′,σ′′=λ(bj,n,σ)id2=getId(d,n′′,σ′′)}Enext={Bi:idBd:id2|bjJUMPbjEndn,σdom(𝒳j)id=getId(i,n,σ)n,σ𝒳j(n,σ)d=j+size(bj)n′′,σ′′=λ(bj,n,σ)id2=getId(d,n′′,σ′′){\begin{array}[]{rcll}E_{jump}&=&\{B_{i{:}id}\to B_{d:id_{2}}\leavevmode\nobreak\ |\nobreak\leavevmode&b_{j}\in Jump\leavevmode\nobreak\ \wedge\\ &&&\langle n,\sigma\rangle\in dom({\mathcal{X}}_{j})\wedge id=getId(i,\langle n,\sigma\rangle)\leavevmode\nobreak\ \wedge\nobreak\leavevmode\\ &&&\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{j}(\langle n,\sigma\rangle)\leavevmode\nobreak\ \wedge\leavevmode\nobreak\ d=\sigma^{\prime}(s_{n^{\prime}-1})\leavevmode\nobreak\ \wedge\nobreak\leavevmode\\ &&&\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle=\lambda(b_{j},\langle n^{\prime},\sigma^{\prime}\rangle)\wedge id_{2}=getId(d,\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle)\leavevmode\nobreak\ \}\par\\ E_{next}&=&\{B_{i:id}\to B_{d:id_{2}}\leavevmode\nobreak\ |\nobreak\leavevmode&b_{j}\neq\text{\leavevmode\lstinline{{\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@rulecolor\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@frame\lst@@@set@numbers\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\lst@@@set@numbers\lst@@@set@frame\lst@@@set@language\small{\@listingGroup{ltx_lst_identifier}{JUMP}}}}}}\leavevmode\nobreak\ \wedge b_{j}\not\in End\leavevmode\nobreak\ \wedge\\ &&&\langle n,\sigma\rangle\in dom({\mathcal{X}}_{j})\wedge id=getId(i,\langle n,\sigma\rangle)\leavevmode\nobreak\ \wedge\nobreak\leavevmode\\ &&&\langle n^{\prime},\sigma^{\prime}\rangle\in{\mathcal{X}}_{j}(\langle n,\sigma\rangle)\wedge d=j+size(b_{j})\leavevmode\nobreak\ \wedge\\ &&&\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle=\lambda(b_{j},\langle n^{\prime},\sigma^{\prime}\rangle)\wedge id_{2}=getId(d,\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle)\par\end{array}

The first relevant point of the control flow graph (CFG) we produce is that, for producing the set of vertices VV, we replicate each block for each different stack state that could be used for invoking it. Analogously, the different entry stack states are also used to produce different edges depending on its corresponding replicated blocks. Note that the definition distinguishes between two kinds of edges. (1) edges produced by JUMP or JUMPI instructions at the end of the blocks, whose destination is taken from the values stored in the stack states of the instruction before the jump with d=σ(sn1)d=\sigma^{\prime}(s_{n^{\prime}-1}); and (2) edges produced by continuations to the next instruction, whose destination is computed with d=j+size(bj)d=j+size(b_{j}). In both kinds of edges, as we could have replicated blocks, we apply function λ\lambda and get the id of the resulting state to compute the idid of the destination: n′′,σ′′=λ(bj,n,σ)id2=getId(d,n′′,σ′′)\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle=\lambda(b_{j},\langle n^{\prime},\sigma^{\prime}\rangle)\wedge id_{2}=getId(d,\langle n^{\prime\prime},\sigma^{\prime\prime}\rangle).

Example 10

Considering the blocks shown in Figure 3 and the equations shown at Figure 6, the CFG of the program includes non-replicated nodes for those blocks that only receive one possible stack state (white nodes in Figure 3). However, the nodes that could be reached by two different stack states (gray nodes in Figure 3) will be replicated in the CFG:

V={B941,B123,B954,B142,B64B:1,B653:1,B661:1,B66F:1,B6C3:1,B66E:1,B690:1,B683:1,B691:1,B6D0:1,B6D1:1,B64B:2,B653:2,B661:2,B66F:2,B6C3:2,B66E:2,B690:2,B683:2,B691:2,B6D0:2,B6D1:2}\begin{array}[]{rcl}V&=\{B_{\text{{941}}},B_{\text{{123}}},B_{\text{{954}}},B_{\text{{142}}},&B_{\text{{64B{:}1}}},B_{\text{{653{:}1}}},B_{\text{{661{:}1}}},B_{\text{{66F{:}1}}},B_{\text{{6C3{:}1}}},B_{\text{{66E{:}1}}},B_{\text{{690{:}1}}},B_{\text{{683{:}1}}},B_{\text{{691{:}1}}},B_{\text{{6D0{:}1}}},B_{\text{{6D1{:}1}}},\\ &&B_{\text{{64B{:}2}}},B_{\text{{653{:}2}}},B_{\text{{661{:}2}}},B_{\text{{66F{:}2}}},B_{\text{{6C3{:}2}}},B_{\text{{66E{:}2}}},B_{\text{{690{:}2}}},B_{\text{{683{:}2}}},B_{\text{{691{:}2}}},B_{\text{{6D0{:}2}}},B_{\text{{6D1{:}2}}}\par\}\end{array}

Analogously, our CFG replicates the edges according to the nodes replicated (solid and dashed edges in Figure 3):

E = { B941B64B:1,B_{\text{{941}}}\rightarrow B_{\text{{64B{:}1}}}, B64B:1B653:1,B_{\text{{64B{:}1}}}\rightarrow B_{\text{{653{:}1}}}, B653:1B661:1,B_{\text{{653{:}1}}}\rightarrow B_{\text{{661{:}1}}}, B661:1B66F:1,B_{\text{{661{:}1}}}\rightarrow B_{\text{{66F{:}1}}}, B66F:1B6C3:1,B_{\text{{66F{:}1}}}\rightarrow B_{\text{{6C3{:}1}}}, B6C3:1B653:1,B_{\text{{6C3{:}1}}}\rightarrow B_{\text{{653{:}1}}}, B66D:1B66E:1,B_{\text{{66D{:}1}}}\rightarrow B_{\text{{66E{:}1}}}, B66F:1B690:1,B_{\text{{66F{:}1}}}\rightarrow B_{\text{{690{:}1}}}, B66F:1B683:1,B_{\text{{66F{:}1}}}\rightarrow B_{\text{{683{:}1}}}, B683:1B691:1,B_{\text{{683{:}1}}}\rightarrow B_{\text{{691{:}1}}}, B691:1B6D1:1,B_{\text{{691{:}1}}}\rightarrow B_{\text{{6D1{:}1}}}, B6D1:1B954,B_{\text{{6D1{:}1}}}\rightarrow B_{\text{{954}}}, B123B64B:2,B_{\text{{123}}}\dashrightarrow B_{\text{{64B{:}2}}}, B64B:2B653:2,B_{\text{{64B{:}2}}}\dashrightarrow B_{\text{{653{:}2}}}, B653:2B661:2,B_{\text{{653{:}2}}}\dashrightarrow B_{\text{{661{:}2}}}, B661:2B66F:2,B_{\text{{661{:}2}}}\dashrightarrow B_{\text{{66F{:}2}}}, B66F:2B6C3:2,B_{\text{{66F{:}2}}}\dashrightarrow B_{\text{{6C3{:}2}}}, B6C3:2B653:2,B_{\text{{6C3{:}2}}}\dashrightarrow B_{\text{{653{:}2}}}, B66D:2B66E:2,B_{\text{{66D{:}2}}}\dashrightarrow B_{\text{{66E{:}2}}}, B66F:2B690:2,B_{\text{{66F{:}2}}}\dashrightarrow B_{\text{{690{:}2}}}, B66F:2B683:2,B_{\text{{66F{:}2}}}\dashrightarrow B_{\text{{683{:}2}}}, B683:2B691:2,B_{\text{{683{:}2}}}\dashrightarrow B_{\text{{691{:}2}}}, B691:2B6D1:2,B_{\text{{691{:}2}}}\dashrightarrow B_{\text{{6D1{:}2}}}, B6D1:2B142B_{\text{{6D1{:}2}}}\dashrightarrow B_{\text{{142}}} }

Note that, in Figure 3, we distinguish dashed and solid edges just to remark that we could have two possible execution paths, that is, if the call to findWinner comes from block B941B_{\text{{941}}}, it will return to block B954B_{\text{{954}}} and, if the execution comes from a public invocation, i.e. block B123B_{\text{{123}}}, it will return to block B142B_{\text{{142}}}. \blacksquare

Theorem 3.1 (Soundness of the stack-sensitive control flow graph)

Let PP be an EVM program. If a stack-sensitive control flow graph CFG can be generated, then for any execution trace tt of PP there exists a directed walk that visits, in the same order, nodes in the CFG that correspond to replicas of the blocks executed in tt.

Proof

We prove this theorem reasoning by induction on the value of mm, the length of the trace tS0mSmt\equiv S_{0}\Rightarrow^{m}S_{m}. We will assume that a directed walk of the CFG is of the form B0:0Bn:idnB_{0:0}\cdot\ldots\cdot B_{n:id_{n}}, where B0:0B_{0:0} is a replica of the block that contains the first instruction in the program b0b_{0}.

Base case: if m=0m=0, S0=0,0,σS_{0}=\langle 0,\langle 0,\sigma_{\emptyset}\rangle\rangle and the Lemma trivially holds as b0b_{0} is the first instruction of block B0B_{0}.

Induction Hypothesis: we assume Theorem 3.1 holds for all traces of length m0m\geq 0.

Inductive Case: Let us consider a trace of length m+1m+1, tS0mSmSm+1t\equiv S_{0}\Rightarrow^{m}S_{m}\Rightarrow S_{m+1}. SmS_{m} is a program state of the form Sm=pc,nm,σmS_{m}=\langle pc,\langle n_{m},\sigma_{m}\rangle\rangle. We can apply the induction hypothesis to SmS_{m}: there exists a directed walk in CFG, wB0:0Bj:idw\equiv B_{0:0}\cdot\ldots\cdot B_{j:id} that visits nodes corresponding to replicas of the blocks executed in tt in the same order, and bpcBj:idb_{pc}\in B_{j:id}. There may be two cases:

  • a)

    Instruction bpcb_{pc} is not the last instruction in Bj:idB_{j:id}. By Definition 1, bpc+size(bpc)b_{pc+size(b_{pc})} is also in Bj:idB_{j:id}, and bpcJumpb_{pc}\not\in Jump. The applicable rules of the semantics of Figure 4 are Rules (4) to (12). In all cases, Sm+1=pc+size(bpc),nm+1,σm+1S_{m+1}=\langle pc+size(b_{pc}),\langle n_{m+1},\sigma_{m+1}\rangle\rangle and Theorem 3.1 holds, since the same directed walk ww already visits a replica of the node that contains the instruction executed in Sm+1S_{m+1}.

  • b)

    Instruction bpcb_{pc} is the last instruction in block Bj:idB_{j:id}. We reason on all possible instructions that can be the last instruction of a block:

    • bpcJumpb_{pc}\not\in Jump. This case is the result of the application of Rules (4-12) of the Semantics in Figure 4. Therefore, Sm+1S_{m+1} is of the form Sm+1=d,nm+1,σm+1S_{m+1}=\langle d,\langle n_{m+1},\sigma_{m+1}\rangle\rangle, where d=bpc+size(bpc)d=b_{pc+size(b_{pc})}. Lemma 1 guarantees that there exists a stack state n,σ\langle n,\sigma\rangle such that nm,σm𝒳pc(n,σ)\langle n_{m},\sigma_{m}\rangle\in{\mathcal{X}}_{pc}(\langle n,\sigma\rangle). By Definition 7 there is an edge in EnextE_{next} of the form Bj:idBd:id2B_{j:id}\to B_{d:id_{2}} where d=pc+size(bpc)d=pc+size(b_{pc}) for each element in 𝒳pc(n,σ){\mathcal{X}}_{pc}(\langle n,\sigma\rangle). Therefore, there exists a directed walk w=wBd:id2w^{\prime}=w\cdot B_{d:id_{2}} that visits nodes in CFG corresponding to replicas of the blocks executed in tt, and Theorem 3.1 holds.

    • bpcJumpb_{pc}\in Jump. This case is the result of the application of Rules (1-3) of the Semantics in Figure 4.

      The application of Rules (1-2) corresponds to a jump in the code, and Sm+1S_{m+1} is of the form Sm+1=dm+1,nm+1,σm+1S_{m+1}=\langle d_{m+1},\langle n_{m+1},\sigma_{m+1}\rangle\rangle, where dm+1=σm(snm1)d_{m+1}=\sigma_{m}(s_{n_{m}-1}). Lemma 1 guarantees that there exists a stack state n,σ\langle n,\sigma\rangle such that nm,σm𝒳pc(n,σ)\langle n_{m},\sigma_{m}\rangle\in{\mathcal{X}}_{pc}(\langle n,\sigma\rangle). By Definition 7, EJumpE_{Jump} contains an edge Bj:idBd:id2B_{j{:}id}\to B_{d:id_{2}} for each element in 𝒳pc(n,σ){\mathcal{X}}_{pc}(\langle n,\sigma\rangle), such that d=σm(snm1)d=\sigma_{m}(s_{n_{m}-1}) and id2id_{2} is the replica identifier of block dd corresponding to λ(bpc,nm,σm)\lambda(b_{pc},\langle n_{m},\sigma_{m}\rangle). Therefore, d=dm+1d=d_{m+1}, and the directed walk w=wBd:id2w^{\prime}=w\cdot B_{d:id_{2}} visits nodes in CFG corresponding to replicas of the blocks executed in tt, so Theorem 3.1 holds.

      The application of Rule (3) corresponds to a JUMPI instruction in the code that does not jump to its destination address. This case is equal to the previous case in which bpcJumpb_{pc}\not\in Jump.

\square

References

  • [1] The EthereumPot contract, 2017. https://etherscan.io/address/0x5a13caa82851342e14cd2ad0257707cddb8a31b7.
  • [2] A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2nd edition, 2006.
  • [3] Lexi Brent, Anton Jurisevic, Michael Kong, Eric Liu, Francois Gauthier, Vincent Gramoli, Ralph Holz, and Bernhard Scholz. Vandal: A Scalable Security Analysis Framework for Smart Contracts, 2018. arXiv:1809.03981.
  • [4] Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. Gigahorse: thorough, declarative decompilation of smart contracts. In Joanne M. Atlee, Tevfik Bultan, and Jon Whittle, editors, Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, pages 1176–1186. IEEE / ACM, 2019.
  • [5] Neville Grech, Michael Kong, Anton Jurisevic, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. Madmax: surviving out-of-gas conditions in ethereum smart contracts. PACMPL, 2(OOPSLA):116:1–116:27, 2018.
  • [6] Gavin Wood. Ethereum: A secure decentralised generalised transaction ledger, 2014.