This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Alternative Paradigm for Developing and Pricing Storage on Smart Contract Platforms

Christos Patsonakis Department of Informatics and Telecommunications
University of Athens
c.patswnakis@di.uoa.gr
   Mema Roussopoulos Department of Informatics and Telecommunications
University of Athens
mema@di.uoa.gr
Abstract

Smart contract platforms, the most notable of which is probably Ethereum, facilitate the development of important and diverse distributed applications (e.g., naming services and fungible tokens) in a simple manner. This simplicity stems from the inherent utility of employing the state of smart contracts to store, query and verify the validity of application data. In Ethereum, data storage incurs an underpriced, non-recurring, predefined fee. Furthermore, as there is no incentive for freeing or minimizing the state of smart contracts, Ethereum is faced with a tragedy of the commons problem with regards to its monotonically increasing state. This issue, if left unchecked, may lead to centralization and directly impact Ethereum’s security and longevity.

In this work, we introduce an alternative paradigm for developing smart contracts in which their state is of constant size and facilitates the verification of application data that are stored to and queried from an external, potentially unreliable, storage network. This approach is relevant for a wide range of applications, such as any key-value store. We evaluate our approach by adapting the most widely deployed standard for fungible tokens, i.e., the ERC20 token standard. We show that Ethereum’s current cost model penalizes our approach, even though it minimizes the overhead to Ethereum’s state and aligns well with Ethereum’s future. We address Ethereum’s monotonically increasing state in a two-fold manner. First, we introduce recurring fees that are proportional to the state of smart contracts and adjustable by the miners that maintain the network. Second, we propose a scheme where the cost of storage-related operations reflects the effort that miners have to expend to execute them. Lastly, we show that under such a pricing scheme that encourages economy in the state consumed by smart contracts, our ERC20 token adaptation reduces the incurred transaction fees by up to an order of magnitude.

I Introduction

Bitcoin ([1]) revolutionized the world of digital payments by allowing untrusted entities to transact securely without relying on trusted, third parties. Its operation is based on a distributed network of peers with open membership that maintains a highly replicated, auditable, append-only log of transactions, which is commonly referred to as a blockchain. A second generation of blockchains allows the development of smart contracts ([2]), i.e., digital agents that encode, execute and enforce arbitrary agreements. Smart contract platforms provide the means of developing diverse and important distributed applications (dApps) in a simple manner that, prior to their introduction, was challenging to implement.

Ethereum ([3]) is probably the most notable smart contract platform. Its live chain features dApps that implement naming services ([4]), multisignature wallets ([5]), a large variety of fungible tokens ([6]) and even crypto-collectibles ([7]), all in just a few lines of code. The simplicity of developing dApps on top of these platforms stems from the inherent utility of employing the state of smart contracts to store, query and verify the validity of application data. For instance, all implementations of the most widely deployed standard for fungible tokens, i.e., the ERC20 token standard ([8]), store each account’s token balance on the contract’s state.

Today, Ethereum’s cost model does not adequately take into account the amount of storage consumed by smart contracts. This is problematic for several reasons. First, in Ethereum, storing data on the state of smart contracts requires paying one, non-recurring fee at the time the data is stored. Thus, regardless of the amount of state that they consume, contracts have zero maintenance costs and can be part of Ethereum’s state forever. Second, storage-related operations are underpriced, as stated by Ethereum’s creator, Vitalik Buterin, in one of his recent talks ([9]). These two factors facilitate contracts that gain utility from storing small amounts of data per user and have low computational complexity, such as ERC20 tokens. As a result, such contracts have very low transaction fees for their operations. Third and most importantly, Ethereum’s state must be maintained by all full nodes, yet there is no incentive mechanism in place for freeing storage. If left unchecked, this can have serious consequences. It will diminish the mining population as proportionally fewer and fewer miners will be able to contribute to the network. This will lead to centralization and may prohibit new nodes from joining and syncing to the network. This will have a direct impact on Ethereum’s security and, utlimately, its longevity.

In this work, we introduce an alternative paradigm for developing dApps on top of smart contract platforms by decoupling the issue of storage from verifying the validity of data. The former is handled by an external, potentially unreliable, storage network that allows efficient access to the application’s data. To verify the validity of data obtained from the storage network, we maintain cryptographic accumulators in the smart contract’s state. These are data structures that provide a constant-sized representation of a set of elements and allow for verifiable (non) membership proofs. To evaluate our approach, we present a case study of an accumulator-based implementation of the ERC20 token standard. We choose this standard because it is the most widely deployed token standard for fungible tokens, numbering over 130,000 compliant contracts on Ethereum’s live chain ([6]). Via minor modifications, our construction can be modified to fit other, upcoming standards, such as the ERC721 standard ([10]) for non-fungible tokens. However, we stress that our approach can be adapted to any application that requires a verifiable representation of its application data, e.g., naming services, voting systems or any kind of key-value store.

By requiring only minimal (constant-sized) state to be stored in the contract, our accumulator-based approach promotes diversity, scalability, and security of the Ethereum network. Yet, we show that under Ethereum’s current cost model, this accumulator-based approach is penalized for the security properties it provides; it is much more (almost prohibitively) costly than the approach of storing each account’s token balance in the contract state. This illustrates one of Ethereum’s main incentive misalignments. To address this, we revisit Ethereum’s storage cost model and propose modifications that: 1) price storage-related operations based on the effort that miners have to expend to execute them, 2) ensure that contracts pay recurring fees proportionate to the amount of storage they consume and the system’s overall capacity and, 3) free space consumed by unused/stale contracts. We show that under such a pricing scheme, our accumulator-based ERC20 token construction reduces the incurred transaction fees by up to an order of magnitude. With these modifications, we hope the Ethereum developer community will be encouraged to exercise economy in the state consumed by the smart contracts they develop.

II Ethereum

Ethereum is a blockchain-based, 32-byte word, global computer that allows the development of smart contracts, i.e., stateful agents that “live” in the blockchain and can execute arbitrary state transition functions. Smart contract code is written in a high-level, Turing-complete programming language (e.g., Solidity [11]), which is then compiled-down to Ethereum Virtual Machine (EVM) initialization code. Contracts are deployed by wrapping their initialization code in a transaction, signing it and broadcasting it to the network. Users can interact with smart contracts by broadcasting appropriately formatted transactions. Smart contracts are “passive” entities that, as a result of a user’s transaction, can issue message calls, i.e., call functions of other contracts. Ethereum’s cryptocurrency is called ether and serves as a means to incentivize participants (miners) to engage in the protocol. Transactions fees are measured in a unit called gas and are a function of the byte size and the complexity of the code invoked by transactions (if any). Each transaction byte and EVM operation costs some predefined amount of gas ([3]). Transactions specify a gas price, which converts ether to gas and influences the incentive of miners to include it in their next block. A transaction that consumes gcostg_{cost} gas and specifies a gas price of gpriceg_{price} will cost E=gcost×gpriceE=g_{cost}\times g_{price} units of ether. Lastly, transactions and message calls, specify an upper bound on the amount of gas that they can consume. This protects miners from, e.g., getting stuck in an infinite loop, an issue that stems from Ethereum’s Turing-completeness.

III Hash Tree Universal Accumulator

Cryptographic accumulators provide a constant representation of a set of elements and allow for verifiable membership queries. Universal accumulators also allow for verifiable non membership queries. Proving statements (e.g., element membership) is facilitated via values that are referred to as witnesses. Informally, the security property of accumulators states that an adversary is unable to generate a valid witness value for a false statement, except with negligible probability. For instance, an adversary is not able to generate a valid membership witness for an element xx that is not part of the accumulated set of elements XX. It is common to refer to the party that maintains and manages the accumulator as the accumulator manager. In our accumulator-based ERC20 token, this role is played by the smart contract.

In the following, we provide a high level description of the hash tree, universal accumulator of Camacho et al. [12], whose security is based on collision-resistant hash functions. This accumulator employs a public data structure m=(T,X)m=(T,X) (referred to as memory), where X={x1,,xn}X=\{x_{1},...,x_{n}\} is the set of accumulated elements and TT is a binary, balanced hash tree. The accumulator’s value (denoted as AccAcc) is the hash of TT’s root node. Camacho et al. [12] model their accumulator as a tuple of the following algorithms:

  • 𝖲𝖾𝗍𝗎𝗉(𝗄):\mathsf{Setup}\mathsf{(k)}: On input the security parameter kk\in\mathbb{N}, it outputs the accumulator’s initial value Acc0{0,1}kAcc_{0}\in\{0,1\}^{k}, which corresponds to the set X=X=\emptyset, and an initialized memory m0m_{0}.

  • 𝖶𝗂𝗍𝗇𝖾𝗌𝗌(𝖠𝖼𝖼,𝗆,𝗑):\mathsf{Witness}\mathsf{(Acc,m,x)}: This algorithm outputs a membership or a non membership witness WW, if xXx\in X or if xXx\notin X, respectively.

  • 𝖡𝖾𝗅𝗈𝗇𝗀𝗌(𝖠𝖼𝖼,𝗑,𝖶):\mathsf{Belongs}\mathsf{(Acc,x,W)}: This algorithm outputs 11, if WW is a valid witness for xXx\in X, 0, if WW is a valid witness for xXx\notin X, or \perp otherwise.

  • 𝖴𝗉𝖽𝖺𝗍𝖾𝗈𝗉(𝖠𝖼𝖼𝖻𝖾𝖿𝗈𝗋𝖾,𝗆𝖻𝖾𝖿𝗈𝗋𝖾,𝗑):\mathsf{Update_{op}}\mathsf{(Acc_{\mathsf{before}},m_{\mathsf{before}},x)}: This algorithm updates the accumulator’s value by either adding (𝗈𝗉=𝖺𝖽𝖽\mathsf{op=add}) or removing (𝗈𝗉=𝖽𝖾𝗅\mathsf{op=del}) the element xx to/from the accumulated set XX. It outputs the updated values of the accumulator (𝖠𝖼𝖼𝖺𝖿𝗍𝖾𝗋\mathsf{Acc_{after}}) and its memory (𝗆𝖺𝖿𝗍𝖾𝗋\mathsf{m_{after}}), as well as, an update witness 𝖶𝗈𝗉\mathsf{W_{op}}.

  • 𝖢𝗁𝖾𝖼𝗄𝖴𝗉𝖽𝖺𝗍𝖾(𝖠𝖼𝖼𝖻𝖾𝖿𝗈𝗋𝖾,𝖠𝖼𝖼𝖺𝖿𝗍𝖾𝗋,𝗑,𝖶𝗈𝗉):\mathsf{CheckUpdate}\mathsf{(Acc_{\mathsf{before}},Acc_{\mathsf{after}},x,W_{\mathsf{op}})}: This algorithm outputs 11, if WopW_{op} is a valid witness for an update operation (𝗈𝗉{𝖺𝖽𝖽,𝖽𝖾𝗅}\mathsf{op\in\{add,del\}}) pertaining element xx, which updated the accumulator’s value from 𝖠𝖼𝖼𝖻𝖾𝖿𝗈𝗋𝖾\mathsf{Acc_{before}} to 𝖠𝖼𝖼𝖺𝖿𝗍𝖾𝗋\mathsf{Acc_{after}}, otherwise, it outputs 0.

This accumulator is strong, i.e., it does not require a trusted setup nor a trusted accumulator manager. It allows for updates (additions and deletions) that can be performed without having access to secret information and are publicly verifiable. The latter is accomplished via the 𝖢𝗁𝖾𝖼𝗄𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{CheckUpdate} algorithm which, on input a witness returned by 𝖴𝗉𝖽𝖺𝗍𝖾𝗈𝗉\mathsf{Update_{op}} and the accumulator’s values before and after the update, outputs 1, if the update was performed honestly, and 0, otherwise. In this accumulator, (non) membership and update witnesses are hash path(s) starting from some node(s) in TT (not necessarily leaf node(s)) that lead all the way up to the root node. Thus, their size is 𝒪(klog2(n))\mathcal{O}(k\log_{2}(n)), where n=|X|n=|X|.

IV Accumulator-based ERC20 Token

The ERC20 token standard ([8]) describes the functions and events that facilitate the exchange of arbitrary crypto-assets. Each token holder’s account is associated with an Ethereum 𝐚𝐝𝐝𝐫𝐞𝐬𝐬\mathbf{address} data type. The token balance of each account is commonly represented as a 𝐮𝐢𝐧𝐭\mathbf{uint} data type, i.e., an unsigned integer. The ERC20 token interface is comprised of the following functions:

  1. 1.

    𝐭𝐨𝐭𝐚𝐥𝐒𝐮𝐩𝐩𝐥𝐲():\mathbf{totalSupply}\mathbf{()\!:} Outputs the total supply of tokens accross all accounts.

  2. 2.

    𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐎𝐟(𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐨𝐰𝐧𝐞𝐫):\mathbf{balanceOf}\mathbf{(address\;owner)\!:} Outputs the token balance of the input account.

  3. 3.

    𝐚𝐩𝐩𝐫𝐨𝐯𝐞(𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐬𝐩𝐞𝐧𝐝𝐞𝐫,𝐮𝐢𝐧𝐭𝐭𝐨𝐤𝐞𝐧𝐬):\mathbf{approve}\mathbf{(address\;spender,uint\;tokens)\!:} The account that issues the call (transaction) to this function authorizes the “𝐬𝐩𝐞𝐧𝐝𝐞𝐫\mathbf{spender}” account to transfer the specified number of 𝐭𝐨𝐤𝐞𝐧𝐬\mathbf{tokens} from her account.

  4. 4.

    𝐚𝐥𝐥𝐨𝐰𝐚𝐧𝐜𝐞(𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐨𝐰𝐧𝐞𝐫,𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐬𝐩𝐞𝐧𝐝𝐞𝐫):\mathbf{allowance}\mathbf{(address\;owner,address\;spender):} Outputs the number of tokens that the spender’s account is 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve}’d to transfer from the owner’s account.

  5. 5.

    𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫(𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐭𝐨,𝐮𝐢𝐧𝐭𝐭𝐨𝐤𝐞𝐧𝐬):\mathbf{transfer}\mathbf{(address\;to,uint\;tokens)\!:} The account that issues the call (transaction) to this function transfers the specified number of tokens to the “𝐭𝐨\mathbf{to}” account.

  6. 6.

    𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦(𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐟𝐫𝐨𝐦,𝐚𝐝𝐝𝐫𝐞𝐬𝐬𝐭𝐨,𝐮𝐢𝐧𝐭𝐭𝐨𝐤𝐞𝐧𝐬):\begin{aligned} &\mathbf{transferFrom}\mathbf{(address\;from,address\;to,uint}\\[-2.84526pt] &\mathbf{tokens):}\end{aligned}

           Transfers the specified number of 𝐭𝐨𝐤𝐞𝐧𝐬\mathbf{tokens} from account ”𝐟𝐫𝐨𝐦\mathbf{from}“ to the 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve}’d account ”𝐭𝐨\mathbf{to}“.

To facilitate the aforementioned functionality, ERC20 compliant smart contracts store two mappings in their state: 1) balances, which maps account addresses to token balances and, 2) allowed, which maps account addresses to another mapping where, the latter, maintains the balance that each 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve}’d account is allowed to transfer from the token owner’s account.

We now illustrate how we employ the hash tree, universal accumulator of Camacho et al. [12] (Section III), to realize an accumulator-based ERC20 token. The core idea is to replace each aforementioned mapping with one accumulator. We replace the balances mapping with an accumulator, balancesAcc, that accumulates (owner,tokens) tuples and allows clients to infer each account’s token balance. For the allowed mapping, which is a “double” mapping, we need two accumulators. The first accumulator, allowedAddressesAcc, accumulates (owner,spender) tuples and allows clients to infer the accounts that token owners have 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve}’d. The second accumulator, allowedBalancesAcc, accumulates (spender,tokens) tuples and allows clients to infer the token balance that 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve}’d accounts are allowed to transfer from the owner’s account. Thus, we have a constant-sized and verifiable representation of account balances and allowances.

Our design’s security depends solely on that of the smart contract platform (Ethereum in our case) and the accumulator scheme. This allows us to employ a variety of primitives to realize the storage network, whose concrete specification we leave as future work. For instance, even centralized cloud storage services are a viable option. However, we believe that the best approach is a distributed file storage system, especially one that has “bridges” with the Ethereum network. Some notable examples are Swarm ([13]), Storj ([14]) and IPFS ([15]). The storage network’s state is assumed to be comprised of the memory data structure (see Section III) of each of the aforementioned accumulators. As we show below, the interaction with accumulator-based ERC20 smart contracts requires the construction of (non) membership and update witnesses by the clients which, subsequently, are subject to verification by the smart contract. Clients construct these witnesses by interacting with the storage network. We stress that clients do not need to download the entire memory of accumulators to construct these witnesses. The data that needs to be transmitted from storage nodes to clients are hash paths from the appropriate accumulators’ hash trees, i.e., they are of logarithmic complexity. Thus, from hereon in, we assume that clients can efficiently construct the witness values that are required to realize the ERC20 token interface.

Accumulator-based ERC20 token smart contracts cannot implement the 𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐎𝐟\mathbf{balanceOf} and 𝐚𝐥𝐥𝐨𝐰𝐚𝐧𝐜𝐞\mathbf{allowance} functions since they do not store account balances and allowances in their state. Instead, clients are able to infer the information obtained by these functions by interacting with the storage network. To infer the balance yy of account xx, clients construct and verify a membership witness that the tuple (x,y)(x,y) is accumulated in balancesAcc. To infer the allowance zz of a spender’s account x2x_{2} from an owner’s account x1x_{1}, clients construct and verify two membership witnesses. First, a membership witness that the tuple (x1,x2)(x_{1},x_{2}) is accumulated in allowedAddressesAcc, which proves that the token owner x1x_{1} has allowed the spender account x2x_{2} to transfer some tokens from her account. Second, a membership witness that the tuple (x2,z)(x_{2},z) is accumulated in allowedBalancesAcc, which proves the number of tokens the spender is allowed to transfer from the token owner’s account.

An account x1x_{1} with balance y1y_{1} that wishes to 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer} zz tokens (y1zy_{1}\geq z) to an account x2x_{2} with balance y2y_{2} produces the following proofs. First, a membership witness for the tuple (x1,y1)(x_{1},y_{1}) in balancesAcc, which proves the owner’s account balance. Second, a membership witness for the tuple (x2,y2)(x_{2},y_{2}), which proves the balance of the destination account. Third, an update witness for the deletion of the tuple (x1,y1)(x_{1},y_{1}) from balancesAcc. Fourth, an update witness for the deletion of the tuple (x2,y2)(x_{2},y_{2}) from balancesAcc. Fifth, an update witness for the addition of the tuple (x1,y1z)(x_{1},y_{1}-z) to balancesAcc. Sixth, an update witness for the addition of the tuple (x2,y2+z)(x_{2},y_{2}+z) to balancesAcc. Notice that the sequence of the involved updates reflects the transfer of zz tokens from x1x_{1} to x2x_{2}.

Due to space limitations, we are unable to describe how we realize the 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} and 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} operations. To provide insight with regards to their complexity, we mention the proofs that are involved in each operation. The 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} operation involves two update witnesses and either one non membership, or, one membership witness, depending on whether the token owner approves the spender’s account for the first time or not, respectively. The 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} operation involves four membership witnesses and six update witnesses. Thus, the 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} is the most expensive operation, followed by 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer} and, lastly, by 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve}.

V Evaluation

In this section, we evaluate our accumulator-based ERC20 token construction. We ran our experiments on a private blockchain that is maintained by a single mining node. We use the latest, stable version of geth (v1.8.17, [16]), Ethereum’s official client, that was available at the time of this writing. We conducted our experiments via the truffle suite (v4.1.13, [17]) that employs solc-js (v0.4.24, [18]) to compile smart contracts with optimizations enabled.

Refer to caption
Figure 1: Gas cost versus of the 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer}, 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} and 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} operations of our accumulator-based ERC20 token construction for up to a total of 400,000 accounts and 400,000 approvals.

Figure 1 illustrates the gas cost of the 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer}, 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} and 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} operations of our accumulator-based ERC20 token for up to a total of 400,000 accounts and 400,000 approvals. Results illustrate that transaction gas costs scale logarithmically, which is expected (same holds for the 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} operation). Recall that all involved proofs are hash path(s) starting from some node(s) (not necessarily leaf node(s)) in the accumulator’s tree. Thus, their size and verification cost varies based on the position of those nodes in the tree. Our construction’s operations consume a large portion of the block’s limit which is, currently, about 8 million gas ([19]). In the following, we discuss a series of improvements that will diminish the cost of our construction’s operations.

The security property of the accumulator of Camacho et al. [12] is based on the presupposition that, prior to an invocation of 𝖢𝗁𝖾𝖼𝗄𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{CheckUpdate} for the deletion or addition of some element xx, xXx\in X or xXx\notin X, respectively. Thus, prior to, e.g., verifying the addition of some element xx via the 𝖢𝗁𝖾𝖼𝗄𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{CheckUpdate} algorithm, we have to make sure, via a non membership witness verification, that xXx\notin X. Part of an ongoing project is to provide a proof extension that will allow us to lift this assumption from the accumulator’s security property. Consequently, we will be able to eliminate one, two and four invocations of the accumulator’s 𝖡𝖾𝗅𝗈𝗇𝗀𝗌\mathsf{Belongs} algorithm from the 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve}, 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer} and 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} operations of the accumulator-based ERC20 token, respectively. Note that one invocation of 𝖡𝖾𝗅𝗈𝗇𝗀𝗌\mathsf{Belongs} costs, on average, 289,873.23 gas, when |X|=400,000|X|=400,000 and will, thus, provide a substantial improvement.

Our implementation of the hash tree accumulator employs the SHA-256 hash function, which is exposed as a precompiled contract in Ethereum. Precompiled contracts reside on well-known, static addresses and constitute Ethereum’s “standard library”, similar to that of common programming languages. The advantage of precompiled contracts is that their computation incurs low gas costs because their code runs on the miner’s machine language. The computational cost of the SHA-256 hash function is 60 gas, plus 12 gas per input word (rounded up) and its implementation complies to the NIST standard. However, the KECCAK-256 hash function, whose computational cost is 30 gas, plus 6 gas per input word (rounded up), does not comply to the NIST standard and is, instead, implemented as an EVM opcode. Moreover, precompiled contracts, at each invocation, incur the extra gas cost of a message call, which is 700 gas. However, that is not the case for EVM opcodes. Thus, Ethereum promotes the use of a non-standard compliant hash function. Recently, a proposal has been submitted ([20]) that suggests the removal of the message call gas cost for precompiled contracts, which we believe is fair. Furthermore, we believe that the gas cost of these hash functions should be equalized. Assuming that both of the aforementioned suggestions are applied, the gas cost of the hashing operations will be reduced by 93.69%93.69\% and, as a result, will further diminish the gas cost of the accumulator-based ERC20 token operations.

To illustrate the overhead of our accumulator-based ERC20 token construction, we implemented a “bare-bones” ERC20 token (where account balances and allowances are stored in the contract’s state [21]) and repeated the same experiment. We measure an average cost of 33,193.12, 42,465.23 and 23,798.35 gas for the 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer}, 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} and 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} operations, respectively. Thus, our accumulator-based construction is much more expensive, despite its constant and minimal space overhead on Ethereum’s state. The large discrepancy between the gas cost of the two constructions’ operations, as well as, the small and static gas cost of the bare-bones ERC20 token operations, are a by-product of Ethereum’s flat cost model. The fact that storage-related operations are underpriced ([9]) and that contracts do not pay a recurring fee proportional to the size of their state is one of Ethereum’s main incentive misalignments. This issue, if left unchecked, will have severe consequences to the future of, not only Ethereum, but any smart contract platform that employs a flat cost model. Next, we propose modifications to Ethereum’s cost model to deal with this issue.

VI Revisiting Ethereum’s Storage Cost Model

Ethereum employs a flat cost model to price all EVM opcodes ([3]), including storage-related operations. There are two main issues with this approach. First, storing data on the state of smart contracts incurs a one time fee which is underpriced ([9]). To our knowledge, there is no other, real world system that provides such high levels of data replication and availability without a recurring fee that is proportional to the volume of the stored data. Furthermore, as there is no incentive for freeing storage, Ethereum is faced with a tragedy of the commons problem with regards to the monotonically increasing size of its state. Second, Ethereum’s flat cost model does not account for the complexity of executing storage-related operations, which is a function of the size of the state of smart contracts. We propose the following modifications to Ethereum’s pricing of storage to address these issues.

Recurring Storage Fees: The concept of introducing “storage rent”, i.e., a recurring fee that smart contracts have to pay based on the amount of storage they consume has been discussed over the years. Buterin’s original proposal ([22]) has spurred a lot of discussion and has led to the publication of several articles (e.g., [23, 24, 25]) which, in their vast majority, stress how important such a mechanism is for the longevity of public blockchains. An additional use of the rent mechanism is to clean up Ethereum’s state from accounts (contracts are accounts as well) that are not being used anymore.

Our proposal on the subject of storage rent is based on the following points. First, we believe that rent fees should not be rewarded to anyone as that could introduce new attack vectors. Second, since Ethereum is a global computer, it is rational to assume that it has a predefined storage capacity SmaxS_{max} (e.g., Buterin has suggested 500 GB [26]). Naturally, this is a conceptual upper bound on the state’s size and will, essentially, reflect an estimate of what is considered reasonable for the average miner. Third, SmaxS_{max} should be adjustable by the ones that maintain the network, i.e., the miners, to account for real world, storage trends. This could be achieved via a mechanism similar to the one that is already in use for adjusting block difficulty. We propose that up to a low utilization percentage of the system’s state, e.g., Ulow=25%U_{low}=25\%, the rent per storage key of a contract’s state should be static to reflect the low burden imposed on miners. When the state’s utilization is between UlowU_{low} and, e.g., Uhigh=80%U_{high}=80\%, the rent per storage key of a contract’s state should increase logarithmically with the total number of keys in the system’s state. This reflects the fact that Ethereum’s state is organized on top of LevelDB whose complexity we elaborate more on the following paragraph. From thereon in, rent fees should be prohibitive, thus, they should scale linearly to the total number of keys in the system’s state. To derive a base rent fee per storage key, we considered real world examples of systems that are highly replicated, available and charge for storage. Cloud storage providers are a prime example. For instance, Amazon’s EFS ([27]) charges 0.30 USD per GB per month. At the time of this writing, one unit of Ether corresponds to 202.18 USD ([28]). Based on this analogy, we compute a base rent fee of Rbase=530,657,634.8R_{base}=530,657,634.8 Wei per storage key per year (1 Ether corresponds to 101810^{18} Wei). Thus, we have an adaptable scheme for computing rent fees that follows the laws of supply and demand by considering the state’s overall utilization and the burden imposed on miners.

Scaling Storage Costs: A contract’s state is organized on an on-disk Merkle Patricia (MP) trie ([29]), which is referred to as the storage trie. This is a modified version of a typical radix tree with the added property of Merkle trees, i.e., the root hash uniquely identifies the (key,value) pairs in the tree. The nodes of the storage trie and the smart contract’s state (storage keys) are stored in a LevelDB ([30]) key-value store, whose underlying data structure is a multi-level Log Structured Merge (LSM) tree. As illustrated in a recent study ([31]), due to Ethereum’s authenticated storage (MP trie), one Ethereum read (e.g., reading the root node of a contract’s storage trie) can lead to 64 LevelDB 𝗀𝖾𝗍()\mathsf{get()} (read) requests. Each 𝗀𝖾𝗍()\mathsf{get()} may internally involve multiple disk reads due to the large amount of metadata that LevelDB maintains ([32]). Updates to a contract’s storage, e.g., adding/updating storage keys, result in updates to its storage trie that have to be committed on disk. In LevelDB, key-value updates are reinserted into a skip list with a monotonically increasing sequence number along with a “tombstone” flag that invalidates the pair’s prior version. To maintain key-value pairs in sorted order, LevelDB uses a compaction method. This process involves multiple merge sorts (one per LSM tree level) and incurs a write amplification factor, which is the ratio of the amount of data written to the amount of data requested for writing by users, of ×11\times 11 ([32]).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: Gas cost versus of the 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer} (2(a)), 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} (2(b)) and 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} (2(c)) operations of the ERC20 token and the accumulator-based ERC20 token for up to a total of 400,000 accounts and 400,000 approvals under the new storage cost model.

Ethereum’s flat cost model does not reflect the aforementioned complexity of storage-related operations. One might assume that an ideal scheme would scale the cost of these operations based on the number of incurred disk operations. However, this is not possible as Ethereum miners do not have a shared hardware configuration, e.g., their physical hard disks and their caches vary significantly. This would interfere with Ethereum’s consensus as the execution of the same transaction would lead to different gas costs across different miners. Instead, we propose a scheme where the cost of storage-related operations is computed on a per transaction basis and scales according to the number of operations to LevelDB’s LSM tree, which is the same across all miners. Fetching one key from a LSM tree involves two binary searches ([33]). Accessing the value of a smart contract’s storage key involves, at minimum, fetching one node of its storage trie and, subsequently, fetching the storage key itself. Thus, it requires a total of four binary searches, i.e., 4log2(n)4\log_{2}(n) accesses, where nn is the number of storage keys. Updating, or, adding a new storage key, involves the same number of accesses to infer the value of the tombstone flag. However, since updates are propagated to all levels of LevelDB’s LSM tree during its compaction process, they are subject to LevelDB’s write amplification factor, which we discussed above. Thus, updates incur a total of 11×4log2(n)=44log2(n)11\times 4\log_{2}(n)=44\log_{2}(n) operations. Currently, reading, storing and updating storage keys costs 200, 20,000 and 5,000 gas, respectively. Thus, under our proposed scheme, the cost of, e.g., reading a storage key is 200×4log2(n)200\times 4\log_{2}(n).

Figure 2 illustrates the gas cost of the 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫\mathbf{transfer}, 𝐭𝐫𝐚𝐧𝐬𝐟𝐞𝐫𝐅𝐫𝐨𝐦\mathbf{transferFrom} and 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} operations of the bare-bones and our accumulator-based ERC20 token under our proposed cost model. Regarding the bare-bones ERC20 token, we only plot the storage-related cost of its operations, which are the dominant factor. The biggest discrepancy is in the 𝐚𝐩𝐩𝐫𝐨𝐯𝐞\mathbf{approve} operation (Figure 2(c)) where our accumulator-based construction provides an order of magnitude improvement. Overall, results illustrate that, under a pricing scheme that reflects the effort that miners have to expend to execute storage-related operations, the programming paradigm that we propose in this work provides reduced gas costs across all ERC20 token operations. Nevertheless, we believe that the most important property of our approach is that it aligns well with the future of smart contract platforms since it incurs constant storage overhead to miners.

VII Conclusion

We introduce an alternative programming paradigm for developing dApps that promotes diversity, scalability and aligns well with the future of smart contract platforms. Our approach can be adapted to any application that requires a verifiable representation of its application data. We propose a scheme for computing rent fees that follows the laws of supply and demand by considering the state’s overall utilization, as well as the burden imposed on miners. In addition, our scheme is adjustable to real world, storage trends. We introduce scaling of the cost of storage-related operations to account for the effort that miners have to expend to execute them. Lastly, we show that under such a pricing scheme that encourages economy in the state consumed by smart contracts, our ERC20 token adaptation reduces the incurred transaction fees by up to an order of magnitude.

References

  • [1] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” http://bitcoin.org/bitcoin.pdf.
  • [2] N. Szabo, “Smart contracts: Building blocks for digital markets,” https://tinyurl.com/ycdbqu9a.
  • [3] G. Wood, “Ethereum yellow paper,” https://tinyurl.com/yaptyawg.
  • [4] “Ethereum name service,” https://ens.domains/.
  • [5] “Consensys: Ethereum multisigwallet,” https://github.com/ConsenSys/MultiSigWallet.
  • [6] “Erc20 token market capitalization,” https://etherscan.io/tokens.
  • [7] “Cryptokitties,” https://www.cryptokitties.co/.
  • [8] “Eip20 - erc20 token standard,” https://tinyurl.com/ycd8mzb3.
  • [9] V. Buterin, “Transaction fee economics,” https://www.youtube.com/watch?v=7vuTtvshR34&t=1213s, August 2018.
  • [10] “Erc721 - a class of unique tokens,” http://erc721.org/.
  • [11] “Solidity,” https://solidity.readthedocs.io/en/v0.4.24/.
  • [12] P. Camacho, A. Hevia, M. A. Kiwi, and R. Opazo, “Strong accumulators from collision-resistant hashing,” in Information Security, 11th International Conference, ISC 2008.
  • [13] “Swarm,” https://tinyurl.com/y7fz8q3u.
  • [14] “Storj - decentralized cloud object storage that is affordable, easy to use, private, and secure,” https://storj.io/.
  • [15] “Ipfs is the distributed web,” https://ipfs.io/.
  • [16] “Go ethereum - releases,” https://tinyurl.com/m8k5gor.
  • [17] “Truffle suite,” https://truffleframework.com/.
  • [18] “solc-js,” https://github.com/ethereum/solc-js.
  • [19] “Etherscan - ethereum average gaslimit chart,” https://tinyurl.com/yaokfvl2.
  • [20] J. Baylina, “Eip 1109: Precompiledcall opcode,” https://tinyurl.com/yckxjogx, May 2018.
  • [21] “Ethereum wiki - erc20 token standard,” https://tinyurl.com/yd9fnw9q.
  • [22] “Eip 103: Blockchain rent,” https://tinyurl.com/yc3uc4ak.
  • [23] “Ethereum?s vitalik buterin wants to create annual ?rent? fees,” https://tinyurl.com/yal56med, July 2018.
  • [24] “Vitalik wants you to pay to slow ethereum’s growth,” https://tinyurl.com/y9gj8zvz, March 2018.
  • [25] “Eip 1418 blockchain rent: fixed cost per word-block,” https://github.com/ethereum/EIPs/issues/1418.
  • [26] “A simple and principled way to compute rent fees,” https://tinyurl.com/y9vv6w59, March 2018.
  • [27] “Amazon elastic file system,” https://aws.amazon.com/efs/.
  • [28] “Ethereum price chart us dollar (eth/usd),” https://tinyurl.com/jxsjqqd.
  • [29] “Ethereum - merkle patricia tree,” https://tinyurl.com/zl2z4m8.
  • [30] “Google: Leveldb,” https://github.com/google/leveldb.
  • [31] P. Raju, S. Ponnapalli, E. Kaminsky, G. Oved, Z. Keener, V. Chidambaram, and I. Abraham, “mlsm: Making authenticated storage faster in ethereum,” in 10th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2018.
  • [32] X. Wu, Y. Xu, Z. Shao, and S. Jiang, “Lsm-trie: An lsm-tree-based ultra-large key-value store for small data items,” in USENIX Annual Technical Conference (USENIX ATC 2015).
  • [33] P. Raju, R. Kadekodi, V. Chidambaram, and I. Abraham, “Pebblesdb: Building key-value stores using fragmented log-structured merge trees,” in Proceedings of the 26th Symposium on Operating Systems Principles, ser. SOSP ’17.