# Solidity Internals
In this tutorial, we'd like to study the low-level details of the solidity internals, like some high-level language constructs that solidity realizes for EVM studied last week.
# Dispatching Calls
>>> s = Storage.deploy()
Transaction sent: 0xed56633bc73ce0068b871ada77cb7bf1c8ccf418df278752b3d7b75cd1481113
Gas price: 0.0 gwei Gas limit: 12000000 Nonce: 657
Storage.constructor confirmed Block: 666 Gas used: 90551 (0.75%)
Storage deployed at: 0xb6dbB5627379c0094408B3dbfE1f802E54b1c68A
>>> code = web3.eth.get_code('0xb6dbB5627379c0094408B3dbfE1f802E54b1c68A')
>>> print(evm_disasm(code))
...
You might find this pseudo code generated from the ethervm.io (opens new window) website or panoramix (opens new window) decompiler useful as well.
We will walk through the entire bytecode of the Storage
contract.
We'd like to decipher two magic values, what we called function selector, namely,
0x6057361d
and 0x2e64cec1
.
In fact, as low-level bytecodes,
EVM does not provide high-level semantic constructs
like if
/else
or doesn't even have a notion of function
,
likewise to assembly for C/C++.
To provide an interface from an external call,
Ethereum defines an ABI specification (opens new window)
that standard smart contracts should follow.
The first four bytes of msg.data
contains a pointer to a function
that the caller intends to invoke on the smart contract.
The pointer of each function is defined
by a hash of the function description
including the types of each parameters.
>>> web3.keccak(text="store(uint256)")[:4]
'0x6057361d'
>>> web3.keccak(text="retrieve()")[:4]
'0x2e64cec1'
Similarly, in the brownie console, you can check these selectors and ABI like below:
>>> s.store.signature
'0x6057361d'
>>> s.retrieve.signature
'0x2e64cec1'
>>> s.store.abi
{
'inputs': [
{
'internalType': "uint256",
'name': "num",
'type': "uint256"
}
],
'name': "store",
'outputs': [],
'stateMutability': "nonpayable",
'type': "function"
}
>>> s.retrieve.abi
{
'inputs': [],
'name': "retrieve",
'outputs': [
{
'internalType': "uint256",
'name': "",
'type': "uint256"
}
],
'stateMutability': "view",
'type': "function"
}
Let's check the actual payload of the transaction.
>>> s = Storage.deploy()
Transaction sent: 0x55259335db83b1d9f77435967fbbb7ede3427e1de09ea0eb3a7f80f953ce7e9a
Gas price: 0.0 gwei Gas limit: 12000000 Nonce: 671
Storage.constructor confirmed Block: 680 Gas used: 90551 (0.75%)
Storage deployed at: 0x8c071390f8f468E3a761e7C8e67AEf4937B61f65
>>> s = s.store(0xdeadbeef)
Transaction sent: 0xcfbcd5c7527b34f740206457b8be5fe3e337f621ffea6764d42567f2db057f98
Gas price: 0.0 gwei Gas limit: 12000000 Nonce: 673
Storage.store confirmed Block: 682 Gas used: 22252 (0.19%)
>>> s.input
'0x6057361d00000000000000000000000000000000000000000000000000000000deadbeef'
Do you see the four bytes of s.input
contains the hash of store(uint256)
followed by the argument, 0xdeadbeef
?
Wait, the ABI defines to map a string to a 32b integer? Hey, hash collision!
Task 1
Your first task is to find a function that collides with callMeIfYouCan(uint256,bool)
!
function submitProofOfCollision(string memory func2) external {
string memory func1 = "callMeIfYouCan(uint256,bool)";
uint hash1 = keccak256(abi.encodePacked(func1));
uint hash2 = keccak256(abi.encodePacked(func2));
require(hash1 != hash2, "func1 != func2")
require(hash1 >> (256 - 32) == hash2 >> (256 - 32),
"Find two functions that have the same selector");
completed1 = true;
}
# Layout of State Variables
Ethereum provides three types of memory (opens new window):
stack
and memory
are a temporary storage alive during the current execution context,
and storage
is a global storage persistent on the ledger.
The storage
can be seen as a simple key-value store
that given a 256b key it stores and retrieves a 256b value.
Accordingly,
it is hard to enumerate storage
(too big!)
unless you already know what keys to look up ahead of time.
High-level idea. Unlike stack
,
memory
and stack
are like a free canvas;
it is up to the high-level language to define
own strategy to effectively use these storage.
Solidity specifies various rules
of how to build the structure of
high-level primitives like mapping
, dynamic or static array
.
Before proceeding further, please take time to digest these two articles: Layout in Memory (opens new window) and Layout of State Variables in Storage (opens new window) in order.
# Slot
// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.0 <0.9.0;
contract A {
struct S {
uint128 a;
uint128 b;
uint[2] staticArray;
uint[] dynArray;
}
uint public x;
uint256 public y;
uint128 public z0;
uint8 public z1;
uint16 public z2;
bool public b;
S public s;
address public addr;
mapping (address => uint) public map1;
mapping (uint => mapping (address => bool)) public map2;
uint[] public array;
string public s1;
bytes public b1;
function set_x(uint _x) public { x = _x; }
function set_y(uint256 _y) public { y = _y; }
function set_z0(uint128 _z0) public { z0 = _z0; }
function set_z1(uint8 _z1) public { z1 = _z1; }
function set_z2(uint16 _z2) public { z2 = _z2; }
function set_b(bool _b) public { b = _b; }
function set_s(S memory _s) public { s = _s; }
function set_addr(address _addr) public { addr = _addr; }
function set_map1(address _k, uint _v) public { map1[_k] = _v; }
function set_map2(uint _k1, address _k2, bool _v) public { map2[_k1][_k2] = _v; }
function add_to_array(uint _v) public { array.push(_v); }
function set_s1(string memory _s1) public { s1 = _s1; }
function set_b1(bytes memory _b1) public { b1 = _b1; }
}
>>> src = "..."
>>> dump_layout(src)
[00]@000-031 x : I/032B uint256
[01]@000-031 y : I/032B uint256
[02]@000-015 z0 : I/016B uint128
[02]@016-016 z1 : I/001B uint8
[02]@017-018 z2 : I/002B uint16
[02]@019-019 b : I/001B bool
[03]@000-127 s : I/128B struct A.S
[07]@000-019 addr : I/020B address
[08]@000-031 map1 : M/032B mapping(address => uint256)
[09]@000-031 map2 : M/032B mapping(uint256 => mapping(address => bool))
[10]@000-031 array : D/032B uint256[]
[11]@000-031 s1 : B/032B string
[12]@000-031 b1 : B/032B bytes
struct A.S:
[00]@000-015 a : I/016B uint128
[00]@016-031 b : I/016B uint128
[01]@000-063 staticArray : I/064B uint256[2]
[03]@000-031 dynArray : D/032B uint256[]
The left most column shows the slot number indexed from 0
and, if the slot is shared with other variables,
the offset (e.g., @000-015
in z0
and @016-016
in z1
)
can be used
to distinguish each others.
It uses one byte as a unit
to store variables; a bool
is also one byte.
The right most column represents the type
along with its size and the encoding scheme.
The encoding scheme, noted, I
for inplace, M
for mapping and D
for dynamic
array and B
for bytes, each of which are discussed below.
>>> a = A.deploy()
>>> a.set_x(0xdeadbeef)
>>> a.set_y(0x00c0ffee)
>>> a.get_storage_at(0)
'0x00000000000000000000000000000000000000000000000000000000deadbeef'
>>> a.get_storage_at(1)
'0x0000000000000000000000000000000000000000000000000000000000c0ffee'
We can look up the value in each slot of the contract via get_storage_at()
using 256b/32B as storage unit. The unit type, uint
, is equivalent
to uint256
as shown above.
If the sizes of variables are less than 256b, they are likely clobbered together with the adjacent variables. For example, the slot 2 shared by four variables.
[02]@000-015 z0 : I/016B uint128
[02]@016-016 z1 : I/001B uint8
[02]@017-018 z2 : I/002B uint16
[02]@019-019 b : I/001B bool
>>> a.set_z0(2**128-1)
>>> a.get_storage_at(2)
'0x00000000000000000000000000000000ffffffffffffffffffffffffffffffff'
>> a.set_z1(0xee)
>>> a.get_storage_at(2)
'0x000000000000000000000000000000eeffffffffffffffffffffffffffffffff'
>>> a.set_z2(0xdddd)
>>> a.get_storage_at(2)
'0x00000000000000000000000000ddddeeffffffffffffffffffffffffffffffff'
>>> a.set_b(1)
>>> a.get_storage_at(2)
'0x00000000000000000000000001ddddeeffffffffffffffffffffffffffffffff'
As shown above, each of variables are all stored in the slot 2 because the aggregated size of their values fits to the 256b slot.
Keep in mind that the encoding scheme of each variables (not bytes
)
stored in the slot
is little endian but we can only fetch a 256b unit via get_storage_at
so be cautious when you are matching the stored data with the solidity code.
>>> a.set_z0(int("0x" + "".join("%02x" % c for c in range(0, 128//8)), 16))
>>> a.get_storage_at(2)
'0x00000000000000000000000001ddddee000102030405060708090a0b0c0d0e0f'
# Bytes and String
Solidity has some optimized uses of bytes
and string
.
If bytes
and string
are less than or equal to 31B,
they are encoded in place as optimization; the last one byte indicates
the size of the data. If they are overflowed, their data are stored
to a location, indexed by keccak256(slot) + offset
. Let's walk through the
example below.
# when size <= 31B, inplace encoding. See, one byte size, 0x06/2 = 3B.
>>> a.set_b1(0xaabbcc)
>>> a.get_storage_at(12)
'0xaabbcc0000000000000000000000000000000000000000000000000000000006'
# size == 31B, 0x3e/2 = 31B
>>> a.set_b1(2**(256-8)-1)
>>> a.get_storage_at(12)
'0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff3e'
# size == 32B, overflowed! the slot contains only the size, 0x40/2 == 32B
# 0x41 % 2 == 1 indicates the overflow
>>> a.set_b1(2**256-1)
>>> a.get_storage_at(12)
'0x0000000000000000000000000000000000000000000000000000000000000041'
# this is where the data will be stored. 12 is the slot number.
>>> web3.solidityKeccak(["uint256"], [12])
'0xdf6966c971051c3d54ec59162606531493a51404a002842f56009d7e5cf4a8c7'
# 32B data is stored @ keccak256(12)
>>> a.get_storage_at(web3.solidityKeccak(["uint256"], [12]))
'0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff'
# if the data is bigger than 32B, it uses keccak256(12) + N
# where N == size // 32
>>> a.set_b1(2**400-1)
>>> a.get_storage_at(12)
'0x0000000000000000000000000000000000000000000000000000000000000065'
>>> a.get_storage_at(web3.solidityKeccak(["uint256"], [12]))
'0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff'
>>> a.get_storage_at(web3.solidityKeccak(["uint256"], [12]).int()+1)
'0xffffffffffffffffffffffffffffffffffff0000000000000000000000000000'
string
in solidity works exactly same as bytes
. Try by yourself!
# Struct
The struct
is encoded in placed
and, in this case, it spans for 128/32 = 4 slots.
[03]@000-127 s : I/128B struct A.S
struct S {
uint128 a;
uint128 b;
uint[2] staticArray;
uint[] dynArray;
}
>>> a.set_s((1, 2, [3, 4], [5, 6, 7, 8]))
# uint128 a, uint128 b
>>> a.get_storage_at(3)
'0x0000000000000000000000000000000200000000000000000000000000000001'
# staticArray[0]
>>> a.get_storage_at(4)
'0x0000000000000000000000000000000000000000000000000000000000000003'
# staticArray[1]
>>> a.get_storage_at(5)
'0x0000000000000000000000000000000000000000000000000000000000000004'
# dynArray[]; it indicates the size of the dynamic array (discussed below)
>>> a.get_storage_at(6)
'0x0000000000000000000000000000000000000000000000000000000000000004'
# Static and Dynamic Array
The static array is encoded in place; as show above,
each element occupies 1+ slots depending on the size of each element.
The dynamic array, however, requires additional storage to store
each elements. The slot, 10 for array
in our example, contains
the size of the array like below.
>>> a.add_to_array(0xdeadbeef)
>>> a.array(0)
3735928559
>>> a.get_storage_at(10)
'0x0000000000000000000000000000000000000000000000000000000000000001'
>>> a.add_to_array(0x00c0ffee)
>>> a.array(1)
12648430
>>> a.get_storage_at(10)
'0x0000000000000000000000000000000000000000000000000000000000000002'
The element of an array is stored similar to the bytes
when overflowed.
For example, the element at 0 and element at 1 can be found like below:
>>> a.get_storage_at(web3.solidityKeccak(["uint256"], [10]).int()+0)
'0x00000000000000000000000000000000000000000000000000000000deadbeef'
>>> a.get_storage_at(web3.solidityKeccak(["uint256"], [10]).int()+1)
'0x0000000000000000000000000000000000000000000000000000000000c0ffee'
>>> a.get_storage_of_array(10, element=2)
[0xc65a7bb8d6351c1cf70c95a316cc6a92839c986682d98bc35f958f4883f9d2a8] len=2
[00] 0x00000000000000000000000000000000000000000000000000000000deadbeef
[01] 0x0000000000000000000000000000000000000000000000000000000000c0ffee
# Mapping
Lastly, the mapping follows a similar encoding scheme, like bytes
and dynamic array
but with a different indexing structure, as it is indexed by an arbitrary key of
the specified type, say keccak256(key . slot)
. Let's walk through an example.
>>> h = web3.solidityKecck
>>> a.map1(a.address)
0
>>> a.map1(a.address, 0xdeadbeef)
>>> a.map1(a.address)
0xdeadbeef
>>> a.get_storage_at(h(["uint256", "uint256"], [int(a.address, 16), 8]))
'0x00000000000000000000000000000000000000000000000000000000deadbeef'
For mapping(address => uint256)
at the slot 8,
it stores a value of the key, address
,
to the keccak256(uint265(address) . uint256(8))
slot of the storage.
For mapping(uint256 => mapping(address => bool))
,
it calculates the storage slot
by applying a hash in sequence:
slot = 8
-+--
|
+-------------------------------------------+
V
slot = keccak256(uint265(uint265) . uint256(slot))
-+--
|
+-------------------------------------------+
V
slot = keccak256(uint256(address) . uint256(slot))
Concretely, you can look up the value of the nested mapping like below in brownie:
>>> a.set_map2(13, a.address, 1)
>>> a.map2(13, a.address)
True
# slot for the first mapping
>>> s = h(["uint256", "uint256"], [13, 9])
# slot for the nexted mapping
>>> a.get_storage_at(h(["uint256", "uint256"], [int(a.address, 16), s.int()]))
'0x0000000000000000000000000000000000000000000000000000000000000001'
Task 2
Let's do a hide-and-seek!
struct S {
uint128 a;
uint128 b;
uint[2] staticArray;
uint[] dynArray;
}
// Storage (global) variables.
address public owner;
bool public completed1;
bool public completed2;
mapping(bool => mapping(uint128 => mapping(string => S))) data;
/* hide! */
data[true][0xdeadbeef]["hello"].dynArray.push(0x00c0ffee);
data[false][0xdeadc0de]["world!"].staticArray[1] = 0xdecafbad;
function submitHideAndSeek(uint slot1, uint slot2) external {
/* seek! */
require(readSlot(slot1) == 0x00c0ffee, "Failed to find the answer!");
require(readSlot(slot2) == 0xdecafbad, "Failed to find the answer!");
completed2 = true;
}
← Public Key Vault →