Erigon Stage Sync and control flows
Quick follow up from the previous post: the theoretical figure of 1015 Gb of the database (this does not include block snapshot of 225 Gb) after full resync of the latest alpha release 2022.05.02 was not too far from the actual, which was 1083 Gb, out of which 12 Gb was non yet reused free space.
In this post we will be looking closer into the internals of the “ETH core” component from the architecture diagram in one of the previous posts. Inside is something we call “Stage Loop”. Here is the diagram:
There are two main control flows here - one shown by yellow arrows, and another - by blue arrows. Stage loop is driven by a single go-routine (thread) that repeatedly goes through so-called stages. Below are short descriptions of what each stage does:
Headers. Requests block headers from ETH sentry, receives them, verifies and persists in 3 tables in the database (one can find full list of tables in the file
tables.go
in the packagekv
of theerigon-lib
repository):Headers: height|hash => RLP encoding of header
HeaderTD: height|hash => RLP encoding of total_difficulty
HeaderCanonical: height => hash
Block Hashes. Computes the inverted mapping (and persists it into a database table):
HeaderNumber: hash => height
. It is more efficient to perform this separately from the Headers stage, because hashes are not monotonous, so their insertion into a database table is most efficient when hashes are first pre-sorted.Bodies. Requests block bodies corresponding to canonical block hashes, receives them, verifies and persists in 2 tables in the database:
BlockBody: height => (start_tx_id; tx_count)
EthTx: tx_id => RLP encoding of tx
Senders. Processes digital signatures (ECDSA) of transactions and “recovers” corresponding public keys, and therefore, “From” addresses for each transaction, persists them into a table in the database:
Senders: height => sender_0|sender_1|…|sender_{tx_count-1}
Execution. Replays all transactions and computes so-called “Plain State”, as well as 2 types of change logs, and receipts and event logs, persisting all these into tables in the database. It also create another, temporary table, which is later used one the Call Trace Index Stage.
PlainState: account_address => (balance; nonce; code_hash)
oraccount_address|incarnation|location => storage_value
PlainContractCode: account_address|incarnation => code_hash
Code: code_hash => contract_bytecode
AccountChangeSet: height => account_address => (prev_balance; prev_nonce; prev_code_hash)
StorageChangeSet: height => account_address|location => prev_storage_value
Receipts: height => CBOR encoding of receipts
Log: height|tx_index => CBOR encoding of event logs
CallTraceSet: height => account_address => (bit_from; bit_to)
Hashed State. Exists only to provide input data for the next stage. Reads new records in the tables AccountChangeSet and StorageChangeSet, to determine new account addresses and storage locations added to the state, and adds entries to two tables, which have similar content to PlainState, except the mapping is from “hashed keys”, and accounts and storage items are in separate tables:
HashesAccounts: keccak256(account_address) => (balance; nonce; code_hash)
HashedStorage: keccak256(account_address)|incarnation|keccak256(location) => storage_value
Trie. Calculates state root hash, as well as maintains two tables (
TrieOfAccounts
andTrieOfStorage
) in the database allowing incrementation computation of state root hash to be done more efficiently.Call Trace Index. Processes data from temporary table CallTraceSet and creates two inverted indices (represented by roaring bitmaps):
CallFromIndex: account_address => bitmap of heights where account has “from” traces
CallToIndex: account_address => bitmap of heights where accounts has “to” traces
History Index. Processes data from
AccountChangeSet
andStorageChangeSet
tables and creates two inverted indices (as roaring bitmaps):AccountHistory: account_address => bitmap of heights where account was modified
StorageHistory: account_address|location => bitmap of heights where storage item was modified
Log Index. Processes data from
Log
table and creates two inverted indices (as roaring bitmaps):LogAddressIndex: account_address => bitmap of heights where account is mentioned in any event logs
LogTopicIndex: topic => bitmap of heights where topic is mentioned in any event logs
Tx Lookup. Processes data from
BlockBody
andEthTx
tables and persists mapping that allows finding transactions by their tx hash:TxLookup: tx_hash => height
Whenever control flow reaches any stage, it attempts to process all data available for this stage at the moment. This means that during initial launch, Headers stage will attempt to download all existing block headers, Bodies stage will attempt to download all corresponding block bodies, and so on. It can take many hours before control returns to the Headers stage again, by which time there are more headers available, so the process repeats, but with much fewer number of headers, and then blocks. Eventually, these repetitions converge to processing 1 (or sometimes more) blocks at a time, as they are being produced by the network.
Another control flow, shown in blue, is driven by data arriving from other peers, as managed by sentry. Receiving newly produced blocks, or response to requested headers and block bodies, may happen at the time when the Stage Loop (yellow control flow) is away from the “Headers” or “Bodies” stages, those that are able to “ingest” new headers and block bodies correspondingly. In order not to block the blue control flow, so-called “Exchange” data structures are introduced. They allow for the blue control to deposit data and for the yellow control flow to pick the data up. Of course, these exchanges need to be designed such that they do not consume unlimited memory, but at the same time minimise amount of repeated requests, and latency of delivery of the latest headers and blocks. In order words, it is not trivial, and there are still bugs in the implementation, unfortunately.
The Merge
Design of “The Merge” (a.k.a. POS transition) introduced another, third control flow in the diagram above. It is control flow communicating with the Consensus Layer (CL). Details of the interaction between Staged Sync, Sentry flow, and CL flow are still being worked out, and hopefully will be described in another post. As one may suspect, things are getting more complicated…