Erigon Stage Sync and control flows

May 04, 2022

Quick follow up from the previous post: the theoretical figure of 1015 Gb of the database (this does not include block snapshot of 225 Gb) after full resync of the latest alpha release 2022.05.02 was not too far from the actual, which was 1083 Gb, out of which 12 Gb was non yet reused free space.

In this post we will be looking closer into the internals of the “ETH core” component from the architecture diagram in one of the previous posts. Inside is something we call “Stage Loop”. Here is the diagram:

There are two main control flows here - one shown by yellow arrows, and another - by blue arrows. Stage loop is driven by a single go-routine (thread) that repeatedly goes through so-called stages. Below are short descriptions of what each stage does:

Headers. Requests block headers from ETH sentry, receives them, verifies and persists in 3 tables in the database (one can find full list of tables in the file tables.go in the package kv of the erigon-lib repository):
- Headers: height|hash => RLP encoding of header
- HeaderTD: height|hash => RLP encoding of total_difficulty
- HeaderCanonical: height => hash
Block Hashes. Computes the inverted mapping (and persists it into a database table): HeaderNumber: hash => height. It is more efficient to perform this separately from the Headers stage, because hashes are not monotonous, so their insertion into a database table is most efficient when hashes are first pre-sorted.
Bodies. Requests block bodies corresponding to canonical block hashes, receives them, verifies and persists in 2 tables in the database:
- BlockBody: height => (start_tx_id; tx_count)
- EthTx: tx_id => RLP encoding of tx
Senders. Processes digital signatures (ECDSA) of transactions and “recovers” corresponding public keys, and therefore, “From” addresses for each transaction, persists them into a table in the database:
- Senders: height => sender_0|sender_1|…|sender_{tx_count-1}
Execution. Replays all transactions and computes so-called “Plain State”, as well as 2 types of change logs, and receipts and event logs, persisting all these into tables in the database. It also create another, temporary table, which is later used one the Call Trace Index Stage.
- PlainState: account_address => (balance; nonce; code_hash) or account_address|incarnation|location => storage_value
- PlainContractCode: account_address|incarnation => code_hash
- Code: code_hash => contract_bytecode
- AccountChangeSet: height => account_address => (prev_balance; prev_nonce; prev_code_hash)
- StorageChangeSet: height => account_address|location => prev_storage_value
- Receipts: height => CBOR encoding of receipts
- Log: height|tx_index => CBOR encoding of event logs
- CallTraceSet: height => account_address => (bit_from; bit_to)
Hashed State. Exists only to provide input data for the next stage. Reads new records in the tables AccountChangeSet and StorageChangeSet, to determine new account addresses and storage locations added to the state, and adds entries to two tables, which have similar content to PlainState, except the mapping is from “hashed keys”, and accounts and storage items are in separate tables:
- HashesAccounts: keccak256(account_address) => (balance; nonce; code_hash)
- HashedStorage: keccak256(account_address)|incarnation|keccak256(location) => storage_value
Trie. Calculates state root hash, as well as maintains two tables (TrieOfAccounts and TrieOfStorage) in the database allowing incrementation computation of state root hash to be done more efficiently.
Call Trace Index. Processes data from temporary table CallTraceSet and creates two inverted indices (represented by roaring bitmaps):
- CallFromIndex: account_address => bitmap of heights where account has “from” traces
- CallToIndex: account_address => bitmap of heights where accounts has “to” traces
History Index. Processes data from AccountChangeSet and StorageChangeSet tables and creates two inverted indices (as roaring bitmaps):
- AccountHistory: account_address => bitmap of heights where account was modified
- StorageHistory: account_address|location => bitmap of heights where storage item was modified
Log Index. Processes data from Log table and creates two inverted indices (as roaring bitmaps):
- LogAddressIndex: account_address => bitmap of heights where account is mentioned in any event logs
- LogTopicIndex: topic => bitmap of heights where topic is mentioned in any event logs
Tx Lookup. Processes data from BlockBody and EthTx tables and persists mapping that allows finding transactions by their tx hash:
- TxLookup: tx_hash => height

Whenever control flow reaches any stage, it attempts to process all data available for this stage at the moment. This means that during initial launch, Headers stage will attempt to download all existing block headers, Bodies stage will attempt to download all corresponding block bodies, and so on. It can take many hours before control returns to the Headers stage again, by which time there are more headers available, so the process repeats, but with much fewer number of headers, and then blocks. Eventually, these repetitions converge to processing 1 (or sometimes more) blocks at a time, as they are being produced by the network.

Another control flow, shown in blue, is driven by data arriving from other peers, as managed by sentry. Receiving newly produced blocks, or response to requested headers and block bodies, may happen at the time when the Stage Loop (yellow control flow) is away from the “Headers” or “Bodies” stages, those that are able to “ingest” new headers and block bodies correspondingly. In order not to block the blue control flow, so-called “Exchange” data structures are introduced. They allow for the blue control to deposit data and for the yellow control flow to pick the data up. Of course, these exchanges need to be designed such that they do not consume unlimited memory, but at the same time minimise amount of repeated requests, and latency of delivery of the latest headers and blocks. In order words, it is not trivial, and there are still bugs in the implementation, unfortunately.

The Merge

Design of “The Merge” (a.k.a. POS transition) introduced another, third control flow in the diagram above. It is control flow communicating with the Consensus Layer (CL). Details of the interaction between Staged Sync, Sentry flow, and CL flow are still being worked out, and hopefully will be described in another post. As one may suspect, things are getting more complicated…

Erigon Blog

Discussion about this post