Post-merge release of Erigon, dropping alpha designation, and progress of Erigon2

Sep 17, 2022

It has been a while since the last update, but summer always messes up the schedule. But now the weather got cooler, the routine set in. Ready? Let’s dive in :)

Merge

On the 15th of September 2022, Ethereum transitioned to Proof Of Stake (POS) by merging two of its networks/chains, and from then on, their blocks are synchronised. What was known as “beacon chain” or “Ethereum 2” (and now is called “Consensus Layer”, or CL) is allocating block slots, which are filled by the blocks produced by what was known “Ethereum 1” (and now is called “Execution Layer” or EL). You can see this now by inspecting any POS block in a block explorer:

While previously, blocks had 1 numerical identifier, called “Block number”, or “Block height”, now each block will have three: block height, slot number, and epoch number. Epochs are relevant when we talk about finalisation, as Ethereum POS finalises blocks in whole epochs.

Although the merge event has been seen as very successful, we see quite a few user issues at the moment, which is somewhat understandable, because the complexity of the system running a node did go up: two pieces of software are required to run a node instead of one, there are multiple combinations of these pieces, and these pieces have their own “sync” lifecycles.

Given the above, some of us think that there will be a need for embedded CL at some point. Of course, as far as Erigon architecture is concerned, such embedding would also be separable, working via GRPC. Perhaps a standard GRPC specification will emerge for EL←→ CL communication.

Alpha and beta designations

From the release which will be published shortly after this article (2022.09.02 - this is not a date but second release of September 2022), we are dropping alpha and beta designations from our releases. It did make sense in the past, as we were experimenting with a few approaches to managing the code. But recently we decided that designating our releases as alpha and beta does not serve us very well. Firstly, to the best of our knowledge, no other Ethereum implementations use such designations, and the users assume that all other implementations are therefore “production ready”. What “production ready” actually means is relatively subjective. But there were also instances where our alpha designation was used to dissuade users from using our software. We would like node operators to use Erigon if they find it useful, if it provides them with the functionality or cost efficiency that other implementations are not providing. As will be described in the section below, there will be big changes in Erigon for a while, as we are gradually rolling out Erigon 2, which we believe will make Erigon competitive with other implementations according to most criteria. For now, it is up to the users to make their choices.

Erigon 2 progress

As described in one of the previous posts, the plan was for Erigon 2 to include three upgrades, which we now call Erigon 2.1, Erigon 2.2, and Erigon 2.3. It is still the plan. We can say that Erigon 2.1 has been shipped, it is now part of the all releases. There are still issues once in a while, but for most users, the functionality works. To recap, Erigon 2.1 has brought:

Infrastructure for using BitTorrent to download and seed block snapshots.
Block snapshots contain information about block headers, block bodies (including transaction payloads), and pre-computed transaction senders.
Compression used inside snapshot files (mostly for transaction payloads) reduces the total disk footprint of a node.
Indexing based on minimal perfect hash tables for looking up transactions by their hashes further reduces the total disk footprint of a node.

Erigon 2.2 is currently out of the prototype phase and is being integrated into the code base. It is not developed in a separate branch, but instead is "hiding” behind an experimental command line flag. Here is what Erigon 2.2 will be bringing:

In addition to block snapshots, all historical data will be provided as snapshots, delivered and seeded via BitTorrent.
State (accounts, contract storage, byte code of contract) will still be only in the MDBX and will not be delivered via BitTorrent (this should be happening in Erigon 2.3).
History of state will become more granular. Instead of being able to query the state of any account, or contract storage item, or byte code “as of” certain block height, it will be possible to query these things “as of” before any transaction in any block.
Transaction-level granularity of history will improve performance of queries like debug_traceTransaction, trace_transaction, but more importantly, of trace_filter, which currently has to re-execute a lot of irrelevant transactions to advance the state from beginning of a block.
Improved performance of trace_filter may lead to block explorers like Otterscan being able to offer smoother user experience when displaying list of transactions for certain account.
Slightly experimental but promising feature - state reconstitution. Given downloaded history in the form of snapshot files, it is possible to reconstitute recent state by re-executing only part of the transactions occurred since Genesis. Such reconstitution may also exploit multiple CPU cores. So far experiments on relatively powerful machine (16 cores, 128 Gb RAM) showed reconstitution for Ethereum main net up to 15m block within 11,5 hours. Experiment is now prepared for BSC (but it takes a long time to generate history snapshot files).

There is still a lot of work to be done for Erigon 2.2, including, but not limited to proper parallelism of snapshot creating, and rewriting most of the RPC methods to support new state history layout.

Erigon 2.3 is still in the prototype phase (and will likely stay in the prototype until Erigon 2.2 is rolled out), though a lot of progress has been made here too. According to the current vision, Erigon 2.3 will bring:

State of accounts, contract storage items, and contract byte code will be split into parts stored as snapshot files and MDBX (that part will contain recent updates).
The separation into snapshot files and MDBX is similar to the idea used in LSM (Log Structured Merge) databases, such as LevelDB, RocksDB etc. Recently updated entries live in “Level 0” (which is usually in-memory skip list, analogous to MDBX in Erigon 2.3). As they get “more mature”, they migrate to “Level 1” (bunch of files), and then to “Level 2” (larger files), etc.
The main difference between LSM organisation and Erigon 2.3 is that snapshots files are produced in deterministic way, and therefore can be shared around the network in the form of BitTorrent swarms.
There will be further decoupling of “commitment” from the “state”, with the code designed to easily support multiple commitments simultaneously. This will be quite useful for easier support of Verkle tree commitments, and also for alternative commitment schemes, used in StarkNet, for instance.

After Erigon 2.3 upgrade, we see the Erigon archive node being mostly bootstrapped via snapshot files, and then relatively quickly catch up to the more recent state. Also, the size of MDBX database file will be quite limited, and perhaps fitting entirely in RAM on relatively powerful hardware.

Erigon Blog

Discussion about this post