In the beginning of 2020, a decision was made to introduce so-called Staged Sync to Erigon (back then, it was called “Turbo-Geth”). The idea is simple, but counter-intuitive. On the picture above, it is assumed that Ethereum implementation processes the blocks received from the network, as a series of pipelines. On the picture, there are 7 stages in such pipeline, shown by 7 colours (in reality, it does not have to be 7). Intuitively, one would want to maximise the utilisation of multi-core or multi-CPU computers and so run many pipelines concurrently. Staged sync goes against this intuition, and proposes the processing model where all available blocks first go through stage 1 of the pipeline, then all of them go through stage 2 of the pipeline and so on. Within each stage, concurrency is allowed, but more often than not, it is actually parallelism more than concurrency that matters. Certain stages in the pipeline, for example, verification of signatures are so-called “embarrassingly parallel” tasks. Others are mostly I/O bound and do not benefit from parallelism. In fact, concurrency may even degrade performance due to content switching and lost locality of data access.
Staged Sync and short history of Silkworm project
Staged Sync and short history of Silkworm…
Staged Sync and short history of Silkworm project
In the beginning of 2020, a decision was made to introduce so-called Staged Sync to Erigon (back then, it was called “Turbo-Geth”). The idea is simple, but counter-intuitive. On the picture above, it is assumed that Ethereum implementation processes the blocks received from the network, as a series of pipelines. On the picture, there are 7 stages in such pipeline, shown by 7 colours (in reality, it does not have to be 7). Intuitively, one would want to maximise the utilisation of multi-core or multi-CPU computers and so run many pipelines concurrently. Staged sync goes against this intuition, and proposes the processing model where all available blocks first go through stage 1 of the pipeline, then all of them go through stage 2 of the pipeline and so on. Within each stage, concurrency is allowed, but more often than not, it is actually parallelism more than concurrency that matters. Certain stages in the pipeline, for example, verification of signatures are so-called “embarrassingly parallel” tasks. Others are mostly I/O bound and do not benefit from parallelism. In fact, concurrency may even degrade performance due to content switching and lost locality of data access.