The Ethereum beaconchain has been live since 1st of December. It has over 128,000 validators staking a total of more than 4.1M Ether.
First incident on ETH2
About two week ago on the 24th of April, the beaconchain experienced its first hiccup in operation after running perfectly fine for almost 5 months since inception. Validators running the Prysm implementation were unable to propose blocks due to an issue with processing ETH1 staking deposits. The problem was quickly identified and a bug fix was released about 12 hours after the first problems occurred. Throughout this incident the ETH2 chain kept running and the only consequence was that validators lost about 0.52 on average in rewards. You can find more information about the backgrounds and lessons learnt in Prysm’s post-mortem. From the perspective of the beaconchain health, there are two main takeaways from this incident:
- Having four production-ready ETH2 implementations, as we already have seen in Ethereum’s history, turned out once again critical as non-Prysm validators were able to keep producing blocks and keep the chain running. If ETH2 would have had only the Prysm implementation, the beaconchain would have come to a halt.
- Furthermore, this incidence gave us a good overview of the distribution of client implementations among the ETH2 validators. During the two hiccup periods of this incident about 71% were not proposed. Given that under normal circumstances about 1% of blocks are not proposed, we can estimate that the amount of validators running a Prysm setup is likely more than ⅔.
Given we have four excellent ETH2 implementations — Lighthouse, Nimbus, Prysm and Teku but more than ⅔ of the ETH2 validators are running the go implementation Prysm, it is fair to ask the question whether the beaconchain is in a healthy state and whether we as the Ethereum staking community need to take action.
What does ‘healthy’ actually mean in the context of a consensus system?
The ETH2 beaconchain is a distributed consensus system. As Leslie Lamport showed in 1977 in ‘Proving the Correctness of Multiprocess Programs’ there are two properties needed to achieve correctness in distributed systems: liveness and safety.
In layman terms, liveness is the property that eventually something good will happen, that we are making progress. In the context of the beaconchain it means that we eventually will reach finality.
On the other hand, safety means that nothing bad will ever happen. In other words, for ETH2 it means that we don’t make inconsistent decisions and that we don’t finalize 2 inconsistent epochs.
So in order to keep the beaconchain healthy, we need to avoid liveness and safety failures. But what can we, the Ethereum staking community, do to keep the beaconchain healthy and how can we best avoid liveness and safety failures?
Diversity in staking setup matters
There are many ways to look at this topic, I’d like to approach this topic from the validator’s standpoint.
At the beginning of the staking journey each Ethereum staker has to make some important decisions. If they have more than 32ETH and want to stake by themselves the first questions will be about their staking setup. ‘Where to host, what type of hardware or which ETH2 implementation to use?’ will come to mind.
On the other hand, everyone who doesn’t have 32 ETH but wants to take part in the staking experience needs to choose a staking provider. There are many aspects for finding the staking provider that fits your needs. One aspect that might not often be considered that strongly but is very relevant for the beaconchain health (and thus also for the expected rewards, see anticorrelation incentives below) is the staking setup of the staking provider.
The staking setup is mostly relevant because correlated failures are the biggest threats to safety and liveness. We can say that usually multiple individual failures distributed over time aren’t even as close as bad for the liveness and safety as many failures at about the same time.
A good overview of the type of failures that can occur to each individual staker can be found in Carl’s blog post: Validated, staking on eth2: #6 — Perfect is the enemy of the good
Carl’s article also highlights the anti-correlation incentives that are in place in ETH2. Mechanisms like an increasing penalty for offline validators when ETH2 is not finalizing or increased slashing penalties when other validators are slashed at a similar timeframe incentive you to have a staking setup that doesn’t fail when there are failures by other validators.
One way to look at it is ‘diversity increases robustness’. The idea is the following: The less similar the validators setups are, the less likely they fail at the same time.
Diversity can not only be achieved in relation to the overall ETH2 network but also within a staking setup.
I’d like to go over the 3 most important aspects for the staking setup when it comes to avoiding correlated failures.
1. Hosting location — If ETH2 validators are distributed among many different hosting providers in various geopolitical regions it is no big threat to the liveness of the beaconchain if a single cloud hosting provider, let’s say eg AWS has a major downtime or decides to ban crypto services. But if the majority would stake on AWS, then this would have a strong impact on the liveness of the beaconchain. The ETH2 chain would stop finalizing and the longer the chain would stay in this state the larger the offline penalties for offline validators would get. Therefore it is a good idea to not host your validators at a location where a lot of other validators are running their staking setups. Hint: That means, although it might be most convenient, cloud providers are likely most populated by other validators. Solo stakers could consider staking from home or with local vps hosters whereas professional and institutional stakers should consider having systems in place that host on multiple providers or even in local data centers respectively on their own servers.
2. ETH2 software stack — If the ETH2 validators use diverse software stacks like ETH2 beacon node or validator client implementations, then this reduces a threat to liveness and safety if one of the implementations has a bug. The incident mentioned earlier, would have had almost no impact if the ETH2 client were distributed more evenly between validators. Anti-correlation incentives in ETH2 discourage staking setups with the same software stack as the majority of the network.
Next to the inbuilt ETH2 incentives, the Ethereum staking community should encourage software stack diversity even more strongly, at least the usage of a non-majority Ethereum software stack (for now that means non-Prysm, non-Geth). Improving the UX for switching the ETH2 implementation to another would be an important step to achieving a more evenly distributed use of implementations. More professional setups or solo-stakers with multiple validators should also consider setups with multiple clients for example running multiple beacon nodes (setting one as failover beacon node) or a multi-client software like Vouch.
3. Centralization & mono-cultures — If there is a single entity running a lot of validators and particularity with the same staking setup, they can become a threat in terms of liveness as well as safety for the beaconchain. Control over many validators can lead to byzantine behavior and malicious attacks. Likewise if one entity runs many validators with the same setup then failure of their setup results can result in a correlated failure by just that single entity.
Currently the stake distribution looks as following (source: beaconcha.in):
The biggest ETH2 staking entities we can identify right now are two exchanges:
Kraken with a little less than 17% and Binance with about 7% of the total stake. Among the other high stake entities are mostly staking pools like Bitcoin Suisse, Staked.us, Lido and Stakefish.
Given that a collusion of malicious actors would need about ⅓ total stake at worst (less with weaker synchronicity assumptions / high network latency) in order to stall the chain and attack liveness, we need to keep an eye on the stake distribution.
If you are considering staking with a staking service since you can’t run a validator by yourself, please keep in mind that it is not only healthiest for the beaconchain but also in your best interest to avoid centralization of stake.
Staking providers should act responsibly and should keep in mind that failure in their system can have a high impact on the ETH2 network. Especially since anti-correlation incentives will result in higher offline and slashing penalties when nodes fail all together (crash and safety faults), pools should consider architectures that have diverse setups and spread control over the validators to multiple entities, thus keeping the number of validators run by one entity low.
Secret shared validators is a very promising technology that will help increase robustness of all types of staking setups: solo-stakers, staking providers as well as institutional stakers. SSV nodes are currently under development by the Blox team who got a grant by the Ethereum foundation earlier this year. SSVs work similar to splitting the validator key in multiple pieces and then have a byzantine fault tolerance protocol in place to come to consensus internally first. That means that you can run each SSV node (each of them hold 1 piece of the validator key) on a different staking setup configuration or on a different hosting provider.
Please also check out this excellent blog post by Mara written earlier this year on how SSVs work and how they can help make staking setups more robust. SSVs can not only increase robustness of staking setups but might also bring new pooling options to the table like friends & family pools or trust-minimized staking pools.
Overall we can say that despite the fact that we still work on client diversity and need to pay attention to possible centralization risks, the beaconchain has been running almost perfectly for the first 5 months so far. Huge shout out to all client developers and researchers that were involved in bringing us this far and which are currently working hard on the merge.
We have four excellent ETH2 implementations in our community maintained by four awesome teams. That said, Prysm as much as we love you but for the sake of a healthy network let us try to give Lighthouse, Teku and Nimbus some more love. ❤
As always feel free to reach out to me on Twitter @phil_eth or on the Ethstaker discord @phil.eth. Ethstaker is the warm and welcoming home for all Ethereum stakers and future stakers. If you haven’t joined so far come by and say hello (: