Blockchain Intro Series: Consensus Mechanisms

Disclaimer: The content of these research summaries has been written after a year of reading, researching and writing about blockchain technologies and applications. Definitions may vary depending on the paper cited. The summaries provided are subject to further iterations; whereby, the first version relies on my personal understanding of the industry and the technologies. Most of it is based on informal discussions, academic papers, industry whitepapers and primary research. These research summaries may foster from previous research but do not replicate any ideas or content created previously.

Overview

Consensus mechanisms are the backbone of replicated ledgers to ensure consistency between existing blocks and new data appended to the ledger. In the case of centralised systems, the state is dependent on one main authority or a centralised group, which controls the information flow and the validity of data appended to the ledger. Assuming that a blockchain is a replicated ledger across all individual nodes in the network, whereby each node is able to append new data to the network, the system will require protocols to verify new information submitted by individual nodes. Otherwise, all nodes would have to trust each other to act honestly. In that case, the system would not be trustless nor tamper proof.

Replicated systems, such as blockchains, cannot depend on a central authority; otherwise, the ledger would not be trustless since all nodes in the system would have to trust this centralised point of validation.

Consensus mechanisms are defined within consensus protocols that guide trustless decision making. Protocols provide a set of guidelines on how processes are executed and decisions are made between various participants. The protocols that keep the ledger in synchrony by verifying new information appended to the ledger may be referred to as consensus protocols.

In the case of blockchain, node Alice may have 0.5 Ether (eth) in her wallet and wishes to transfer 0.1 eth to node Bob. Bob would be quite happy about the additional funds provided by Alice. However, every other node in the system wants to ensure that Alice is not malicious and creates the 0.1 eth transferred to Bob out of thin air if she does not have the funds. In our case Alice has the sufficient amount of funds available to transfer 0.1 eth to Bob. If Alice might not have enough funds available, she might wish to bribe other nodes in the network to transfer funds, which she does not have. Consensus protocols require nodes in the network, called validators to check whether or not Alice, or any other nodes that submits a transaction, has enough funds available to do so. Only if ⅔ of the nodes, i.e. the majority, agree that Alice has enough funds to make the transaction and that the transaction is valid, Alice’s and Bob’s account will be updated.

Depending on the consensus protocol in place, the requirements and processes to validate new transactions will vary. Generally, consensus protocols define the following:

Who gets to be a validator and participate in consensus,
For how long validators are responsible to verify transactions,
What happens if they behave maliciously, and
The processes that validators have to participate in.

The goal of consensus protocols is to allow a majority of honest nodes in the system to reach consensus even in the case of an adversary.

This summary on Consensus Mechanisms provides an overview of the design trade-offs that consensus protocols aim to optimise, a summary of different types of consensus protocols, and obstacles that consensus protocols are confronted with.

Consensus design and implications

Leaderless vs. leader models

In the case of public blockchains, all nodes should be able to participate in the consensus protocol. The consensus design should not place trust on individual nodes more than other nodes in the system. In comparison, consortium blockchains generally rely on a selected group of nodes to reach consensus. This group may either be selected by all nodes in the network or by the development team of the ledger. The latter also applies for private blockchains. The selected group or individuals who decide upon the consensus within the network may be referred to as leaders.

There are various tradeoffs between leaderless and leader based consensus mechanisms. As mentioned earlier, the more a consensus protocol relies on a central authority to reach consensus, the more centralised the protocol will be. The implications of a centralised ledger have been discussed in the summary of Replicated Systems. While leaderless systems tend to be more transparent and trustworthy since no single authority gets to make the decisions, leader based systems do not struggle with the same design obstacles, such as scalability. In fact, leader based consensus protocols tend to be highly scalable but also less decentralised and secure. If the elected leader is malicious, they will be able to manipulate the ledger and append false information or tamper with existing ones. While leaderless systems are still subject to attacks as outlined in the overview on Replicated Systems, they benefit more from blockchain design properties such as transparency, tamer-resistance, and trustlessness.

As a result of the centralisation of leader based consensus protocols, consensus can be reached faster per consensus round since they require less communication between all nodes in the network. In the case of a leader based system, nodes only have to talk to the other leaders, and once they reached agreement, communicate the result to the crowd. In comparison, a leaderless system requires hundreds of nodes to provide their sentiment to their peers and once there is a majority, the results may be broadcasted to all other nodes. Depending on the size of the network, the broadcasting of messages will take time. (For more detail, please refer to the summary on Scalability Solutions). In contrast, in leader based systems, all network participants will have a central point of reference and nodes are more likely to be in synchrony with each other than in leaderless systems.

Figure 1 Leader model overview

Figure 2 Leaderless model overview

Trade-offs & theories

Like described in the overview of Replicated Systems, blockchains aim to optimise between decentralisation, security, and scalability. If a blockchain is highly scalable it might either be in return less secure and/or less decentralised, and vice versa. Generally, it is thought that blockchains can reach at most two of the described properties. The difference is that properties, such as decentralisation and scalability, are constantly visible in the network. In every transaction, users will realise how much contribution power they have and how fast transactions are processed and finalised.

Furthermore, the consensus protocol design may also influence design trade-offs that are not directly visible to every node but become crucial in the case of a network partition. An example is the trade-off between availability and consistency. When a network is partitioned, one part of the network may not be able to communicate with the other part(s) of the network. Resulting, both parts are separated from each other. This type of network failure might be caused by a network fault, i.e. some processes did not work as intended, or because of an attack. Once this happens, the network, such as a blockchain, has no choice other than optimising for either availability or consistency. In case the network optimises for availability, both partitions can still access their part of the network as usual, and submit transactions, however, they will not be able to access the other part of the network. This optimisation may result in network inconsistencies since transactions cannot be synchronised across all nodes. Once both partitions reunite, the entire network has to figure out what information remain valid and which ones are reversed or disregarded.

Centralised systems usually prioritise consistency since the service is provided by a central data point. In case all servers, which provide the service fail, the users cannot access the service anymore. If nodes are unable to submit transactions during server failure, there is little to no risk that inconsistencies may arise. In contrast, one of the goals of replicated ledgers, such as blockchains, is to allow users to manage their funds 24/7 and submit transactions. In the case of a partition, these transactions might not be acted upon. Ultimately, the ledger is available but might not be consistent across nodes. Note that there are also blockchains that priorities consistency once a network fault, such as a partition, is detected.

The FLP impossibility theorem defines the effect of a partition on an asynchronous network more specifically. Broadly defined, an asynchronous network allows nodes to send and receive messages in an unordered manner. In contrast, a synchronous network requires users to wait for a message request to trigger a response. The messages are ordered within their transaction flow. Overall, asynchronous networks require a messaging overhead. If node A sends a message to node B and does not receive a reply, then it simply resends the message over and over again until it can be assured that one of the messages must have been received by B. In contrast, in a synchronous network A would wait for a failure message by B instead of bombarding B with messages. The FLP theorem states that in an asynchronous network, the network can reach at most two of the following properties during a partition: Safety (the majority of nodes will eventually agree on the same value), Liveness (agreement will be reached so that the state of the network can evolve), and Fault-tolerance (even during a network fault, the system can still operate and eventually recover from the network fault).

Fault tolerant hereby refers to the malfunctioning of entities within the network, this may vary between stagnation of the network by slowing down consensus up to a complete halt of all operations in the network. Note that a system is not automatically fault tolerant by design. Instead, the network design has to priorities either of the three properties described by the FLP theorem. Arguably, it should always aim to ensure fault tolerance to allow for fault recovery and to prevent a complete crash of the network, in combination with either Safety (similar to consistency) or Liveness (similar to availability.)

Figure 3 FLP theorem

The following section provides a list of various consensus protocols, their mechanisms, and the trade-offs, they prioritise.

Types of consensus protocols

Within the past ten years, more and more projects evolved around replicated, blockchain based, ledgers; each of which has their own variation of consensus protocols. This section will not provide a detailed analysis of each consensus protocol but rather an overview.

Note that most consensus protocols rely either on the Proof of Work (PoW) protocol, which was first implemented by Bitcoin, or the Proof of Stake (PoS) protocol. While PoW and variations of PoW are already running on several blockchains, PoS mainly has test implementations and design proposals. Overall, all consensus protocols in public ledgers rely on the assumption that nodes, who are willing to provide more resources than other nodes in the system, are more trustworthy. Meaning, if a users puts more stake in the game, she/he has more to lose than other users, who do not provide as many resources. Thus, it is assumed that this user will be more cautious in their decision making, not to lose the provided stake. If a node behaves maliciously and the network detects such behaviour, the stake of the node may be slashed; i.e. the funds provided by the node are taken away and get redistributed across other nodes in the network.

PoW

The PoW protocol relies on nodes, called miners, to provide computing power to the network by solving cryptographic difficult puzzles. Miners will take a list of transactions from a transaction pool and bundle those together into the next block. However, the block is only valid if the miner finds the hash of the block which references all transactions within the current block and the previous block. The hash of the block has to start with a predefined set of zeros. To generate a valid hash, miners have to find the right nonce to append to the block data. The nonce can be changed to identify the valid hash. (For more information on what a hash etc. is, please refer to the summary on Cryptography.)

In return of solving the puzzle, miners will receive a block reward and transaction fees from the transactions included into the block. Each block has a size limit. Therefore, the miner will have to decide whether to include into the block large transactions with high transaction fees or a bundle of smaller transactions with a set of smaller fees. This is up to the miners. A miner could also publish an empty block by giving up on potential transaction fees. Thus, transaction fees serve as a reward for miners for including a transaction into the block.

In the case of Bitcoin, the block header is based on the SHA-256 hash. Generally, the hash of the next block should be relatively difficult to find but easy to verify by other miners and nodes in the network. The difficulty of finding the right nonce increases over time to ensure that new blocks are mined within the same time interval. In the case of Bitcoin, this is approximately every ten minutes. Once a miner finds the next block, she/he will broadcast the block to the network and collect the transaction fee and the block reward. All other nodes in the network (full nodes and miners) will have to verify the validity of the block first, by computing the hash, before they continue mining the next block etc. It is assumed that nodes will verify new blocks without receiving any rewards since it is in the interest of every honest user on the chain to ensure the validity and consistency of transactions in the network.

PoW faces some criticism, mainly that it is time and energy expensive. Thus, mining blocks does not pay off for most users. Additionally, depending on the hash algorithm used in PoW (for more information, please refer to the summary on Cryptography), users are able to build custom hardware to solve the puzzle faster than it can be done with the use of normal hardware. This hardware is specialized in running reoccurring calculations, such as trying different nonces for finding the hash, faster than normal hardware. PoS consensus protocols are supposed to provide an improvement upon PoW protocols by making the validation and block discovery more accessible to users.

PoS

PoS requires users to place a stake in the network to participate in consensus. The stake is generally the tokens used in the network. Once a user has placed a stake, it is locked up during the duration of the current consensus round. Summarising, all nodes in the network, who wish to participate in the consensus, lock up a stake. Either all nodes who have locked up a stake, the n number of nodes with the highest stake, or all nodes who have submitted a minimum stake, are chosen as validators. Validators are usually in place for one or a certain number of consensus rounds. These consensus rounds are generally referred to as epochs. Epochs may be divided into additional consensus steps. Once the majority of nodes have reached consensus on the next block, it is appended to the ledger. The nodes, who have followed the protocol honestly and submitted the right responses, may receive their stake back and a portion of the transaction fees. The portion they receive may either be for all nodes the same or depend on other parameters, such as the amount of stake they locked up. The latter suggests that nodes have taken higher risks than other nodes and thus, should receive a higher reward.

Depending on the PoS protocol in place, the number of nodes, who are able to participate in consensus, will vary. Also, consensus may be reached faster with a higher level of finality in some protocols over others. Some PoS protocols rely on a leader/block proposer who is randomly selected from the set of validators. Any validator may be chosen to be the block proposer. Thus, all validators may have to have a set of transactions ready to propose in the next consensus round.

Tendermint

Depending on the variation of PoS protocol, each epoch will have various steps. An example of a PoS consensus protocol is Tendermint. Tendermint is divided into Tendermit Core and Tendermint ABCI. The latter is the modular application interface. Tendermint Core is the consensus protocol of Tendermit, which ensures that all transactions are appended to the ledge in sequential order.

The consensus of Tendermint in based on Byzantine Fault Tolerance (BFT). Validators have to provide a stake to the network to participate in consensus. In case the validators behave maliciously, their stake is slashed. Depending on the implementation of Tendermint, either a subset of nodes, who provided a stake, all nodes that provided a stake, or only the nodes, who provided at least a minimum stake, are chosen as validators. Each consensus round (epoch) is divided into four distinct consensus steps: (1) propose, (2) prevote, (3) precommit and (4) commit. The stake of a node is split up and distributed across the different consensus rounds. After each successful consensus round, part of the stake will be unlocked and given back to the validator nodes.

First, out of the set of available validators, a block proposer is chosen, whose responsibility is to propose the next block to the network by broadcasting the information of the block to all other nodes in the network. This is the propose step (1). In case the chosen block proposer fails to broadcast a block to the network, her/his stake is slashed and a new block proposer is selected. Moving on to the prevote step (2). Once a validator node receives the proposed block, they will sign and broadcast the block to other nodes in the network. Note that a validator may receive proposed blocks from previous consensus rounds. If those are valid, the validator may sign those blocks, too. Thus, not all nodes are in synchrony on the state of the network. At least ⅔ of validators have to sign the block in the prevote step (3). If a validator node x receives a block in the prevote phase that does not have ⅔ of validator signatures from the propose stage, the validator node x may not sign the block at the prevote step. In contrast, if the block broadcasted in the propose step has received ⅔ of signatures from all validators, then the validator nodes can sign and rebroadcast the block to the other validators. Lastly, nodes may only sign the block in the commit step (4) if they have received ⅔ of signatures in the precommit step. This is similar to the precommit step. Furthermore, the commit step is dependent on a time interval. Once validators have received the block, they have a pre-defined time period to sign off the block and declare it to be final.

Similar to other consensus protocols, Tendermit has several disadvantages. First, the protocol has to be able to identify roughly the size of the network to ensure that consensus is reached by at least ⅔ of nodes in the network. This is relatively easy if only a subset of randomly selected nodes are allowed to participate in the consensus. Also, as mentioned earlier, nodes may not be in the same consensus round. Resulting, the network has to deal with latency between consensus nodes. Depending on the level of fault in the network this latency may increase arbitrary. However, nodes may still be available to submit transactions to the network, even if the network will not reach consensus. Therefore, Tendermint prioritises availability over consistency.

Figure 4 Simplified overview on the different steps of Tendermint consensus rounds

Delegated Proof of Stake (DPoS)

DPoS is the most similar to current representative politics structures. Nodes do not place a stake on themselves but instead on representatives. The n number of nodes who got the most stake assigned to them will be the representatives, also called witnesses. Resulting, nodes, who have a higher stake, are able to influence the election of representatives more than nodes who have a smaller stake. The nodes, who received the most stake in the election, will become the validators or representatives in the system. The problem with most implementations of DPoS is that the representatives remain ‘in power’ not only for one validation round or epoch but usually for several validation rounds. Each time validators reach consensus on the next block, they will receive a reward for their work. Resulting, with each round validators will get richer and richer, causing a plutocracy. This is one of the main criticisms of DPoS.

Note that in most blockchain networks, it is highly difficult to proof someone's identity. Since no one can proof that one of the staking nodes is the same as a validator node, nodes can also place a stake on themselves. Once validators have already a large stake in the network, there is little that other, smaller nodes in the network, can do to counteract this effect and elect someone else to be the representative. The rich will remain in power until they decide to delegate the decision making rights to someone else. Arguably, similar can be said for PoS systems. However, DPoS protocols usually only rely on a few validator nodes (e.g. 9, 21, 71). Since the transaction fees and validator rewards only have to be shared between a few nodes, each individual node will receive more funds. Generally, it is easier to over-stake one node (as in PoS) instead of a group of nodes that rely on their ‘leader’ (DPoS).

Figure 5 A representation of the staking dynamics in DPoS.

Figure 6 In comparison to DPoS, PoS allows individual nodes to compete for a validator position.

Furthermore, DPoS systems have been associated to bribing attacks, in which powerful validators bribed other users to stake on them to remain validator. In return for staking on the validator, nodes would receive a percentage of the validation reward. You can read more on this here.

Additional variations of consensus protocols

Obstacles of Consensus protocols

Consensus Protocols have different mechanisms to declare the finality of transactions. This makes it difficult to reach cross chain interoperability if transactions are declared final on one chain but not on the other. In PoW, with every block added on top of block x, the probability of the transactions being final just increases; it is theoretical still possible to mine on the first or n block in the chain and overtake the current chain as the longest chain. It probably be perceived as a fork from the current chain and disregarded but in theory, another chain could overtake the current chain and make all transactions invalid. In contrast, BFT based consensus protocols, such as Tendermint offer high finality of the transactions once they are appended in a block on the current chain.
Ultimately, consensus protocols have to optimise for design trade-offs described earlier. The most decentralised protocols generally process the fewest transactions per second but provide higher transparency and security in the network.
Consensus Protocols enhance various attack vectors, which will be discussed in a future iteration of this summary.

Main Points

Consensus mechanisms are defined within consensus protocols that guide trustless decision making.
The more a consensus protocols relies on a central authority to reach consensus, the more centralised the protocol will be.
The FLP theorem states that in an asynchronous network, the network can reach at most two of the following properties during a partition: Safety (the majority of nodes will eventually agree on the same value), Liveness (agreement will be reached so that the state of the network can evolve), and Fault-tolerance (even during a network fault, the system can still operate and eventually recover from the fault).
If a node behaves maliciously and the network detects such behaviour, the stake of the node may be slashed; i.e. the funds provided by the node are taken away and get redistributed across other nodes in the network.
Transaction fees serve as a reward for miners/verifiers for including a transaction into the block.
The PoW protocol relies on nodes, called miners, to provide computing power to the network by solving cryptographic difficult puzzles.
PoS requires users to place a stake in the network to participate in consensus.
Depending on the variation of PoS protocol, each epoch will have various steps. An example of a PoS consensus protocol is Tendermint. Tendermit prioritises Availability over Consistency.
DPoS is the most similar to current representative politics structures; but has been associated to enhancing inequalities between nodes and various attacks.

Update 22.07.2019:

A new paper outlines the design of a distributed ledger, which orders transactions by decoupling consensus mechanisms and transaction validity. Summarising, nodes will are merely required to validate the transactions that validate the state of the application in use. “The ‘data availability problem’ asks how a client– such as a light client–that only downloads block headers, but not the corresponding block data (e.g., list of transactions), can satisfy itself that the block data is not being withheld by the producer of the block (e.g., a miner), and that the full data is indeed available to the network” -- a solution to this problem has been proposed by Mustafa Al-Bassam et al. via erasure coding.

LazyLedger divides the network into three different types of nodes: Consensus Nodes, Storage Nodes, and Client Nodes, which are all connected in a peer to peer network. If A is not a dependency for B, then A is irrelevant for the users of B. For the contrary, an application may only be executed if another application is within a certain state.

Furthermore, this paper outlines a consensus mechanism design that does not depend on token holders to collateralise value in the network. Instead it implements similar principles as the web of trust. Thus, allowing more participants to contribute in the consensus, making the network overall more decentralised.