The Family of Asynchronous BFT Consensus Protocols

This survey is inspired and supported by the Dora Factory team during the research into practical ABFT implementations on orbital communication and computation systems.

In an asynchronous environment, there is no bounded time for a message sent from one to another. In this article, we survey the family of asynchronous BFT consensus protocols, analyze security and performance of several important protocols.

Introduction of blockchain as a distributed system

In this article, we will go through the family of Asynchronous Byzantine Fault Tolerant (ABFT) consensus in blockchain systems. So let's briefly (re)-introduce blockchain.

A transaction-based state machine

As a state machine, a blockchain system stores the complete states for multiple objects (expressed by addresses) from the beginning time (birth of the objects in the system) to the present. The system also stores deterministic logics (e.g. encapsulated in smart contracts) that can be invoked by transactions.

Transactions are the inputs of the system. Each transaction encapsulates a set of data. The data can be logics, addresses, or state data that has been recorded in the system before. Each transaction is sent to the system to invoke one or more certain logics that have been deployed in the system. The contents of the transaction are treated as inputs to the certain logics to be invoked. After that, the system exports a set of outputs that will be regarded as the new states. The transition process can be described by the following equation:

=

where is the state transition function; the transaction and the new states are the input and outputs of this function.

Next, we should identify the actors in the system.

The actors of the system

Nodes. We assume the system (blockchain) has nodes. Each node can communicate with other nodes to exchange information. They record and update the system's state. Importantly, they are responsible for maintaining the security of the system.

Users. Participants of the system. Each user is an object with a unique identity (address), although in reality a user often has more than one identity in the system.

Consensus protocol. Conceptually, consensus protocol is a set of ordered instructions that stipulate the behaviors of every honest node. In order to keep the system secure and high-performant, we must figure out how to coordinate the nodes via consensus protocols.

The assumptions of an asynchronous system

In an asynchronous network, we must assume:

  • Asynchrony. there is no bounded time for a message sent from one to another;
  • Unforgeability. if a node sends a message to another it connects, no one can modify the message.
  • Termination. if a message has been sent from one to another, the another will eventually receive this message.

The three assumptions are crucial to guide us to construct an ABFT consensus in an asynchronous environment. (1) is the main assumption of ABFT protocols, a property that differentiates ABFT from other BFT protocols; (2) is easy to achieve through asymmetric cryptographic technologies such as digital signatures; (3) is an assumption that corresponds to an asynchronous environment. We will see that without this assumption, it's going to be very hard to construct a feasible ABFT consensus protocol.

One more (advanced) assumption

  • Adaptive adversary: An adaptive adversary is not restricted to choose which parties to corrupt at the beginning of an execution, but is free to corrupt (up to ) parties during a consensus process.

Some important metrics

  • Quality: the probability that the decision value was proposed by an honest node.

Here, "propose or proposed" means the opinion of a node to an object.

  • Asymptotically optimal time: if Byzantine Agreement is solved using an expected O(1) running time, we say asymptotically optimal time.

expected O(1) means 1) only one instance is operated; and 2) the instance can terminate within certain steps with overwhelming probability.

  • Asymptotically optimal word communication: An ABFT consensus protocol solves Byzantine agreement using an expected message and each message contains just a single word where we assume a word contains a constant number of signatures and domain values.

The goals of the system

For security, we must guarantee:

  • (1) the inputs of the system must be verifiable and only the inputs that are verified as valid will be accepted by the system;
  • (2) the logics that are invoked by the inputs are executed correctly.
  • (3) if (1) and (2) hold, all honest nodes in the system must output the outputs of (2).
  • (4) the system must guarantee liveness, i.e., as long as a correct input has been sent by a user to an honest node of the system, the input will be eventually executed to export its output (new state).

Actually, (1) is not explicitly emphasized in current ABFT consensus. However, (1) is important to let honest nodes to only send valid messages to others at the beginning of executing an ABFT consensus protocol.

(2) responds to the execution of smart contracts or the instructions defined by an ABFT consensus protocol.

(3) emphasizes the consistency, i.e., every node has the same view on the corrected outputs. (some paper says (3) as totality property)

(4) signifies the liveness property. It means an unified consensus will be eventually reached by all honest nodes if all of them executes the consensus protocol.

The two properties, consistency and liveness, define the security of the system. Actually, (1), (2), (3) is used to guarantee consistency.

Performance metrics of the system

  • Throughput. The number of transactions the system can process at a certain time slot, e.g., 1 seconds, denoted by tps.
  • Latency. the duration from the time (a correct transaction sent to an honest node) to the time (the transaction is recorded in the system).

We always hope for high throughput and low latency.

Constraints of a system

Since the system is operated and maintained by its nodes, the hardware bottlenecks of the nodes actually indirectly constrain the performance of the system. Generally, three capacities mainly determine a node's hardware capacity.

  • Processor (CPU): CPU is responsible for performing various computational tasks. In blockchain networks, nodes need to perform tasks such as verifying transactions and processing consensus algorithms, which require the computational power. Powerful CPUs can increase the speed and efficiency of nodes when processing tasks. In general, ABFT consensus assumes the computational capability is enough to all the nodes in the system.

  • Memory: Memory is used to store the programs and data being executed. Blockchain nodes need enough memory to store transaction pools (Txpool), handle consensus algorithms and execute smart contracts, among other tasks. Higher memory capacity and speed can improve the performance and responsiveness of nodes.

  • Bandwidth: Bandwidth refers to the data transmission rate of a network connection and is used to measure the amount of data a node can send and receive per unit of time. Blockchain nodes need to communicate a lot with other nodes, including broadcasting transactions, transmitting blocks and sharing data. Higher bandwidth can increase the speed of communication between nodes, thus improving the performance of the entire network.

Other capacities:

  • Storage: Storage is a persistent storage area of a computer that is used to keep data. Blockchain nodes need to store data for the entire blockchain, including all blocks and transaction records. As the blockchain grows, the storage requirement will also increase. Therefore, nodes need enough storage space to store the data of the entire blockchain.

  • Network Interface: A network interface is a hardware device that connects a computer to an external network, such as an Ethernet card or a wireless network card. Blockchain nodes need a stable and high-speed network interface to stay connected to other nodes for communication and data transfer.

Bandwidth may be the most scarce resource of the system. We assume a node with a least bandwidth capacity has 1000 Mps bandwidth, which means it can upload or download at most 1000/8=125MB data in a second. We assume one transaction has 500 bytes; assume for each second, 50MB are used to receive user's transactions and other node's transactions and 75MB are used to send messages for consensus. On this basis, a node can receive at most pieces of transactions. Therefore, the maximum tps of this system is .

Memory is also critical. Each node must store necessary state data of the system and the transactions received as valid but not yet recorded in the system into its memory. As the system grows, the size of state data will be increased continually. If the size of state data is too large, it will centralize the system.

Computation capacity (CPU/GPU) is also important for the nodes and the system. Since more and more complicated logics have been deployed to blockchains, if a node has poor computation capacity, it will take a long time for transaction verification and logic execution. Other nodes configured with high computation capacity will wait for the low-bounded nodes.

In summary, ABFT consensus is constructed to fully use the hardware capability of the participating nodes in the system to let them efficiently and quickly reach an agreement on a block with the preservation of consistency and liveness.

A naive example

Conceptually, asynchronous Byzantine Fault Tolerant (ABFT) Consensus processes transactions in ordered epoches. one ABFT process is handled in one epoch and each ABFT process can be generally divided into three stages, i.e., transactions collection and block formulation (TCBF), message broadcast, and steps to reach agreement on the broadcasted message. After the process, a block is generated and chained. How does that happen? How is it different from a normal BFT epoch? Let's dive into a complete but naive ABFT epoch.

Stage 1: Transaction collection and block formulation (TCBF)

Each node in the system has a unique terminal that everyone can send messages to it.

  1. Each node listens, receives and verifies the transactions sent by users or other nodes. Valid transactions will be recorded in the node's local transaction pool (Txpool). The standard of "valid transaction" varies. For example, one common rule is that the signature w.r.t the transaction and its sender must be verified as correct.
  2. Before the beginning of an epoch, each node should select a certain number of transactions in Txpool and orderly package them into a sub-block. The selection operation can be randomly or deliberately performed by the node. Also, the exact sequence of the transactions in the sub-block depends on the nodes as well.

Generally, the sub-block can be further processed Erasure coding. However for simplicity, we temporarily let it aside. Also, we denote the block formulated by as .

Stage 2: Byzantine Broadcast

Byzantine Broadcast can be achieved by Reliable Broadcast (RBC). Here, we look at the first RBC protocol.

At the beginning of an ABFT epoch, each node initiates its RBC instance:

  1. broadcast its sub-block (“broadcast”means the message is sent to all the nodes the connects).
    After this step, all the nodes will eventually receive .
  2. Once receives 's sub-block , it directly broadcasts it.
    After this step, all the nodes will receive sent by . As a result, each honest node will broadcast once and each node will eventually receive for at least times.
  3. If a node like receives the same for at least times, it broadcasts . If it receives for at least times and it still has not broadcasted , it broadcasts .
    After this step, waits for pieces of . If it receives pieces of , it assures that every honest node will receive pieces of . So it can assure that every honest node will reach an agreement on sub-block . So we say “ delivers ”and ’s RBC instance terminates for .

We must emphasize that receiving is equal to . If a node receives , it means at least is from honest nodes. This is the reason why the node can broadcast (since honest nodes have a view on and is the majority of ). At the same time, means at least one honest node broadcasts . It says that the honest node has been received . It then means at least that the honest node has been received will be eventually received by itself. So once it receives , it actually reaches the state of the honest node that broadcasted .

Additionally, if a honest node receives , it means honest nodes have broadcasted , so every honest node will eventually receive the and every honest node will eventually broadcast . Finally, every honest node will eventually receive at least .

Since there are nodes in the system, pieces of RBC instances will process following the above three steps at Stage 2. Theoretically, if all the nodes in the system are honest, each node will deliver RBC instances. However, in reality, we have to assume there are up to Byzantine nodes, i.e. , so we can rule that if an honest node has delivered different RBC instance, the node can enter the next stage. In an asynchronous consensus protocol, the agreement is achieved via Stage 3: asynchronous (Byzantine) binary agreement (ABBA or ABA).

Why do we need ABA?

Considering that both and have delivered different RBC instances. It is clear that we cannot guarantee that the different RBC instances delivered by the two honest nodes are consistent. As a result, after RBC, the two nodes cannot reach an agreement on the RBC instances. We need one more stage to let all honest nodes reach an agreement on the RBC instances. The third stage is called asynchronous Byzantine binary agreement.

Stage 3. Asynchronous (Byzantine) binary agreement (ABBA or ABA)

Now we introduce an ABA protocol presented in this paper, which was later used by Honey Badger BFT protocol. There are in total ABA instances to process in this stage. Take as an example. An ABA instance progresses in rounds, like . To make it clear, we can pick up a round and dive into it. For example, progressed in to illustrate the steps of stage 3.

  1. At the beginning of for , from the perspective of , assuming is honest and has entered this stage, if delivered at stage 2, then broadcasts in , where represents the opinion of "agree" on 's RBC instance, and represents . If did not deliver at stage 2, there will have two situations: 1) if has not decided on different ABA instances, does nothing at this moment; 2) if has decided different ABA instances in which is not included, broadcasts in . We will explain the meaning of “decide” shortly.
  2. If receives the same and it has not broadcasted , it will broadcast .
  3. If receives the same , and it has not added to a set called , it add to .
  4. When the set of is not empty, broadcasts . (Apparently, is the value that is the first to add to the set of ).
  5. If receives valid , it broadcasts its threshold signature shares. (Here, we must emphasize that a “valid ”for means in has been added to ’s ).
  6. if receives valid threshold signature shares from other nodes, it generates a unique threshold signature . It then calculates or and create a common coin =the first binary number of or . Note that there are three cases:
    (1) values from the received valid are the same:
    (1.1) If the first binary number of or is equal to that has been added to ’s , it decides as .
    (1.2) otherwise, broadcasts in .
    (2) if the values from the received valid are not same (i.e., both and are added to and received both and ), broadcasts at .
    Repeat until decides .

Apparently, "decide or decided" here means the node has determined whether to include a sub_block into the final block to generate in this epoch. decides as 1 means that agrees to add to the block generated in this epoch. decides as 0 means does not agree to add to the block generated in this epoch.

Protocol Security

Security considerations of Stage 1

At Stage 1, the nodes of the system mainly perform the block formulation operations, i.e., packaging valid transactions from the Txpools into their sub_blocks. Hence, in Stage 1, we should consider whether the node should validate the transactions before it adds them to its sub_block. The validity of a transaction should be carefully defined. Moreover, it is important to prevent censorship attacks - every valid transaction should be included in the blockchain within a certain time slot.

Security analysis on Stage 2

At Stage 2, each node will initiate, progress and complete its RBC instance. In order to make a RBC instance secure, the RBC instance must satisfy the following properties:

  • Agreement. If a correct node delivers , then every node eventually delivers .
  • Validity. If a correct node broadcasts correct , then eventually delivers .
  • Integrity. For any sub_block , every correct node only delivers at once. Moreover, if a node delivers , then was previously broadcasted by a honest node.

As for validity, because an honest node will broadcast to all the remaining honest nodes. After that, those honest nodes will broadcast . As a result, every honest node will receive at least . On this basis, every honest node will broadcast and every node will receive at least . Therefore, validity is satisfied.

As for agreement, if an honest node delivered , it must have received at least . On this basis, there are at least is broadcasted by honest nodes. As a result, every honest node will receive the , which means every honest node to broadcast . Therefore, every honest node will receive at least .

Integrity is trivial to prove.

Security risks

An adversary can deliberately delay the communication and increase latency (network latency caused by DDoS attack and so forth) of some honest nodes. On this basis, an adversary can launch an attack to censor some honest node's sub_blocks.

The adversary can first delay some honest nodes’ RBC instances. After that, an honest node may deliver RBC instances. However, some honest nodes’ RBC instances are not included in the RBC instances. The reason why this attack can be launched is due to the fact that a node only is only required to deliver RBC instances to enter the next stage. The adversary can censor at most RBC instances of honest nodes.

Based on this attack, the adversary can further let the sub_blocks of Byzantine nodes tain invalid transactions. Due to the mechanism of ABFT protocols, the transactions encapsulated in each sub_block will not be verified until the final block of this epoch is generated. At this time, honest nodes will find some of sub_block's transactions invalid. As a result, those invalid transactions will be deleted from the block.

Since an adversary can at most censor honest node's sub_blocks, the final block in other word can contain at most Byzantine node's sub_blocks. As a result, in the final block, only valid sub_blocks are included in this context, losing nearly 50% of all valid transactions.

Security analysis on Stage 3

Similarly, in order to make an ABA instance secure, we require each ABA instance satisfy the following three properties:

  • Validity: If all honest nodes propose for an ABA instance, then any honest node that terminates decides the ABA instance as .
  • Agreement: If an honest node decides a ABA instance as , then any honest node that terminates decides the ABA instance as .
  • Termination: Every honest node eventually decides 0 or 1 for each ABA instance.
  • Integrity: No honest node decides twice.

Proving validity is trivial: all honest nodes propose means all honest nodes broadcast . Therefore, will be added to every honest node’s and will not be added to every honest node’s . As a result, every honest node will broadcast and every honest node will receive at least . Also, every node will not accept since is not included in its . So every honest node will eventually decide the ABA instance as when .

As for agreement, if an honest node decides for an ABA instance, it must have received at and . So there are at least honest node broadcast at . As a result, each honest node will receive in . Therefore, in , every honest node will broadcast , which leads to the same situation as the one in validity. Then agreement property is guaranteed.

Proving integrity property is trivial since it is impossible for a Byzantine node to simultaneously receive 2f+1 and (an honest node only broadcasts at once).

A security glitch

In the naive scheme, termination can be violated in the following scenario:

Assume there are honest nodes proposing for an ABA instance (such as ) in and there are honest nodes propose for in . This scenario is possible in an asynchronous network environment (e.g. there are honest nodes delivering so they propose for . However, due to network latency, the other honest nodes do not deliver before they have decided ABA instances for 1, so the honest nodes propose for ).

So every honest node will receive and . Then, every honest node will broadcast and every honest node will add to its . However, Byzantine nodes also broadcasted some , which makes every honest node receive at least and let them also broadcast . After that, will also be added to each honest node’s .

Some honest nodes (assuming honest nodes) broadcast before they add to their . Then, Byzantine nodes send to some of those honest nodes. Also, the adversaries can control that those honest nodes cannot receive at least during a certain time slot (adding latency), so they will not accept during this time. As a result, those honest nodes will broadcast no matter what common coin at . However, Byzantine nodes send to the remaining honest nodes, which makes both and are received and accepted by those honest nodes. So those honest nodes will broadcast .

As long as adversaries let , the situation can continue infinitely (in each round , there are some honest nodes broadcast and some nodes broadcast , where ). To let , the adversaries divide the honest nodes into three groups and let group1’s honest nodes accept both and ; group2’s honest nodes only accept and group3’s honest nodes only accept . Then, adversary let the group1’s honest nodes receive . After that, the adversary can know when an honest node broadcasts its TS share. Then, adversaries let some honest nodes only receive 2f+1 , where .

Performance analysis on the naive example

Performance analysis on Stage 1

Specifically, a node's sub_block contains pieces of transactions where is the total number of transactions to be packaged in a block. If we assume each honest node randomly selects pieces of transactions to its sub_block, it has to expect that there must be a certain number of transactions that will be deleted to the final block due to repetition. For example, if we assume packages transactions denoted by () into its sub_block . In the meantime, ,