Byzantine fault-tolerant deferred update replication

Pedone, Fernando; Schiper, Nicolas

doi:10.1007/s13173-012-0060-z

Volume 18 Supplement 1

LADC'2011

LADC 2011
Open access
Published: 07 February 2012

Byzantine fault-tolerant deferred update replication

Fernando Pedone¹ &
Nicolas Schiper¹

Journal of the Brazilian Computer Society volume 18, pages 3–18 (2012)Cite this article

1991 Accesses
8 Citations
Metrics details

Abstract

Replication is a well-established approach to increasing database availability. Many database replication protocols have been proposed for the crash-stop failure model, in which servers fail silently. Fewer database replication protocols have been proposed for the byzantine failure model, in which servers may fail arbitrarily. This paper considers deferred update replication, a popular database replication technique, under byzantine failures. The paper makes three contributions. First, it shows that making deferred update replication tolerate byzantine failures is quite simple. Second, the paper presents a byzantine-tolerant mechanism to execute read-only transactions at a single server. Third, we consider byzantine client attacks against deferred update replication and discuss effective countermeasures against these attacks.

1 Introduction

Replication is a well-established approach to increasing database availability. By replicating data items in multiple servers, the failure of some servers does not prevent clients from executing transactions against the system. Database replication in the context of crash-stop failures has been largely studied in the past years (e.g., [4, 10, 13, 18, 19]). When a crash-stop server fails, it silently stops its execution. More recently, a few works have considered database replication under byzantine failures (e.g., [20, 23]). Byzantine failures are more severe than crash-stop failures since failed servers can present arbitrary behavior.

Several protocols for the crash-stop failure model are based on deferred update replication. According to deferred update replication, to execute a transaction, a client first picks a server and submits to this server its transaction commands. The execution of a transaction does not cause any communication among servers until after the client requests the transaction’s commit, at which point the transaction enters the termination phase and is propagated to all servers. As part of termination, each server certifies the transaction and commits it, if doing so induces a serializable execution, i.e., one in which transactions appear to have been executed in some serial order.

Deferred update replication scales better than state-machine replication and primary-backup replication. With state-machine replication, every update transaction must be executed by all servers. Thus, adding servers does not increase the throughput of update transactions. With primary-backup replication, the primary first executes update transactions and then propagates the database changes to the backups, which apply them without reexecuting the transactions. The throughput of update transactions is limited by the capacity of the primary, not by the number of replicas. Deferred update replication scales better because it allows all servers to act as “primaries,” locally executing transactions and then propagating the modifications to the other servers. As applying transaction modifications to the database is usually cheaper than executing transactions, the technique provides better throughput and scalability.

Ensuring strong consistency despite multiple co-existing primaries requires servers to synchronize. This is typically done by means of an atomic broadcast protocol to order transactions and a certification test to ensure the consistency criterion of interest. One of the key properties of deferred update replication is that read-only transactions can be executed by a single server, without communication across servers. This property has two implications. First, in geographically distributed networks it can substantially reduce the latency of read-only transactions. Second, it enables read-only transactions to scale perfectly with the number of servers in the system.

This paper considers deferred update replication under byzantine failures. It proposes the first byzantine fault-tolerant deferred update replication protocol that is faithful to its crash-stop counterpart: (i) the execution of a transaction does not require communication across servers, only its termination does, and (ii) only one server executes the transaction commands, but all correct servers apply the updates of a committing transaction. Our protocol is surprisingly simple and similar to a typical crash-stop deferred update replication protocol, although based on a more strict certification procedure to guarantee that transactions only commit if they do not violate consistency and read valid data (i.e., data that was not fabricated by a byzantine server).

Our most significant result is a mechanism to execute read-only transactions at a single server under byzantine failures. Some protocols in the crash-stop model achieve this property by carefully scheduling transactions so that they observe a consistent database view. In the byzantine failure model, however, clients may inadvertently execute a read-only transaction against a byzantine server that fabricates a bogus database view. In brief, our solution to the problem consists in providing enough information for clients to efficiently tell whether the data items read form a valid and consistent view of the database. Clients are still subject to malicious servers executing read-only transactions against old, but consistent, database views. We briefly discuss in the paper the extent of the problem and remedies to such attacks.

Finally, we also consider byzantine client attacks against deferred update replication. Since concurrency control is optimistic in deferred update replication, byzantine clients can launch attacks against honest clients by trying to maximize the abort rate of the transactions submitted by the latter. We discuss different such attacks, present countermeasures to them, and evaluate through simulation byzantine client attacks and our countermeasures.

The remainder of this paper is organized as follows. Sect. 2 describes the system model. Sections 3 and 4 discuss deferred update replication in the crash-stop failure model and in the byzantine failure models, respectively. Section 5 considers byzantine clients and Sect. 6 assesses different byzantine client attacks and the effectiveness of our countermeasures. Section 7 discusses related work. Section 8 concludes the paper.

2 System model and definitions

In this section, we detail the system model and assumptions common to both the crash-stop failure model and the byzantine failure model. Further assumptions, specific to each model, are detailed in Sects. 3 and 4.

2.1 Clients, servers and communication

Let C={c₁,c₂,…} be the set of client processes and S={s₁,s₂,…,s_n} the set of server processes. Processes are either correct, if they follow their specification and never fail, or faulty, otherwise. We distinguish two classes of faulty processes: crash-stop and byzantine. Crash-stop processes eventually stop their execution but never misbehave; byzantine processes may present arbitrary behavior.

We study deferred update replication in two models: in one model faulty servers are crash-stop; in the other model faulty servers are byzantine. In either case, there are at most f faulty servers and an unbounded number of clients. We initially consider crash-stop clients (Sects. 3 and 4) and then byzantine clients (Sects. 5 and 6).

Processes communicate by message passing, using either one-to-one or one-to-many communication. One-to-one communication is through primitives send(m) and receive(m), where m is a message. If sender and receiver are correct, then every message sent is eventually received. One-to-many communication is based on atomic broadcast, through the primitives abcast(m) and deliver(m), and used by clients to propagate messages to the group of servers.

In the crash-stop model, atomic broadcast ensures that (i) if one server delivers a broadcast message, then all correct servers also deliver the message; and (ii) no two servers deliver any two messages in different orders. In the byzantine model, atomic broadcast ensures that (i) if one correct server delivers a broadcast message, then all correct servers also deliver the message; and (ii) no two correct servers deliver any two messages in different orders.

2.2 Transactions and serializability

Let X={x₁,x₂,…} be the set of data items, i.e., the database, Cmd={commit,abort}∪({r,w}×X×V) the set of commands, where V is the set of possible values of a data item, and S=C×Cmd the set of statements. Statement (c,(r,x,v)) means that client c has read item x with value v; statement (c,(w,x,v)) means that c has modified the state of x to v.

We define a historyh as a finite sequence of statements in S. We define the projection h|_c of history h on c∈C as the longest subsequence h′ of h such that every statement in h′ is in c×Cmd. In a projection h|_c=σ₀…σ_m, statement σ_i is finishing in h|_c if it is a commit or an abort; σ_i is initiating if it is the first statement in h|_c or the previous statement σ_i−1 is a finishing statement.

A sequence of commands t=σ₀…σ_m in h|_c is a transaction issued by c if (i) σ₀ is initiating in h|_c and (ii) σ_m is either finishing in h|_c or it is the last statement in h|_c. Transaction t is committing if σ_m is a commit statement. We denote as com(h) the longest subsequence h′ of h such that every statement in h′ is part of a committing transaction in h. In other words, com(h) is the committed projection of h, with all statements of all committed transactions in h.

Let t and u be transactions in a history h. We say that t precedes u in h, t<_hu, if the finishing statement of t occurs before the initiating statement of u in h. A history h is serial if for every pair (t,u) of transactions in h, either t<_hu or u<_ht. History h is legal if in h (i) every read statement (c_i,(r,x,v_j)) is preceded by a write statement (c_j,(w,x,v_j)) and (ii) in between the two there is no statement (c_k,(w,x,v_k)),v_j≠v_k.

History h is serializable if there is a serial permutation h′ of com(h) such that for each data item x, h′|_x is legal, where h′|_x is the projection of h′ on x. Serializability is the set of all serializable histories.

3 Deferred update replication

In this section, we review deferred update replication in the crash-stop failure model. In this model, some atomic broadcast protocols require a majority of correct processes [11]. Thus, we assume the existence of 2f+1 correct servers in the system. The database is fully replicated, that is, every server has a complete copy of the database.

3.1 Overview

In deferred update replication, transactions pass through two phases in their lifetime: the execution phase and the termination phase. The execution phase starts when the client issues the first transaction command; it finishes with a client’s request to commit or abort the transaction, when the termination phase starts. The termination phase finishes when the transaction is committed or aborted.

Before starting a transaction t, a client c must select the server s that will receive and execute t’s commands; other servers will not be involved in t’s execution. Each data item in the server’s database is a tuple (x,v,i), where x is the item’s unique identifier, v is x’s value and i is the value’s version. We assume that read and write commands on database tuples are atomic operations. When s receives a read command for x from c, it returns the current value of x (or the most up-to-date value if the database is multiversion) and its corresponding version. Write commands are locally stored by c. It is only during transaction termination that updates are propagated to the servers.

In the termination phase, the client atomically broadcasts t’s readset and writeset, denoted respectively by t.rs and t.ws—for simplicity, we say that “c broadcasts t.” The readset of t is the set of all tuples (x,i), where x is a data item read by t and i the version of the value read; the writeset of t is the set of all tuples (x,v), where x is a data item written by t and v is x’s new value. Notice that the readset does not contain the values read.

Upon delivering t’s termination request, scertifiest. Certification ensures a serializable execution; it essentially checks whether t’s read commands have seen values that are still up-to-date when t is certified. If t passes certification, then s executes t’s writes against the database and assigns each new value the same version number k, reflecting the fact that t is the k-th committed transaction at s.

To certify t, s maintains a set CT of tuples (i,up), where up is a set with the data items written by the i-th committed transaction at s. We state the certification test of t, C_cs(t.rs,CT), more formally with the predicate below.

(1)

If t passes certification, then s updates the database and CT. Certifying a transaction and creating new database items is an atomic operation. When a new version of x is created, the server can decide to keep older versions of x or not. If multiple versions of a data item exist, then we say the database is multi-version; if not it is single-version.

3.2 Algorithm in detail

Algorithms 1 and 2 are high level descriptions of the client’s and server’s protocol. Notice that to determine the outcome of a commit request, the client waits for a reply from a single server. For brevity, we do not show in Algorithm 1 the case in which the client decides to abort a transaction. In Algorithm 2, items(t.ws) returns the set of data items in t.ws, without the values written by t, that is, items(t.ws)={x | (x,v)∈t.ws}.

3.3 Read-only transactions

We describe two mechanisms to allow read-only transactions to be executed by a single server only. One mechanism is based on multiversion databases and does not require updates from committing transactions to be synchronized with on-going read-only transactions at a server; the other mechanism assumes a single-version database but synchronizes the updates of committing transactions with read-only transactions at servers.

With multiversion databases, each server stores multiple versions of each data item (limited by a system parameter). When a transaction t issues its first read command, the client takes the version of the value returned as a reference for future read commands. This version number specifies the view of the database that the transaction will see, t.view. Every future read command must contain t.view. Upon receiving a read command with t.view, the server returns the most recent value of the item read whose version is equal to or smaller than t.view. If no such a value is available, the server tells the client that t must be aborted. This technique is sometimes called multiversion timestamps [9].

If a single version of each data item exists, then read commands must be synchronized (e.g., through two-phase locking [2]) with the updates of committing transactions. During the execution of a transaction t, each read command of t must first acquire a read lock on the data item. If a transaction u passes certification, then the server must acquire write locks on all data items in u’s writeset and then commit u. Since read and write locks cannot be acquired simultaneously, this technique may block transactions. This mechanism has been used by some protocols based on the deferred update replication model (e.g., [17]).

3.4 Correctness

To reason about correctness, we must define when read, write and commit commands of a transaction t take place. For read and write commands this is simple: a read happens when the client receives the corresponding reply from the server that executes t; a write happens after the client updates t’s writeset. It seems natural to assume that the commit of t also happens when the client receives the first commit reply from a server. In our model, however, clients are allowed to crash and so, the commit may never take place, despite the fact that some databases consider the transaction committed. To avoid such cases, without assuming correct clients, we define the commit event of t as taking place when the first server applies t’s updates to its database.

We initially argue that Algorithm 1 is correct for update transactions only. We then extend our argument to include read-only transactions.

(1)
Update transactions: Let h₀ be a history created by Algorithm 1. We must show that we can permutate the statements of committed transactions in h₀ such that the resulting history, h_s, is serial and legal. Our strategy is to create a series of histories h₀,…,h_i,…,h_s where h_i is created after swapping two statements in h_i−1.

Let t and u be two committing transactions in h_i such that t commits before u, which implies that t’s commit request was delivered before u’s. We show that all of t’s statements that succeed u’s statements in h_i can be placed before u’s statements. There are two cases to consider.
1. (a)
  Some u’s statement σ_u precedes t’s read statement (c_t,(r,x,v)). If σ_u is a read on x, or a read or a write on an item different than x, then we can trivially swap the two. Assume now that σ_u is a write on x. Since t is delivered before u and only values of a delivered transaction can be read by other transactions, we know that t’s read statement did not read the value written by σ_u, and thus, the two can be swapped.
2. (b)
  Some u’s statement σ_u precedes t’s write statement (c_t,(w,x,v)). If σ_u is a statement on an item different than x, then the two can be obviously swapped. Assume then that σ_u is a statement on x. If σ_u is a write, then we can swap the two as no read statement from other transactions have seen them; they only take effect after a transaction is delivered and t is delivered before u. Finally, let σ_u be a read. We show by way of contradiction that (c_u,(r,x,v_u)) cannot precede (c_t,(w,x,v_t)). Since u is certified after t and u passes certification, from the certification test, it must be that version read by u is still up-to-date. But since t modifies x, it updates x’s version, a contradiction that concludes our argument.
(2)
Read-only transactions: We consider first read-only transactions in the presence of multiversion databases. Let t be a read-only transaction in some history h_i of the system. We extend the argument presented in the previous section to show that all of t’s read commands can be placed in between two update transactions, namely, after the transaction u that created the first item read by t and before any other update transaction. From Sect. 3.3, every future read of t will return a version that is equal to or precedes t.view. Therefore, every read command issued by t can be placed before any command that succeeds u’s write commands.

We claim now that read-only transactions are also serializable with single-version databases. In this case, the local execution at a server follows the two-phase locking protocol, which is serializable [2]. Notice that although read and write commands are synchronized in a server, certification of update transactions is still needed since two update transactions executing on different servers may see inconsistent reads. For example, transaction t may read data item x and write y, while transaction u, executing at a different server, may read y and write x. In such a case, certification will abort one of the transactions.

4 BFT deferred update replication

To adapt our protocol to the byzantine failure model, we make a few extra assumptions. We first assume that the number of servers is at least 3f+1. This is the minimum to solve atomic broadcast with malicious faults [12]. We make use of message digests produced by collision-resistant hash functions to ensure data integrity [21], and public-key signatures to ensure communication authenticity [22]. We follow the common practice of signing message digests instead of signing messages directly. We also assume that each client-server pair and each pair of servers communicate using private channels that can be implemented using symmetric keys [5]. Finally, clients are authenticated and servers enforce access control. This forbids unauthorized clients from accessing the database, and prevents potentially byzantine servers from issuing transactions that may compromise the integrity of the database. Obviously, a byzantine server can compromise its local copy of the database, but this behavior is handled by our protocols.

4.1 Overview

A byzantine server could easily corrupt the execution of the algorithm presented in the previous section. For example, it could hamper the execution of the atomic broadcast protocol or answer client commands with incorrect data. While the first problem can be solved by replacing the atomic broadcast algorithm for crash-stop failures with one that tolerates byzantine failures [16], the second problem is less obvious to address.

A byzantine server may return two types of “incorrect data”: invalid or stale. Invalid data, as opposed to valid data, is fabricated by the server and does not correspond to any value created by a committed transaction. Stale data is too old, although it may be valid. We address the problem of stale data with the certification test by checking that the values read are still up-to-date. To guarantee that transactions read valid data, each database tuple is redefined as (x,v,i,d), where in addition to x’s value v and version i, it contains a digest d of v. A read command returns the value read and its digest, and readsets include the digest of values.

We decompose the certification test into a component that checks the validity of data read, \(C^{v}_{b}(t.\mathit{rs}, \mathit{CT})\), and another component that checks whether the items read are up to date, \(C^{u}_{b}(t.\mathit{rs},\mathit{CT})\).^{Footnote 1} Moreover, tuples (i,up) in CT contain in set up elements (x,d), that is, the items written and the digest of the values written.

(2)

(3)

Additionally, to shield clients from byzantine servers that would commit a transaction t that violates serializability, clients wait for more than one server’s reply before concluding the outcome of transactions. It turns out that at commit time, by waiting for a set of f+1 identical replies, t’s outcome can be safely determined by the clients since only up to f servers can be compromised.

The left side of Fig. 1 illustrates the protocol for update transactions. Note that step 5 in the illustration will be explained as part of the protocol for read-only transactions (cf. Sect. 4.3).

4.2 Algorithm in detail

Algorithms 3 and 4 present the client’s and server’s protocol. To execute a read command, a client c contacts a server s and stores the version, the value, and the digest in t.rs. Write operations are buffered in t.ws as before. When c wishes to commit t, c atomically broadcasts a signed message to all servers. This is denoted by \(\langle m \rangle_{\sigma_{c}}\), where m is a message and σ_c is c’s signature. Signing messages guarantees that only authenticated clients issue commit requests. After receiving f+1 identical replies from servers, c can determine t’s outcome.

Besides a change in the certification test and in the data sent back to the client when answering read requests, the server’s code is similar to the crash-stop case. In Algorithm 4, we must instantiate items(t.ws) (see line 11) as items(t.ws)={(x,v’s digest) | (x,v)∈t.ws}.

4.3 Read-only transactions

A simple way to handle read-only transactions is to execute and terminate them in the same way as update transactions. This leads to a simple solution but increases the latency of read-only transactions since they need to be atomically broadcast and certified by servers. In the following, we describe a mechanism that allows read-only transactions to be executed locally to a server only, just like deferred update replication in the crash-stop failure model.

To allow read-only transactions to execute at a single server, without interserver communication, clients must be able to tell unilaterally whether (i) a value returned by a server as a response to a read command is valid (cf. Sect. 4.1) and (ii) any set of valid values read by the client belongs to a consistent view of the database. If the client determines that a value returned by the server is invalid or inconsistent, it aborts the transaction and retries using another server.

A set of values read by a client is a consistent view of the database if the values could be the result of a serial execution of the committed transactions. For example, assume that transactions t and u modify the values of data items x and y. Any transaction that reads x and y must see either the values created by t or the ones created by u or none, but not a mix of the two (e.g., x from t and y from u).

We ensure proper transaction execution by letting the client ask the server, at the end of the transaction execution, for a proof that the values read are valid and consistent. A validity and consistency proof for a transaction t, denoted as vcp(t), consists of all elements of CT whose version lies between the lowest and highest data item version read by t. Moreover, to ensure that byzantine servers do not fabricate data, every element (i,up) in vcp(t) must be signed by f+1 servers, denoted \(\langle i, \mathit{up} \rangle_{\Sigma_{f+1}}\). Given a validity and consistency proof vcp(t), the client decides to commit t if the following conditions hold:

(1)
The proof vcp(t) is valid: If i_min and i_max are, respectively, the minimum and maximum data item versions read by t, then vcp(t) contains all tuples (i,up) such that i_min≤i≤i_max. Moreover, each element of vcp(t) is signed by f+1 servers, which guarantees that at least one correct server abides by this element.
(2)
The values read by t are valid: Each data item with version i read by t matches its corresponding digest in vcp(t) with version i.
(3)
The values read by t are consistent: For each item x read with version i, no newer version i′>i of x exists in vcp(t).

Conditions 2 and 3 can be stated more precisely with predicates (4) and (5), respectively. Note that these predicates are similar to those used to certify update transactions (cf. Sect. 4.1). The main difference is that read-only transactions do not need to be certified against elements in CT whose version is newer than the highest data item version t read.

(4)

(5)

To build a validity and consistency proof from CT, we add to each CT entry a certificate of f+1 server signatures. Database servers build certificates asynchronously. When the i-th update transaction t commits on s, the value, version, and digest of each data item x written by t are updated. Periodically, server s signs new tuples (i,up) and sends this information to all servers. When s gathers f+1 signatures of the tuple (i,up), s inserts this new element in CT. This asynchronous scheme does not add communication overhead to update transactions. However, a read-only transaction t may stall until the server answering t’s requests gathers enough signatures to provide a validity and consistency proof for t. The protocol for read-only transactions is illustrated on the right side of Fig. 1.

Algorithms 5 and 6 present the client and server protocols to execute read-only transactions. In the algorithms, given a transaction t and a validity and consistency proof vcp(t) for t, read validity and consistency are expressed by predicates \(C^{v}_{r}(t.\mathit{rs}, \mathit{vcp}(t))\) and \(C^{c}_{r}(t.\mathit{rs}, \mathit{vcp}(t))\), respectively.

Table 1 summarizes the costs of the proposed protocols for the crash-stop and byzantine failure models. To compute these costs, we consider a transaction t that performs r reads and present the latency and number of messages sent for the execution and termination phases, when t is an update and a read-only transaction.

Table 1 The cost of the proposed protocols (n is the number of servers, f is the maximum number of faulty servers, r denotes the number of items the transaction reads, abcast_cs denotes an atomic broadcast algorithm for crash-stop failures, and abcast_byz is an atomic broadcast algorithm for byzantine failures)

Full size table

4.4 Liveness issues

Byzantine servers may compromise the progress of the above protocol by being nonresponsive or slow. Besides attacks that would slow down the delivery of atomic broadcast messages [1], byzantine servers may also not answer client read requests or slow down their execution. The first case can be treated as in the crash-stop case, that is, the client may simply consider that the server has crashed and restart executing the transaction on another server. The second case is more problematic since it may not be possible to distinguish between a slow honest server and a malicious one. To avoid such an attack, the client can execute the transaction on two (or more) servers and abort the transaction on the slower server as soon as the faster server is ready to commit.

A more subtle attack is for a byzantine server to provide read-only transactions with old, but valid and consistent, database views. Although serializability allows old database views to be seen by transactions (i.e., strictly speaking it is not an attack), useful implementations try to reduce the staleness of the views provided to transactions. There are (at least) two ways to confront such server misbehavior. First, clients can broadcast read-only transactions and ask servers to certify them, just like update transactions. If the transaction fails certification, the client can retry using a different server. Second, clients may submit a read command to more than one server and compare their versions. Submitting read commands to f+1 servers ensures the “freshness” of reads, but may be an overkill. More appropriate policies would be to send multiple read commands when suspecting a server misbehavior, and possibly try first with a small subset of servers.

4.5 Optimizations

Our protocols can be optimized in many ways. In the following, we briefly present three optimizations.

Client caches

To increase scalability, clients can cache data item values, versions, digests, and elements of CT. In doing so, clients can execute queries without contacting any server, provided that the necessary items are in the cache. Before inserting a tuple (x,v,i) in the cache, where v and i are x’s value and version, respectively, we verify the validity of v and make sure that version i of x has value v by using the appropriate element of CT. We proceed similarly with elements of CT by verifying their signatures before inserting them in the cache. At the end of the execution of a read-only transaction t, the consistency of the values read can be checked using cached elements of CT. If some elements of CT are missing, they are retrieved from a server. If t is an update transaction, the consistency of the values read by t are performed by the certification test. To avoid reading arbitrary old values, cache entries are evicted after some time (e.g., a few hundreds of milliseconds).

Limiting the size of CT

We limit the number of element in CT by some value K to reduce the space overhead of the set of committed transactions on the servers. After the k-th transaction commits and server s inserts tuple \(\langle k, up \rangle_{\Sigma_{f+1}}\) into CT, s checks whether CT contains more than K elements. If so, the element of CT with the lowest timestamp is removed. This scheme may force servers to unnecessarily abort transactions due to missing versions in CT. Choosing K must thus be done carefully.

Efficient message authentication

In the above protocol, public-key signatures are used by servers to sign elements of the set of committed transactions and by clients to authenticate their commit messages. Public-key signatures are expensive however. In particular, it is orders of magnitude slower than message authentication codes (MACs). In contrast to public-key signatures, a MAC cannot prove the authenticity of a message to a third party. We thus replace signatures by vectors of MACs [3]. This vector contains one entry per machine of the system, that is, clients and servers. In doing so, any vector of MACs can be verified by any client or server.

4.6 Correctness

Correctness of the protocol for update transactions relies on the fact that at certification, we use value digests to check for data integrity and versions to check for data staleness. The rest of the correctness argument is based on the same principles as the crash-stop case and we thus omit it.

Proving the correctness of read-only transactions is more subtle. To be serializable, read-only transactions must read values from a consistent view of the database that is the result of a serial execution of some finite sequence of transactions. Let i_max and i_min, respectively, be the highest and lowest data item versions read by a transaction t. We claim that if t commits, then the view of the database t observes is the result of the execution, in version order, of all transactions whose corresponding tuple in CT has a version that is smaller than, or equal to i_max. Let h_t be the sequence of such transactions sorted in ascending version order.

We first note that since t commits, the server s on which t executed behaved as prescribed by the protocol. This is because each element of the validity and consistency proof vcp(t) of t is signed by f+1 servers, and thus ensures that the values and versions read by t are those that would be returned by a correct server.

Since the client checks that (i) vcp(t) contains all versions between i_min and i_max and (ii) for any data item x read by t, no newer version of x exists in vcp(t), t reads a consistent view of the database that is the result of the serial execution of transactions in h_t.

5 Tolerating byzantine clients

In this section, we discuss serializability in light of byzantine clients and consider some attacks that could be perpetrated by byzantine clients against the deferred update replication technique. We conclude the section with countermeasures to these attacks.

5.1 Consistency issues

In order to accommodate byzantine clients in our failure model, we must reconsider our consistency criterion since with the current definition, it would be easy for a byzantine client to generate histories that are not serializable. One simple attack is a byzantine client who issues read operations that return bogus information. Another attack is a byzantine client who executes read operations (e.g., against a byzantine server) and does not include them in the transaction’s readset. As a consequence, honest servers would certify the transaction using a subset of the transaction’s read operations, which could violate serializability.

Our proposal to accomodate byzantine clients in our model is similar to [15], developed in the context of linearizability: Correctness must take into consideration correct clients only. More precisely, (1) correct clients in a history must observe values that could be generated in a serializable history where all clients are correct, and (2) if at some point all byzantine clients stop execution (e.g., the access rights of byzantine clients are revoked), then the system eventually recovers, that is, the number of fictitious writes needed to justify the values read by correct clients is bounded.^{Footnote 2}

Interestingly, the algorithm presented in Sect. 4 ensures the first property of the modified serializability commented above, but does not guarantee the second property: byzantine clients can leave an unbounded number of “bogus transactions” (denoted “lurking writes” in [14]) in the system after they stop. To see why, notice that a byzantine client can atomically broadcast an unbounded number of transactions before it stops, and it is perfectly possible for these transactions to be delivered after the client stops. In Sect. 5.3, we describe a mechanism to guarantee the second property presented above.

5.2 Byzantine client attacks

Byzantine clients can launch a number of attacks to disrupt the execution of correct clients. The attacks we discuss next target certification by increasing the chances of aborting transactions issued by correct clients.

The most obvious attack to be attempted by a byzantine client is to submit transactions with many write operations and few read operations or no read operation at all (i.e., a “blind transaction”). A reduced number of read operations would increase the chances of the byzantine transaction to commit, while at the same time increasing the probability that legitimate transactions abort. Moreover, as commented in the previous section, a byzantine client can submit an unbounded number of such transactions concurrently.

A more subtle strategy is for multiple byzantine clients to collude and coordinate their attack by minimizing the chances that one byzantine transaction would abort another byzantine transaction, but together they would maximize the chances of aborting legitimate transactions. More precisely, this can be obtained as follows. Let \(\mathcal{T}_{B}\) be a set of byzantine transactions created by colluded byzantine clients. For any two transactions t_i and t_j in \(\mathcal{T}_{B}\), the readset and the writeset of t_i do not intersect the writeset of t_j.

At the very extreme, coordinated byzantine clients could divide the database into disjoint sets, each set associated with one coordinated byzantine client. Then, each byzantine client would permanently submit blind transactions that modify all items in its set.

5.3 Countermeasures against byzantine clients

In this section, we discuss mechanisms to address the attacks presented in the previous section. Our first countermeasure limits the number of concurrent transactions that a client can submit to the system to K and addresses bogus transactions. This countermeasure also reduces the negative impact byzantine clients can have on the abort rate of legitimate transactions.

CM 1. At most K concurrent transactions per client. Two transactions t and t′ are concurrent if t starts before t′ has terminated (i.e., committed or aborted).

To implement CM 1, we assume that clients are authenticated. Thus, an honest server can unmistakably tell which client has submitted a transaction. Each client c appends a sequence number to each of its submitted transaction. This sequence number is provided together with the result of the previous transaction submitted by c and is signed by f+1 servers. The emission of such sequence numbers is used to limit the number of concurrent transactions clients can submit to K. We now explain in details the implementation of CM 1.

Initially, each server s sends signed sequence numbers 1 through K to each client c, and sets the next sequence number to emit, \(N_{s}^{c}\), to K+1. Whenever s accepts to certify a new transaction, s increments \(N_{s}^{c}\) and sends \(N_{s}^{c}\) signed to c. Server s only accepts to certify transactions whose sequence numbers belong to set \(V_{s}^{c}\). Initially, this set contains sequence numbers 1 to K. Each time s accepts to certify a transaction with sequence number seq, seq is removed from \(V_{s}^{c}\) and \(N_{s}^{c}\) is added to the set after being incremented.

Countermeasure 1 avoids bogus transactions since if a byzantine client stops, it can leave at most K transactions, which will be eventually processed. Another way to address bogus transactions is to blacklist clients that were identified as byzantine. A violation to one of the countermeasures proposed here is a way to identify malicious behavior. For instance, a violation of CM1 happens when f+1 servers detect that a client c submitted a transaction with an invalid sequence number. A proof of this violation is obtained by receiving f+1 signed messages asserting that c misbehaved.

We now consider two strategies to reduce the number of aborts due to byzantine clients. Notice that, as commented earlier, aborts due to concurrent transactions are inherent to the deferred update replication technique. This fact makes it impossible to completely countermeasure attacks targeting certification. The techniques we describe for such attacks are best effort, that is, they will reduce the effectiveness of an attack, but they will not avoid it altogether.

CM 2. At most L writes per transaction. The number of transaction writes is upper bounded by simply aborting transactions whose number of writes exceed some constant L at certification time.

CM 3. No blind transactions. Servers enforce no blind transactions by aborting transactions whose writeset is not included in their readset.

The countermeasures proposed here limit the impact that each individual byzantine client can have on the abort rate but do not address the byzantine client collusion attack described above. In fact, it is unclear how such an attack could be dealt with. Luckily, Sect. 6 shows that this attack has little impact on the abort rate. Determining whether there exist other collusion attacks that would negatively impact the performance of byzantine deferred update replication protocols is an open question.

6 Performance assessment

We now assess the attacks described in Sect. 5.2 and the effectiveness of the countermeasures proposed in Sect. 5.3 through a simple simulation model. We start by describing the simulator and then evaluate the various attacks individually and combined.

6.1 The simulation model

A run of the simulator is a sequence of steps, where a step can be a read operation or a certification operation. Our goal is to evaluate variations in the abort rates of deferred update replication caused by byzantine clients under data contention, not due to communication and processing delays. Therefore, for simplicity, we assume that each step takes a single time unit.

Table 2 shows the parameters that can be configured in the simulator and the ranges of values used in our experiments. We consider a large and a small database in the experiments. All read and write operations are uniformly distributed throughput the database, unless noted otherwise in the text. Clients submit a new transaction as soon as the previous one has terminated.

Table 2 Simulation parameters

Full size table

In all experiments reported, there are no blind honest transactions, that is, every write operation on item x is preceded by a read operation on x. Each point in the graphs is the result of a run of the simulator in which one million honest transactions were executed.

6.2 Abort rates under normal conditions

Our first set of experiments assesses abort rates in the absence of byzantine clients (see Fig. 2). In these experiments, each client submits one transaction at a time (i.e., K=1) and transactions have the same number of read and write operations. In the experiments reported on the left of Fig. 2 each transaction has 8 read and 8 write operations in the small database configuration, and 32 reads and 32 writes in the large database configuration. In the experiments reported on the right of the figure, there are 44 and 256 clients, respectively, in the small and in the large database configurations.

As shown in the graphs, deferred update replication is sensitive to the number of clients and the size of transactions. Intuitively, as the number of clients and the number of reads and writes in a transaction increase, the probability of conflicts augments due to data contention, resulting in more aborts. In the space of parameters searched, contention is more important in the small database than in the large database.

6.3 The “concurrent-transactions” and “write-set” attacks

We initially look at the concurrent-transactions and the write-set attacks. The former attack corresponds to the case in which byzantine clients submit multiple concurrent transactions. The latter attack corresponds to the case in which byzantine clients submit transactions larger than the transactions submitted by honest clients. In both cases, honest and byzantine clients do not submit blind transactions. The abort rates reported in Fig. 3 are for honest transactions only.

In both experiments, we set the number of operations in honest transactions to 8 reads and 8 writes in the small database and 32 reads and 32 writes in the large database. There are 44 and 256 clients accessing the small and the large databases, respectively. In the absence of byzantine clients, these configurations lead to abort rates of 10% for both database sizes. In these experiments, we configured 25% of clients to act maliciously.

The number of concurrent transactions submitted by byzantine clients, graph on the left of Fig. 3, has almost identical effect on both database setups. Notice, however, that with 25% of byzantine clients, there are 11 byzantine clients in the small database setup and 64 byzantine clients in the large database setup. Thus, increasing the number of concurrent transactions per byzantine client represents a larger number of byzantine transactions in the large database than in the small database.

The graph on the right of Fig. 3 shows that growing the size of byzantine transactions leads to more aborted honest transactions. While the impact of this attack seems to be more effective in the small database than in the large database, we note that when byzantine transactions are twice the size of honest transactions (i.e., 16 operations and 64 operations in the small and large database configurations), the abort rate of honest transactions is 12% in both cases.

In both experiments, countermeasures CM1 and CM2 are effective in reducing the abort rate by limiting the number of transactions issued by byzantine clients to one at a time (CM1), as done by honest clients (graph on the left of Fig. 4), and by keeping the number of reads and writes in byzantine transactions equal to the number of such operations in honest transactions (CM2) (graph on the right of Fig. 4). Effectiveness is the ratio between the abort rate with an attack and without it, after the corresponding countermeasure was applied.

6.4 The impact of the “blind-transactions” attack

We consider now the blind-transactions attack when combined with the concurrent-transactions attack (graph on the left of Fig. 5) and the write-set attack (graph on the right of Fig. 5). To evaluate the blind-transactions attack, we ran the experiments presented in the previous section after removing all read operations from byzantine transactions.

Blind transactions, a powerful mechanism to boost byzantine client attacks, benefit from two properties: (a) they never abort and (b) they execute more quickly than honest transactions, which contain read operations. From property (a), every transaction created by a byzantine client may cause honest transactions to abort. Property (b) stems from the fact that without reads, byzantine transactions are executed in a single step. As a consequence, more transactions can be generated by byzantine clients in an execution. We have assessed that property (b) can created more damage to honest transactions than property (a). In our assessment, we configured byzantine transactions to issue read operations that no transaction updates. Thus, such transactions never abort, although they issue read operations. It turned out that in this case, byzantine transactions were much less harmful than blind transactions.

Figure 6 assesses the effectiveness of countermeasures CM1 and CM3 combined, CM2 and CM3 combined, and CM3 alone, targeting blind transactions only. Effectiveness of CM3 alone is calculated as the ratio between the abort rate of the combined attacks (e.g., write-set and blind transactions attacks) and the abort rate without blind transactions (e.g., write-set attack alone).

The effectiveness of CM3 alone decreases with the increase of the number of concurrent byzantine transactions and the number of writes in a byzantine transaction, regardless the database size. This happens because as the abort rate increases, due to more concurrent byzantine transactions per client and augmented writes in byzantine transactions, there are fewer honest transactions to be aborted by blind transactions. Therefore, there is less gain in preventing blind transactions. Combining CM3 with CM1 and CM2 can neutralize both client attacks.

6.5 The impact of the “colluded-clients” attack

Figure 7 shows the abort rate of the colluded-clients attack combined with the concurrent-transactions and the write-set attacks. In the colluded-clients attack, we assigned each byzantine client to a range of database keys. Ranges were of same size and nonoverlapping. A byzantine client creates transactions that read and write data items within its range.

The colluded-clients attack led to a small increase in the abort rate of the two other attacks: the differences between the abort rates reported in Figs. 3 and 7 are almost unnoticed. The colluded-clients attack aims to reduce the number of byzantine transactions aborted by other byzantine transactions. Since the overall abort rate is low, this technique leads to few additional committed byzantine transactions, and these transactions are not enough to significantly abort more honest transactions.

We also ran experiments combining the blind-transactions attack and the colluded-clients attack but do not report the results as they provided no new insight. Since blind transactions are never aborted, there was no benefit from avoiding byzantine transactions to overlap.

6.6 Vulnerability to percentage of byzantine clients

Our last experiments target the effect of the number of byzantine clients. The y-axis in Fig. 8 is normalized by the abort rate of the corresponding configuration without byzantine clients. While the effect of a larger number of byzantine clients is more pronounced in the large database configuration than in the small database configuration, in both cases more byzantine clients implies more aborted transactions.

From the graphs, the effect of the number of byzantine clients is limited in two cases: (a) when there are few writes per transaction and (b) when there are approximately 34 writes in the small database and 80 writes in the large database. In case (a), the abort rate is quite low, thus the number of byzantine clients is not important. In case (b), the number of abort is very high, and there is not much more damage that can be caused by adding additional byzantine clients.

6.7 Summary

To conclude, we have assessed four attacks that can be perpetrated by byzantine clients against certification: (a) write set, (b) concurrent transactions, (c) blind transactions, and (d) colluded clients attacks. We determined that the first three attacks can increase the number of aborts in honest transactions, but not the last. Luckily, the counter measures proposed in Sect. 5.3 target attacks (a)–(c), and can prevent them. Had the colluded byzantine clients attack proved more effective, deferred update replication would suffer from a serious vulnerability since it is not clear what counter measure could be used against collusion.

7 Related work

Database replication and deferred update replication have been largely studied under benign faults (e.g., non-byzantine servers subject to crash-stop failures). Few works have considered the effects of byzantine servers on database replication. The first paper to consider the problem is [8], written more than two decades ago. This paper investigates the use of byzantine agreement and state machine replication in the context of databases. It proposes to execute transactions atop serializable databases at the expense of limiting transaction concurrency. The deferred update replication protocol we propose allows concurrency among transactions in that multiple transactions can be simultaneously executed by different servers. Only transaction termination needs to be serialized.

More recently, Vandiver et al. [23] proposed a system that allows more concurrency between transactions. Clients communicate through a central coordinator that chooses a replica as primary and the rest as secondaries. Transactions are first executed on the primary to determine which transactions can be executed in parallel. When the result of a query is returned to the coordinator, the latter ships this query to the secondaries. A commit barrier, maintained by the coordinator and incremented whenever a transaction commits, is used to determine which transactions can be executed in parallel at the secondaries. That is, two transactions that executed in parallel on the primary will be executed in parallel on the secondaries, provided that no transaction commits in the mean time. This is, roughly speaking, the basics of the Commit Barrier Scheduling protocol proposed in [23]. This approach is similar to ours in the sense that it allows the concurrent execution of transactions, although our protocol allows the execution of transactions at any replica. Moreover, in contrast to [23], our protocol does not require a trusted coordinator, a strong assumption.

Byzantium [20] considers byzantine failures of servers that guarantee snapshot isolation, as opposed to serializability. Under snapshot isolation, transactions observe a committed instant of the database, which may correspond to the database state when the transaction started or, more likely in a distributed environment, to some earlier state of the database [6]. Transactions that execute concurrently can only commit if they do not modify the same data items, a policy sometimes referred to as first-commiter-wins rule. In Byzantium, a client first selects a replica that will act as the coordinator for its transaction and then atomically broadcast the begin operation to all replicas so that they all use the same database snapshot for the transaction. The transaction is entirely executed by the coordinator. At commit time, the operations along with their results are atomically broadcast to all replicas. If the transaction was executed in a correct coordinator, then a quorum of servers will obtain the same results and the transaction can commit; otherwise, the transaction is aborted and the client is notified about the byzantine coordinator. Both update and read-only transactions are atomically broadcast.

Byzantium is extended in [7] to a more efficient protocol that allows read-only transactions to be executed on any subset of f+1 replicas and propagates operations before commit time for better performance. Their empirical assessment shows that the extended version of Byzantium introduces a moderate performance overhead of roughly 30% over nonreplicated solutions in the TPC-C benchmark.

8 Final remarks

This paper has considered the deferred update replication technique in the byzantine failure model. Deferred update replication has been largely used to implement database replication in the crash-stop failure model. It is more scalable than other replication techniques such as state machine replication and primary-backup since transactions may be executed at any server. Moreover, it allows read-only transactions to be executed at a single replica. The paper shows that it is surprisingly simple to use deferred update replication under byzantine failures—in fact, our protocol only requires a small modification of the certification procedure and an additional check, performed by clients, to filter out transaction outcomes sent by byzantine servers. The paper also shows that even though some servers may behave maliciously, read-only transactions can be executed at a single server—the execution must be certified by the client at the end of the transaction however. Finally, the paper considers the effects of byzantine client attacks against certification, presents countermeasures against these attacks, and analyzes their performance and effectiveness.

Notes

The decomposition of the certification test into two components is done for clarity purposes. An implementation would probably combine the two to speed up the execution.
Notice that a byzantine client may issue commands that violate the integrity of the database. Defining mechanisms that preserve database integrity despite malicious clients is out of the scope of this work.

References

Amir Y, Coan BA, Kirsch J, Lane J (2008) Byzantine replication under attack. In: DSN, pp 197–206
Google Scholar
Bernstein P, Hadzilacos V, Goodman N (1987) Concurrency control and recovery in database systems. Addison-Wesley, Reading
Google Scholar
Castro M, Liskov B (2002) Practical byzantine fault tolerance and proactive recovery. ACM Trans Comput Syst 20(4):398–461
Article Google Scholar
Cecchet E, Marguerite J, Zwaenepoel W (2004) C-jdbc: flexible database clustering middleware. In: USENIX annual technical conference, FREENIX track
Google Scholar
Diffie W, Hellman ME (1976) Multiuser cryptographic techniques. In: AFIPS ’76: proceedings of the June 7–10, 1976, national computer conference and exposition. ACM, New York, pp 109–112
Chapter Google Scholar
Elnikety S, Pedone F, Zwaenepoel W (2005) Database replication using generalized snapshot isolation. In: Symposium on reliable distributed systems (SRDS’2005), Orlando, USA
Google Scholar
Garcia R, Rodrigues R, Preguiça N (2011) Efficient middleware for byzantine fault tolerant database replication. In: Proceedings of the sixth conference on computer systems (EuroSys ’11). ACM, New York, pp 107–122
Chapter Google Scholar
Garcia-Molina H, Pittelli FM, Davidson SB (1986) Applications of byzantine agreement in database systems. ACM Trans Database Syst 11(1):27–47
Article Google Scholar
Garcia-Molina H, Ullman JD, Widom J (2008) Database systems: the complete book. Prentice Hall, New York
Google Scholar
Kemme B, Alonso G (2000) A new approach to developing and implementing eager database replication protocols. ACM Trans Database Syst 25(3):333–379
Article Google Scholar
Lamport L (1998) The part-time parliament. ACM Trans Comput Syst 16(2):133–169
Article Google Scholar
Lamport L, Shostak R, Pease M (1982) The Byzantine generals problem. ACM Trans Program Lang Syst 4(3):382–401
Article Google Scholar
Lin Y, Kemme B, Patino-Martinez M, Jimenez-Peris R (2005) Middleware based data replication providing snapshot isolation. In: International conference on management of data (SIGMOD), Baltimore, Maryland, USA
Google Scholar
Liskov B, Rodrigues R (2005) Byzantine clients rendered harmless. Technical report MIT-CSAIL-TR-2005-047, MIT, July 2005
Malkhi D, Reiter M, Lynch N (1998) A correctness condition for memory shared by byzantine processes. Unpublished manuscript, Sept 1998
Martin J-P, Alvisi L (2005) Fast byzantine consensus. In: DSN’05, pp 402–411
Google Scholar
Pedone F (1999) The database state machine and group communication issues. PhD thesis, École Polytechnique Fédérale de Lausanne, Switzerland, Number 2090
Pedone F, Guerraoui R, Schiper A (1997) Transaction reordering in replicated databases. In: Proceedings of the 16th IEEE symposium on reliable distributed systems, Durham, USA
Google Scholar
Plattner C, Alonso G (2004) Ganymed: scalable replication for transactional web applications. In: Proceedings of the 5th ACM/IFIP/USENIX international conference on middleware, pp 155–174
Google Scholar
Preguiça NM, Rodrigues R, Honorato C, Lourenço J (2008) Byzantium: Byzantine-fault-tolerant database replication providing snapshot isolation. In: HotDep
Google Scholar
Rivest RL (1992) The md5 message-digest algorithm. Internet rfc-1321
Rivest RL, Shamir A, Adleman L (1978) A method for obtaining digital signatures and Public-Key cryptosystems. Commun ACM 21(2):120–126
Article MathSciNet Google Scholar
Vandiver B, Balakrishnan H, Liskov B, Madden S (2007) Tolerating byzantine faults in transaction processing systems using commit barrier scheduling. In: SOSP, pp 59–72
Chapter Google Scholar

Download references

Acknowledgements

The authors wish to thank Antonio Carzaniga, Rui Oliveira, Ricardo Padilha, José Orlando Pereira, José Enrique Armendáriz-Iñigo, and the LADC 2011 and JBCS anonymous reviewers for the insightful comments about this work.

This work was supported in part by the Hasler Foundation, Switzerland, under grant number 2316.

Author information

Authors and Affiliations

University of Lugano (USI), Lugano, Switzerland
Fernando Pedone & Nicolas Schiper

Authors

Fernando Pedone
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Schiper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando Pedone.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Pedone, F., Schiper, N. Byzantine fault-tolerant deferred update replication. J Braz Comput Soc 18, 3–18 (2012). https://doi.org/10.1007/s13173-012-0060-z

Download citation

Received: 03 November 2011
Accepted: 13 January 2012
Published: 07 February 2012
Issue Date: March 2012
DOI: https://doi.org/10.1007/s13173-012-0060-z

LADC'2011

Byzantine fault-tolerant deferred update replication

Abstract

1 Introduction

2 System model and definitions

2.1 Clients, servers and communication

2.2 Transactions and serializability

3 Deferred update replication

3.1 Overview

3.2 Algorithm in detail

3.3 Read-only transactions

3.4 Correctness

4 BFT deferred update replication

4.1 Overview

4.2 Algorithm in detail

4.3 Read-only transactions

4.4 Liveness issues

4.5 Optimizations

Client caches

Limiting the size of CT

Efficient message authentication

4.6 Correctness

5 Tolerating byzantine clients

5.1 Consistency issues

5.2 Byzantine client attacks

5.3 Countermeasures against byzantine clients

6 Performance assessment

6.1 The simulation model

6.2 Abort rates under normal conditions

6.3 The “concurrent-transactions” and “write-set” attacks

6.4 The impact of the “blind-transactions” attack

6.5 The impact of the “colluded-clients” attack

6.6 Vulnerability to percentage of byzantine clients

6.7 Summary

7 Related work

8 Final remarks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords