Skip to main content

Multicoordinated agreement for groups of agents

Abstract

Agents in agreement protocols play well-distinct roles. Proposers propose values to the acceptors, which will accept proposals and inform the learners so they detect that an agreement has been reached. A fourth role is that of the coordinator, who filters the proposals from proposers to acceptors. While proposers, learners, and coordinators are easily replaced, substituting an acceptor is prohibitive. Protocols that do not employ a coordinator are less resilient to acceptor failures. Protocols that use one coordinator are more resilient to acceptor failures, at the expense of one extra communication step even in the absence of failures. Moreover, they require replacing the coordinator as soon as it fails, a reconfiguration that, although relatively inexpensive, diminishes the protocol availability. Hence, either option, i.e., one or zero coordinator, has its drawbacks. In previous works, we have presented an alternative: multicoordinated agreement protocols. Such protocols are as resilient as single-coordinated protocols but require less reconfiguration to cope with coordinator failures. In fact, most reconfiguration can be done in parallel to the execution of the protocol’s normal steps. Multicoordination can be applied to several problems. In this paper we exemplify its use in solving consensus and then introduce a fast multicoordinated agreement protocol for agents organized in groups, an abstraction for fast local area networks interconnected by slower links.

References

  1. 1.

    Aguilera M, Chen W, Toueg S (1998) Failure detection and consensus in the crash–recovery model. In: Proceedings of the 12th international symposium on distributed computing, September 1998

    Google Scholar 

  2. 2.

    Amir Y, Danilov C, Dolev D, Kirsch J, Lane J, Nita-Rotaru C, Olsen J, Zage D. Steward: scaling Byzantine fault-tolerant replication to wide area networks. IEEE Trans Depend Secure Comput 99(2):5555

  3. 3.

    Ben-Or M (1983) Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols. In: PODC ’83: proceedings of the second annual ACM symposium on principles of distributed computing. ACM, New York, pp 27–30

    Chapter  Google Scholar 

  4. 4.

    Bracha G, Toueg S (1983) Resilient consensus protocols. In: PODC ’83: proceedings of the second annual ACM symposium on principles of distributed computing. ACM, New York, pp 12–26

    Chapter  Google Scholar 

  5. 5.

    Camargos L, Madeira E, Pedone F (2006) Optimal and practical WAB-based consensus algorithms. In: Euro-Par 2006 parallel processing. Lecture notes in computer science, vol 4128. Springer, Berlin, pp 549–558

    Chapter  Google Scholar 

  6. 6.

    Camargos L, Pedone F, Schmidt R (2006) A primary-backup protocol for in-memory database replication. In: NCA ’06: proceedings of the fifth IEEE international symposium on network computing and applications. IEEE Computer Society, Washington, pp 204–211

    Chapter  Google Scholar 

  7. 7.

    Camargos L, Schmidt R, Pedone F (2006) Multicoordinated Paxos. Technical Report 2006/2, EPFL and University of Lugano, 2006

  8. 8.

    Camargos L, Schmidt R, Pedone F (2007) Multicoordinated Paxos: brief announcement. In: PODC ’07: proceedings of the twenty-sixth annual ACM symposium on principles of distributed computing. ACM, New York, pp 316–317

    Chapter  Google Scholar 

  9. 9.

    Camargos L, Schmidt R, Pedone F (2008) Multicoordinated agreement protocols for higher availability. In: NCA ’08: proceedings of the seventh IEEE international symposium on network computing and applications. IEEE Computer Society, Washington

    Google Scholar 

  10. 10.

    Castro M, Liskov B (1999) Practical Byzantine fault tolerance. In: OSDI ’99: proceedings of the third symposium on operating systems design and implementation. USENIX Association, Berkeley, pp 173–186

    Google Scholar 

  11. 11.

    Chandra TD, Hadzilacos V, Toueg S (1996) The weakest failure detector for solving consensus. J ACM 43(4):685–722

    MathSciNet  Article  Google Scholar 

  12. 12.

    Chandra TD, Toueg S (1996) Unreliable failure detectors for reliable distributed systems. Commun ACM 43(2):225–267

    MathSciNet  Google Scholar 

  13. 13.

    Cristian F, Fetzer C (1999) The timed asynchronous distributed system model. IEEE Trans Parallel Distrib Syst 10(6):642–657

    Article  Google Scholar 

  14. 14.

    Dolev D, Dwork C, Stockmeyer L (1987) On the minimal synchronism needed for distributed consensus. J ACM 34(1):77–97

    MathSciNet  Article  Google Scholar 

  15. 15.

    Dutta P, Guerraoui R (2002) Fast indulgent consensus with zero degradation. In: Lecture notes in computer science, vol 2485. Springer, Berlin

    Google Scholar 

  16. 16.

    Dwork C, Lynch N, Stockmeyer L (1988) Consensus in the presence of partial synchrony. J ACM 35(2):288–323

    MathSciNet  Article  Google Scholar 

  17. 17.

    Fischer M, Lynch N, Paterson M (1985) Impossibility of distributed consensus with one faulty process. J ACM 32(2):374–382

    MathSciNet  Article  Google Scholar 

  18. 18.

    Hurfin M, Mostefaoui A, Raynal M (1998) Consensus in asynchronous systems where processes can crash and recover. In: Proceedings seventeenth IEEE symposium on reliable distributed systems. IEEE Computer Society, Los Alamitos, pp 280–286

    Chapter  Google Scholar 

  19. 19.

    Hurfin M, Mostéfaoui A, Raynal M (2002) A versatile family of consensus protocols based on Chandra–Toueg’s unreliable failure detectors. IEEE Trans Comput 51(4):395–408

    Article  Google Scholar 

  20. 20.

    Hurfin M, Raynal M (1999) A simple and fast asynchronous consensus protocol based on a weak failure detector. Distrib Comput 12(4):209–223

    Article  Google Scholar 

  21. 21.

    Kooh N, Haddad S (1999) Reaching agreement in hierarchical groups. In: Proceedings of the 12th international conference on parallel and distributed computing systems. IASTED Press, Fort Lauderdale

    Google Scholar 

  22. 22.

    Lamport L (1978) Time, clocks, and the ordering of events in a distributed system. Commun ACM 21(7):558–565

    Article  Google Scholar 

  23. 23.

    Lamport L (1998) The part-time parliament. ACM Trans Comput Syst 16(2):133–169

    Article  Google Scholar 

  24. 24.

    Lamport L (2001) Paxos made simple. ACM SIGACT News 32(4):18–25

    Google Scholar 

  25. 25.

    Lamport L (2004) Generalized consensus and Paxos. Technical Report MSR-TR-2005-33, Microsoft Research

  26. 26.

    Lamport L (2006) Fast Paxos. Distrib Comput 19(2):79–103

    MathSciNet  Article  Google Scholar 

  27. 27.

    Lamport L (2006) Lower bounds for asynchronous consensus. Distrib Comput 19(2):104–125

    MathSciNet  Article  Google Scholar 

  28. 28.

    Lampson B (2001) The abcd’s of Paxos. In: PODC ’01: Proceedings of the twentieth annual ACM symposium on principles of distributed computing. ACM, New York

    Google Scholar 

  29. 29.

    Martin JP, Alvisi L (2006) Fast Byzantine consensus. IEEE Trans Dependable Secure Comput 3(3):202–215

    Article  Google Scholar 

  30. 30.

    Pedone F, Guerraoui R, Schiper A (2003) The database state machine approach. Distrib Parallel Databases 14(1):71–98

    Article  Google Scholar 

  31. 31.

    Pedone F, Schiper A (1999) Generic broadcast. In: Proceedings of the 13th international symposium on distributed computing (DISC’99, formerly WDAG)

  32. 32.

    Pedone F, Schiper A (2002) Handling message semantics with generic broadcast protocols. Distrib Comput 15(2):97–107

    Article  Google Scholar 

  33. 33.

    Pedone F, Schiper A, Urbán P, Cavin D (2002) Solving agreement problems with weak ordering oracles. In: EDCC-4: proceedings of the 4th European dependable computing conference on dependable computing. Springer, London, pp 44–61

    Google Scholar 

  34. 34.

    Pedone F, Schiper A, Urbán P, Cavin D (2002) Weak ordering oracles for failure detection-free systems. In: Proceedings of the international conference on dependable systems and networks (DSN), supplemental volume

  35. 35.

    Rabin MO (1983) Randomized Byzantine generals. In: Proceedings of the 24th annual IEEE symposium on foundations of computer science, pp 403–409

  36. 36.

    Schiper A (1997) Early consensus in an asynchronous system with a weak failure detector. Distrib Comput 10(3):149–157

    Article  Google Scholar 

  37. 37.

    Schmidt R, Camargos L, Pedone F (2007) On collision-fast atomic broadcast. Technical report, EPFL

  38. 38.

    Sousa A, Pereira J, Moura F, Oliveira R (2002) Optimistic total order in wide area networks. In: Proceedings of the 21st IEEE symposium on reliable distributed systems. IEEE Computer Society, New York, pp 190–199

    Google Scholar 

  39. 39.

    Vicente P, Rodrigues L (2002) An indulgent uniform total order algorithm with optimistic delivery. In: Proceedings of the 21st symposium on reliable distributed systems, Osaka University, Suita, Japan. IEEE, New York, pp 92–101

    Google Scholar 

  40. 40.

    Zielinski P (2004) Paxos at war. Technical Report UCAM-CL-TR-593, University of Cambridge, Computer Laboratory

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lasaro Camargos.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Camargos, L., Schmidt, R., Madeira, E. et al. Multicoordinated agreement for groups of agents. J Braz Comput Soc 16, 49–68 (2010). https://doi.org/10.1007/s13173-010-0001-7

Download citation

Keywords

  • Multicoordinated
  • Agreement
  • Consensus
  • Broadcast
  • Groups