Skip to main content

Efficient and robust adaptive consensus services based on oracles

Abstract

Due to their fundamental role in the design of fault-tolerant distributed systems, consensus protocols have been widely studied. Most of the research in this area has focused on providing ways for circumventing the impossibility of reaching consensus on a purely asynchronous system subject to failures. Of particular interest are the indulgent consensus protocols based upon weak failure detection oracles. Following the first works that were more concerned with the correctness of such protocols, performance issues related to them are now a topic that has gained considerable attention. In particular, a few studies have been conducted to analyze the impact that the quality of service of the underlying failure detection oracle has on the performance of consensus protocols. To achieve better performance, adaptive failure detectors have been proposed. Also, slowness oracles have been proposed to allow consensus protocols to adapt themselves to the changing conditions of the environment, enhancing their performance when there are substantial changes on the load to which the system is exposed. In this paper we further investigate the use of these oracles to design efficient consensus services. In particular, we provide efficient and robust implementations of slowness oracles based on techniques that have been previously used to implement adaptive failure detection oracles. Our experiments on a wide-area distributed system show that by using a slowness oracle that is well matched with a failure detection oracle, one can achieve performance as much as 53.5% better than the alternative that does not use a slowness oracle.

References

  1. T. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems.Journal of the ACM, 43(2):225–267, March 1996.

    Article  MATH  MathSciNet  Google Scholar 

  2. T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus.Journal of the ACM, 43(4):685–722, July 1996.

    Article  MATH  MathSciNet  Google Scholar 

  3. W. Chen, S. Toueg, and M. K. Aguilera. On the quality of service of failure detectors. InInternational Conference on Dependable Systems and Networks (DSN’2000), pages 191–200, New York, USA, Jun 2000. IEEE Computer Society.

  4. A. Coccoli, P. Urbán, A. Bondavalli, and A. Schiper. Performance analysis of a consensus algorithm combining stochastic activity networks and measurements. InInternational Conference on Dependable Systems and Networks (DSN’2002), pages 551–560, Washington, D.C., USA, June 2002. IEEE Computer Society.

  5. P. Felber, R. Guerraoui, X. Défago, and P. Oser. Failure detector as first class objects. InInternational Symposium on Distributed Objects and Applications, pages 132–141, Edinburgh, Scotland, September 1999. IEEE Computer Society.

  6. M. J. Fischer, N. A. Lynch, and M. D. Paterson. Impossibility of distributed consensus with one faulty process.Journal of ACM, 32(2):374–382, April 1985.

    Article  MATH  MathSciNet  Google Scholar 

  7. R. Guerraoui and M. Raynal. The information structure of indulgent consensus.IEEE Transactions on Computers, 53(4):453–466, April 2004.

    Article  Google Scholar 

  8. R. Guerraoui and A. Schiper. The generic consensus service.IEEE Transactions on Software Engineering, 27(1):29–41, January 2001.

    Article  MathSciNet  Google Scholar 

  9. M. Hurfin, R. Macêdo, M. Raynal, and F. Tronel. A generic framework to solve agreement problems. InProceedings of the 19th IEEE Symposium on Reliable Distributed Systems (SRDS’99), pages 56–65, Lausanne, Switzerland, October 1999. IEEE Computer Society.

  10. M. Hurfin, J.P. Le Narzul, J. Pley, and Ph. Raipin Parvédy. A fault-tolerant protocol for resource allocation in a grid dedicated to genomic applications. InProceedings of the 5th International Conference on Parallel Processing and Applied Mathematics, Special Session on Parallel and Distributed Bioinformatic Applications (PPAM-03), volume 3019 ofLNCS, pages 1154–1161, Czestochowa, Poland, September 2003. Springer.

  11. I. Keidar and S. Rajsbaum. Open questions on consensus performance. InFuture Directions in Distributed Computing, LNCS-2584, Springer, volume 2584 ofLNCS, pages 35-39. Springer, 2003.

  12. L. Lamport. Paxos made simple.SIGACT News, 32(4):18–25, December 2001.

    Google Scholar 

  13. A. Mostéfaoui and M. Raynal. Leader-based consensus.Parallel Processing Letters, 11(1):95–107, March 2001.

    Article  MathSciNet  Google Scholar 

  14. R. C. Nunes and I. Jansch-Pôrto. Modelling communication delays in distributed systems using time series. InProceedings of the 21st IEEE Symposium on Reliable Distributed Systems (SRDS’2002), pages 268–273, Osaka, Japan, 2002.

  15. R. C. Nunes and I. Jansch-Pôrto. A lightweight interface to predict communication delays. InProceedings of the First Latin American Symposium (LADC’2003), volume 2847 of LNCS, pages 245–263, São Paulo, Brazil, October 2003. Springer.

  16. R. C. Nunes and I. Jansch-Pôrto. Qos of timeout-based self-tuned failure detectors: the effects of the communication delay predictor and the safety margin. InProceedings of the International Conference on Dependable Systems and Networks (DSN’2004), pages 753–761, Florence, Italy, June 2004. IEEE Computer Society.

  17. L. M. R. Sampaio, F. V. Brasileiro, W. da C. Cirne, and J. C. A. de Figueiredo. How bad are wrong suspicions? towards adaptive distributed protocols. InProceedings of the International Conference on Dependable Systems and Networks (DSN’2003), pages 551–560, San Francisco, California, USA, June 2003. IEEE Computer Society.

  18. A. Schiper. Early consensus in an asynchronous system with a weak failure detector.Distributed Computing, 10(3):149–157, April 1997.

    Article  Google Scholar 

  19. N. Sergent, X. Défago, and A. Schiper. Impact of a failure detection mechanism on the performance of consensus. InProceedings of the 2001 Pacific Rim International Symposium on Dependable Computing (PRDC’2001), pages 137–145, Seoul, Korea, December 2001. IEEE Computer Society.

  20. J. Turek and D. Shasha. The many faces of consensus in distributed systems.IEEE Computer, 25(6):8–17, June 1992.

    Google Scholar 

  21. P. Urbán, X. Défago, and A. Schiper. Contention-aware metrics for distributed algorithms: comparison of atomic broadcast algorithms. InProceedings of the 9th IEEE International Conference on Computer Communications and Networks (IC3N’2000), pages 80–92, Las Vegas, Nevada, USA, October 2000. IEEE Computer Society.

  22. P. Urbán, X. Défago, and A. Schiper. Neko: a single environment to simulate and prototype distributed algorithms. InProceeding of the 15th International Conference on Information Networking (ICOIN-15), pages 503–511, Beppu City, Japan, February 2001. IEEE Computer Society.

  23. P. Urbán, N. Hayashibara, A. Schiper, and T. Katayama. Performance comparison of a rotating coordinator and a leader based consensus algorithm. InProceedings of the 23rd Symposium on Reliable Distributed Systems (SRDS’2004), pages 4–17, Florianópolis, Brazil, October 2004. IEEE Computer Society.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Sampaio, L., Brasileiro, F., Nunes, R.C. et al. Efficient and robust adaptive consensus services based on oracles. J Braz Comp Soc 10, 31–41 (2004). https://doi.org/10.1007/BF03192364

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03192364

Keywords