Skip to main content

A generalized model for distributed comparison-based system-level diagnosis

Abstract

This work introduces a new system-level diagnosis model and an algorithm based on this model: Hi-Comp (Hierarchical Comparison-based Adaptive Distributed System-Level Diagnosis algorithm). This algorithm allows the diagnosis of systems that can be represented by a complete graph. Hi-Comp is the first diagnosis algorithm that is, at the same time, hierarchical, distributed and comparison-based. The algorithm is not limited to crash fault diagnosis, because its tests are based on comparisons. To perform a test, a processor sends a task to two processors of the system that, after executing the task, send their outputs back to the tester. The tester compares the two outputs; if the comparison produces a match, the tester considers the tested processors fault-free; on the other hand, if the comparison produces a mismatch, the tester considers that at least one of the two tested processors is faulty, but can not determine which one. Considering a system of N nodes, it is proved that the algorithm’s diagnosability is (N-1) and the latency is log2N testing rounds. Furthermore, a formal proof of the maximum number of tests required per testing round is presented, which can be O(N3). Simulation results are also presented.

References

  1. A. Subbiah, and D.M. Blough, “Distributed Diagnosis in Dynamic Fault Environments,”IEEE Transactions on Paralel and Distributed Systems, Vol. 15 No. 5, pp. 453–467, 2004.

    Article  Google Scholar 

  2. G. Masson, D. Blough, and G. Sullivan, “System Diagnosis,”Fault-Tolerant Computer System Design, ed. D.K. Pradhan, Prentice-Hall, 1996.

  3. F. Preparata, G. Metze, and R.T. Chien, “On The Connection Assignment Problem of Diagnosable Systems,”IEEE Transactions on Electronic Computers, Vol. 16, pp. 848–854, 1968.

    Article  Google Scholar 

  4. S.L. Hakimi, and A.T. Amin, “Characterization of Connection Assignments of Diagnosable Systems,”IEEE Transactions on Computers, Vol. 23, pp. 86–88, 1974.

    Article  MATH  MathSciNet  Google Scholar 

  5. S.L. Hakimi, and K. Nakajima, “On Adaptive System Diagnosis,”IEEE Transactions on Computers, Vol. 33, pp. 234–240, 1984.

    Article  MATH  MathSciNet  Google Scholar 

  6. S.H. Hosseini, J.G. Kuhl, and S.M. Reddy, “A Diagnosis Algorithm for Distributed Computing Systems with Failure and Repair,”IEEE Transactions on Computers, Vol. 33, pp. 223–233, 1984.

    Article  MATH  Google Scholar 

  7. E.P. Duarte Jr., and T. Nanya, “A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm,”IEEE Transactions on Computers, Vol.47, pp. 34–45, 1998.

    Article  Google Scholar 

  8. R.P. Bianchini, and R. Buskens, “Implementation of On-Line Distributed System-Level Diagnosis Theory,”IEEE Transactions on Computers, Vol. 41, pp. 616–626, 1992.

    Article  Google Scholar 

  9. A. Brawerman, and E.P. Duarte Jr., “A Synchronous Testing Strategy for Hierarchical Adaptive Distributed System-Level Diagnosis,”Journal of Electronic Testing Theory and Applications, Vol. 17, No. 2, pp. 185–195, 2001.

    Article  Google Scholar 

  10. E.P. Duarte Jr., A. Brawerman, and L.C.P. Albini, “An Algorithm for Distributed Hierarchical Diagnosis of Dynamic Fault and Repair Events,”Proc. IEEE ICPADS’00, pp. 299–306, 2000.

  11. S. Lee, and K.G. Shin, “Probabilistic Diagnosis of Multiprocessor Systems,”ACM Computing Surveys, Vol. 26, No. 1, pp. 121–139, 1994.

    Article  MathSciNet  Google Scholar 

  12. M. Malek, “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,”Proc. Seventh Int’l Symp. Computer Architecture, pp. 31–36, 1980.

  13. K.Y. Chwa, and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems,”Information and Control, Vol. 49, pp. 212–238, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  14. J. Maeng, and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems,”Digest 11th Int’l Symp. Fault Tolerant Computing, pp. 173–175, 1981.

  15. A. Sengupta, and A.T. Dahbura, “On Self-Diagnosable Multiprocessor Systems: Diagnosis by Comparison Approach,”IEEE Transactions on Computers, Vol. 41, No. 11, pp. 1386–1396, 1992.

    Article  MathSciNet  Google Scholar 

  16. D.M. Blough, and H.W. Brown, “The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation,”IEEE Transactions on Computers, Vol. 48, pp. 470–493, 1999.

    Article  MathSciNet  Google Scholar 

  17. D. Wang, “Diagnosability of Hipercubes and Enhanced Hypercubes under the Comparison Diagnosis Model,”IEEE Transactions on Computers, Vol. 48, No. 12, pp. 1369–1374, 1999.

    Article  Google Scholar 

  18. G.S. Almasi, and A. Gottlieb, Highly Parallel Computing, The Benjamim/Commings Publishing Company Inc., 1994.

  19. C. Xavier, and S.S. Iyengar,Introduction to Parallel Algorithms, Wiley-Intersciense Publication, 1998.

  20. N.F. Tzeng, and S. Wei, “Enhanced Hypercubes,”IEEE Transactions on Computers, Vol. 40, No. 3, pp. 284–294, Mar. 1991.

    Article  Google Scholar 

  21. T. Araki, and Y. Shibata, “Diagnosability of Butterfly Networks under the Comparison Approach,”IEICE Trans. Fundamentals, Vol E85-A, No. 5, Maio 2002.

  22. F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, San Mateo, CA, 1992.

    MATH  Google Scholar 

  23. J. Fan, “Diagnosability of Crossed Cubes,”IEEE Transactions on Computers, Vol. 13, No. 10, pp. 1099–1104, Out. 2002.

    Google Scholar 

  24. F. Harary,Graph Theory, Addison-Wesley Publishing Company, 1971.

  25. S. Rangarajan, A.T. Dahbura, and E.A. Ziegler, “A Distributed System-Level Diagnosis for Arbitrary Network Topologies,”IEEE Transactions on Computers, Vol. 44, No. 2, pp. 312–333, 1995.

    Article  MATH  Google Scholar 

  26. M.H. MacDougall,Simulating Computer Systems: Techniques and Tools, The MIT Press, Cambridge, MA, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Albini, L.C.P., Duarte, E.P. & Ziwich, R.P. A generalized model for distributed comparison-based system-level diagnosis. J Braz Comp Soc 10, 42–54 (2004). https://doi.org/10.1007/BF03192365

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03192365

Keywords