Skip to main content

Advertisement

You are viewing the new article page. Let us know what you think. Return to old version

| Open | Published:

A generalized model for distributed comparison-based system-level diagnosis

Abstract

This work introduces a new system-level diagnosis model and an algorithm based on this model: Hi-Comp (Hierarchical Comparison-based Adaptive Distributed System-Level Diagnosis algorithm). This algorithm allows the diagnosis of systems that can be represented by a complete graph. Hi-Comp is the first diagnosis algorithm that is, at the same time, hierarchical, distributed and comparison-based. The algorithm is not limited to crash fault diagnosis, because its tests are based on comparisons. To perform a test, a processor sends a task to two processors of the system that, after executing the task, send their outputs back to the tester. The tester compares the two outputs; if the comparison produces a match, the tester considers the tested processors fault-free; on the other hand, if the comparison produces a mismatch, the tester considers that at least one of the two tested processors is faulty, but can not determine which one. Considering a system of N nodes, it is proved that the algorithm’s diagnosability is (N-1) and the latency is log2N testing rounds. Furthermore, a formal proof of the maximum number of tests required per testing round is presented, which can be O(N3). Simulation results are also presented.

References

  1. [1]

    A. Subbiah, and D.M. Blough, “Distributed Diagnosis in Dynamic Fault Environments,”IEEE Transactions on Paralel and Distributed Systems, Vol. 15 No. 5, pp. 453–467, 2004.

  2. [2]

    G. Masson, D. Blough, and G. Sullivan, “System Diagnosis,”Fault-Tolerant Computer System Design, ed. D.K. Pradhan, Prentice-Hall, 1996.

  3. [3]

    F. Preparata, G. Metze, and R.T. Chien, “On The Connection Assignment Problem of Diagnosable Systems,”IEEE Transactions on Electronic Computers, Vol. 16, pp. 848–854, 1968.

  4. [4]

    S.L. Hakimi, and A.T. Amin, “Characterization of Connection Assignments of Diagnosable Systems,”IEEE Transactions on Computers, Vol. 23, pp. 86–88, 1974.

  5. [5]

    S.L. Hakimi, and K. Nakajima, “On Adaptive System Diagnosis,”IEEE Transactions on Computers, Vol. 33, pp. 234–240, 1984.

  6. [6]

    S.H. Hosseini, J.G. Kuhl, and S.M. Reddy, “A Diagnosis Algorithm for Distributed Computing Systems with Failure and Repair,”IEEE Transactions on Computers, Vol. 33, pp. 223–233, 1984.

  7. [7]

    E.P. Duarte Jr., and T. Nanya, “A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm,”IEEE Transactions on Computers, Vol.47, pp. 34–45, 1998.

  8. [8]

    R.P. Bianchini, and R. Buskens, “Implementation of On-Line Distributed System-Level Diagnosis Theory,”IEEE Transactions on Computers, Vol. 41, pp. 616–626, 1992.

  9. [9]

    A. Brawerman, and E.P. Duarte Jr., “A Synchronous Testing Strategy for Hierarchical Adaptive Distributed System-Level Diagnosis,”Journal of Electronic Testing Theory and Applications, Vol. 17, No. 2, pp. 185–195, 2001.

  10. [10]

    E.P. Duarte Jr., A. Brawerman, and L.C.P. Albini, “An Algorithm for Distributed Hierarchical Diagnosis of Dynamic Fault and Repair Events,”Proc. IEEE ICPADS’00, pp. 299–306, 2000.

  11. [11]

    S. Lee, and K.G. Shin, “Probabilistic Diagnosis of Multiprocessor Systems,”ACM Computing Surveys, Vol. 26, No. 1, pp. 121–139, 1994.

  12. [12]

    M. Malek, “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,”Proc. Seventh Int’l Symp. Computer Architecture, pp. 31–36, 1980.

  13. [13]

    K.Y. Chwa, and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems,”Information and Control, Vol. 49, pp. 212–238, 1981.

  14. [14]

    J. Maeng, and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems,”Digest 11th Int’l Symp. Fault Tolerant Computing, pp. 173–175, 1981.

  15. [15]

    A. Sengupta, and A.T. Dahbura, “On Self-Diagnosable Multiprocessor Systems: Diagnosis by Comparison Approach,”IEEE Transactions on Computers, Vol. 41, No. 11, pp. 1386–1396, 1992.

  16. [16]

    D.M. Blough, and H.W. Brown, “The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation,”IEEE Transactions on Computers, Vol. 48, pp. 470–493, 1999.

  17. [17]

    D. Wang, “Diagnosability of Hipercubes and Enhanced Hypercubes under the Comparison Diagnosis Model,”IEEE Transactions on Computers, Vol. 48, No. 12, pp. 1369–1374, 1999.

  18. [18]

    G.S. Almasi, and A. Gottlieb, Highly Parallel Computing, The Benjamim/Commings Publishing Company Inc., 1994.

  19. [19]

    C. Xavier, and S.S. Iyengar,Introduction to Parallel Algorithms, Wiley-Intersciense Publication, 1998.

  20. [20]

    N.F. Tzeng, and S. Wei, “Enhanced Hypercubes,”IEEE Transactions on Computers, Vol. 40, No. 3, pp. 284–294, Mar. 1991.

  21. [21]

    T. Araki, and Y. Shibata, “Diagnosability of Butterfly Networks under the Comparison Approach,”IEICE Trans. Fundamentals, Vol E85-A, No. 5, Maio 2002.

  22. [22]

    F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, San Mateo, CA, 1992.

  23. [23]

    J. Fan, “Diagnosability of Crossed Cubes,”IEEE Transactions on Computers, Vol. 13, No. 10, pp. 1099–1104, Out. 2002.

  24. [24]

    F. Harary,Graph Theory, Addison-Wesley Publishing Company, 1971.

  25. [25]

    S. Rangarajan, A.T. Dahbura, and E.A. Ziegler, “A Distributed System-Level Diagnosis for Arbitrary Network Topologies,”IEEE Transactions on Computers, Vol. 44, No. 2, pp. 312–333, 1995.

  26. [26]

    M.H. MacDougall,Simulating Computer Systems: Techniques and Tools, The MIT Press, Cambridge, MA, 1987.

Download references

Author information

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • Distributed Diagnosis
  • System-Level Diagnosis
  • Comparison-Based Diagnosis