A generalized model for distributed comparison-based system-level diagnosis

Albini, Luiz Carlos Pessoa; Duarte, Elias Procópio; Ziwich, Roverli Pereira

doi:10.1007/BF03192365

Open access
Published: October 2004

A generalized model for distributed comparison-based system-level diagnosis

Luiz Carlos Pessoa Albini¹,
Elias Procópio Duarte² &
Roverli Pereira Ziwich²

Journal of the Brazilian Computer Society volume 10, pages 42–54 (2004)Cite this article

485 Accesses
2 Citations
Metrics details

Abstract

This work introduces a new system-level diagnosis model and an algorithm based on this model: Hi-Comp (Hierarchical Comparison-based Adaptive Distributed System-Level Diagnosis algorithm). This algorithm allows the diagnosis of systems that can be represented by a complete graph. Hi-Comp is the first diagnosis algorithm that is, at the same time, hierarchical, distributed and comparison-based. The algorithm is not limited to crash fault diagnosis, because its tests are based on comparisons. To perform a test, a processor sends a task to two processors of the system that, after executing the task, send their outputs back to the tester. The tester compares the two outputs; if the comparison produces a match, the tester considers the tested processors fault-free; on the other hand, if the comparison produces a mismatch, the tester considers that at least one of the two tested processors is faulty, but can not determine which one. Considering a system of N nodes, it is proved that the algorithm’s diagnosability is (N-1) and the latency is log₂N testing rounds. Furthermore, a formal proof of the maximum number of tests required per testing round is presented, which can be O(N³). Simulation results are also presented.

References

A. Subbiah, and D.M. Blough, “Distributed Diagnosis in Dynamic Fault Environments,”IEEE Transactions on Paralel and Distributed Systems, Vol. 15 No. 5, pp. 453–467, 2004.
Article Google Scholar
G. Masson, D. Blough, and G. Sullivan, “System Diagnosis,”Fault-Tolerant Computer System Design, ed. D.K. Pradhan, Prentice-Hall, 1996.
F. Preparata, G. Metze, and R.T. Chien, “On The Connection Assignment Problem of Diagnosable Systems,”IEEE Transactions on Electronic Computers, Vol. 16, pp. 848–854, 1968.
Article Google Scholar
S.L. Hakimi, and A.T. Amin, “Characterization of Connection Assignments of Diagnosable Systems,”IEEE Transactions on Computers, Vol. 23, pp. 86–88, 1974.
Article MATH MathSciNet Google Scholar
S.L. Hakimi, and K. Nakajima, “On Adaptive System Diagnosis,”IEEE Transactions on Computers, Vol. 33, pp. 234–240, 1984.
Article MATH MathSciNet Google Scholar
S.H. Hosseini, J.G. Kuhl, and S.M. Reddy, “A Diagnosis Algorithm for Distributed Computing Systems with Failure and Repair,”IEEE Transactions on Computers, Vol. 33, pp. 223–233, 1984.
Article MATH Google Scholar
E.P. Duarte Jr., and T. Nanya, “A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm,”IEEE Transactions on Computers, Vol.47, pp. 34–45, 1998.
Article Google Scholar
R.P. Bianchini, and R. Buskens, “Implementation of On-Line Distributed System-Level Diagnosis Theory,”IEEE Transactions on Computers, Vol. 41, pp. 616–626, 1992.
Article Google Scholar
A. Brawerman, and E.P. Duarte Jr., “A Synchronous Testing Strategy for Hierarchical Adaptive Distributed System-Level Diagnosis,”Journal of Electronic Testing Theory and Applications, Vol. 17, No. 2, pp. 185–195, 2001.
Article Google Scholar
E.P. Duarte Jr., A. Brawerman, and L.C.P. Albini, “An Algorithm for Distributed Hierarchical Diagnosis of Dynamic Fault and Repair Events,”Proc. IEEE ICPADS’00, pp. 299–306, 2000.
S. Lee, and K.G. Shin, “Probabilistic Diagnosis of Multiprocessor Systems,”ACM Computing Surveys, Vol. 26, No. 1, pp. 121–139, 1994.
Article MathSciNet Google Scholar
M. Malek, “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,”Proc. Seventh Int’l Symp. Computer Architecture, pp. 31–36, 1980.
K.Y. Chwa, and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems,”Information and Control, Vol. 49, pp. 212–238, 1981.
Article MATH MathSciNet Google Scholar
J. Maeng, and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems,”Digest 11th Int’l Symp. Fault Tolerant Computing, pp. 173–175, 1981.
A. Sengupta, and A.T. Dahbura, “On Self-Diagnosable Multiprocessor Systems: Diagnosis by Comparison Approach,”IEEE Transactions on Computers, Vol. 41, No. 11, pp. 1386–1396, 1992.
Article MathSciNet Google Scholar
D.M. Blough, and H.W. Brown, “The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation,”IEEE Transactions on Computers, Vol. 48, pp. 470–493, 1999.
Article MathSciNet Google Scholar
D. Wang, “Diagnosability of Hipercubes and Enhanced Hypercubes under the Comparison Diagnosis Model,”IEEE Transactions on Computers, Vol. 48, No. 12, pp. 1369–1374, 1999.
Article Google Scholar
G.S. Almasi, and A. Gottlieb, Highly Parallel Computing, The Benjamim/Commings Publishing Company Inc., 1994.
C. Xavier, and S.S. Iyengar,Introduction to Parallel Algorithms, Wiley-Intersciense Publication, 1998.
N.F. Tzeng, and S. Wei, “Enhanced Hypercubes,”IEEE Transactions on Computers, Vol. 40, No. 3, pp. 284–294, Mar. 1991.
Article Google Scholar
T. Araki, and Y. Shibata, “Diagnosability of Butterfly Networks under the Comparison Approach,”IEICE Trans. Fundamentals, Vol E85-A, No. 5, Maio 2002.
F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, San Mateo, CA, 1992.
MATH Google Scholar
J. Fan, “Diagnosability of Crossed Cubes,”IEEE Transactions on Computers, Vol. 13, No. 10, pp. 1099–1104, Out. 2002.
Google Scholar
F. Harary,Graph Theory, Addison-Wesley Publishing Company, 1971.
S. Rangarajan, A.T. Dahbura, and E.A. Ziegler, “A Distributed System-Level Diagnosis for Arbitrary Network Topologies,”IEEE Transactions on Computers, Vol. 44, No. 2, pp. 312–333, 1995.
Article MATH Google Scholar
M.H. MacDougall,Simulating Computer Systems: Techniques and Tools, The MIT Press, Cambridge, MA, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Unicenp, Dept. Informatics R. Prof. Pedro Viriato Parigot de Souza, 5300, 81280-330, Curitiba, PR, Brazil
Luiz Carlos Pessoa Albini
Federal University of Paraná, Dept. Informatics, P.O.Box 19081, 81531-990, Curitiba, PR, Brazil
Elias Procópio Duarte & Roverli Pereira Ziwich

Authors

Luiz Carlos Pessoa Albini
View author publications
You can also search for this author in PubMed Google Scholar
Elias Procópio Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Roverli Pereira Ziwich
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Albini, L.C.P., Duarte, E.P. & Ziwich, R.P. A generalized model for distributed comparison-based system-level diagnosis. J Braz Comp Soc 10, 42–54 (2004). https://doi.org/10.1007/BF03192365

Download citation

Issue Date: October 2004
DOI: https://doi.org/10.1007/BF03192365

A generalized model for distributed comparison-based system-level diagnosis

Abstract

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords