| Open | Published:
A generalized model for distributed comparison-based system-level diagnosis
Journal of the Brazilian Computer Societyvolume 10, pages42–54 (2004)
This work introduces a new system-level diagnosis model and an algorithm based on this model: Hi-Comp (Hierarchical Comparison-based Adaptive Distributed System-Level Diagnosis algorithm). This algorithm allows the diagnosis of systems that can be represented by a complete graph. Hi-Comp is the first diagnosis algorithm that is, at the same time, hierarchical, distributed and comparison-based. The algorithm is not limited to crash fault diagnosis, because its tests are based on comparisons. To perform a test, a processor sends a task to two processors of the system that, after executing the task, send their outputs back to the tester. The tester compares the two outputs; if the comparison produces a match, the tester considers the tested processors fault-free; on the other hand, if the comparison produces a mismatch, the tester considers that at least one of the two tested processors is faulty, but can not determine which one. Considering a system of N nodes, it is proved that the algorithm’s diagnosability is (N-1) and the latency is log2N testing rounds. Furthermore, a formal proof of the maximum number of tests required per testing round is presented, which can be O(N3). Simulation results are also presented.
A. Subbiah, and D.M. Blough, “Distributed Diagnosis in Dynamic Fault Environments,”IEEE Transactions on Paralel and Distributed Systems, Vol. 15 No. 5, pp. 453–467, 2004.
G. Masson, D. Blough, and G. Sullivan, “System Diagnosis,”Fault-Tolerant Computer System Design, ed. D.K. Pradhan, Prentice-Hall, 1996.
F. Preparata, G. Metze, and R.T. Chien, “On The Connection Assignment Problem of Diagnosable Systems,”IEEE Transactions on Electronic Computers, Vol. 16, pp. 848–854, 1968.
S.L. Hakimi, and A.T. Amin, “Characterization of Connection Assignments of Diagnosable Systems,”IEEE Transactions on Computers, Vol. 23, pp. 86–88, 1974.
S.L. Hakimi, and K. Nakajima, “On Adaptive System Diagnosis,”IEEE Transactions on Computers, Vol. 33, pp. 234–240, 1984.
S.H. Hosseini, J.G. Kuhl, and S.M. Reddy, “A Diagnosis Algorithm for Distributed Computing Systems with Failure and Repair,”IEEE Transactions on Computers, Vol. 33, pp. 223–233, 1984.
E.P. Duarte Jr., and T. Nanya, “A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm,”IEEE Transactions on Computers, Vol.47, pp. 34–45, 1998.
R.P. Bianchini, and R. Buskens, “Implementation of On-Line Distributed System-Level Diagnosis Theory,”IEEE Transactions on Computers, Vol. 41, pp. 616–626, 1992.
A. Brawerman, and E.P. Duarte Jr., “A Synchronous Testing Strategy for Hierarchical Adaptive Distributed System-Level Diagnosis,”Journal of Electronic Testing Theory and Applications, Vol. 17, No. 2, pp. 185–195, 2001.
E.P. Duarte Jr., A. Brawerman, and L.C.P. Albini, “An Algorithm for Distributed Hierarchical Diagnosis of Dynamic Fault and Repair Events,”Proc. IEEE ICPADS’00, pp. 299–306, 2000.
S. Lee, and K.G. Shin, “Probabilistic Diagnosis of Multiprocessor Systems,”ACM Computing Surveys, Vol. 26, No. 1, pp. 121–139, 1994.
M. Malek, “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,”Proc. Seventh Int’l Symp. Computer Architecture, pp. 31–36, 1980.
K.Y. Chwa, and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems,”Information and Control, Vol. 49, pp. 212–238, 1981.
J. Maeng, and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems,”Digest 11th Int’l Symp. Fault Tolerant Computing, pp. 173–175, 1981.
A. Sengupta, and A.T. Dahbura, “On Self-Diagnosable Multiprocessor Systems: Diagnosis by Comparison Approach,”IEEE Transactions on Computers, Vol. 41, No. 11, pp. 1386–1396, 1992.
D.M. Blough, and H.W. Brown, “The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation,”IEEE Transactions on Computers, Vol. 48, pp. 470–493, 1999.
D. Wang, “Diagnosability of Hipercubes and Enhanced Hypercubes under the Comparison Diagnosis Model,”IEEE Transactions on Computers, Vol. 48, No. 12, pp. 1369–1374, 1999.
G.S. Almasi, and A. Gottlieb, Highly Parallel Computing, The Benjamim/Commings Publishing Company Inc., 1994.
C. Xavier, and S.S. Iyengar,Introduction to Parallel Algorithms, Wiley-Intersciense Publication, 1998.
N.F. Tzeng, and S. Wei, “Enhanced Hypercubes,”IEEE Transactions on Computers, Vol. 40, No. 3, pp. 284–294, Mar. 1991.
T. Araki, and Y. Shibata, “Diagnosability of Butterfly Networks under the Comparison Approach,”IEICE Trans. Fundamentals, Vol E85-A, No. 5, Maio 2002.
F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, San Mateo, CA, 1992.
J. Fan, “Diagnosability of Crossed Cubes,”IEEE Transactions on Computers, Vol. 13, No. 10, pp. 1099–1104, Out. 2002.
F. Harary,Graph Theory, Addison-Wesley Publishing Company, 1971.
S. Rangarajan, A.T. Dahbura, and E.A. Ziegler, “A Distributed System-Level Diagnosis for Arbitrary Network Topologies,”IEEE Transactions on Computers, Vol. 44, No. 2, pp. 312–333, 1995.
M.H. MacDougall,Simulating Computer Systems: Techniques and Tools, The MIT Press, Cambridge, MA, 1987.