Skip to main content

Robust assertions and fail-bounded behavior

Abstract

In this paper the behavior of assertion-based error detection mechanisms is characterized under faults injected according to a quite general fault model. Assertions based on the knowledge of the application can be very effective at detecting corruption of critical data caused by hardware faults. The main drawbacks of that approach are identified as being the lack of protection of data outside the section covered by assertions, namely during input and output, and the possible incorrect execution of the assertions.

To handle those weak-points the Robust Assertions technique is proposed, whose effectiveness is shown by extensive fault injection experiments. With this technique a system follows a new failure model, that is called Fail-Bounded, where with high probability all results produced are either correct or, if wrong, they are within a certain bound of the correct value, whose exact distance depends on the output assertions used.

Any kind of assertions can be considered, from simple likelihood tests to high coverage assertions such as those used in the Algorithm Based Fault Tolerance paradigm. We claim that this failure model is very useful to describe the behavior of many low-cost fault-tolerant systems, that have low hardware and software redundancy, like embedded systems, were cost is a severe restriction, yet full availability is expected.

References

  1. A. Campbell, P. McDonald and K. Ray. Single Event Upset Rates in Space. IEEE Trans. on Nuclear Science, 39(6): 1828–1835, 1992.

    Article  Google Scholar 

  2. C. Constantinescu. Impact of Deep Submicron Technology on Dependability of VLSI Circuits. In Proc. Int’l Conf. on Dependable Systems and Networks. Pages 205–209, 2002.

  3. P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger and L. Alvisi. Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic.In Proc. Int’l Conf. on Dependable Systems and Networks. Pages 389–398, 2002.

  4. D. Powell, P. Veríssimo, G. Bonn, F. Waeselynck, and D. Seaton. The Delta-4 Approach to Dependability in Open Distributed Computing Systems.In Proc. 18th Int’l Symp. Fault-Tolerant Computing. Pages 246–251, 1988.

  5. H. Madeira and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior in Computers Without Error Masking.In Proc. 24th Int’l Symp. Fault Tolerant Computing Systems. Pages 350–359, 1994.

  6. M. Z. Rela, H. Madeira, and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior of Programs with Consistency Checks.In Proc. 26th Int’l Symp. Fault-Tolerant Computing. Pages 394–403, 1996.

  7. J. G. Silva, J. Carreira, H. Madeira, D. Costa, and F. Moreira. Experimental Assessment of Parallel Systems.In Proc. 26th Int’l Symp. Fault-Tolerant Computing. Pages 415–424, 1996.

  8. A. Mahmood and E. J. McCluskey. Concurrent Error Detection Using Watchdog Processors — A Survey.IEEE Trans. Computers, 37(2): 160–174, 1988.

    Article  Google Scholar 

  9. K. Wilken and J. P. Shen. Continous Monitoring: Low-Cost Concurrent Detection of Processor Control Errors.IEEE Trans. on Computer-Aided Design, 9(6): 629–641, 1990.

    Article  Google Scholar 

  10. Z. Alkhalifa, V. S. S. Nair, N. Krishnamurthy, J. A. Abraham. Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection.IEEE Trans. on Parallel and Distributed Systems, 10(6): 627–641, 1999.

    Article  Google Scholar 

  11. G. Miremadi, J. Ohlsson, M. Rimen, and J. Karlsson. Use of Time and Address Signatures for Control Flow Checking.5th Int’l IFIP Conference on Dependable Computing for Critical Applications (DCCA-5), ISBN 0-8186-7803-8, IEEE Computer Society Press, February 1998.

  12. A. Steininger and C. Scherrer. On Finding An Optimal Combination Of Error Detection Mechanisms Based On Results Of Fault Injection Experiments.In Proc. of the 27th Int’l Symp. on Fault-Tolerant Computing, IEEE Computer Society Press, 1997.

  13. A. Mahmood, E. J. McCluskey, and D. J. Lu. Concurrent Fault Detection Using a Watchdog Processor and Assertions.In Proc. Int’l Test Conference. Pages 622–628, 1983.

  14. M. Turmon, R. Granat, and D. S. Katz. Software-Implemented Fault Detection for High-Performance Space Applications.In Proc. 30th Int’l Conf. on Dependable Systems and Networks (FTCS-30 & DCCA-8). Pages 107–116, 2000.

  15. K.-H. Huang and J. A. Abraham. Algorithm-Based Fault Tolerance for Matrix Operations.IEEE Trans. Computers, c-33(6):518–528, 1984.

    Article  MATH  Google Scholar 

  16. L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem.ACM Trans. Prog. Lang. Syst. 4(3):382–401, 1982.

    Article  MATH  Google Scholar 

  17. P. Banerjee, J. T. Rahmed, C. Stunkel, V. S. Nair, K. Roy, V. Balasubramanian, and J.A. Abraham. Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor.IEEE Trans. Computers, 39(9): 1132–1144, 1990.

    Article  Google Scholar 

  18. A. R. Chowdhury and P. Banerjee. Algorithm-Based Fault Location and Recovery for Matrix Computations.In Proc. 24th Int’l Symp. Fault-Tolerant Computing. Pages 38–47, 1994.

  19. Y.-H. Choi and M. Malek, A Fault-Tolerant FFT processor.IEEE Trans. Computers, 37(5): 617–621, 1988.

    Article  Google Scholar 

  20. A. R. Chowdhury and P. Banerjee. Compiler-Assisted Generation of Error Detection Parallel Programs.In Proc. 26th Int’l Symp. Fault-Tolerant Computing. Pages 360–369, 1996.

  21. P. Banerjee and J. A. Abraham. Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems.IEEE Transactions on Computers, c-35(4): 296–306, 1986.

    Article  Google Scholar 

  22. B. Vinnakota and N. K. Jha. Design of Algorithm-Based Fault-Tolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis.IEEE Trans. Parallel and Distributed Systems, 5(10): 1099–1106, 1994

    Article  Google Scholar 

  23. S. Yajnik and N. K. Jha. Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems.IEEE Trans. Parallel and Distributed Systems, 8(2): 137–153, 1997.

    Article  Google Scholar 

  24. R. K. Sitaraman and N. K. Jha. Optimal Design of Checks for Error Detection and Location in Fault-Tolerant Multiprocessor Systems.IEEE Trans. Computers, 42(7): 780–793, 1993.

    Article  Google Scholar 

  25. J. Carreira, H. Madeira, and J. G. Silva. Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers.IEEE Trans. Software Eng., 24(2): 125–135, 1998.

    Article  Google Scholar 

  26. P. Duba and R. K. Iyer. Transient Fault Behavior in a Microprocessor: A Case Study.Presented at ICCD. Pages 272–276, 1988.

  27. ANSI/IEEE. IEEE Standard for Binary Floating-Point Arithmetic, 1985.

  28. Motorola. PowerPC 601 Risc Microprocessor user’s Manual, 1993.

  29. A. R. Chowdhury and P. Banerjee. Tolerance Determination for Algorithm-Based Checks Using Simplified Error Analysis Techniques.In Proc. 23rd Int’l Symp. Fault-Tolerant Computing. Pages 290–298, 1993.

  30. P. Prata. High Coverage Assertions. PhD Thesis, Universidade da Beira Interior, Portugal, 180 pages, September 2000.

    Google Scholar 

  31. D. Powell, M. Cukier, and J. Arlat. On Stratified Sampling for High Coverage Estimations.In Proc. 2nd European Dependable Computing Conference. Pages 37–54, 1996.

  32. J. H. Saltzer, D. P. Reed, and D. D. Clark. End-To-End Arguments in System Design.ACM Trans. Computer Systems, 2(4): 277–288, 1984.

    Article  Google Scholar 

  33. J. Cunha, R. Maia, M. Z. Rela, J. G. Silva. A Study of Failure Models in Feedback Control Systems.In Proc. The Int’l Conf. on Dependable Systems and Networks (DSN-2001). Pages 314–323, 2001.

  34. N. Oh, P. P. Shirvani and E. J. McCluskey. Control Flow Checking by Software Signatures.In IEEE Trans. on Reliability, 51(1), 2002.

  35. J. G. Silva, P. Prata, M. Z. Rela and H. Madeira. Practical Issues in the Use of ABFT and a New Failure Model.In Proc. 28th Int’l Symposium on Fault-Tolerant Computing. Pages 26–35, 1998.

  36. B. Randell. System Structure for Software Fault-Tolerance.IEEE Trans. Software Eng., SE-1(2): 220–232, 1975.

    Google Scholar 

  37. D. M. Andrews. Using Executable Assertions for Testing and Fault Tolerance.In Proc. 9th Int’l Symp. Fault-Tolerant Computing. Pages 102–105, 1979.

  38. B. McMillin and L. M. Ni. Executable Assertion Development for the Distributed Parallel Environment.In Proc. 12th Int’l COMPSAC. Pages 284–291, 1988.

  39. N. Leveson and T. J. Shimeall. Safety Assertions for Process-Control Systems.In Proc. 13th Int’l Symp. Fault-Tolerant Computing. Pages 236–240, 1983.

  40. A. Watanabe and K. Sakamura. Design Fault Tolerance in Operating Systems Based on a Standardization Project.In Proc. 25th Int’l Symp. on Fault-Tolerant Computing. Pages 372–380, 1995.

  41. N. P. Kropp, P. J. Koopman and D. P.Siewiorek. Automated Robustness Testing of Off-the-Shelf Software Components.In 28th Int’l Symposium on Fault-Tolerant Computing. Pages 230–239, 1998.

  42. R. A. Maxion and R. T. Olszewski. Improving Software Robustness with Dependability Cases.In 28th Int’l Symposium on Fault-Tolerant Computing. Pages 346–355, 1998.

  43. J. Pan, P. Koopman, D. Siewiorek, Y. Huang, R. Gruber and M. L. Jiang. Robustness Testing and Hardening of Corba ORB Implementations.In Int’l Conference on Dependable Systems and Networks. Pages 141–150, 2001.

  44. C. Fetzer and Z. Xiao. HEALERS: A Toolkit for Enhancing the Robustness and Security of Existing Applications.In Int’l Conference on Dependable Systems and Networks, 2003.

  45. N. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 688 pages, 1996.

  46. H. Wasserman and M. Blum. Software Reliability via Run-Time Result-Checking.Journal of the ACM, 44(6): 826–849, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  47. J. P. Vinter, J. Aidemark, P. Folkesson, and J. Karlsson. Reducing Critical Failures for Control Algorithms Using Executable Assertions and Best Effort Recovery.In Proc. Int’l Conf. on Dependable Systems and Networks.Pages 347–356, 2001.

  48. J. P. Vinter, A. Johansson, P. Folkesson, and J. Karlsson. On the Design of Robust Integrators for Fail-Bounded Control Systems.In Proc. Int’l Conf. on Dependable Systems and Networks. Pages 415–424, 2003.

  49. D. Powell. Failure Mode Assumptions and Assumption Coverage.In Proc. 22nd Int’l Symp. Fault-Tolerant Computing. Pages 386–395, 1992.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Prata, P., Rela, M., Madeira, H. et al. Robust assertions and fail-bounded behavior. J Braz Comp Soc 10, 18–30 (2004). https://doi.org/10.1007/BF03192363

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03192363

Keywords

  • Hardware faults
  • Error detection
  • ABFT
  • Robust assertions
  • Failure models
  • Fail-bounded