- Open Access
Robust assertions and fail-bounded behavior
Journal of the Brazilian Computer Society volume 10, pages 18–30 (2004)
In this paper the behavior of assertion-based error detection mechanisms is characterized under faults injected according to a quite general fault model. Assertions based on the knowledge of the application can be very effective at detecting corruption of critical data caused by hardware faults. The main drawbacks of that approach are identified as being the lack of protection of data outside the section covered by assertions, namely during input and output, and the possible incorrect execution of the assertions.
To handle those weak-points the Robust Assertions technique is proposed, whose effectiveness is shown by extensive fault injection experiments. With this technique a system follows a new failure model, that is called Fail-Bounded, where with high probability all results produced are either correct or, if wrong, they are within a certain bound of the correct value, whose exact distance depends on the output assertions used.
Any kind of assertions can be considered, from simple likelihood tests to high coverage assertions such as those used in the Algorithm Based Fault Tolerance paradigm. We claim that this failure model is very useful to describe the behavior of many low-cost fault-tolerant systems, that have low hardware and software redundancy, like embedded systems, were cost is a severe restriction, yet full availability is expected.
A. Campbell, P. McDonald and K. Ray. Single Event Upset Rates in Space. IEEE Trans. on Nuclear Science, 39(6): 1828–1835, 1992.
C. Constantinescu. Impact of Deep Submicron Technology on Dependability of VLSI Circuits. In Proc. Int’l Conf. on Dependable Systems and Networks. Pages 205–209, 2002.
P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger and L. Alvisi. Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic.In Proc. Int’l Conf. on Dependable Systems and Networks. Pages 389–398, 2002.
D. Powell, P. Veríssimo, G. Bonn, F. Waeselynck, and D. Seaton. The Delta-4 Approach to Dependability in Open Distributed Computing Systems.In Proc. 18th Int’l Symp. Fault-Tolerant Computing. Pages 246–251, 1988.
H. Madeira and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior in Computers Without Error Masking.In Proc. 24th Int’l Symp. Fault Tolerant Computing Systems. Pages 350–359, 1994.
M. Z. Rela, H. Madeira, and J. G. Silva. Experimental Evaluation of the Fail-Silent Behavior of Programs with Consistency Checks.In Proc. 26th Int’l Symp. Fault-Tolerant Computing. Pages 394–403, 1996.
J. G. Silva, J. Carreira, H. Madeira, D. Costa, and F. Moreira. Experimental Assessment of Parallel Systems.In Proc. 26th Int’l Symp. Fault-Tolerant Computing. Pages 415–424, 1996.
A. Mahmood and E. J. McCluskey. Concurrent Error Detection Using Watchdog Processors — A Survey.IEEE Trans. Computers, 37(2): 160–174, 1988.
K. Wilken and J. P. Shen. Continous Monitoring: Low-Cost Concurrent Detection of Processor Control Errors.IEEE Trans. on Computer-Aided Design, 9(6): 629–641, 1990.
Z. Alkhalifa, V. S. S. Nair, N. Krishnamurthy, J. A. Abraham. Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection.IEEE Trans. on Parallel and Distributed Systems, 10(6): 627–641, 1999.
G. Miremadi, J. Ohlsson, M. Rimen, and J. Karlsson. Use of Time and Address Signatures for Control Flow Checking.5th Int’l IFIP Conference on Dependable Computing for Critical Applications (DCCA-5), ISBN 0-8186-7803-8, IEEE Computer Society Press, February 1998.
A. Steininger and C. Scherrer. On Finding An Optimal Combination Of Error Detection Mechanisms Based On Results Of Fault Injection Experiments.In Proc. of the 27th Int’l Symp. on Fault-Tolerant Computing, IEEE Computer Society Press, 1997.
A. Mahmood, E. J. McCluskey, and D. J. Lu. Concurrent Fault Detection Using a Watchdog Processor and Assertions.In Proc. Int’l Test Conference. Pages 622–628, 1983.
M. Turmon, R. Granat, and D. S. Katz. Software-Implemented Fault Detection for High-Performance Space Applications.In Proc. 30th Int’l Conf. on Dependable Systems and Networks (FTCS-30 & DCCA-8). Pages 107–116, 2000.
K.-H. Huang and J. A. Abraham. Algorithm-Based Fault Tolerance for Matrix Operations.IEEE Trans. Computers, c-33(6):518–528, 1984.
L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem.ACM Trans. Prog. Lang. Syst. 4(3):382–401, 1982.
P. Banerjee, J. T. Rahmed, C. Stunkel, V. S. Nair, K. Roy, V. Balasubramanian, and J.A. Abraham. Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor.IEEE Trans. Computers, 39(9): 1132–1144, 1990.
A. R. Chowdhury and P. Banerjee. Algorithm-Based Fault Location and Recovery for Matrix Computations.In Proc. 24th Int’l Symp. Fault-Tolerant Computing. Pages 38–47, 1994.
Y.-H. Choi and M. Malek, A Fault-Tolerant FFT processor.IEEE Trans. Computers, 37(5): 617–621, 1988.
A. R. Chowdhury and P. Banerjee. Compiler-Assisted Generation of Error Detection Parallel Programs.In Proc. 26th Int’l Symp. Fault-Tolerant Computing. Pages 360–369, 1996.
P. Banerjee and J. A. Abraham. Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems.IEEE Transactions on Computers, c-35(4): 296–306, 1986.
B. Vinnakota and N. K. Jha. Design of Algorithm-Based Fault-Tolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis.IEEE Trans. Parallel and Distributed Systems, 5(10): 1099–1106, 1994
S. Yajnik and N. K. Jha. Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems.IEEE Trans. Parallel and Distributed Systems, 8(2): 137–153, 1997.
R. K. Sitaraman and N. K. Jha. Optimal Design of Checks for Error Detection and Location in Fault-Tolerant Multiprocessor Systems.IEEE Trans. Computers, 42(7): 780–793, 1993.
J. Carreira, H. Madeira, and J. G. Silva. Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers.IEEE Trans. Software Eng., 24(2): 125–135, 1998.
P. Duba and R. K. Iyer. Transient Fault Behavior in a Microprocessor: A Case Study.Presented at ICCD. Pages 272–276, 1988.
ANSI/IEEE. IEEE Standard for Binary Floating-Point Arithmetic, 1985.
Motorola. PowerPC 601 Risc Microprocessor user’s Manual, 1993.
A. R. Chowdhury and P. Banerjee. Tolerance Determination for Algorithm-Based Checks Using Simplified Error Analysis Techniques.In Proc. 23rd Int’l Symp. Fault-Tolerant Computing. Pages 290–298, 1993.
P. Prata. High Coverage Assertions. PhD Thesis, Universidade da Beira Interior, Portugal, 180 pages, September 2000.
D. Powell, M. Cukier, and J. Arlat. On Stratified Sampling for High Coverage Estimations.In Proc. 2nd European Dependable Computing Conference. Pages 37–54, 1996.
J. H. Saltzer, D. P. Reed, and D. D. Clark. End-To-End Arguments in System Design.ACM Trans. Computer Systems, 2(4): 277–288, 1984.
J. Cunha, R. Maia, M. Z. Rela, J. G. Silva. A Study of Failure Models in Feedback Control Systems.In Proc. The Int’l Conf. on Dependable Systems and Networks (DSN-2001). Pages 314–323, 2001.
N. Oh, P. P. Shirvani and E. J. McCluskey. Control Flow Checking by Software Signatures.In IEEE Trans. on Reliability, 51(1), 2002.
J. G. Silva, P. Prata, M. Z. Rela and H. Madeira. Practical Issues in the Use of ABFT and a New Failure Model.In Proc. 28th Int’l Symposium on Fault-Tolerant Computing. Pages 26–35, 1998.
B. Randell. System Structure for Software Fault-Tolerance.IEEE Trans. Software Eng., SE-1(2): 220–232, 1975.
D. M. Andrews. Using Executable Assertions for Testing and Fault Tolerance.In Proc. 9th Int’l Symp. Fault-Tolerant Computing. Pages 102–105, 1979.
B. McMillin and L. M. Ni. Executable Assertion Development for the Distributed Parallel Environment.In Proc. 12th Int’l COMPSAC. Pages 284–291, 1988.
N. Leveson and T. J. Shimeall. Safety Assertions for Process-Control Systems.In Proc. 13th Int’l Symp. Fault-Tolerant Computing. Pages 236–240, 1983.
A. Watanabe and K. Sakamura. Design Fault Tolerance in Operating Systems Based on a Standardization Project.In Proc. 25th Int’l Symp. on Fault-Tolerant Computing. Pages 372–380, 1995.
N. P. Kropp, P. J. Koopman and D. P.Siewiorek. Automated Robustness Testing of Off-the-Shelf Software Components.In 28th Int’l Symposium on Fault-Tolerant Computing. Pages 230–239, 1998.
R. A. Maxion and R. T. Olszewski. Improving Software Robustness with Dependability Cases.In 28th Int’l Symposium on Fault-Tolerant Computing. Pages 346–355, 1998.
J. Pan, P. Koopman, D. Siewiorek, Y. Huang, R. Gruber and M. L. Jiang. Robustness Testing and Hardening of Corba ORB Implementations.In Int’l Conference on Dependable Systems and Networks. Pages 141–150, 2001.
C. Fetzer and Z. Xiao. HEALERS: A Toolkit for Enhancing the Robustness and Security of Existing Applications.In Int’l Conference on Dependable Systems and Networks, 2003.
N. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 688 pages, 1996.
H. Wasserman and M. Blum. Software Reliability via Run-Time Result-Checking.Journal of the ACM, 44(6): 826–849, 1997.
J. P. Vinter, J. Aidemark, P. Folkesson, and J. Karlsson. Reducing Critical Failures for Control Algorithms Using Executable Assertions and Best Effort Recovery.In Proc. Int’l Conf. on Dependable Systems and Networks.Pages 347–356, 2001.
J. P. Vinter, A. Johansson, P. Folkesson, and J. Karlsson. On the Design of Robust Integrators for Fail-Bounded Control Systems.In Proc. Int’l Conf. on Dependable Systems and Networks. Pages 415–424, 2003.
D. Powell. Failure Mode Assumptions and Assumption Coverage.In Proc. 22nd Int’l Symp. Fault-Tolerant Computing. Pages 386–395, 1992.
About this article
Cite this article
Prata, P., Rela, M., Madeira, H. et al. Robust assertions and fail-bounded behavior. J Braz Comp Soc 10, 18–30 (2004). https://doi.org/10.1007/BF03192363
- Hardware faults
- Error detection
- Robust assertions
- Failure models