Table 1 Proposals and main strategies

From: Running resilient MPI applications on a Dynamic Group of Recommended Processes

Related work Main strategies adopted
FT-MPI (fault-tolerant MPI) [21], Non-Stop and Fault-Resilient MPI (NR-MPI) [23], Run-Through Stabilization (RTS) [25], User Level Failure Mitigation (ULFM) [9], Consensus Protocol [2628], Adaptive MPI (AMPI) [37] Primitives for dealing with fault tolerance at the application level
Fenix [31, 32] Checkpoint-restart at the application level
Dealing with process faults using ABFT [30] Algorithm-Based Fault Tolerance (ABFT)
Ferreira et al. [34], P2P-MPI [35], Fiala, et al. [36], Silent error [36] State-machine replication
Gioiosa et al. [5], Aguilar et al. [40], TAUoverSupermon [42] Monitoring system for performance
DGRP Monitoring system that recommends a group of processes to run an application