Skip to main content

Table 1 Proposals and main strategies

From: Running resilient MPI applications on a Dynamic Group of Recommended Processes

Related work

Main strategies adopted

FT-MPI (fault-tolerant MPI) [21], Non-Stop and Fault-Resilient MPI (NR-MPI) [23], Run-Through Stabilization (RTS) [25], User Level Failure Mitigation (ULFM) [9], Consensus Protocol [26–28], Adaptive MPI (AMPI) [37]

Primitives for dealing with fault tolerance at the application level

Fenix [31, 32]

Checkpoint-restart at the application level

Dealing with process faults using ABFT [30]

Algorithm-Based Fault Tolerance (ABFT)

Ferreira et al. [34], P2P-MPI [35], Fiala, et al. [36], Silent error [36]

State-machine replication

Gioiosa et al. [5], Aguilar et al. [40], TAUoverSupermon [42]

Monitoring system for performance

DGRP

Monitoring system that recommends a group of processes to run an application