Open Access

Reliable management of checkpointing and application data in opportunistic grids

Journal of the Brazilian Computer Society201016:16

https://doi.org/10.1007/s13173-010-0016-0

Received: 22 January 2010

Accepted: 17 June 2010

Published: 28 July 2010

Abstract

Opportunistic computational grids use idle processor cycles from shared machines to enable the execution of long-running parallel applications. Besides computational power, these applications may also consume and generate large amounts of data, requiring an efficient data storage and management infrastructure. In this article, we present an integrated middleware infrastructure that enables the use of not only idle processor cycles, but also unused disk space of shared machines. Our middleware enables the reliable distributed storage of application data in the shared machines in a redundant and fault-tolerant way. A checkpointing-based mechanism monitors the execution of parallel applications, saves periodical checkpoints in the shared machines, and in case of node failures, supports the application migration across heterogeneous grid nodes. We evaluate the feasibility of our middleware using experiments and simulations. Our evaluation shows that the proposed middleware promotes important improvements in grid data management reliability while imposing a low performance overhead.

Keywords

Grid computing Distributed data storage Opportunistic grid Grid middleware

Notes