Item Details

Print View

Process State Capture and Recovery in High-Performance Heterogeneous Distributed Computing Systems

Ferrari, Adam John
Format
Thesis/Dissertation; Online
Author
Ferrari, Adam John
Advisor
Grimshaw, Andrew
Abstract
Process Introspection is a fundamentally new solution to the process state capture and recovery problem suitable for use in high-performance heterogeneous distributed systems. A process state capture and recovery mechanism for such an environment has the primary requirement that it must be platform-independent: process checkpoints produced on a computer system of one architecture or operating system platform must be recoverable on a computer system of a different architecture or operating system platform. The central feature of the Process Introspection approach is automatic transformation of program code to incorporate state capture and recovery functionality. This program modification is performed at a platform-independent intermediate level of code representation, and preserves the original program semantics. The attractive properties of this approach include portability, ease of use, and flexibility with respect to basic performance trade-offs and application-specific requirements. Our solution is novel in its true platform and run-time system independence—no system support or non-portable code is required by our core mechanisms. Experimental results obtained using a prototype implementation of the Process Introspection system indicate this mechanism can be applied to computationally demanding scientific applications automatically, resulting in very low run-time overhead (typically below 10%) and efficient state capture and recovery service.
Published
University of Virginia, Department of Computer Science, PhD, 1998
Published Date
1998-01-30
Degree
PhD
Collection
Libra ETD Repository
In CopyrightIn Copyright
▾See more
▴See less

Availability

Read Online