The Interaction of Failure and Performance in a Migratory File Service
File(s)
Date
2003Author
Bent, John
Thain, Douglas
Arpaci-Dusseau, Andrea
Arpaci-Dusseau, Remzi
Livny, Miron
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Metadata
Show full item recordAbstract
We present the design, implemetitation, and evaluation of a Migratory File Service (MFS), a system designed to exploit semantic knowledge of workloads and user expectations to improve performance and handle failures effectively in wide-area batch scheduling systems. We discuss Hawk, a prototype MFS system
which has two novel components: migratory proxies, which cache data at remote clusters, and a workflow manager, which manages the workflow of the system. Hawk integrates aggressive caching and I/O filtering to reduce wide-area traffic, proactively replicates data to avoid regeneration due to failure, and performs
fine-grained rollback and recovery to minimize the effort required to recover from failure. Through a case study of data-intensive applications, we demonstrate the benefits of Hawk over traditional approaches, delivering a two to three orders of magnitude increase in performance for jobs that are deployed across a wide-area batch scheduling environment.
Permanent Link
http://digital.library.wisc.edu/1793/60348Citation
TR1475