• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A Scalable Failure Recovery Model for Tree-based Overlay Networks

    Thumbnail
    File(s)
    TR1626.pdf (220.0Kb)
    Date
    2008
    Author
    Arnold, Dorian
    Miller, Barton P.
    Publisher
    University of Wisconsin-Madison Department of Computer Sciences
    Metadata
    Show full item record
    Abstract
    We present a scalable failure recovery model for data aggregations in large scale tree-based overlay networks (TBONs). A TBON is a network of hierarchically organized processes that exploits the logarithmic scaling properties of trees to provide scalable data multicast, gather, and in-network aggregation. TBONs are commonly used in debugging and performance tools, system monitoring, information management systems, stream processing, and mobile ad hoc networks. Our recovery model leverages inherent information redundancies in TBON computations. This redundant information is gathered from non-failed processes to compensate for computation and communication state lost due to failures. This state compensation strategy is attractive because: (1) it avoids the time and resource overheads of previous reliability approaches, which rely on explicit replication; (2) recovery is rapid and only involves a small subset of the network; and (3) it applies to many useful, complex computations. In this paper, we formalize the TBON model and its fundamental properties to prove that our state compensation model properly preserves computational semantics across TBON process failures. These properties lead to an efficient implementation of state compensation, which we use to empirically validate and evaluate recovery performance. We show that state compensation can recover from failures in extremely large TBONs in milliseconds rendering practically no application service interruption.
    Permanent Link
    http://digital.library.wisc.edu/1793/60616
    Type
    Technical Report
    Citation
    TR1626
    Part of
    • CS Technical Reports

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback