• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Adapting to Intermittent Faults in Multicore Systems

    Thumbnail
    File(s)
    TR1605.pdf (667.9Kb)
    Date
    2007
    Author
    Wells, Philip M.
    Chakraborty, Koushik
    Sohi, Gurindar S.
    Publisher
    University of Wisconsin-Madison Department of Computer Sciences
    Metadata
    Show full item record
    Abstract
    Future multicore processors will become more susceptible to a variety of hardware failures. In particular, intermittent faults, caused in part by manufacturing process variation or in-progress wear-out, can cause bursts of frequent faults that last from several cycles to several seconds or more. Cost-effective reliability to tolerate intermittent faults will likely require, or be greatly simplified by, the ability to temporarily suspend execution on a core during periods of frequent intermittent faults. We investigate three existing techniques for adapting to the dynamically changing resource availability caused by such core suspension, and demonstrate their different system-level implications. We show that system software reconfiguration has very high overhead for short intermittent faults, that temporarily pausing the execution of a faulty core can lead to cascading livelock, and that using spare cores has high fault-free cost. To remedy these and other drawbacks of current techniques, we propose using a thin hardware/firmware layer to manage an overcommitted system -- one where the OS is configured to use more virtual processors than the number of currently available physical cores. We show that this proposed technique can gracefully degrade performance during intermittent faults of various durations with low overhead, without involving system software, and without requiring spare cores.
    Permanent Link
    http://digital.library.wisc.edu/1793/60576
    Type
    Technical Report
    Citation
    TR1605
    Part of
    • CS Technical Reports

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback