• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Serializing Instructions in System-Intensive Workloads: Amdahls Law Strikes Again

    Thumbnail
    File(s)
    TR1606.pdf (283.1Kb)
    Date
    2007
    Author
    Wells, Philip
    Sohi, Gurindar
    Publisher
    University of Wisconsin-Madison Department of Computer Sciences
    Metadata
    Show full item record
    Abstract
    To maintain a reasonable level of complexity, processor implementations contain Serializing Instructions (SIs) � instructions, such as those that write control registers, that cannot be executed out-of-order (OoO). Maintaining sequential semantics may force SIs to serialize the pipeline and execute as the only instruction in the window. We examine the frequency of SIs in three ISAs, SPARC V9, X86-64, and PowerPC, for several system-intensive workloads. Across ISAs, we observe 2�8 SIs per thousand instructions for most workloads. As explained by Amdahl�s Law, such frequent SIs, which create serial regions within the instruction-level parallel execution of a single thread, can have a significant impact on performance. For the SPARC ISA (after removing TLB and register window effects), we observe a 4�17% performance difference between a modest out-of-order processor and a hypothetical processor which idealizes serializing instructions. We examine the consumption of values produced by several SIs, and observe that most values are consumed, but that the values are Effectively Useless (EU) � i.e. they do not actually change the execution of the consuming instructions. To improve the performance of such SIs, we propose EU prediction, which can allow younger instructions to proceed, possibly reading a stale value, and yet still correctly execute. This simple technique improves the performance of five of our seven workloads by 8�12%.
    Permanent Link
    http://digital.library.wisc.edu/1793/60578
    Type
    Technical Report
    Citation
    TR1606
    Part of
    • CS Technical Reports

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback