• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Reducing GPU Address Translation Overhead with Virtual Caching

    Thumbnail
    File(s)
    TR1842 (1.304Mb)
    Date
    2016-12-05
    Author
    Yoon, Hongil
    Lowe-Power, Jason
    Sohi, Gurindar S.
    Metadata
    Show full item record
    Abstract
    Heterogeneous computing on tightly-integrated CPU-GPU systems is ubiquitous, and to increase programmability, many of these systems support virtual address accesses from GPU hardware. However, there is no free lunch. Supporting virtual memory entails address translations on every memory access, which greatly impacts performance (about 77% performance degradation on average). To mitigate this overhead, we propose a software-transparent, practical GPU virtual cache hierarchy. We show that a virtual cache hierarchy is an effective GPU address translation bandwidth filter. We make several empirical observations advocating for GPU virtual caches: (1) mirroring CPU-style memory management unit in GPUs is not effective, because GPU workloads show very high Translation Lookaside Buffer (TLB) miss ratio and high miss bandwidth. (2) many requests that miss in TLBs find corresponding valid data in the GPU cache hierarchy. (3) The GPU’s accelerator nature simplifies implementing a deep virtual cache hierarchy (i.e., fewer virtual address synonyms and homonyms). We evaluate both L1-only virtual cache designs and an entire virtual cache hierarchy (private L1s and a shared L2 caches). We find that virtual caching on GPUs considerably improves performance. Our experimental evaluation shows that the proposed entire GPU virtual cache design significantly reduces the overheads of virtual address translation providing an average speedup of 1.77x over a baseline physically cached system. L1-only virtual cache designs show modest performance benefits (1.35x speedup). By using a whole GPU virtual cache hierarchy, we can obtain additional performance benefits.
    Subject
    Virtual Caching
    TLBs
    Virtually indexed virtually tagged caches
    Synonyms
    GPU
    Address Translation
    GPU Virtual Cache Hierarchy
    Permanent Link
    http://digital.library.wisc.edu/1793/75577
    Type
    Technical Report
    Citation
    TR1842
    Part of
    • CS Technical Reports

    Related items

    Showing items related by title, author, creator and subject.

    • Pedagogical Approaches in the Virtual Beginning Orchestra Classroom: Best Digital Resources for the Beginning Virtual Orchestra Classroom 

      Scheidedgger, Emily Lynn (College of Fine Arts and Communication, University of Wisconsin - Stevens Point, 2022-08)
      This study aims to answer the research question: what are the best digital tools for teaching beginning string players in a fully virtual format? During the COVID-19 pandemic, educators across the world found themselves ...
    • Pedagogical Approaches in the Virtual Beginning Orchestra Classroom: Best Digital Resources for the Beginning Virtual Orchestra Clasroom 

      Scheidegger, Emily (Active & Integrative Music Education (AIME) journal, 2022)
    • A comparison of the physiological and psychological responses to exercise on a virtual reality recumbent cycle versus a non-virtual reality recumbent cycle 

      Maldari, Monica Marisa (1997-08)

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback