Show simple item record

dc.contributor.authorYoon, Hongil
dc.contributor.authorLowe-Power, Jason
dc.contributor.authorSohi, Gurindar S.
dc.date.accessioned2016-12-07T15:39:34Z
dc.date.available2016-12-05T15:39:34Z
dc.date.issued2016-12-05T15:39:34Z
dc.identifier.citationTR1842
dc.identifier.urihttp://digital.library.wisc.edu/1793/75577
dc.description.abstractHeterogeneous computing on tightly-integrated CPU-GPU systems is ubiquitous, and to increase programmability, many of these systems support virtual address accesses from GPU hardware. However, there is no free lunch. Supporting virtual memory entails address translations on every memory access, which greatly impacts performance (about 77% performance degradation on average). To mitigate this overhead, we propose a software-transparent, practical GPU virtual cache hierarchy. We show that a virtual cache hierarchy is an effective GPU address translation bandwidth filter. We make several empirical observations advocating for GPU virtual caches: (1) mirroring CPU-style memory management unit in GPUs is not effective, because GPU workloads show very high Translation Lookaside Buffer (TLB) miss ratio and high miss bandwidth. (2) many requests that miss in TLBs find corresponding valid data in the GPU cache hierarchy. (3) The GPU’s accelerator nature simplifies implementing a deep virtual cache hierarchy (i.e., fewer virtual address synonyms and homonyms). We evaluate both L1-only virtual cache designs and an entire virtual cache hierarchy (private L1s and a shared L2 caches). We find that virtual caching on GPUs considerably improves performance. Our experimental evaluation shows that the proposed entire GPU virtual cache design significantly reduces the overheads of virtual address translation providing an average speedup of 1.77x over a baseline physically cached system. L1-only virtual cache designs show modest performance benefits (1.35x speedup). By using a whole GPU virtual cache hierarchy, we can obtain additional performance benefits.en
dc.subjectVirtual Cachingen
dc.subjectTLBsen
dc.subjectVirtually indexed virtually tagged cachesen
dc.subjectSynonymsen
dc.subjectGPUen
dc.subjectAddress Translationen
dc.subjectGPU Virtual Cache Hierarchyen
dc.titleReducing GPU Address Translation Overhead with Virtual Cachingen
dc.typeTechnical Reporten


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • CS Technical Reports
    Technical Reports Archive for the Department of Computer Sciences at the University of Wisconsin-Madison

Show simple item record