Reducing GPU Address Translation Overhead with Virtual Caching
dc.contributor.author | Yoon, Hongil | |
dc.contributor.author | Lowe-Power, Jason | |
dc.contributor.author | Sohi, Gurindar S. | |
dc.date.accessioned | 2016-12-07T15:39:34Z | |
dc.date.available | 2016-12-05T15:39:34Z | |
dc.date.issued | 2016-12-05T15:39:34Z | |
dc.identifier.citation | TR1842 | |
dc.identifier.uri | http://digital.library.wisc.edu/1793/75577 | |
dc.description.abstract | Heterogeneous computing on tightly-integrated CPU-GPU systems is ubiquitous, and to increase programmability, many of these systems support virtual address accesses from GPU hardware. However, there is no free lunch. Supporting virtual memory entails address translations on every memory access, which greatly impacts performance (about 77% performance degradation on average). To mitigate this overhead, we propose a software-transparent, practical GPU virtual cache hierarchy. We show that a virtual cache hierarchy is an effective GPU address translation bandwidth filter. We make several empirical observations advocating for GPU virtual caches: (1) mirroring CPU-style memory management unit in GPUs is not effective, because GPU workloads show very high Translation Lookaside Buffer (TLB) miss ratio and high miss bandwidth. (2) many requests that miss in TLBs find corresponding valid data in the GPU cache hierarchy. (3) The GPU’s accelerator nature simplifies implementing a deep virtual cache hierarchy (i.e., fewer virtual address synonyms and homonyms). We evaluate both L1-only virtual cache designs and an entire virtual cache hierarchy (private L1s and a shared L2 caches). We find that virtual caching on GPUs considerably improves performance. Our experimental evaluation shows that the proposed entire GPU virtual cache design significantly reduces the overheads of virtual address translation providing an average speedup of 1.77x over a baseline physically cached system. L1-only virtual cache designs show modest performance benefits (1.35x speedup). By using a whole GPU virtual cache hierarchy, we can obtain additional performance benefits. | en |
dc.subject | Virtual Caching | en |
dc.subject | TLBs | en |
dc.subject | Virtually indexed virtually tagged caches | en |
dc.subject | Synonyms | en |
dc.subject | GPU | en |
dc.subject | Address Translation | en |
dc.subject | GPU Virtual Cache Hierarchy | en |
dc.title | Reducing GPU Address Translation Overhead with Virtual Caching | en |
dc.type | Technical Report | en |
Files in this item
This item appears in the following Collection(s)
-
CS Technical Reports
Technical Reports Archive for the Department of Computer Sciences at the University of Wisconsin-Madison