Programming Heterogeneous Computers and Improving Inter-Node Communication Across Xeon Phis

File(s)
Date
2016-05-20Author
Feilbach, Chris
Sperling, Adam
Sifakis, Eftychios
Hill, Mark D.
Metadata
Show full item recordAbstract
Scientific computing workloads are well suited to parallel accelerators such as GPGPUs and the Intel Xeon Phi. While these accelerators can provide greater performance than traditional CPUs due to their parallel architectures and greater memory bandwidth, their maximum workload size is limited by relatively small memory capacity. To solve this problem, data can be split across multiple accelerators to utilize the combined memory capacity as well as increased compute capability. Combining multiple accelerators into heterogeneous systems introduces a new bottleneck. Communication bandwidth between accelerators over the PCIe interconnect is much slower than internal memory bandwidth. This project examines the inter-node bandwidth bottleneck using the Intel Xeon Phi in the context of scientific applications. We show the limitations of traditional MPI programming paradigms, and leverage Intel?s Xeon Phi-specific SCIF communication API to achieve increased inter-node memory bandwidth. While small messages still incur communication overhead penalties, messages larger than 512KB are able to saturate the PCIe bus and achieve bandwidth utilization close to 90% of the theoretical maximum. This project also attempts to address the complexities of programming systems of multiple accelerators. We introduce a software interface wrapper over SCIF that coalesces groups of small messages into larger ones. This new interface eases the programming experience and provides greater interconnect bandwidth from coalescing.
Subject
Bandwidth
Accelerators
Scientific Workloads
PCIE
SCIF
Xeon Phi
Permanent Link
http://digital.library.wisc.edu/1793/74898Citation
TR1834
Part of
Licensed under: