On-Chip Storage Optimization for GPUs

Monday, 20th January 2014, 10:00 am (PDCC Meeting Room)
Speaker: Dr. Yun (Eric) Liang (Peking University, China)
Title: On-Chip Storage Optimization for GPUs

Graphics Processing Units (GPUs) have become ubiquitous for general purpose applications due to their tremendous computing power. This talk focuses on the on-chip storage optimization for caches and registers on GPUs. Initially, GPUs only employ scratchpad memory as on-chip memory. Though scratchpad memory benefits many applications, it is not ideal for those general purpose applications with irregular memory accesses. Hence, GPU vendors have introduced caches in conjunction with scratchpad memory in the recent generations of GPUs. The caches on GPUs are highly-configurable. The programmer or the compiler can explicitly control cache access or bypass for global load instructions. This highly-configurable feature of GPU caches opens up the opportunities for optimizing the cache performance. In this talk, I will present an efficient compiler framework for cache bypassing on GPUs. The compiler framework can selectively filter cache accesses through bypassing and improve the overall performance. Then, I will present a thread structure and register joint optimization for GPUs. GPUs have large on-chip register files. Register allocation per thread affects both thread level parallelism (TLP) and instruction level parallelism (ILP). The proposed optimization improves the overall performance through balancing the TLP and ILP.


Yun (Eric) Liang is currently an assistant professor in School of EECS at Peking University, China. Before joining Peking University, he was a Research Scientist in Advanced Digital Science Center, University of Illinois at Champaign Urbana. He received the B.S degree from Tongji University, Shanghai, and the Ph.D degree in computer science from National University Singapore. He has published about 30 research papers in the top conferences and journals on embedded system, computer architecture, real-time system, and hardware. His work has received the Best Paper Award of FCCM 2011 and Best Paper Award nominations from DAC 2012, FPT 2011, and CODES+ISSS 2008