High Performance Databases on Heterogeneous Processors

Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip. That opens up new opportunities for optimizing query co-processing. In this project, we have systematically evaluated those opportunities from two different aspects:


  • Fine-grained Co-Processing

    In the first study, propose a fine-grained design that is based on the shared main memory. As a case study, we experimentally revisit hash joins, one of the most important join algorithms for main memory databases, on a coupled CPU-GPU architecture. Particularly, we study the fine-grained co-processing mechanisms demonstrated as Figure 1 on hash joins with and without partitioning.

    The co-processing outlines an interesting design space. We extend existing cost models to automatically guide decisions on the design space. Our experimental results on a recent AMD APU show that (1) the coupled architecture enables fine-grained co-processing and cache reuses, which are inefficient on discrete CPU-GPU architectures; (2) the cost model can automatically guide the design and tuning knobs in the design space; (3) fine-grained co-processing achieves up to 53%, 35% and 28% performance improvement over CPU-only, GPU-only and conventional CPU-GPU co-processing, respectively. We believe that the insights and implications from this study are initial yet important for further research on query co-processing on coupled CPU-GPU architectures.


    Figure 1: Fine-grained co-processing algorithms on a series of steps

  • In-cache Query Co-Processing

    Though database operators with fine-grained operators can improve the utilization on both processors, there are still some performance pitfalls while deploying query processing workloads on such coupled architectures. On the one hand, the GPU in the coupled architecture is less powerful than those in discrete systems. On the other hand, tighter coupling of hardware components between two processors also reduces the bandwidth available to the GPU, resulting more memory stalls. As our experiments show, such memory stalls can severely hurt the performance of both processors, and the GPU can suffer more than the CPU does from such stalls.

    With those observations, we further propose a novel in-cache query co-processing paradigm for main memory databases on coupled CPU-GPU architectures. Figure 2 depicts the overview of our system design. Specifically, we abstract three common functional modules in the in-cache query co-processing paradigm: prefetching (P), decompression (D, optional), and the actual query execution (E). Each compute unit (CU) of either processor can work on any unit of P/D/E. These functional modules are scheduled by the workload scheduler to available CUs. The APU-aware cost model captures information of both hardware and input data to produce an optimal workload scheduling configuration. The experimental results on TPC-H have demonstrated a significant performance improvement over the state-of-the-art GPU co-processing paradigm.


    Figure 2: System design of in-cache query co-processing

Publications

  • Jiong He*, Mian Lu, Bingsheng He. Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture. Proceedings of the VLDB Endowment, Volume 6 Issue 10, August 2013, pages = {1--12}. [pdf]
  • Shuhao Zhang*, Jiong He*, Bingsheng He, Mian Lu. OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures. Proceedings of the VLDB Endowment, Volume 6 Issue 10, August 2013, pages = {1—4} (demo). [pdf]
  • Jiong He*, Shuhao Zhang*, Bingsheng He. In-Cache Query Co-Processing on Coupled CPU-GPU Architectures. Proceedings of the VLDB Endowment, Volume 8 Issue 4, September 2015, pages = {329--340}. [pdf]
  • Jiong He*, Bingsheng He, Mian Lu, Shuhao Zhang*. In-Memory Data Analytics on Coupled CPU-GPU Architectures. IEEE Micro Special Issue on Heterogeneous Computing (in submission). [pdf]

Software

OmniDB

Author

  • Jiong He

    Current PhD student of Xtra

Drupal 6 Appliance - Powered by TurnKey Linux