Not All Joules are Equal: Towards Energy-Efficient and Green-Aware Data Processing Frameworks
Interests have been growing in integrating renewable energy into data centers, which attracts many research efforts in developing green-aware algorithms and systems. Existing green-aware systems mainly addressed the intermittent and variable feature of renewable energy by delaying the workload execution. However, little attention was paid to the efficiency of each joule consumed by data center workloads. In fact, not all joules are equal in the sense that the amount of work that can be done by a joule can vary significantly in data centers. Ignoring this fact leads to significant energy waste (by 25% of the total energy consumption in Hadoop YARN on a Facebook production trace according to our study). In this paper, we investigate how to leverage such joule differentiations to maximize the benefits of renewable energy in the data center. Specifically, we consider data processing frameworks, where each job has a predefined deadline, and propose GreenMR, an energy-efficient and green-aware MapReduce framework. We develop job/task scheduling algorithms with a particular focus on the factors on joule differentiations in the data center, including the energy efficiency of MapReduce workloads, renewable energy supply and the battery usage. We further develop a simple yet effective performance-energy consumption model to guide our scheduling decisions. We have implemented GreenMR on top of Hadoop YARN. The experiments demonstrate the accuracy of our models, and the effectiveness of our energy-efficient and green-aware optimizations over Hadoop YARN and a state-of-the-art green-aware Hadoop YARN implementation.