firstField

SARS Algorithm Introduction

Hadoop presents MapReduce as an analytics engine, and under the hood it uses a distributed storage layer referred to as the Hadoop distributed file system (HDFS). As an open source implementation of MapReduce, Hadoop is so far one of the most successful realizations of large-scale data-intensive cloud computing platforms. It has been realized that when and where to start the reduce tasks are the key problems to enhance the MapReduce performance.
For time scheduling in MapReduce, the existing work may result in a block of reduce tasks. Especially, when the map tasks’ output is large, the performance of a MapReduce task scheduling algorithm will be influenced seriously. Through analysis for the current MapReduce scheduling mechanism, this research poit illustrates the reasons of system slot resource wasting, which results in reduce tasks waiting around. Then, the research poit proposes a self-adaptive reduce task scheduling policy (SARS Algorithm) for reduce tasks’ start times in the Hadoop platform. It can decide the start time point of each reduce task dynamically according to each job context, including the task completion time and the size of map output.

Problem Analysis

Hadoop allows the user to configure the job, submit it, control its execution, and query the state. Every job consists of independent tasks, and each task needs to have a system slot to run. Fig.2 shows the time delay and slot resources waste problem in reduce task scheduling. Through Fig.2(a), we can know that Job1 and Job2 are the current running jobs, and at the initial time, each job is allocated two map slots to run respective tasks. Since the execution time of each task is not the same, as shown in Fig.2(a), the Job2 finishes its map tasks at time t2. Because the reduce tasks will begin once any map task finishes, from the duration t1 to t2, there are two reduce tasks from Job1 and Job2 which are running respectively. As indicated in Fig.2(b), at time t3, when all the reduce tasks of Job2 are finished, two new reduce tasks from Job1 are started. Now all the reduce slot resources are taken up by Job1. As shown in Fig.2(c), at the moment t4, when Job3 starts, two idle map slots can be assigned to it, and the reduce tasks from this job will then start. However, we can find that all reduce slots are already occupied by Job1, and the reduce tasks from Job3 have to wait for slot release.
The root cause of this problem is that reduce task of Job3 must wait for all the reduce tasks of Job1 to be completed, as Job1 takes up all the reduce slots and Hadoop system does not support preemptive action acquiescently. In early algorithm design, a reduce task can be scheduled once any map tasks are finished. One of the benefits is that the reduce tasks can copy the output of the map tasks as soon as possible. But reduce tasks will have to wait before all map tasks are finished, and the pending tasks will always occupy the slot resources, so that other jobs which finish the map tasks cannot start the reduce tasks. All in all, this will result in long waiting of reduce tasks, and greatly increase the delay of Hadoop jobs.
In practical applications, a shared cluster environment often has different jobs in running which are from multiple users at the same time. If the above similar situation appears among the different users at the same time, and the reduce slot resources are occupied for a long time, the submitted jobs from other users will not be pushed ahead until the slots are released. Such inefficiency will extend the average response time of a Hadoop system, lower the resource utilization rate, and affect the throughput of a Hadoop cluster.