Home

Self-Adaptive Reduce Task Scheduling

Through analyzing the MapReduce scheduling mechanism, this project illustrates the reasons of system slot resource wasting which results in reduce tasks waiting around, and it proposes the development of a method detailing the start times of reduce tasks dynamically according to each job context, including the task completion time and the size of map output. There is no doubt that the use of this method will decrease the reduce completion time and system average response time in Hadoop platforms. more>>

Reduce Placement

Current Hadoop schedulers often lack of data locality consideration. As a result, unnecessary data might get shuffled in the network causing performance degradation. This project addresses several optimizing algorithms to solve the problem of reduce placement. We make a Hadoop reduce task scheduler aware of partitions’ network locations and sizes in order to mitigate network traffic and improve the performance of Hadoop. more>>

Named Entity Recognition in Biomedical Big Data Mining

A parallel biomedical data processing model using the MapReduce framework is presented as an application of the proposed methods. As USA proposed the human genome project (HGP), biomedical big data shows its unique position among the academics. A widely used CRFs model and an efficient Hadoop-based method, Bio-NER, have been introduced to explore the information and knowledge under the biomedical big data. more>>