Dr SK Moulali
Keywords: Map-Reduce, Divisible Load Theory, Fault Tolerance, Straggler, Checkpoint.
Abstract
Map-reduce is a programming model used for processing data intensive applications. More than ten thousand distinct Map-reduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand Map-reduce jobs are executed on Google’s clusters every day, processing a total of more than twenty petabytes of data per day. Divisible Load Theory(DLT) is applied to the existing Map-reduce system to increase the efficiency of the system. In this work, a new fault tolerant Map-reduce system is developed which is applicable to static type of scheduling(DLT). This paper proposes a new algorithm –Two Level Fault Tolerant Partitioning (TFTP) which identifies the faulty processor and re-executes the data by scheduling it to the straggler processors. This algorithm mainly ensures the completion of the job at the nearest estimated time. And Checkpointing along with TFTP reduces the redundancy in the system by re-executing the jobs from a saved state rather than re-executing it from the beginning. There is some overhead in checkpointing the data. By having a tradeoff between the Fault Probability and checkpointing interval, the efficiency of the system is improved by the proposed method.


