Computer Science, asked by ronaldoinam2604, 1 year ago

Complexity of execution of map function over reduce function

Answers

Answered by AyushAnand1111

Map-Reduce is commonly used to refer to both a programming model for Bulk Synchronous Parallel Pro-
cessing [7], as well as a computational infrastructure for implementing this programming model. From the
infrastructure point of view, a Map-Reduce job has three phases:
While many good descriptions of Map-Reduce exist [3, 5], we still would like to present a description
since one of the phases (shuffle) is typically given less attention, and this phase is going to be crucial in our
complexity measures and in the distinction that we draw with PRAM.
Map: In this phase, a User Defined Function (UDF), also called Map, is executed on each record in a given
file. The file is typically striped across many computers, and many processes (called Mappers) work
on the file in parallel. The output of each call to Map is a list of hKEY, VALUEi pairs.

Reduce: In this phase, a UDF, also called Reduce, is applied to each Reduce Record, often by many parallel
processes. Each process is called a Reducer. For each invocation of Reduce, one or more records may
get written into a local output file.
The reduce phase starts after all the Mappers have finished, and hence, this model is an example of
Bulk Synchronous Processing (BSP). The shuffle phase is typically implemented by writing all the data that
comes to a destination computer to disk. The task of separating out the data into different Reduce Records
on each destination computer is also done off of disk. We are going to assume that the total amount of work
done in the shuffle phase is proportional only to the size of the data being shuffled, both overall as well as
for any one destination computer.

Previous Question

Next Question