Map Reduce Data Types and Formats
Map : (K1,V1) -> list(K2,V2)
Reduce : (K1,list(V2)) -> list(K2,V3)
* The Map input key and value types ( K1,V1 ) are different from
the Map output types of (K2,V2)
* However , Reduce input takes in K1 and list values of V2
(which is different in format from that of the Map input over Key1 and associated
value V1) . And yet again the output for the Reduce process is a list of the
key-value pair (K2 and V3) which is again different from that of Reduce
Operations .
* MapReduce can process many different types of data formats
which may range from text file formats to databases .
* An "input split" is a chunk of the input that is processed by a single Map function .
* Each Map process processes a single split where each of the
split is divided into records and the map function processes each record in the
form of a key-value pair
* Splits and Records are a part of logical processing of the
records and even maps to a full file / part of a file / collection of different
files etc
* In a database context , a split corresponds to a range of rows
from a table / record .
No comments:
Post a Comment