Thursday, April 8, 2021

Map Reduce Data Types and Formats ( A short explanatory Article )


 

Map Reduce Data Types and Formats


 * MapReduce 's model of data processing which includes the following components : inputs and outputs as the Map Reduce Functions consist of Key-Value pairs


 * The Map and Reduce functions in Hadoop MapReduce have the following general form which is an accepted mode of representation

 

Map : (K1,V1) -> list(K2,V2)

Reduce : (K1,list(V2)) -> list(K2,V3)

 

* The Map input key and value types ( K1,V1 ) are different from the Map output types of (K2,V2)

 

* However , Reduce input takes in K1 and list values of V2 (which is different in format from that of the Map input over Key1 and associated value V1) . And yet again the output for the Reduce process is a list of the key-value pair (K2 and V3) which is again different from that of Reduce Operations .

 

* MapReduce can process many different types of data formats which may range from text file formats to databases .


* An "input split" is a chunk of the input that is processed by a single Map function .

 

* Each Map process processes a single split where each of the split is divided into records and the map function processes each record in the form of a key-value pair

 

* Splits and Records are a part of logical processing of the records and even maps to a full file / part of a file / collection of different files etc

 

* In a database context , a split corresponds to a range of rows from a table / record .

 

No comments:

Post a Comment