Data Science and AI Quest: Map Reduce Data Types and Formats ( A short explanatory Article )

Thursday, April 8, 2021

Map Reduce Data Types and Formats ( A short explanatory Article )

Map Reduce Data Types and Formats

* MapReduce 's model of data processing which includes the following components : inputs and outputs as the Map Reduce Functions consist of Key-Value pairs

* The Map and Reduce functions in Hadoop MapReduce have the following general form which is an accepted mode of representation

Map : (K1,V1) -> list(K2,V2)

Reduce : (K1,list(V2)) -> list(K2,V3)

* The Map input key and value types ( K1,V1 ) are different from the Map output types of (K2,V2)

* However , Reduce input takes in K1 and list values of V2 (which is different in format from that of the Map input over Key1 and associated value V1) . And yet again the output for the Reduce process is a list of the key-value pair (K2 and V3) which is again different from that of Reduce Operations .

* MapReduce can process many different types of data formats which may range from text file formats to databases .

* An "input split" is a chunk of the input that is processed by a single Map function .

* Each Map process processes a single split where each of the split is divided into records and the map function processes each record in the form of a key-value pair

* Splits and Records are a part of logical processing of the records and even maps to a full file / part of a file / collection of different files etc

* In a database context , a split corresponds to a range of rows from a table / record .

Data Science and AI Quest

Thursday, April 8, 2021

Map Reduce Data Types and Formats ( A short explanatory Article )

No comments:

Post a Comment

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note