Data Science and AI Quest: Writing MapReduce Programming - a descriptive guide with example on writing of a sample maprReduce programme with reference to its architecture

Thursday, April 8, 2021

Writing MapReduce Programming - a descriptive guide with example on writing of a sample maprReduce programme with reference to its architecture

Writing MapReduce Programming

* As per standard books , one should start MapReduce program by writing pseudocode for Map and Reduce Functions

* A "pseudo-code" is not the entire / actual length of the code but it is a blueprint of the code that would be written in place of the actual code that is going to be used in case of a working standardised code .

* The program code for both the Map and Reduce functions can be written in Java or other programming languages

* In Java , a Map function is represented by generic Mapper Class (which acts over structured and unstructured data type objects ) .

* The Map Function has mainly four parameters (input key,input value, output key and output value)

* General handling of Map Function imported from Mapper Class in Java is out of context for this article , however I shall try to cover the usage of Map Function in a separate blog with appropriate example

* The Mapper Class uses an abstract Map() method which receives the Input Key and Input Values which would produce an Output key and Output value .

* For more complex problems involving Map() functions , it is advised to use a higher-level language than MapReduce such as Pig, Hive and Spark

( Detailed coverage on the above programming languages - Pig , Hive and Spark would be done in separate articles later )

* A Mapper function commonly performs input format parsing , projection (selection of relevant fields) and filtering related operations (selection of requisite records needed from context table)

* The Reducer function typically combines (adds/averages) the requisite values again after performance of the necessary operations after Mapping procedure which finally yields the Output .

* Below diagram is a breakdown of all operations that happen within the MapReduce Program Flow .

* Following is a step-by-step logic for performing a word count of all unique words in a text file .

1) A document taken into consideration is split into several different segments .The Map step is run on each segment of the data . The output is a set of key and value pairs . In the given case , the key is a word in the document .

2) The Big Data system gathers the (key,value) pair outputs from all the mappers and then it will sort the entire system with the help of a Key . The sorted list is then split into a few segments

3) The task of the Reducer in the entire system is to sort the entire list and produce a combined list of word counts from the entire list provided to the system for the purpose of Sorting and counting .

==================================

Java Code for WordCount

==================================

map(String key,String value):

for each word w in value:

EmitIntermediate(w,"I"):

reduce(String key,Iterator values):

int result = 0:

for each 'v' in values:

result == ParseInt(v):

Emit(AsString(result))

==================================

Data Science and AI Quest

Thursday, April 8, 2021

Writing MapReduce Programming - a descriptive guide with example on writing of a sample maprReduce programme with reference to its architecture

No comments:

Post a Comment

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note