Thursday, April 8, 2021

Writing MapReduce Programming - a descriptive guide with example on writing of a sample maprReduce programme with reference to its architecture

     


             

                   Writing MapReduce Programming

 

* As per standard books , one should start MapReduce  program by writing pseudocode for Map and Reduce Functions 

* A "pseudo-code" is not the entire / actual length of the code but it is a blueprint of the code that would be written in place of the actual code that is going to be used in case of a working standardised code . 

* The program code for both the Map and Reduce functions can be written in Java or other programming languages 

* In Java , a Map function is represented by generic Mapper Class (which acts over structured and unstructured data type objects ) . 

* The Map Function has mainly four parameters (input key,input value, output key and output value)

 * General handling of Map Function imported from Mapper Class in Java is out of context for this article , however I shall try to cover the usage of Map Function in a separate blog with appropriate example 

* The Mapper Class uses an abstract Map() method which receives the Input Key and Input Values which would produce an Output key and Output value .

 * For more complex problems involving Map() functions , it is advised to use a higher-level language than MapReduce such as Pig, Hive and Spark

 ( Detailed coverage on the above programming languages - Pig , Hive and Spark would be done in separate articles later )

 * A Mapper function commonly performs input format parsing , projection (selection of relevant fields) and filtering related operations (selection of requisite records needed from context table)

* The Reducer function typically combines (adds/averages) the requisite values again after performance of the necessary operations after Mapping procedure which finally yields the Output .

 * Below diagram is a breakdown of all operations that happen within the MapReduce Program Flow .

 


* Following is a step-by-step logic for performing a word count of all unique words in a text file .

1) A document taken into consideration is split into several different segments .The Map step is run on each segment of the data . The output is a set of key and value pairs . In the given case , the key is a word in the document .

2) The Big Data system gathers the (key,value) pair outputs from all the mappers and then it will sort the entire system with the help of a Key . The sorted list is then split into a few segments

3) The task of the Reducer in the entire system is to sort the entire list and produce a combined list of word counts from the entire list provided to the system for the purpose of Sorting and counting .

 

==================================

Java Code for WordCount

==================================

 map(String key,String value):

    for each word w in value:

    EmitIntermediate(w,"I"):

    reduce(String key,Iterator values):

int result = 0:

for each 'v' in values:

result == ParseInt(v):

Emit(AsString(result))

==================================

 

No comments:

Post a Comment