Writing MapReduce Programming
* As per standard books , one should start MapReduce program by writing pseudocode for Map and Reduce Functions
* A "pseudo-code" is not the entire / actual length of the code but it is a blueprint of the code that would be written in place of the actual code that is going to be used in case of a working standardised code .
* The program code for both the Map and Reduce functions can be written in Java or other programming languages
* In Java , a Map function is represented by generic Mapper Class (which acts over structured and unstructured data type objects ) .
* The Map Function has mainly four parameters (input key,input
value, output key and output value)
* The Mapper Class uses an abstract Map() method which receives
the Input Key and Input Values which would produce an Output key and Output
value .
* The Reducer function typically combines (adds/averages) the
requisite values again after performance of the necessary operations after
Mapping procedure which finally yields the Output .
* Following is a step-by-step logic for performing a word count of all unique words in a text file .
1) A document taken into consideration is split into several different segments .The Map step is run on each segment of the data . The output is a set of key and value pairs . In the given case , the key is a word in the document .
2) The Big Data system gathers the (key,value) pair outputs from all the mappers and then it will sort the entire system with the help of a Key . The sorted list is then split into a few segments
3) The task of the Reducer in the entire system is to sort the entire list and produce a combined list of word counts from the entire list provided to the system for the purpose of Sorting and counting .
==================================
Java Code for WordCount
==================================
map(String
key,String value):
for
each word w in value:
EmitIntermediate(w,"I"):
reduce(String key,Iterator values):
int result = 0:
for each 'v' in values:
result == ParseInt(v):
Emit(AsString(result))
==================================
No comments:
Post a Comment