Sample MapReduce Application – WordCount
* Suppose one wants to identify unique words in a piece of text with the frequency of the occurrence of each of the words in the text .
* Suppose the text within a
datafile "file.txt" can be split into 4 segments in such a way that
each of the segments are somewhat of the same length with a few changes between
them and that too very minimally , then one can represent the same in the
following manner :
Segment01
- "I stay at WonderVille in the city of Gods"
Segment
02 - "I am going to a picnic near our house "
Segment
03 - " Many of our friends are coming "
Segment
04 - " You are welcome to join us "
Segment
05 - " We will have fun "
* Each of the given segments of
data can be processed in parallel where all the constituent data within the
sample could be aggregated to provide results for the text as given in the
above text segments "
* From this it can be ascertained that there are 4 map tasks one for each segment of data where each Map process takes in input in a <key,value> pair format .
* Each Map process takes in a <key,value> pair format where the
first column is addressed as the Key which is the entire sentence in the case .
The second column holds the Value which in the application is the frequency of the words appearing within the counting process . Here , each Map Process within the application is executed by a different processor .
* There are four intermediate
files in <keys2,value2> pair format which can be shown in the below
manner
* The sort process inherent
within "MapReduce" will
"SORT" each of the
intermediate files and prodce a
following sorted key-value pair in the following format .
* The "Reduce"
function will read the sorted intermediate files and combine the results into
one result
No comments:
Post a Comment