Data Science and AI Quest: Sample MapReduce Application – WordCount ( analysis and interpretation with an example )

Wednesday, April 7, 2021

Sample MapReduce Application – WordCount ( analysis and interpretation with an example )

Sample MapReduce Application – WordCount

* Suppose one wants to identify unique words in a piece of text with the frequency of the occurrence of each of the words in the text .

* Suppose the text within a datafile "file.txt" can be split into 4 segments in such a way that each of the segments are somewhat of the same length with a few changes between them and that too very minimally , then one can represent the same in the following manner :

Segment01 - "I stay at WonderVille in the city of Gods"

Segment 02 - "I am going to a picnic near our house "

Segment 03 - " Many of our friends are coming "

Segment 04 - " You are welcome to join us "

Segment 05 - " We will have fun "

* Each of the given segments of data can be processed in parallel where all the constituent data within the sample could be aggregated to provide results for the text as given in the above text segments "

* From this it can be ascertained that there are 4 map tasks one for each segment of data where each Map process takes in input in a <key,value> pair format .

* Each Map process takes in a <key,value> pair format where the first column is addressed as the Key which is the entire sentence in the case .

The second column holds the Value which in the application is the frequency of the words appearing within the counting process . Here , each Map Process within the application is executed by a different processor .

* There are four intermediate files in <keys2,value2> pair format which can be shown in the below manner

* The sort process inherent within "MapReduce" will "SORT" each of the

intermediate files and prodce a following sorted key-value pair in the following format .

* The "Reduce" function will read the sorted intermediate files and combine the results into one result

Data Science and AI Quest

Wednesday, April 7, 2021

Sample MapReduce Application – WordCount ( analysis and interpretation with an example )

No comments:

Post a Comment

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note