Data Science and AI Quest: MapReduce Jobs Execution

Monday, April 12, 2021

MapReduce Jobs Execution

MapReduce Jobs Execution

* A MapReduce job is specified by a Map program and the Reduce program along with the data sets associated with a MapReduce Job

* There is another master program that resides and runs endlessly over a NameNode which is called as the "Job Tracker" which tracks the progress of MapReduce jobs from beginning to completion stage

* Hadoop moves the Map and Reduce computation logic to all the DataNodes which are hosting a fragment of data

* Communication between the nodes is accomplished using YARN , Hadoop's native resource manager

* The master machine (NameNode) is completely aware of the data stored over each of the worker machines (DataNodes)

* The Master Machine schedules the " Map / Reduce jobs " to Task Trackers with full awareness of the data location which means that Task Trackers residing within the hierarchial monitoring architecture being thoroughly aware of the residing data and their location .

By this the job tracker would be able to fully address the issue of mapping the requisite jobs to their job queues in the form of Job/Task tracker .

For example , if "node A" contains data (x,y,z) and "node B" contains data (a,b,c) , the job tracker schedules node B to perform map or Reduce Tasks on (a,b,c) and node A would be scheduled to perform Map or Reduce tasks on (x,y,z). This helps in reduction of the data traffic and subsequent choking of the network .

* Each DataNode within the MapReduce Jobs has a master program which is called the Task Tracker

Data Science and AI Quest

Monday, April 12, 2021

MapReduce Jobs Execution

No comments:

Post a Comment

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note