Monday, April 12, 2021

MapReduce Jobs Execution

 


MapReduce Jobs Execution

 

* A MapReduce job is specified by a Map program and the Reduce program along with the data sets associated with a MapReduce Job

 

* There is another master program that resides and runs endlessly over a NameNode which is called as the "Job Tracker" which tracks the progress of MapReduce jobs from beginning to completion stage

 

* Hadoop moves the Map and Reduce computation logic to all the DataNodes which are hosting a fragment of data

 

* Communication between the nodes is accomplished using YARN , Hadoop's native resource manager

 

* The master machine (NameNode) is completely aware of the data stored over each of the worker machines (DataNodes)

 

* The Master Machine schedules the " Map / Reduce jobs " to Task Trackers with full awareness of the data location which means that Task Trackers residing within the hierarchial monitoring architecture being thoroughly aware of the residing data and their location .

By this the job tracker would be able to fully address the issue of mapping the requisite jobs to their job queues in the form of Job/Task tracker .

 

For example , if "node A" contains data (x,y,z) and "node B" contains data (a,b,c) , the job tracker schedules node B to perform map or Reduce Tasks on (a,b,c) and node A would be scheduled to perform Map or Reduce tasks on (x,y,z). This helps in reduction of the data traffic and subsequent choking of the network .

 

* Each DataNode within the MapReduce Jobs has a master program which is called the Task Tracker

 

No comments:

Post a Comment