MapReduce Jobs Execution
* A MapReduce job is specified by a Map program and the Reduce
program along with the data sets associated with a MapReduce Job
* There is another master program that resides and runs
endlessly over a NameNode which is called as the "Job Tracker" which
tracks the progress of MapReduce jobs from beginning to completion stage
* Hadoop moves the Map and Reduce computation logic to all the
DataNodes which are hosting a fragment of data
* Communication between the nodes is accomplished using YARN ,
Hadoop's native resource manager
* The master machine (NameNode) is completely aware of the data
stored over each of the worker machines (DataNodes)
* The Master Machine schedules the " Map / Reduce jobs
" to Task Trackers with full awareness of the data location which means
that Task Trackers residing within the hierarchial monitoring architecture
being thoroughly aware of the residing data and their location .
By this the job tracker would be able to fully address the issue
of mapping the requisite jobs to their job queues in the form of Job/Task
tracker .
For example , if "node A" contains data (x,y,z) and
"node B" contains data (a,b,c) , the job tracker schedules node B to
perform map or Reduce Tasks on (a,b,c) and node A would be scheduled to perform
Map or Reduce tasks on (x,y,z). This helps in reduction of the data traffic and
subsequent choking of the network .
* Each DataNode within the MapReduce Jobs has a master program
which is called the Task Tracker
No comments:
Post a Comment