Managing MapReduce
For an overview of computation frameworks, insight into their usage and restrictions, and examples of common tasks they perform, see Managing YARN (MRv2) and MapReduce (MRv1).
Configuring the MapReduce Scheduler
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
The MapReduce service is configured by default to use the FairScheduler. You can change the scheduler type to FIFO or Capacity Scheduler. You can also modify the Fair Scheduler and Capacity Scheduler configuration. For further information on schedulers, see YARN (MRv2) and MapReduce (MRv1) Schedulers.
Configuring the Task Scheduler Type
- Go to the MapReduce service.
- Click the Configuration tab.
- Select .
- Select .
- In the Task Scheduler property, select a scheduler.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Restart the JobTracker to apply the new configuration:
- Click the Instances tab.
- Click the JobTracker role.
- Select .
Modifying the Scheduler Configuration
- Go to the MapReduce service.
- Click the Configuration tab.
- Select .
- Select .
- Modify the configuration properties.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Restart the JobTracker to apply the new configuration:
- Click the Instances tab.
- Click the JobTracker role.
- Select .
Configuring the MapReduce Service to Save Job History
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
Normally job history is saved on the host on which the JobTracker is running. You can configure JobTracker to write information about every job that completes to a specified HDFS location. By default, the information is retained for 7 days.
Enabling Map Reduce Job History To Be Saved to HDFS
- Create a folder in HDFS to contain the history information. When creating the folder, set the owner and group to mapred:hadoop with permission setting 775.
- Go to the MapReduce service.
- Click the Configuration tab.
- Select .
- Select .
- Set the Completed Job History Location property to the location that you created in step 1.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes.
- Restart the MapReduce service.
Setting the Job History Retention Duration
- Select the JobTracker Default Group category.
- Set the Job History Files Maximum Age property (mapreduce.jobhistory.max-age-ms) to the length of time (in milliseconds, seconds, minutes, or hours) that you want job history files to be kept.
- Restart the MapReduce service.
- Select the JobTracker Default Group category.
- Set the Job History Files Cleaner Interval property (mapreduce.jobhistory.cleaner.interval) to the desired frequency (in milliseconds, seconds, minutes, or hours).
- Restart the MapReduce service.
Configuring Client Overrides
A configuration property qualified with (Client Override) is a server-side setting that ignores any value a client tries to set for that property. It performs the same role as its unqualified counterpart, and applies the configuration to the service with the setting <final>true</final>.
For example, if you set the Map task heap property to 1 GB in the job configuration code, but the service's heap property qualified with (Client Override) is set to 500 MB, then 500 MB is applied.