MapReduce Health Tests
MapReduce Failover Controllers Health
This is a MapReduce service-level health test that checks that all the Failover Controllers associated with this service are healthy and running. The test returns "Bad" health if any of Failover Controllers that the service depends on is unhealthy or not running. Check the Failover Controllers logs for more details This test can be enabled or disabled using the Failover Controllers Healthy service-wide monitoring setting.
Short Name: Failover Controllers Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Failover Controllers Healthy | Enables the health check that verifies that the failover controllers associated with this service are healthy and running. | failover_controllers_healthy_enabled | true | no unit |
MapReduce JobTracker Health
This is a MapReduce service-level health test that checks for and active, healthy JobTracker. The test returns "Bad" health if the service is running and an active JobTracker cannot be found. If an active JobTracker is found, then the test checks the health of that JobTracker as well as the health of any standby JobTracker configured. A "Good" health result will only be returned if both the active and Standby JobTrackers are healthy. A failure of this health test may indicate stopped or unhealthy JobTracker roles, or it may indicate a problem with communication between the Cloudera Manager Service Monitor and the JobTrackers. Check the status of the MapReduce service's JobTracker roles and look in the Cloudera Manager Service Monitor's log files for more information when this test fails. This test can be enabled or disabled using the JobTracker Role Health Test MapReduce service-wide monitoring setting. The check for a healthy standby JobTracker can be enabled or disabled with Standby JobTracker Health Test. In addition, the Active JobTracker Detection Window can be used to adjust the amount of time that the Cloudera Manager Service Monitor has to detect the active JobTracker before this health test fails, and the JobTracker Activation Startup Tolerance can be used to adjust the amount of time around JobTracker startup that the test allows for the JobTracker to be made active.
Short Name: JobTracker Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Active JobTracker Detection Window | The tolerance window that will be used in Mapreduce service tests that depend on detection of the active JobTracker. | mapreduce_active_jobtracker_detecton_window | 3 | MINUTES |
JobTracker Activation Startup Tolerance | The amount of time after JobTracker(s) start that the lack of an active JobTracker will be tolerated. This is intended to allow either the auto-failover daemon to make a JobTracker active, or a specifically issued failover command to take effect. This is an advanced option that does not often need to be changed. | mapreduce_jobtracker_activation_startup_tolerance | 180 | SECONDS |
JobTracker Role Health Test | When computing the overall MapReduce cluster health, consider the JobTracker's health | mapreduce_jobtracker_health_enabled | true | no unit |
Standby JobTracker Health Test | When computing the overall cluster health, consider the health of the standby JobTracker. | mapreduce_standby_jobtrackers_health_enabled | true | no unit |
MapReduce TaskTracker Health
This is a MapReduce service-level health test that checks that enough of the TaskTrackers in the cluster are healthy. The test returns "Concerning" health if the number of healthy TaskTrackers falls below a warning threshold, expressed as a percentage of the total number of TaskTrackers. The test returns "Bad" health if the number of healthy and "Concerning" TaskTrackers falls below a critical threshold, expressed as a percentage of the total number of TaskTrackers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of 90% for a cluster of 100 TaskTrackers, this test would return "Good" health if 95 or more TaskTrackers have good health. This test would return "Concerning" health if at least 90 TaskTrackers have either "Good" or "Concerning" health. If more than 10 TaskTrackers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy TaskTrackers. Check the status of the individual TaskTrackers for more information. This test can be configured using the MapReduce MapReduce service-wide monitoring setting.
Short Name: TaskTracker Health
Property Name | Description | Template Name | Default Value | Unit |
---|---|---|---|---|
Healthy TaskTracker Monitoring Thresholds | The health test thresholds of the overall TaskTracker health. The check returns "Concerning" health if the percentage of "Healthy" TaskTrackers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" TaskTrackers falls below the critical threshold. | mapreduce_tasktrackers_healthy_thresholds | critical:90.0, warning:95.0 | PERCENT |