Known Issues in YARN and YARN Queue Manager

Learn about the known issues in YARN and YARN Queue Manager, the impact or changes to the functionality, and the workaround in Cloudera Runtime 7.1.9 SP1 CHF 6

Known issues identified in Cloudera Runtime 7.1.9 SP1 CHF 6

There are no new known issues identified in this release.

Known issues identified before Cloudera Runtime 7.1.9 SP1 CHF 6

COMPX-14820: Delete Queue and its Children throws "Queue capacity was reduced to zero, but failed to delete queue."
When trying to perform the operation "Delete Queue and its Children" on a queue that has one or more siblings, the operation fails as YARN has some constraints. If the queue performing the operation "Delete Queue and its Children" is a leaf node, then the operations succeeds.
None.
COMPX-4644: Queue capacity rounding problem when configuration is initially set via YARN

When setting the capacity scheduler configuration through the YARN/Cloudera Manager configuration, there may be capacity values that use multiple decimal places. This results in rounding/floating point precision discrepancies in the UI when trying to validate that all sibling capacities equal 100%. The UI looks like all the numbers add up to 100, but the validation still displays an error and does not allow to save the capacities. It is also observed that the capacity is being calculated as, for example, 99.9999999991 in the backend.

  • Create queues within the UI, or
  • Ensure that capacities configured through the Capacity Scheduler safety valve do not have more than one decimal place.
20202 Database migration after enabling opt-in migration
When migrating from an H2 database to a PostgreSQL database in YARN Queue Manager after installation or upgrade, you might encounter an issue only when you have followed the following specific scenario:
  • New install or upgrade to CDP 7.1.9, forcing migration from H2 to PostgreSQL database.
  • Upgrade to CDP 7.1.9 CHF2, moving back to H2 database.
  • Upgrade to CDP 7.1.9 SP1 with valid PostgreSQL connection details in Queue Manager configurations.
To avoid any issues during the upgrade to version CDP 7.1.9 SP1, ensure that PostgreSQL connection details are removed from the YARN database configuration if you prefer to continue using the H2 database.
CDPD-56559: MapReduce jobs can intermittently fail during a rolling upgrade.
During a rolling upgrade between CDP versions 7.1.8 and 7.1.9, MapReduce jobs may fail with message, RuntimeException: native snappy library not available. Although the native Snappy compression library is not loaded, a checkmark displays to indicate that the Snappy compression library is loading for NodeManagers that are pending upgrades. This causes the MapReduce jobs that are associated with the NodeManagers to fail. After the upgrade, the jobs work as expected. This issue only impacts rolling upgrades from before CDP 7.1.9 to a higher version.
None.
COMPX-13177: QueueManager webapp requests fail with 'HTTP ERROR 400 java.net.ConnectException: Unsupported ciphersuite TLS_EDH_RSA_WITH_3DES_EDE_CBC_SHA'
Products:
  • Cloudera Manager for CDP Private
  • Cloud Base Cloudera Manager for CDP Public Cloud
Context:
  • Centos 7.8 and Redhat 7.8 operating systems, when FIPS support is enabled.
Problem:
  • When attempting to display the Yarn Queue Manager interface, Cloudera Manager displays an error: "HTTP ERROR 400 java.net.ConnectException: Unsupported ciphersuite TLS_EDH_RSA_WITH_3DES_EDE_CBC_SHA".
  1. Edit the file [/etc/default/cloudera-scm-server]
  2. Around line 28, modify the line that starts with
     #export CMF_OVERRIDE_TLS_CIPHERS=....
  3. Remove the comment mark #.
  4. Remove all ciphers with "3DES" in the name.
  5. Save the file.
  6. Restart the Cloudera Manager Server service.
COMPX-12021 Queue Manager configurations on Scheduler Configuration page are not working
When setting the following properties on the YARN Queue Manager UI, the properties are set in the capacity-scheduler.xml file which does not have any effect on YARN. The properties need to be set in the yarn-site.xml file, which does not happen when you set them through YARN Queue Manager.
  • "Maximum Application Priority" – "yarn.cluster.max-application-priority"
  • "Enable Monitoring Policies" – "yarn.resourcemanager.scheduler.monitor.enable"
  • "Monitoring Policies" – "yarn.resourcemanager.scheduler.monitor.policies"
  • "Preemption: Observe Only" – "yarn.resourcemanager.monitor.capacity.preemption.observe_only"
  • "Preemption: Monitoring Interval (ms)" – "yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval"
  • "Preemption: Maximum Wait Before Kill (ms)" – "yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill"
  • "Preemption: Total Resources Per Round" – "yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round"
  • "Preemption: Over Capacity Tolerance" – "yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity"
  • "Preemption: Maximum Termination Factor" – "yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor"
  • "Enable Intra Queue Preemption" – "yarn.resourcemanager.monitor.capacity.preemption.intra-queue-preemption.enabled"
  1. In Cloudera Manager, select the YARN service.
  2. Click the Configuration tab.
  3. Search for yarn-site.xml.
  4. Under YARN Service Advanced Configuration Snippet (Safety Valve) for yarn-site.xml, add the corresponding parameter and value you need.
  5. Click Save Changes.
  6. Restart the YARN services.
Third-party applications do not launch if MapReduce framework path is not included in the client configuration
MapReduce application framework is loaded from HDFS instead of being present on the NodeManagers. By default the mapreduce.application.framework.path property is set to the appropriate value, but third-party applications with their own configurations does not launch.
Set the mapreduce.application.framework.path property to the appropriate configuration for third-party applications.
JobHistory URL mismatch after server relocation
After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected.
For any existing jobs that have the incorrect JobHistory Server URL, there is no option other than to allow the jobs to roll off the history over time. For new jobs, make sure that all clients have the updated the mapred-site.xml file that references the correct JobHistory Server.
YARN cannot start if Kerberos principal name is changed
If the Kerberos principal name is changed in Cloudera Manager after launch, YARN does not start. In such cases, the keytabs can be correctly generated but YARN cannot access ZooKeeper with the new Kerberos principal name and old ACLs.
There are two possible workarounds:
  • Delete the znode and restart the YARN service.
  • Use the reset ZK ACLs command. This also sets the znodes below /rmstore/ZKRMStateRoot to world:anyone:cdrwa which is less secure.
Queue Manager does not open on using a custom user with a default Kerberos principal
If a custom user is used with the default Kerberos principal, the Queue Manager web UI displays an HTTP ERROR 400 error.
Ensure that the Queue Manager process_username property matches the YARN process_username property.