Behavior changes in Cloudera Data Warehouse on premises 1.5.5 SP1

Summary: Changes to Unified Analytics availability in Cloudera Data Warehouse on premise

Before this release: When creating a new Impala Virtual Warehouse, you could select the Enable Unified Analytics option.

After this release: The Enable Unified Analytics option is no longer available when creating a new Impala Virtual Warehouse. However, you can continue to create and manage Impala Unified Analytics virtual warehouses using the Cloudera CLI.

Summary: Enhanced log router resource allocation in Cloudera Data Warehouse on premise

Before this release: The resource quota of the log router component was fixed and static, failing to scale automatically with the cluster size. As clusters grew with additional nodes, the allocated quota became insufficient for the log router, which operates as a DaemonSet across all nodes. This resource deficit often caused the log router pods to remain in a Pending state and required manual quota adjustments to address the issue.

After this release: The resource quota calculation for the log router component is now improved. Cloudera Data Warehouse on premises now dynamically calculates and requests the appropriate quota by multiplying the resource request of a single log router pod by the current number of nodes it operates on. This ensures the log router consistently receives sufficient resources, eliminating the need for manual adjustments as the cluster scales.

Summary: Automatic synchronization of group name conversion between Cloudera Base on premises Ranger and Cloudera Data Warehouse

Before this release: In Active Directory environments, user and group names are often written in mixed case, for example, JohnDoe or AdminGroup, and handled in a case-insensitive manner by Windows. However, Cloudera Base on premises operates in a Linux environment, in which names are case-sensitive. To address this, some customers configure Cloudera Base on premises to disable case sensitivity in System Security Service Daemon (sssd) and modify Ranger Usersync settings to convert user and group names to lowercase, ensuring compatibility with Ranger policies.

Configuration properties to set the Ranger Usersync settings

While this configuration worked correctly in Cloudera Base on premises, authorization issues could arise in Cloudera Data Warehouse components like Hive and Impala. Cloudera Data Warehouse did not automatically convert group names to lowercase, causing mismatches with Ranger policies that define group names in lowercase. This resulted in authorization problems, such as users being unable to access databases, tables, or columns in Hue or remote client shells, such as impala-shell or jdbc, even though access worked correctly in Cloudera Base on premises Hue or remote client shells.

To resolve this issue, a manual workaround was required to enable group name conversion to lowercase in Cloudera Data Warehouse. This involved adding specific Hadoop core-site configuration entries to the hadoop-core-site-default-warehouse configuration file. For Hive Virtual Warehouse, changes were applied to HiveServer2 and for Impala Virtual Warehouse, changes were applied to Impala Catalogd, Impala Coordinator, Impala Executor, and Impala StateStored.

Property Name	Value
`hadoop.security.group.mapping`	`org.apache.hadoop.security.RuleBasedLdapGroupsMapping`
`hadoop.security.group.mapping.ldap.conversion.rule`	`to_lower`

After this release: The Cloudera Data Services on premises 1.5.5 SP1 release introduces automatic synchronization of group name conversion settings between Cloudera Base on premises and Cloudera Data Warehouse. This ensures consistent handling of group names across environments.

Behavior remains unchanged if the manual Cloudera Data Warehouse workaround was previously applied, or if the Cloudera Base on premises Ranger Usersync was set to the default of none, meaning no case conversion.

Action required: If mixed-case group name policies were created specifically for Cloudera Data Warehouse as an undocumented solution, these policies must be reviewed and manually updated to match the case conversion settings of being either lowercase or uppercase in Cloudera Base on premises Ranger.

Summary: Support load-based routing in `impala-proxy`

Before this release: impala-proxy used a random selection policy to choose a coordinator to which to forward new OpenSession requests.

After this release: impala-proxy now uses load-based routing to distribute OpenSession requests across multiple active coordinators.

Summary: Cleanup subdirectories in truncate/insert overwrite if recursing listing is enabled

Before this release: Impala did not consistently delete files located in subdirectories of external tables during TRUNCATE and INSERT OVERWRITE operations, even when recursive listing was enabled. This led to leftover data in subdirectories after these operations, resulting in data corruption.

After this release:After this change, directories are also deleted in addition to (non-hidden) data files, with the exception of hidden and ignored directories. Now, setting DELETE_STATS_IN_TRUNCATE=false is no longer supported by default when truncating non-transactional tables; attempting this will result in an exception. If the old behavior is absolutely required, you can set the --truncate_external_tables_with_hms flag to false, but be aware that this will also reintroduce the bug that was fixed by this change.

Apache Impala: IMPALA-14189 IMPALA-14224