Behavior changes

This release of the Cloudera Data Warehouse service on Cloudera on cloud has the following behavior changes:

Summary: Using 'skopeo' to copy Cloudera images to custom ECR repository

Due to security reasons, Cloudera images are sensitive and require their hash value to be retained when moving images between repositories. You have to ensure that the images are copied by retaining the image manifests and hash value (SHA). If the image SHA is different in the custom ECR repository as compared to the Cloudera hosted repository, you might encounter issues in activating the Cloudera Data Warehouse clusters.

Before this release: You can copy images in any preferred way and you do not notice any issues while activating Cloudera Data Warehouse environments.

After this release: You must use third-party tools, such as 'skopeo' to copy images between repositories by preserving the image metadata. For more information, see Copying images to custom ECR repository.

Summary: Ability to select a compute instance type during environment activation

Before this release: You could select a compute instance type only when using the CDP CLI to activate an environment in Cloudera Data Warehouse. The option to select the instance type through the UI was removed.

After this release: Starting with this release, you can no longer select a compute instance type when you use the CDP CLI to activate an environment in Cloudera Data Warehouse.

Summary: View historical data in the Impala Autoscaling dashboard

Before this release: In the Impala Autoscaling Dashboard, you could view autoscaler metrics data for the past one hour and use a time window slider to zoom into a specific time period within the recent one hour window.

After this release: The Impala Autoscaling Dashboard is enhanced to enable you to view historical data along with live data. You can now choose the Historic Data option and specify the start and end timestamps for which you want to view autoscaler metrics data. Note that this feature is currently available only for AWS environments.

Summary: Specify a file size threshold limit for Iceberg data compaction

Before this release: The Impala OPTIMIZE TABLE <table_name> statement rewrites all files in the table, regardless of size or type, even when there are no small or delete files.

After this release: The Impala OPTIMIZE TABLE <table_name> is enhanced to include a FILE_SIZE_THRESHOLD_MB option that enables you to specify the maximum size of files (in MB) that should be considered for compaction.

Summary: Removal of "docker" image registry type

Before this release: While activating an environment, you can choose the following image repositories — ecr, acr, or docker in the Registry Type option.

After this release: The "docker" custom image registry type is no longer supported in Cloudera Data Warehouse and the option to choose the "docker" registry type during environment activation is removed. You can either choose a custom ACR or ECR image repository.

Summary: Consistent protocol version values in workload management tables

Before this release: The sys.impala_query_log table stored a full protocol name such as "HIVE_CLI_SERVICE_PROTOCOL_V6" in the hiveserver2_protocol_version column. In contrast, the sys.impala_query_live table and query profiles used a shorter value such as "V6".

After this release: The hiveserver2_protocol_version column in the sys.impala_query_log table now uses the same short string (for example, "V6") as the sys.impala_query_live table and query profiles.

Restore previous behavior:

This SQL statement replicates the behavior before this release where sys.impala_query_log stored a value of "HIVE_CLI_SERVICE_PROTOCOL_V6":

SELECT CASE hiveserver2_protocol_version WHEN 'V6' THEN 'HIVE_CLI_SERVICE_PROTOCOL_V6'
ELSE hiveserver2_protocol_version END as hiveserver2_protocol_version FROM sys.impala_query_log

Summary: Disabling join disjunctive predicate pushdown

Before this release: With hive.optimize.join.disjunctive.transitive.predicates.pushdown enabled by default, queries with disjunctive predicates could cause HiveServer2 to crash or run out of memory during compilation.

After this release: The hive.optimize.join.disjunctive.transitive.predicates.pushdown setting is now disabled by default, enhancing HiveServer2 stability and preventing crashes and out-of-memory errors. In some rare cases, queries with joins and unions become slightly less efficient but the difference should not be noticeable by the end-users.

Apache Jira: HIVE-28310

Summary: Hive CBO fallback strategy configuration

Before this release: The hive.cbo.fallback.strategy property was set to CONSERVATIVE by default. In case of an error during the cost-based optimizer phase, Hive would fallback to the legacy optimizer, potentially reducing optimization efficiency and masking serious or unrecoverable errors.

After this release: The default value for hive.cbo.fallback.strategy is now set to NEVER. Hive no longer falls back to the legacy optimizer and cost-based optimizer errors are fatal. Hidden compilation errors will now show up immediately and additional actions are required to compile and execute the query successfully.

Apache Jira: HIVE-27831