Behavior changes
This release of the Cloudera Data Warehouse service on Cloudera on cloud has the following behavior changes:
Summary: Using 'skopeo' to copy Cloudera images to custom ECR repository
Due to security reasons, Cloudera images are sensitive and require their hash value to be retained when moving images between repositories. You have to ensure that the images are copied by retaining the image manifests and hash value (SHA). If the image SHA is different in the custom ECR repository as compared to the Cloudera hosted repository, you might encounter issues in activating the Cloudera Data Warehouse clusters.
Before this release: You can copy images in any preferred way and you do not notice any issues while activating Cloudera Data Warehouse environments.
After this release: You must use third-party tools, such as 'skopeo' to copy images between repositories by preserving the image metadata. For more information, see Copying images to custom ECR repository.
Summary: Ability to select a compute instance type during environment activation
Before this release: You could select a compute instance type only when using the CDP CLI to activate an environment in Cloudera Data Warehouse. The option to select the instance type through the UI was removed.
After this release: Starting with this release, you can no longer select a compute instance type when you use the CDP CLI to activate an environment in Cloudera Data Warehouse.
Summary: View historical data in the Impala Autoscaling dashboard
Before this release: In the Impala Autoscaling Dashboard, you could view autoscaler metrics data for the past one hour and use a time window slider to zoom into a specific time period within the recent one hour window.
After this release: The Impala Autoscaling Dashboard is enhanced to enable you to view historical data along with live data. You can now choose the Historic Data option and specify the start and end timestamps for which you want to view autoscaler metrics data. Note that this feature is currently available only for AWS environments.
Summary: Specify a file size threshold limit for Iceberg data compaction
Before this release: The Impala OPTIMIZE TABLE
<table_name>
statement rewrites all files in the table, regardless of size or
type, even when there are no small or delete files.
After this release: The Impala OPTIMIZE TABLE
<table_name>
is enhanced to include a FILE_SIZE_THRESHOLD_MB
option that enables you to specify the maximum size of files (in MB) that should be
considered for compaction.
Summary: Removal of "docker" image registry type
Before this release: While activating an environment, you
can choose the following image repositories — ecr
, acr
, or
docker
in the Registry Type option.
After this release: The "docker" custom image registry type is no longer supported in Cloudera Data Warehouse and the option to choose the "docker" registry type during environment activation is removed. You can either choose a custom ACR or ECR image repository.
Summary: Consistent protocol version values in workload management tables
Before this release: The
sys.impala_query_log
table stored a full protocol name such as
"HIVE_CLI_SERVICE_PROTOCOL_V6
" in the
hiveserver2_protocol_version
column. In contrast, the
sys.impala_query_live
table and query profiles used a shorter value such
as "V6
".
After this release: The
hiveserver2_protocol_version
column in the sys.impala_query_log
table now uses the same short string (for example, "V6
") as the
sys.impala_query_live
table and query profiles.
Restore previous behavior:
This SQL statement replicates the behavior before this release where
sys.impala_query_log
stored a value of
"HIVE_CLI_SERVICE_PROTOCOL_V6
":
SELECT CASE hiveserver2_protocol_version WHEN 'V6' THEN 'HIVE_CLI_SERVICE_PROTOCOL_V6'
ELSE hiveserver2_protocol_version END as hiveserver2_protocol_version FROM sys.impala_query_log
Summary: Disabling join disjunctive predicate pushdown
Before this release: With hive.optimize.join.disjunctive.transitive.predicates.pushdown enabled by default, queries with disjunctive predicates could cause HiveServer2 to crash or run out of memory during compilation.
After this release: The hive.optimize.join.disjunctive.transitive.predicates.pushdown setting is now disabled by default, enhancing HiveServer2 stability and preventing crashes and out-of-memory errors. In some rare cases, queries with joins and unions become slightly less efficient but the difference should not be noticeable by the end-users.
Apache Jira: HIVE-28310
Summary: Hive CBO fallback strategy configuration
Before this release: The
hive.cbo.fallback.strategy
property was set to
CONSERVATIVE
by default. In case of an error during the cost-based
optimizer phase, Hive would fallback to the legacy optimizer, potentially reducing
optimization efficiency and masking serious or unrecoverable errors.
After this release: The default value for
hive.cbo.fallback.strategy
is now set to NEVER
. Hive no
longer falls back to the legacy optimizer and cost-based optimizer errors are fatal. Hidden
compilation errors will now show up immediately and additional actions are required to
compile and execute the query successfully.
Apache Jira: HIVE-27831