What's new in Cloudera Data Warehouse on premises 1.5.5 SP1
Review the new features introduced in this cumulative hotfix release of Cloudera Data Warehouse on premises 1.5.5 SP1.
Cloudera Data Warehouse on premises
- Deactivating environments
- The Cloudera Data Warehouse user interface now includes a Force Delete option for environments that are stuck in a deleting state. This feature empowers administrators to directly remove such environments from the UI, bypassing the standard deletion process. For more information on removing the environments, refer to Deactivation environment.
- Auto-scaling enhancements
- In this release, the auto-scaling behavior for Hive and Impala Virtual Warehouses is enhanced to ensure more reliable and efficient resource allocation. With this update, the system now proactively reserves the requested quota, the resources needed for the maximum number of executor groups configured. This guarantees that during a scale-out event, new executor groups are created and Yunikorn schedules them based on the availability of resources at that moment. For more information, refer to Quota management in Cloudera Data Warehouse on premises.
- Rebuild-restore method and its limitations
- The rebuild-restore method, which uses the cluster backup-restore functionality, has some limitations. To successfully restore a selected namespace, you must first delete Cloudera Data Visualizations, Impala, and Hive Virtual Warehouses. This requirement does not apply to Database Catalogs. For more information, refer to Using DRS with Cloudera Data Warehouse.
- Manual resource pool assignment at upgrade
- In this release, you can now manually assign resource pools to existing Virtual Warehouses, and Cloudera Data Visualizations after upgrading to 1.5.5 SP1. This new functionality is for entities that were not yet enabled for quota management. For more information, refer to Adding resource pools after Cloudera Data Warehouse 1.5.5 SP1 upgrade.
- Optimized node allocation for query executor pods
- In this release, you can enhance performance and resource utilization by dedicating nodes with local storage exclusively for Cloudera Data Warehouse query executor pods. This feature is disabled by default. When enabled, Hive and Impala executor and coordinator pods are scheduled exclusively on these specifically labeled worker nodes, ensuring that other data services do not utilize them.
- Cloudera Data Visualizationupgrade feasibility and benefits
- Cloudera Data Visualization has been upgraded to version 8.x, offering enhanced security, such as addressing numerous CVEs and greater reliability due to the migration to Chainguard. This new version also introduces advanced capabilities, such as AI Visual features. For more details, refer to the Cloudera Data Visualization guide.
What's new in Hive on Cloudera Data Warehouse on premises
- Common table expression detection and rewrites using cost-based optimizer
- Hive's existing shared work optimizer detects and optimizes common table expressions
heuristically, but it lacks cost-based analysis and has limited customization. Introduced new
APIs and configuration options to support common table expression optimizations at the
cost-based optimizer level. The feature is experimental and disabled by default.
Apache Jira: HIVE-28259
- Upgraded Avro to version 1.11.3
What's new in Impala on Cloudera Data Warehouse on premises
- OpenTelemetry integration for Impala
- Impala now has OpenTelemetry (OTel) support to help you see query performance and
troubleshoot issues. This new feature, available in Cloudera Data Warehouse
on premises 1.5.5 SP1, collects and exports query telemetry data
as OpenTelemetry traces to a central OpenTelemetry compatible collector.
To enable this, you must upgrade your existing Impala Virtual Warehouses after the Cloudera Data Warehouse version upgrade. The integration is designed to have a minimal impact on performance because it uses data already being collected and handles the export in a separate process. For more information, see OpenTelemetry support for Impala.
Apache Jira:IMPALA-13234
- Enable global admission controller
- A new single admission controller service has been added to Impala to improve performance in multi-coordinator setups. It is now a separate service, which prevents its failure from affecting coordinators and executors. This feature is enabled by default for new Impala Virtual Warehouses running in High Availability (HA) Active-Active mode. If needed, you can disable it through the Cloudera web interface, but this action is permanent. For more information, see Impala admissiond and Configuring admission control.
What's new in Trino on Cloudera Data Warehouse on premises
- Introducing support for Trino Virtual Warehouses [Technical Preview]
- Cloudera Data Warehouse now supports the creation and management of Trino
Virtual Warehouses. For information about creating a Trino Virtual Warehouse, see Adding a new Virtual Warehouse.
Trino is a distributed SQL query engine designed to efficiently query large datasets across one or more heterogeneous data sources. This integration enables users to leverage Trino's powerful capabilities directly within Cloudera Data Warehouse.
With this integration, you can configure and deploy Trino connectors to seamlessly connect to diverse remote data sources, access data, expose metadata, and manage data transfer to and from the remote sources. For more information, see Trino Federation Connectors.
Authorization for Trino is supported through Apache Ranger by default through the
cm_trinoauthorization service. You can create or update Ranger policies for specific resources and assign permissions to Trino users, groups, or roles. When a user submits a query to Trino, the system verifies the defined policies to ensure that the user has the necessary permissions to run queries. For more information, see Ranger authorization for Trino Virtual Warehouses.
What's new in Hue on Cloudera Data Warehouse on premises
- General availability of deploying a shared Hue service
- Cloudera Data Warehouse now supports the deployment of a shared Hue service, enabling cost-efficient management by ensuring that only the necessary Virtual Warehouses remain active. Organisations can enhance team isolation by running multiple shared Hue instances, providing flexibility and control. The shared Hue service remains available as long as the environment is active.
- New Hue storage browser (Technical Preview)
- The Hue Storage Browser is a web-based interface designed to provide seamless interaction with multiple file systems. With enhanced usability and functionality, the File Browser improves data management, offering a streamlined experience.
- Enhanced file extension controls for Hue file upload
- Earlier, Hue permitted uploading all file types to the configured filesystems, including unsupported extensions, which posed a security risk.
