Iceberg-related known issues in Cloudera Data Warehouse on premises

This topic describes the Iceberg-related known issues in Cloudera Data Warehouse on premises.

Known issues identified in 1.5.5

No new known issues identified in 1.5.5

Known issues identified in 1.5.4 SP1

Hive compaction of Iceberg tables results in a failure
When Cloudera Data Warehouse and Cloudera Base on premises are deployed in the same environment and use the same Hive Metastore (HMS) instance, the Cloudera Base on premises compaction workers can inadvertently pick up Iceberg compaction tasks. Since Iceberg compaction is not yet supported in the latest Cloudera Base on premises version, the compaction tasks will fail when they are processed by the Cloudera compaction workers.

In such a scenario where both Cloudera Data Warehouse and Cloudera Base on premises share the same HMS instance and there is a requirement to run both Hive ACID and Iceberg compaction jobs, it is recommended that you use the Cloudera Data Warehouse environment for these jobs. If you want to run only Hive ACID compaction tasks, you can choose to use either the Cloudera Data Warehouse or Cloudera Base on premises environments.

If you want to run the compaction jobs without changing the environment, it is recommended that you use Cloudera Data Warehouse. To avoid interference from Cloudera Base on premises, change the value of the hive.compactor.worker.threads Hive Server (HS2) property to '0'. This ensures that the compaction jobs are not processed by Cloudera Base on premises.
  1. In Cloudera Manager, click Clusters > Hive > Configuration to navigate to the configuration page for HMS.
  2. Search for hive.compactor.worker.threads and modify the value to '0'.
  3. Save the changes and restart the Hive service.
DWX-19489: Concurrent Hive-Iceberg UPDATE/INSERT query fails
Concurrent UPDATE/INSERT queries on Hive Virtual Warehouses might fail intermittently with the following error:
return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. Error committing job
Run the failed queries again.

Known issues identified in 1.5.4

No new known issues identified in 1.5.4.

Known issues identified in 1.5.2

CDPD-59413: Unable to view Iceberg table metadata in Atlas
You may see the following exception in the Atlas application logs when you create an Iceberg table from the Cloudera Data Warehouse data service associated with a Cloudera Base on premises 7.1.8 or 7.1.7 SP2 cluster: Type ENTITY with name iceberg_table does not exist. This happens because the Atlas server on Cloudera Base on premises 7.1.8 and 7.1.7 SP2 does not contain the necessary, compatible functionality to support Iceberg tables. This neither affects creating, querying, or modifying of Iceberg tables using Cloudera Data Warehouse nor does it affect creating of policies in Ranger.

On Cloudera Base on premises 7.1.9, Iceberg table entities are not created in Atlas. You can ignore the following error appearing in the Atlas application logs: ERROR - [NotificationHookConsumer thread-1:] ~ graph rollback due to exception (GraphTransactionInterceptor:200) org.apache.atlas.exception.AtlasBaseException: invalid relationshipDef: hive_table_storagedesc: end type 1: hive_storagedesc, end type 2: iceberg_table

If you are on Cloudera Base on premises 7.1.7 SP2 or 7.1.8, then you can manually upload the Iceberg model file z1130-iceberg_table_model.json in to the /opt/cloudera/parcels/CDH/lib/atlas/models/1000-Hadoop directory as follows:
  1. SSH into the Atlas server host as an Administrator.
  2. Change directory to the following:
    cd /opt/cloudera/parcels/CDH/lib/atlas/models/1000-Hadoop
  3. Create a file called 1130-iceberg_table_model.json with the following content:
    {
      "enumDefs": [],
      "structDefs": [],
      "classificationDefs": [],
      "entityDefs": [
        {
          "name": "iceberg_table",
          "superTypes": [
            "hive_table"
          ],
          "serviceType": "hive",
          "typeVersion": "1.0",
          "attributeDefs": [
            {
              "name": "partitionSpec",
              "typeName": "array<string>",
              "cardinality": "SET",
              "isIndexable": false,
              "isOptional": true,
              "isUnique": false
            }
          ]
        },
        {
          "name": "iceberg_column",
          "superTypes": [
            "hive_column"
          ],
          "serviceType": "hive",
          "typeVersion": "1.0"
        }
      ],
      "relationshipDefs": [
        {
          "name": "iceberg_table_columns",
          "serviceType": "hive",
          "typeVersion": "1.0",
          "relationshipCategory": "COMPOSITION",
          "relationshipLabel": "__iceberg_table.columns",
          "endDef1": {
            "type": "iceberg_table",
            "name": "columns",
            "isContainer": true,
            "cardinality": "SET",
            "isLegacyAttribute": true
          },
          "endDef2": {
            "type": "iceberg_column",
            "name": "table",
            "isContainer": false,
            "cardinality": "SINGLE",
            "isLegacyAttribute": true
          },
          "propagateTags": "NONE"
        }
      ]
    }
  4. Save the file and exit.
  5. Restart the Atlas service using Cloudera Manager.

Technical Service Bulletins

TSB 2024-745: Impala returns incorrect results for Iceberg V2 tables when optimized operator is being used in Cloudera Data Warehouse
Cloudera Data Warehouse customers using Apache Impala (Impala) to read Apache Iceberg (Iceberg) V2 tables can encounter an issue of Impala returning incorrect results when the optimized V2 operator is used. The optimized V2 operator is enabled by default in the affected versions below. The issue only affects Iceberg V2 tables that have position delete files.
Knowledge article

For the latest update on this issue see the corresponding Knowledge Article: TSB 2024-745: Impala returns incorrect results for Iceberg V2 tables when optimized operator is being used in Cloudera Data Warehouse.