Known issues in Cloudera Data Warehouse on premises 1.5.5 SP1

Review the known issues and limitations that you might run into while using the Cloudera Data Warehouse service in Cloudera Private Cloud Data Services.

Known issues identified in 1.5.5 SP1

DWX-21588 - CLI timeout issues for the create-backup command
Due to network latency or cluster slowness, the create-backup command might exceed the default 60-second CLI timeout. This causes the CLI to retry the command, which might fail because of an existing Hue or Cloudera Data Visualization backup job on the cluster with the same name.
To prevent the timeout issue, set the --cli-read-timeout option to 0 in your command. This will disable the CLI timeout.

However, as the API has a hard limit of 200 seconds, the command will still return a timeout error in response if the hard limit is exceeded. Despite the error, the backup will run and complete successfully in the background. You can get the Backup CRN from the cdp dw list-backups command response.

DWX-20754: Invalid column reference in lateral view queries

The virtual column BLOCK__OFFSET__INSIDE__FILEfails to be correctly referenced in queries using lateral views, resulting in the error:

FAILED: SemanticException Line 0:-1 Invalid column reference 'BLOCK__OFFSET__INSIDE__FILE'
To resolve this issue, you can either:
  1. Set the configuration property hive.cbo.fallback.strategy to CONSERVATIVE for the specific query containing such lateral views.
  2. Specify column names explicitly in the 'SELECT' statement instead of using SELECT * in the subquery involving the lateral view, with the NEVER fallback strategy.
DWX-20491: Impala queries fail with "EOFException: End of file reached before reading fully"
Impala queries fail with an EOFException when reading from an HDFS file stored in an S3A location. The error occurs when the file is removed. If the file is removed using SQL commands like DROP PARTITION, there may be a significant lag in Hive Metastore event processing. If removed by non-SQL operations, run REFRESH or INVALIDATE METADATA on the table to resolve the issue.
Run REFRESH/INVALIDATE METADATA <table>;
DWX-20490: Impala queries fail with “Caught exception The read operation timed out, type=<class 'socket.timeout'> in ExecuteStatement”
Queries in impala-shell fail with a socket timeout error in execute statement which submits the query to the coordinator. The error occurs when query execution takes longer to start mainly when query planning is slow due to frequent metadata changes.
Increase the socket timeout on the client side. Set --client_connect_timeout_ms to a higher value, e.g. add --client_connect_timeout_ms=600000 to the impala-shell command line.
CDPD-76644: information_schema.table_privileges metadata is unsupported
Querying the information_schema.table_privileges access control metadata for ranger is unsupported and a TrinoException is displayed indicating that the connector does not support table privileges.
None.
CDPD-76643/CDPD-76645: SET AUTHORIZATION SQL statement does not modify Ranger permissions
The following SQL statements do not dynamically modify the Ranger permissions:
CREATE SCHEMA test_createschema_authorization_user AUTHORIZATION user;
ALTER SCHEMA test_schema_authorization_user SET AUTHORIZATION user;
As an Administrator, you can authorize the permissions from the Ranger Admin UI.
CDPD-68246: Roles related operations are not authorized by Ranger Trino plugin
When you have row filter policy for same resource and same user in both cm_trino and cm_hive (Hadoop SQL) repos and the row filtering conditions are different, then on querying the table using that user returns empty response in the trino-cli.
Do not create row filter policies for the same resource and same user in different repos.
DWX-19626: Number of rows returned by Trino does not match with the Hive query results
If you are running an exact same query on both Hive and Trino engine that involves dividing integers, it was observed that the query results returned by Trino does not match with the query results returned by the Hive engine. This is due to a default behavior of Trino when dividing two integers. Trino does not cast the result into a FLOAT data type.
While performing a floating point division on two integers, cast one of the integers to a DOUBLE. For more information, see the Trino documentation.