General known issues with Cloudera Data Engineering
Learn about the general known issues with the Cloudera Data Engineering service on cloud, the impact or changes to the functionality, and the workaround.
- DEX-17581: Cloudera Data Engineering-1.24.1 is not getting deployed in East US region
- Only applicable to Azure. Cloudera Data Engineering
service creation failed during the database server provisioning step. The issue occurred
because the Azure API, which Cloudera Data Engineering uses to retrieve the
supported database instance types for the specified region (for example,
eastus
), returned an empty response. As a result, the database server provisioning could not proceed. The following error message appeared in the Cloudera Data Engineering service logs:unable to get MySQL flexible server DB instance type for cluster, Error: no instance types available for MySQL flexible server DB service tier: GeneralPurpose having vCores 2
- DEX-17565: Links to download
cdeconnect
andpyspark tars
for Spark Connect are giving HTTP 404 error - Links to download
cdeconnect
andpyspark tars
for Spark Connect give an HTTP 404 error.
- DEX-17519: Sessions are not killed as per the ttl configured in mow-int Azure and AWS
- Sessions are not killed as per the ttl configured in mow-int in
Azure and in AWS. The calculation of timeout has gone wrong in the
isTimeout
method in the Livy code. This method takes a calculated timeout in milliseconds and converts it into nano seconds. However, the caller is already passing the calculated timeout value in nano seconds. In theisTimeout
method, thecalculatedTimeout
value is converted again, which provides a different value. Therefore, (toTime - fromTime
) will not be greater than the calculated timeout, as the calculated timeout value is higher. For this reason, the sessions are not killed after the timeout is reached.
- DEX-17507: Restore of Scheduled Jobs are failing due to time format
- Restoring the Spark Jobs with the Schedule Configuration fails if the start date or end date uses a time format other than RFC3339Nano. This issue affects only jobs created using non-UI options, such as the API or CLI.
- DEX-17500: [CDP Cli] Spark OsName "chainguard" Not Triggering Error in Cloudera Data Engineering Version 1.23.1 Virtual Cluster
- Cloudera Data Engineering allows the creation of a
Virtual Cluster with the
securityhardened
option in Cloudera Data Engineering version 1.23.1, without any error message. Technically, it is usingUBI [redhat]
underneath, which is correct, but it can lead to confusion, as the property in the Virtual Cluster statessecurityhardened
.
- DEX-17458: Cloudera Data Engineering session creation is
failing with
java.util.concurrent.ExecutionException: javax.security.sasl.SaslException
- Cloudera Data Engineering sessions created in a Spark
3.3.0 Virtual Cluster fail to create. The following error is listed in the driver logs:
Exception in thread "main" java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed before SASL negotiation finished
- DEX-16747: Cloudera Data Engineering 1.23.1-b114 - Driver container stderr, and stdout logs are missing for some Spark jobs
- For some job runs, intermittently, the driver
stderr
andstdout
logs are missing.
- DEX-15884: Resource file upload did not pick the modified file intermittently
- When you attempt to update a file by uploading a new version with
the exact same filename, the operation appears to succeed, but the content of the file is
not updated. The system continues to serve the previous version of the file. This issue
has been observed to occur intermittently under the following conditions:
- Uploading a file to overwrite an existing file with the same name.
- Deleting the original file first and then uploading a new file with the same name.
- DEX-15714: Proxy settings are not propagating to Cloudera Data Engineering sessions
- Proxy settings from a configured
CDP proxy
(configmap:cdp-proxy-config
) are not propagated to Cloudera Data Engineering sessions. Proxy settings for Cloudera Data Engineering jobs are propagated throughspark.driver.extraJavaOptions
andspark.executor.extraJavaOptions
, as standardJAVA_OPTS
. For more information, see Cloudera public proxy documentation.
- DEX-15461: Writing Spark Dataframe to Hive using HWC Fails with
java.util.NoSuchElementException: None.get
- This is a known issue while writing data in ORC format. The issue has been fixed internally, but more testing is needed. This issue will be part of the Hive Warehouse Connector and Cloudera Data Engineering certification in the future.
- DEX-14725: virtualenv cannot access pypi mirror
- When a Python virtual environment is created, virtual-env needs to
access the internet to seed packages such as pip, setup-tools, and wheel. If you block the
public internet access (for example, in case of a private network), certain packages fail
to build. Example package:
requests-kerberos
- DEX-14385: Backup fails if there is a Git repository resource
- In the Cloudera Data Engineering 1.20.3 services, if there is a Git repository resource, the cluster backup fails.
- DEX-12616: Node Count shows zero in /metric request
-
Cloudera Data Engineering 1.20.3 introduced compatibility with Kubernetes version 1.27. With this update, the
kube_state_metrics
no longer provides label and annotation metrics by default.Earlier, Cloudera Data Engineering used label information to calculate the Node Count for both Core and All-Purpose nodes, which was automatically exposed. However, due to the changes in
kube_state_metrics
, this functionality is no longer available by default. As a result, the Node count shows zero in /metrics, charts, and the user interface.
- DEX-11340: Kill all the alive sessions in prepare-for-upgrade phase of stop-gap solution for upgrade
- If Spark sessions are running during the Cloudera Data Engineering upgrade, they are not automatically killed, leaving them in an unknown state during and after the upgrade.
- DEX-14084: No error response for Airflow Python virtual environment at Virtual Cluster level for view only access user
- If a user with a view only role on a Virtual Cluster (VC) tries to create an Airflow Python virtual environment on a VC, the access is blocked with a 403 error. However, the no-access 403 error is not displayed on the UI.
- DEX-11639: "CPU" and "Memory" Should Match Tier 1 and Tier 2 Virtual Clusters AutoScale
- CPU and Memory options in the service or cluster edit page display the values for Core (tier 1) and All-Purpose (tier 2) together. However, they must be separate values for Core and All-Purpose.
- DEX-12482: [Intermittent] Diagnostic Bundle generation taking several hours to generate
- Diagnostics bundles can intermittently take very long to get generated due to low EBS throughput and IOPS of the base node.
- DEX-14253: Cloudera Data Engineering Spark Jobs are getting stuck due to the unavailability of the spot instances
- The unavailability of AWS spot instances may cause Cloudera Data Engineering Spark jobs to get stuck.
- DEX-14192: Some Spark 3.5.1 jobs have slightly higher memory requirements
- Some jobs running on Spark 3.5.1 have slightly higher memory requirements, resulting in the driver pods getting killed with a k8s
OOMKilled
.
- DEX-14173: VC Creation is failing with "Helm error: 'timed out waiting for the condition', no events found for chart"
- In case of busy k8s clusters, installing VC/Cloudera Data Engineering may fail
with an error message showing
Helm error: 'timed out waiting for the condition', no events found for chart
.
- DEX-13957: Cloudera Data Engineering metrics and graphs show no data
- Cloudera Data Engineering versions 1.20.3 and 1.21 use Kubernetes version 1.27. In Kubernetes version 1.27, by default, the kube_state_metrics does not provide label and annotation metrics. For this reason, the node count shows zero for core and all-purpose nodes in the Cloudera Data Engineering UI and in charts.
- DEX 11498: Spark job failing with error: "Exception in thread "main" org.apache.hadoop.fs.s3a.AWSBadRequestException:"
- When users in Milan and Jakarta region use Hadoop s3a client to access AWS s3 storage, that is using s3a://bucket-name/key to access the file, an error may occur. This is a known issue in Hadoop.
- DEX-10147: Grafana issue for virtual clusters with the same name
- In Cloudera Data Engineering 1.19, when you have two different Cloudera Data Engineering services with the same name under the same environment, and you click the Grafana charts for the second Cloudera Data Engineering service, metrics for the Virtual Cluster in the first Cloudera Data Engineering service will display.
- DEX-9112: VC deployment frequently fails when deployed through the CDP CLI
- In Cloudera Data Engineering 1.19, when a Virtual Cluster is deployed using the CDP CLI, it fails frequently as the pods fail to start. However, creating a Virtual cluster using the UI is successful.
- DEX-9879: Infinite while loops not working in Cloudera Data Engineering Sessions
- If an infinite while loop is submitted as a statement, the session will be stuck
infinitely. This means that new sessions can't be sent and the Session stays in a busy
state. Sample input:
while(True) { print("hello") }
- DEX-9898: CDE CLI input reads break after interacting with a Session
- After interacting with a Session through the
sessions interact
command, input to the CDE CLI on the terminal breaks. In this example below, ^M displays instead of proceeding:> cde session interact --name sparkid-test-6 WARN: Plaintext or insecure TLS connection requested, take care before continuing. Continue? yes/no [no]: yes^M
- DEX-9881: Multi-line command error for Spark-Scala Session types in the CDE CLI
- In Cloudera Data Engineering 1.19, Multi-line input into a Scala session on
the CDE CLI will not work as expected, in some cases. The CLI interaction will throw an
error before reading the complete input. Sample
input:
scala> type |
- DEX-9756: Unable to run large raw Scala jobs
- Scala code with more than 2000 lines could result in an error.
- DEX-8679: Job fails with permission denied on a RAZ environment
- When running a job that has access to files is longer than the delegation token renewal
time on a RAZ-enabled Cloudera environment, the job will
fail with the following
error:
Failed to acquire a SAS token for get-status on /.../words.txt due to org.apache.hadoop.security.AccessControlException: Permission denied.
- DEX-3706: The Cloudera Data Engineering home page not displaying for some users
- The Cloudera Data Engineering home page will not display Virtual Clusters or a Quick Action bar if the user is part of hundreds of user groups or subgrooups.
- DEX-8283: False Positive Status is appearing for the Raw Scala Syntax issue
- Raw Scala jobs that fail due to syntax errors are reported as succeeded by Cloudera Data Engineering as displayed in this
example:
spark.range(3)..show()
- DEX-8281: Raw Scala Scripts fail due to the use of the case class
- Implicit conversions which involve implicit Encoders for case classes, that are usually
supported by importing spark.implicits._, don't work in Raw Scala jobs in Cloudera Data Engineering. These include converting Scala objects, including RDD
Dataset DataFrame, and Columns. For example, the following operations will fail on Cloudera Data Engineering:
import org.apache.spark.sql.Encoders import spark.implicits._ case class Case(foo:String, bar:String) // 1: an attempt to obtain schema via the implicit encoder for case class fails val encoderSchema = Encoders.product[Case].schema encoderSchema.printTreeString() // 2: an attempt to convert RDD[Case] to DataFrame fails val caseDF = sc .parallelize(1 to 3) .map(i => Case(f"$i", "bar")) .toDF // 3: an attempt to convert DataFrame to Dataset[Case] fails val caseDS = spark .read .json(List("""{"foo":"1","bar":"2"}""").toDS) .as[Case]
- DEX-7051
EnvironmentPrivilegedUser
role cannot be used with Cloudera Data Engineering - The role
EnvironmentPrivilegedUser
cannot currently be used by a user if a user wants to access Cloudera Data Engineering. If a user has this role, then this user will not be able to interact with Cloudera Data Engineering as an "access denied" would occur. - Strict DAG declaration in Airflow 2.2.5
- Cloudera Data Engineering 1.16 introduces Airflow 2.2.5
which is now stricter about DAG declaration than the previously supported Airflow version
in Cloudera Data Engineering. In Airflow 2.2.5, DAG timezone should be a
pendulum.tz.Timezone
, notdatetime.timezone.utc
. - COMPX-6949: Stuck jobs prevent cluster scale down
-
Because of hanging jobs, the cluster is unable to scale down even when there are no ongoing activities. This may happen when some unexpected node removal occurs, causing some pods to be stuck in Pending state. These pending pods prevent the cluster from downscaling.