Fixed Issues in Apache Spark

This section lists the issues in Apache Spark that are fixed in Cloudera Runtime 7.3.1 release, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.1.600 SP3 CHF1

CDPD-87548: Missing spark3.network.crypto.enabled configuration property to Spark
7.3.1.600 SP3 CHF1
This issue is fixed by adding the spark3.network.crypto.enabled configuration property to enhance security.
CDPD-88247: Backport SPARK-51821
7.3.1.600 SP3 CHF1
Previously, a potential deadlock issue occurred in Spark. This issue is now fixed, by adding the awaitInterruptThread flag and updating the interrupt logic to ensure that interrupt() is called without holding the uninterruptibleLock monitor.
Apache Jira: SPARK-51821
CDPD-89111: Race condition in CREATE FUNCTION IF NOT EXISTS when executed concurrently in Spark 3
7.3.1.600 SP3 CHF1
Previously, the lack of atomicity in catalog write operations caused multiple threads to attempt creating the same function or table simultaneously. This resulted in failures during parallel execution. This issue is now fixed by ensuring thread-safe registration of UDFs and creation of tables.
Apache Jira: SPARK-52988
CDPD-93448: Dropping Iceberg table using complex type with timestamp fails
7.3.1.600 SP3 CHF1
This issue is fixed by updating the code to ensure that table metadata translation correctly uses the Iceberg catalog interface rather than directly accessing Hive metadata. This results in proper handling of complex timestamp types.

Cloudera Runtime 7.3.1.500 SP3

There are no new fixed issues in this release.

Cloudera Runtime 7.3.1.400 SP2

CDPD-75091: Backport SPARK-47217 and related changes
7.3.1.400 SP2
Backports upstream Apache Spark improvements to enable reading Parquet files with mixed or widened types without precision loss or failures.
Apache Jira: SPARK-47217

Cloudera Runtime 7.3.1.300 SP1 CHF 1

CDPD-79763: Fix clobbering of files across epochs in Spark Structured streaming with Iceberg
7.3.1.300 SP1 CHF1
Backporting an upstream fix for a bug in structured streaming that resulted in clobbering of files in Iceberg tables by.

Cloudera Runtime 7.3.1.200 SP1

CDPD-79251: Spark - Timestamp read/write performance degradation
7.3.1.200 SP1
Fixing an issue where conversion between Spark's internal timestamp representation and Hive's Timestamp representation were slower on Spark 3 than on Spark 2.
CDPD-76849: Backport SPARK-40876 and related changes
7.3.1.200 SP1
Backporting SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, and SPARK-48603
Apache Jira: SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, SPARK-48603
CDPD-70233: Rebase CDP 7.3.x Spark3 on Apache Spark 3.5.4
7.3.1.200 SP1
Upgrading Spark from 3.4.1 to 3.5.4. For more information, refer to Migrating Spark applications.

Cloudera Runtime 7.3.1.100 CHF 1

CDPD-76229: Optimize the processing speed of BinaryArithmetic#dataType when processing multi-column data
7.3.1.100 CHF1

Restoring performance of some queries in Spark 3.4.1 to match other versions (3.3.x, 3.5.x) of Spark.

Optimized the processing speed of BinaryArithmetic#dataType when processing multi-column data.

Apache Jira: SPARK-45071
CDPD-75926: Backport SPARK-44653
7.3.1.100 CHF1
Backported SPARK-44653 to fix cache breaking with non-trivial DataFrame unions.
Apache Jira: SPARK-44653
CDPD-75755: [ENCODER_NOT_FOUND] Not found an encoder of the type T to Spark SQL internal representation when using Parameterized Bean
7.3.1.100 CHF1
Fixed an upstream regression causing Encoder Exception for a parameterized class
Apache Jira: SPARK-46679
CDPD-75622: Backport upstream fixes for handling nested beans and generic type beans while creating Spark encoders.
7.3.1.100 CHF1

Backporting upstream fixes from Spark 3.4 to fix the following issues:

  • Starting from Spark 3.4.x, Encoders.bean raised an exception when the passed class contains a field whose type is a nested bean with type arguments
  • From Spark 3.4.x, an exception is raised when Encoders.bean is called providing a bean having read-only properties
  • Unsupported feature of bean encoder when the superclass of the bean has generic type arguments
Apache Jira: APACHE-44634, APACHE-45081, APACHE-44910
CDPD-75353: CHAR and VARCHAR handling in Spark 3 is incompatible with Spark 2
7.3.1.100 CHF1

Adding a new configuration spark.cloudera.legacy.charVarcharLegacyPadding (by default set to false in Spark 3). When set to true (together with spark.sql.legacy.charVarcharAsString=true) it creates compatibility with Spark 2 behavior.

For more information refer to Migrating Spark applications.

CDPD-75286: Spark History UI - StreamConstraintsException: String length exceeds the maximum length
7.3.1.100 CHF1
Fixing an issue with Jackson to allow unlimited json string length in Spark event logs.
CDPD-59617: Spark - Upgrade Okio to 1.17.6 due to CVE-2023-3635
7.3.1.100 CHF1
Updating okio from version 1.15.0 to 1.17.6 to address the security vulnerability CVE-2023-3635.
CDPD-74730: Backport SPARK-46239: Hide the Jetty server's version
7.3.1.100 CHF1
The Jetty server's version is now hidden.
Apache Jira: SPARK-46239
CDPD-73233: Encoder not found of the type T to Spark SQL internal representation
7.3.1.100 CHF1
Fixing an upstream regression of encoder exception (org.apache.spark.SparkUnsupportedOperationException: [ENCODER_NOT_FOUND]) for generic types.
Apache Jira: SPARK-49789

Cloudera Runtime 7.3.1

CDPD-74697 - Spark Iceberg vectorized Parquet read of decimal column is incorrect
7.3.1
CDPD-72774 - Use common versions of commons-dbcp2 and commons-pool2
7.3.1
CDPD-70114 - Redirect spark-submit, spark-shell etc. scripts to their Spark 3 counterparts
7.3.1
CDPD-58844 - Spark - Upgrade Janino to 3.1.10 due to CVE-2023-33546
7.3.1
CDPD-48171 - Spark3 - Upgrade snakeyaml due to CVE-2022-1471
7.3.1