Fixed Issues in Apache Spark

This section lists the issues in Apache Spark that are fixed in Cloudera Runtime 7.3.1 release, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.1.600 SP3 CHF1

CDPD-87548: Missing spark3.network.crypto.enabled configuration property to Spark: 7.3.1.600 SP3 CHF1; This issue is fixed by adding the spark3.network.crypto.enabled configuration property to enhance security.
CDPD-88247: Backport SPARK-51821: 7.3.1.600 SP3 CHF1; Previously, a potential deadlock issue occurred in Spark. This issue is now fixed, by adding the awaitInterruptThread flag and updating the interrupt logic to ensure that interrupt() is called without holding the uninterruptibleLock monitor.; Apache Jira: SPARK-51821
CDPD-89111: Race condition in CREATE FUNCTION IF NOT EXISTS when executed concurrently in Spark 3: 7.3.1.600 SP3 CHF1; Previously, the lack of atomicity in catalog write operations caused multiple threads to attempt creating the same function or table simultaneously. This resulted in failures during parallel execution. This issue is now fixed by ensuring thread-safe registration of UDFs and creation of tables.; Apache Jira: SPARK-52988
CDPD-93448: Dropping Iceberg table using complex type with timestamp fails: 7.3.1.600 SP3 CHF1; This issue is fixed by updating the code to ensure that table metadata translation correctly uses the Iceberg catalog interface rather than directly accessing Hive metadata. This results in proper handling of complex timestamp types.

Cloudera Runtime 7.3.1.500 SP3

There are no new fixed issues in this release.

Cloudera Runtime 7.3.1.400 SP2

CDPD-75091: Backport SPARK-47217 and related changes: 7.3.1.400 SP2; Backports upstream Apache Spark improvements to enable reading Parquet files with mixed or widened types without precision loss or failures.; Apache Jira: SPARK-47217

Cloudera Runtime 7.3.1.300 SP1 CHF 1

CDPD-79763: Fix clobbering of files across epochs in Spark Structured streaming with Iceberg: 7.3.1.300 SP1 CHF1; Backporting an upstream fix for a bug in structured streaming that resulted in clobbering of files in Iceberg tables by.

Cloudera Runtime 7.3.1.200 SP1

CDPD-79251: Spark - Timestamp read/write performance degradation: 7.3.1.200 SP1; Fixing an issue where conversion between Spark's internal timestamp representation and Hive's Timestamp representation were slower on Spark 3 than on Spark 2.
CDPD-76849: Backport SPARK-40876 and related changes: 7.3.1.200 SP1; Backporting SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, and SPARK-48603; Apache Jira: SPARK-41096, SPARK-46092, SPARK-45604, SPARK-46466, SPARK-40876, SPARK-48603
CDPD-70233: Rebase CDP 7.3.x Spark3 on Apache Spark 3.5.4: 7.3.1.200 SP1; Upgrading Spark from 3.4.1 to 3.5.4. For more information, refer to Migrating Spark applications.

Cloudera Runtime 7.3.1.100 CHF 1

CDPD-76229: Optimize the processing speed of BinaryArithmetic#dataType when processing multi-column data

7.3.1.100 CHF1

Restoring performance of some queries in Spark 3.4.1 to match other versions (3.3.x, 3.5.x) of Spark.

Optimized the processing speed of BinaryArithmetic#dataType when processing multi-column data.

Apache Jira: SPARK-45071

CDPD-75926: Backport SPARK-44653

7.3.1.100 CHF1

Backported SPARK-44653 to fix cache breaking with non-trivial DataFrame unions.

Apache Jira: SPARK-44653

CDPD-75755:

[ENCODER_NOT_FOUND] Not found an
                            encoder of the type T

to Spark SQL internal representation when using Parameterized Bean

7.3.1.100 CHF1

Fixed an upstream regression causing Encoder Exception for a parameterized class

Apache Jira: SPARK-46679

CDPD-75622: Backport upstream fixes for handling nested beans and generic type beans while creating Spark encoders.

7.3.1.100 CHF1

Backporting upstream fixes from Spark 3.4 to fix the following issues:

Starting from Spark 3.4.x, Encoders.bean raised an exception when the passed class contains a field whose type is a nested bean with type arguments
From Spark 3.4.x, an exception is raised when Encoders.bean is called providing a bean having read-only properties
Unsupported feature of bean encoder when the superclass of the bean has generic type arguments

Apache Jira: APACHE-44634, APACHE-45081, APACHE-44910

CDPD-75353: CHAR and VARCHAR handling in Spark 3 is incompatible with Spark 2

7.3.1.100 CHF1

Adding a new configuration spark.cloudera.legacy.charVarcharLegacyPadding (by default set to false in Spark 3). When set to true (together with spark.sql.legacy.charVarcharAsString=true) it creates compatibility with Spark 2 behavior.

For more information refer to Migrating Spark applications.

CDPD-75286: Spark History UI - StreamConstraintsException: String length exceeds the maximum length

7.3.1.100 CHF1

Fixing an issue with Jackson to allow unlimited json string length in Spark event logs.

CDPD-59617: Spark - Upgrade Okio to 1.17.6 due to CVE-2023-3635

7.3.1.100 CHF1

Updating okio from version 1.15.0 to 1.17.6 to address the security vulnerability CVE-2023-3635.

CDPD-74730: Backport SPARK-46239: Hide the Jetty server's version

7.3.1.100 CHF1

The Jetty server's version is now hidden.

Apache Jira: SPARK-46239

CDPD-73233: Encoder not found of the type T to Spark SQL internal representation

7.3.1.100 CHF1

Fixing an upstream regression of encoder exception (

org.apache.spark.SparkUnsupportedOperationException:
                            [ENCODER_NOT_FOUND]

) for generic types.

Apache Jira: SPARK-49789

Cloudera Runtime 7.3.1

CDPD-74697 - Spark Iceberg vectorized Parquet read of decimal column is incorrect: 7.3.1
CDPD-72774 - Use common versions of commons-dbcp2 and commons-pool2: 7.3.1
CDPD-70114 - Redirect spark-submit, spark-shell etc. scripts to their Spark 3 counterparts: 7.3.1
CDPD-58844 - Spark - Upgrade Janino to 3.1.10 due to CVE-2023-33546: 7.3.1
CDPD-48171 - Spark3 - Upgrade snakeyaml due to CVE-2022-1471: 7.3.1