Fixed
issues and resolved maintenance items for
Ozone
are
addressed
in Cloudera Runtime 7.3.2, its service packs and cumulative
hotfixes.
Cloudera Runtime 7.3.2
Cloudera Runtime 7.3.2 resolves Ozone issues and incorporates fixes
from the service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For
a comprehensive record of all fixes in Cloudera Runtime 7.3.1.x,
see Fixed Issues.
- CDPD-80567: Snapshot garbage collection
fails
to reclaim storage
- 7.3.2
- Previously,
multiple
issues
prevented the snapshot garbage collection system from
identifying
and removing deleted
data.
This
issue
in now resolved.
Improvements
to
the efficiency and reliability of snapshot garbage collection
process
ensure
that storage is reclaimed in a timely
manner,
resulting in better overall performance.
- Apache JIRA:
HDDS-12558
- CDPD-84361:
KeyDeletingService
fails
when the key size
exceeds
Ratis
buffer
- 7.3.2
- Previously, when the
KeyDeletingService was fetching keys to be deleted
based on
keyLimitPerTask,
the deletion operation failed if the key size exceeded the Ratis buffer
limit (default 32 MB). This issue
is
now fixed.
The
key
deletion
operations
no
longer
depend on
the
Ratis buffer size.
- Apache JIRA:
HDDS-13213
- CDPD-80739: Ozone Recon - Containers page
displays
incorrect
labels
for
unhealthy
containers
- 7.3.2
- Previously, the Ozone Recon UI
incorrectly
displayed
the Number of Keys label instead of the
Number of
Blocks
label for
containers
in various unhealthy
states.
This issue
is
now fixed.
The
labels
now
display
the
correct
information.
- Apache JIRA:
HDDS-12588
- CDPD-84620:
Ozone Recon
returns
500 error
ServiceNotReadyException on
/keys/open
during NSSummary tree rebuild
- 7.3.2
- Previously, Ozone
Recon
returned
an HTTP 500 error with a
ServiceNotReadyException when the
/keys/open API was called while the
NSSummary
tree was being rebuilt or was temporarily inconsistent. This issue is
now
fixed.
- Apache Jira:
HDDS-13763
- CDPD-87883: The
processed_keys_metrics table
fails
to
update
when converting deleted keys
- 7.3.2
- Previously,
the
processed_keys_metrics table failed to record details
when the Ozone tiering workflow attempted to
convert
deleted
keys.
This occurred because deleted keys lacked required fields, such as
replication
type
or
replication
factor,.
This issue is
now
fixed,
and the processed_keys_metrics table
updates
correctly.
- CDPD-69122: Ozone Manager database checkpoint
generation failure
- 7.3.2
- Previously, the Ozone Manager database
checkpoint generation failed due to an
InterruptedException Unable
to process metadata snapshot request during the parallel
snapshot operations
or
cluster restarts. This issue is
now
fixed.
- Apache JIRA:
HDDS-10739
- CDPD-92017:
Lower
Ozone versions cannot process
ozone.om.group.rights
default
value
- 7.3.2
- Previously,
lower versions of Ozone could not process the
ozone.om.group.rights configuration
when
it was set to
READ,
LIST.
This issue is now fixed by setting the default
value
to ALL.
- CDPD-75981: Default native ACL limits to user and
user's primary group
- 7.3.2
- Previously, the default native
ACLs
for an
object,
such as volume, bucket, or
file,
limited
to the object owner and owner's primary group. If Ranger was enabled,
these
ACLs did not take effect, but
were
saved
to
KeyInfo regardless. This issue is
now
fixed.
- Apache JIRA:
HDDS-11656
- CDPD-87831: SCM
over-schedules
replications
to
full
DataNodes
- 7.3.2
- Previously, Storage Container Manager (SCM)
scheduled replication commands to fix under-replication or
mis-replication
for container moves, decommissioning, and other operations for both Radis
and EC containers. SCM checked whether a target DataNode had space equal to
twice the container size value before selecting it as the target node for
container replication. However, SCM did not account for the pending
operation size of the scheduled tasks. Consequently, SCM could over-schedule
replications to a target DataNode that did not have enough space. This issue
is now fixed.
- Apache JIRA:
HDDS-13437
- CDPD-80178: Missing check for space availability for
all DNS while container creation is in pipeline
- 7.3.2
- Previously, if the leader node in the pipeline
did not have the capacity to create a new container, it might have returned
a container creation failure. If the follower node did not have the capacity
to create a new container, it might have failed and repeatedly attempted to
find another follower node. This behavior could cause excessive disk space
consumption by parallel write blocks through a state machine, resulting in
slower write performance and delayed failure responses. This issue is now
fixed by checking whether a DataNode has enough space for a new container
before allocating one. This improves write performance and reduces container
creation failure in scenarios when DataNodes have less than 5GB disk space
remaining.
- Apache JIRA:
HDDS-12468
- CDPD-87749: No logs are available about on-demand scan
triggering
- 7.3.2
- Previously,
no
logs or debug
information
existed to
explain
why
on-demand scans were
triggered
on the containers. This issue is
now
fixed,
and logs are available specifying the reason for on-demand container
scans.
- Apache JIRA:
HDDS-13423
- CDPD-85250: The OzoneTokenIdentifier does not
serialize or deserialize correctly
- 7.3.2
- Previously, a
null omServiceId was
deserialized
as an empty string,
which
caused
delegation token cleanup issues in RocksDB. This issue is
now
fixed
w.
- Apache JIRA:
HDDS-13264
- CDPD-82295: AWS S3 DeleteObject failures for FSO
bucket keys containing special characters
- 7.3.2
- Previously, AWS S3 DeleteObject could fail for
File
System Optimized
(FSO)
bucket keys containing special characters. This issue
is
now
fixed
by removing name validation during deletion.
- Apache JIRA:
HDDS-12911
- CDPD-74686: DirectoryDeletion task ignored by
Ratis
- 7.3.2
- Previously, directory deletion tasks
were
ignored
by
Ratis,
leading to
repeated
deletion
retries
instead of
actual
deletion.
This issue is now resolved.
- Apache JIRA:
HDDS-11491
- CDPD-74685: Directory deletion
fails
having millions of directory
- 7.3.2
- Previously,
background
directory deletion
cleanup
failed
when
attempted to delete millions
of
empty directories because
their
combined metadata size
exceeded
the
allowed Ratis
request
size.
This issue is now resolved.
- Apache JIRA:
HDDS-11492
- CDPD-87270: Secret key premature expiration and
invalidation
- 7.3.2
- Previously,
secret
keys
could expire
before
the end of a delegation token
lifetime
causing premature authentication failures. This
issue
is now fixed.
The
secret key expiry calculation
(
hdds.secret.key.expiry.duration) is adjusted to 9
days. This ensures that tokens remain valid for their full
configured
duration to
improve
stable authentication.
- Apache JIRA:
HDDS-13343
- CDPD-76523:
ozone debug ldb
--with-keys key defaults to false instead of
true
- 7.3.2
- Previously, the
ozone debug ldb
--with-keys option defaulted to
false
when specified without a value and did not print the keys. This issue
is
now fixed.
The
option
defaults to true when specified without a value
and
includes keys in the output by default.
- Apache JIRA:
HDDS-11782
- CDPD-84609: The
--output-dir option
is
unavailable
for
replicas verify command
- 7.3.2
- Previously, the Ozone
debug
replicas verify
command
did not support the
--output-dir
option.
This
issue is
now fixed.
The
--output-dir option is
now an
optional
field for
the
replicas
verify
command.
- Apache JIRA:
HDDS-13248
- CDPD-76520: DataNode aborts if
hdds.datanode.wait.on.all.followers = true
- 7.3.2
- Previously, the DataNode aborted if the
hdds.datanode.wait.on.all.followers configuration
was set to
true. This issue is
now
fixed.
- Apache JIRA:
HDDS-11785
- CDPD-76501: DataNode Ratis is taking snapshots
frequently
- 7.3.2
- Previously, DataNode Ratis was taking snapshots
every 5 to 8 seconds causing overhead. This issue
is
now fixed.
The
hdds.ratis.snapshot.threshold and
hdds.container.ratis.statemachine.max.pending.apply-transactions
configuration limits are increased to
100k to avoid
taking
frequent DataNode Ratis
snapshots.
- Apache JIRA:
HDDS-11773
- CDPD-75112: HBase RegionServer crashes due to
inconsistency caused by Ozone client failover handling
- 7.3.2
- Previously, the HBase RegionServer
crashed
due to
inconsistencies
caused by Ozone client failover handling. This issue is
now
fixed
by making the Ozone Manager client retry idempotent
which
prevents the client
from crashing
when
encountering
inconsistent results.
- Apache JIRA:
HDDS-11558
- CDPD-77938: Local Refresh button for current selected
path is missing in the new Ozone Recon UI
- 7.3.2
- Previously, refreshing the Recon UI
page
reset the current path selection and
returned
to the root directory, causing loss of context and requiring manual
navigation. This issue is
now fixed.
The
new Path Reload
button
is introduced in the new Recon UI for the
Namespace
page.
- Apache JIRA:
HDDS-12085
- CDPD-77728: Calendar disappears while setting custom
date range in the
Heatmap
page in New Recon UI
- 7.3.2
- Previously, setting the custom date range in
the
Heatmap
page of the new Recon
UI
caused
the calendar widget to close unexpectedly.
Specifically,
clicking
the back arrow to navigate to a previous month in the date
picker,
caused
the entire calendar and
the
drop-down menu
to
disappear, preventing date selection. This issue is fixed,
and the calendar remains visible until a date is selected and confirmed,
allowing users to set custom date ranges as intended.
- Apache JIRA:
HDDS-12044
- CDPD-77356: Recon UI displayed identical and duplicate
values for
Quota
Allowed and
Quota
In Bytes
- 7.3.2
- Previously, in the Ozone Recon UI, the
Quota Allowed and Quota In
Bytes fields incorrectly displayed the same value. This
duplication prevented
you
from accurately distinguishing between the allocated quota and the actual
consumed
disk
space.
This issue is
now
fixed,
and the Recon UI displays the values correctly.
- Apache JIRA:
HDDS-11987
- CDPD-74437: Multiple IOzoneAuthorizer instances
might
be created during Ratis snapshot installation failures
- 7.3.2
- Previously, if a failure occurred during the
installation of a Ratis snapshot after the metadata manager was stopped,
multiple instances of the Ozone authorizer could be created and retained in
memory. This led to excessive heap usage and, in some cases, crashes due to
long garbage collection pauses, especially in environments with
Ranger
and
Ozone
integration. The issue is
now
fixed, and the old authorizer instances are properly
cleaned up, preventing heap exhaustion.
- Apache JIRA:
HDDS-11472
- CDPD-92003:
Container
Size Count Task showing empty in new Recon UI
- 7.3.2
- Previously, in the Ozone Recon UI, the
Container
Size Count Task
page was
displayed
empty when accessed through the new user interface. This issue
is
now
fixed.
- Apache JIRA:
HDDS-13821
- CDPD-88628: Ozone Recon
Overview
page does not load until all APIs are loaded
- 7.3.2
- Previously, the Recon
Overview
page waited for all API calls to complete before displaying any results,
causing delays and poor responsiveness. This issue
is
now
fixed,
and each card on the
Overview
page now loads independently as soon as its corresponding API call resolves.
This change improves overall page responsiveness and ensures that API errors
only affect the relevant cards, rather than preventing the entire page from
loading.
- Apache JIRA:
HDDS-13542
- CDPD-88541:
Namespace
Usage page becomes blank when Recon DB is missing
- 7.3.2
- Previously, the
Namespace
Usage page could appear blank if the Recon DB was missing
during a fresh installation. This issue
is
now
fixed.
- Apache JIRA:
HDDS-13528
- CDPD-88383: Accessing the new Ozone Recon UI through
Knox breaks the UI
- 7.3.2
- Previously, accessing the new Ozone Recon UI
through a reverse proxy such as Knox caused the UI to break. This issue
is
now
fixed.
- Apache JIRA:
HDDS-13512
- CDPD-56281: Ozone Manager database updates
are
blocked while Recon
is
reprocessing all Recon tasks
- 7.3.2
- Previously, when Recon was reprocessing all
Recon tasks, Ozone Manager database updates were blocked, which could cause
repeated full snapshots and impact performance. This issue is
now
fixed by allowing Ozone Manager database updates to proceed concurrently
with Recon task processing, preventing unnecessary full snapshots and
improving system efficiency.
- Apache JIRA:
HDDS-8633
- CDPD-77805: Improper error handling in the
NSSummaryTask
- 7.3.2
- Previously, improper error handling in the
NSSummaryTask
could lead to data inconsistencies in the Ozone Recon. This issue
is
now
fixed,
and ensures robust error handling in Ozone Recon.
- Apache JIRA:
HDDS-12062
- CDPD-80826: Ozone Recon
fails
during the bootstrapping process
- 7.3.2
- Previously, Ozone Recon
did not
properly handle failures
that
occurred during the bootstrapping
process.
This issue is
now fixed.
If
an
Ozone Manager (OM) task fails during bootstrapping,
Recon
now correctly
handles
and reprocesses
the
task
to ensure
a
successful
start.
Additionally, if
Recon
receives a partial or corrupted OM database
tarball,
it
cleans
up the corrupted file and
restarts
the fetch process from
scratch
to
maintain
data consistency and integrity.
- Apache JIRA:
HDDS-12615
- CDPD-76226: The Recon ListKeys
API
returns
an
inappropriate
HTTP response
- 7.3.2
- Previously, the Recon
ListKeys
API did not return an appropriate HTTP response when an
NSSummary
rebuild was in progress.
This
issue is
now fixed.
The
API now returns
the
503 (Service Unavailable)
HTTP
status code to indicate that the service is temporarily
unavailable due to the ongoing
NSSummary
rebuild. This allows clients to properly handle the too
busy or try again later scenario.
- Apache JIRA:
HDDS-11708
- CDPD-76248: The default volume choosing policy is not
updated correctly in the ozone-default.xml
- 7.3.2
- Previously,
the
ozone-default.xml file
incorrectly
listed
the
RoundRobinVolumeChoosingPolicy as the default volume
choosing
policy.This
policy did not consider available
volume space
during
container
creation
or
replication,
which could result in
block
allocation
failures (though retried) or the creation of small containers. This issue
is
now fixed.
The
default volume choosing policy
is
changed
to CapacityVolumeChoosingPolicy in the
ozone-default.xml
file. This ensures that available capacity is now taken
into account during container allocation, improving reliability and resource
utilization.
- Apache JIRA:
HDDS-11735
- CDPD-73809: Multithreading
issues
in the
ContainerBalancerTask
- 7.3.2
- Previously, the concurrent access to shared
data structures in the
getCurrentIterationsStatistic method
could cause unpredictable errors. This issue
is
now fixed.
Inside
the getCurrentIterationsStatistic
method,
the system now ensures thread safety by synchronizing
access to the iterationsStatistic list and using
ConcurrentHashMap for concurrent access to maps from
findTargetStrategy and
findSourceStrategy.
- Apache JIRA:
HDDS-11386
- CDPD-88723: The FSORepairTool fails to distinguish
Unreachable and Unreferenced
objects
- 7.3.2
- Previously, the FSORepairTool logic to
distinguish between
Unreachable and
Unreferenced objects was incorrect. This issue
is
now
fixed,
and the logic
is
corrected. The unreachable objects are not marked for repair as background
cleanup processes will eventually handle them, while objects that are
neither reachable nor unreachable are classified as unreferenced and marked
for repair.
- Apache JIRA:
HDDS-13549
- CDPD-87575: The ozone admin container
create command runs forever without kinit
- 7.3.2
- Previously, the ozone admin container
create
command
ran
indefinitely on secure Ozone clusters with multiple
SCM
nodes if authentication failed,
for
example, when kinit was not
performed.
This
issue was specifically observed in SCM HA cluster configurations.
This issue is now fixed, and the retry logic
is
updated to fail fast on authentication exceptions, providing immediate
feedback to
you
instead of hanging.
- Apache JIRA:
HDDS-13405
- CDPD-90362: Container Balancer stop command fails with
an
error
- 7.3.2
- Previously, the stopBalancer
command for the Ozone Container
Balancer
failed
with an error if the balancer was already stopped, instead of returning a
successful response. This issue is now fixed. The
stopBalancer operation is now idempotent and will
return success if the balancer is already stopped.
- Additionally, a race
condition
during an SCM leadership
change
caused
the balancer
to
restart
unintentionally
due to the persisted state not being updated.
This
issue is also
now resolved.
The
system
correctly
persists
the
stopped
state of
the
balancer,
preventing unintended restarts during leadership transitions.
- Apache JIRA:
HDDS-13694
- CDPD-89400: DataNode pipeline closes frequently
- 7.3.2
- Previously,
the DataNode (DN) Ratis
repeatedly
triggered
Close Pipeline actions when it
identified issues with a
pipeline,
such
as a slow follower, prolonged leader election, or disk
failures,
even if a close action was already pending in the DN command queue. This
could result in excessive close actions being queued on every heartbeat,
leading to inefficiency and potential command queue bloat. The issue is now fixed.
A
check is
introduced to ensure that a Close
Pipeline action for a specific pipeline is not added to the
command queue if one is already
pending,
preventing redundant triggers and optimizing the signaling mechanism.
- Apache JIRA:
HDDS-13618
- CDPD-80991: Non-administrative users could attempt to
perform
OM
decommission
- 7.3.2
- Previously, non-administrative users could
attempt to perform OM decommission, which could lead to unauthorized or
unintended changes. This issue
is
now fixed.
Only
users with administrative privileges are authorized to perform OM
decommission actions, enhancing the security and integrity of cluster
management.
- Apache JIRA:
HDDS-12646