Iceberg replication policies
Iceberg replication policies replicate Iceberg V1 and V2 tables, created using Spark (read-only with Impala), between Cloudera Base on premises 7.1.9 or higher clusters using Cloudera Manager 7.11.3 or higher versions. In Cloudera Base on premises 7.3.1 and higher versions, Replication Manager can also replicate V1 and V2 Iceberg tables created using Hive.
Apache Iceberg is a cloud-native, high-performance open table format for organizing petabyte-scale analytic datasets on a file system or object store. Iceberg supports ACID compliant tables which includes row-level deletes and updates and can define large analytic data tables using open format files.
Iceberg replication policies provide the following functionalities:
- Replicating metadata and catalog from the source cluster Hive Metastore (HMS) to the target cluster HMS.
- Replicating data files in the HDFS storage system from the source cluster to the target cluster. The Iceberg replication policies can replicate only between HDFS storage systems.
- Replicating data at table level.
- Replicating all the snapshots from the source cluster which allows you to run time travel queries on the target cluster.
Some use cases where you can use Iceberg replication policies are to:
- replicate Iceberg tables between on-premises clusters to archive data or run analytics,
- implement passive disaster recovery with planned failover and perform incremental replication at regular intervals between two similar systems. For example, between an HDFS to another HDFS system.
