Managing Profilers
The Cloudera Data Catalog profiler engine runs data profiling operations on data located in multiple data lakes. These profilers create metadata annotations that summarize the content and shape characteristics of the data assets.
Profiler Name in VM-based environments | Profiler Name in Compute Cluster enabled environments | Description |
---|---|---|
Cluster Sensitivity Profiler | Data Compliance | A sensitive data profiler- PII, PCI, HIPAA, etc. |
Ranger Audit Profiler | Activity Profiler | A Ranger audit log summarizer. |
Hive Column Profiler | Statistics Collector | Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level. |
Limitations
- In VM-based environments, profilers do not support Iceberg tables. However, Iceberg tables are discoverable. In Compute Cluster enabled environments, Iceberg tables can be profiled.
- In Compute Cluster enabled environments, profilers only support tables which are stored on AWS S3 storage.
- Supported file formats:
- VM-based environments:
- CSV
- Compute Cluster enabled environments:
- Statistics Collector
profilers and Data Compliance profilers
- CSV
- Parquet
- Iceberg tables
- ORC
- Statistics Collector
profilers and Data Compliance profilers
- VM-based environments: