Profiler tag rules in VM-based environments

You can use preconfigured tag rules or create new rules based on regular expressions and values in your data to be profiled by the Cluster Sensitivity Profiler. When a tag rule is matching your data, the selected Apache Atlas classification (also known as a Cloudera Data Catalog tag) is applied.

Tag rule types

Tag Rules are categorized based on their type into the following groups:

System Deployed: These are built-in rules that cannot be edited. You can only enable or disable them for your data.
Custom Deployed: Tag rules that you create, edit and deploy on clusters after validation will appear under this category. Click the icon in the Action column to enable your custom tag rules. You can also edit these tag rules.
Custom Draft: You can create new tag rules and save them for later validation and deployment on clusters.

After creating your rule, you have to validate them. Only then you can click Enable.

Tag rule inputs

Tag Rules can be applied based on the following inputs:


Input type	VM based environments	Compute Cluster enabled environments
Column name value	Manually entered regex pattern	Manually entered regex pattern Uploaded regex pattern
Column value	Manually entered regex pattern	Manually entered regex pattern Uploaded regex pattern CSV files with data which will be matched against column values for your tables in your data lake.
Table name		Manually entered regex pattern Uploaded regex pattern

Match thresholds and weightage

The System Deployed rules have a preset match threshold: A matching column name means a 15% confidence value. This is increased by 85% by a matching column value.

Tag rule testing

After creating your tag rule, you have to test it:

By VM-based environments validate them with manually entered test data and, then deploy them from the Custom Draft status.