Hive Column Profiler configuration
In addition to the generic configuration, there are additional parameters for the Hive Column Profiler that can be optionally edited.
- Go to Profilers and select your data lake.
- Go to Profilers > Configs.
-
Select Hive Column Profiler.
The Detail page is displayed.
-
Use the toggle button
to enable or disable the profiler.
-
Select a schedule to run the profiler. This is implemented as a quartz cron
expression.
For more information, see Understanding the Cron Expression generator.
-
Select Last Run Check and set a period if
needed.
-
Set the sample settings:
- Select the Sample Data Size.
- From the drop down, select the type of sample data size.
- Enter the value based on the previously selected type.
- Select the Sample Data Size.
-
Continue with the resource settings.
- In Advanced Options, set the following:
- Number of Executors - Enter the number of executors to launch for running this profiler.
- Executor Cores - Enter the number of cores to be used for each executor.
- Executor Memory - Enter the amount of memory in GB to be used per executor process.
- Driver Cores - Enter the number of cores to be used for the driver process.
- Driver Memory - Enter the memory to be used for the driver processes.
- In Advanced Options, set the following:
- Click Save to apply the configuration changes to the selected profiler.
-
Add Asset Filter
Rules as needed to customize the selection and deselection of assets which
the profiler profiles.
-
Set your Deny List and Allow-list.
The profiler will skip profiling assets that meet any criteria in the Deny List and will include assets that meet any criteria in the Allow List.
- Select the Deny-list or Allow List tab.
- Click Add New to define new rules.
- Select the key from the drop-down list and the relevant operator. You can select
from the following:
Key Operator Database name - equals
- starts with
- ends with
Name (of asset) - equals
- contains
- starts with
- ends with
Owner (of asset) Creation date - greater than
- less than
- Enter the value corresponding to the key. For example, you can enter a string as mentioned in the previous example.
- Click Add Rule. Once a rule is added (enabled by default), you can toggle the state of the new rule to enable it or disable it as needed.
-
Set your Deny List and Allow-list.