Data Compliance profiler configuration

You can configure the scheduling and the available resources for your profiler.

  1. Go to Profilers and select your data lake.
  2. Go to Profilers > Data Compliance > Profiler Details > Configuration > All Configurations
  3. Select a schedule to run profiler using either UNIX Cron Expression or the Basic scheduler.
  1. Select Last Run Check and set a period in Day Range if needed.
  2. Continue with resource settings:
    1. Set the Maximum number of executors

      Indicates the number of processes that are used by the distributed computing framework. The recommended value is at least 10 executors.

    2. Set the Maximum cores per executor

      Indicates the maximum number of cores that can be allocated to an executor.

    3. Set the Executor memory limit in GBs
  1. Click Save to apply the configuration changes to the selected profiler.
  2. Add Asset Filtering Rules as needed to customize the selection of assets to be profiled.
    1. Set your Deny List and Allow-list.
      The profiler will skip profiling assets that meet any criteria in the Deny List and will include assets that meet any criteria in the Allow List.
      1. Select the Deny-list or Allow List tab.
      2. Click Add New Rule to define new rules.
      3. Select the key from the drop-down list and the relevant operator. You can select from the following:
        Key Operator
        Database name
        • equals
        • starts with
        • ends with
        Name (of asset)
        • equals
        • contains
        • starts with
        • ends with
        Owner (of asset)
        Creation date
        • greater than
        • less than
      4. Enter the value corresponding to the key. For example, you can enter a string as mentioned in the previous example.
      5. Click Add Rule. Once a rule is added (enabled by default), you can toggle the state of the new rule to enable it or disable it as needed.