Hive and Impala Lineage Configuration
Cloudera Manager Required Role: Configurator (or Cluster Administrator, or Full Administrator)
Unlike for other services running in the cluster (such as Pig), lineage data from Hive and Impala queries is not extracted by Navigator Metadata Server. Instead, these two services write query data to log files collected in a specific directory on the cluster node. The Cloudera Manager Agent process running on that node monitors the directory and routinely sends the log files to the Navigator Metadata Server, where the query data is coalesced with other metadata collected by the system.
Lineage collection from Hive and from Impala log files is enabled by default—each of these services has its own Enable Lineage Collection property and some related configuration properties, which can be disabled or reconfigured as detailed below.
Modifying Lineage Collection Settings for Hive
Property | Default | Description |
---|---|---|
Enable Lineage Collection | Enabled for Hive, Service-Wide | Enable collection of lineage from the service's roles. |
Hive Lineage Log Directory (lineage_event_log_dir) | /var/log/hive/lineage | Directory in which Hive lineage log files are written. |
Hive Maximum Lineage Log File Size (max_lineage_log_file_size) | 100 MiB | Maximum size (MiB, GiB) of Hive lineage log file before a new file is created. |
To disable Hive lineage collection:
- Log in to Cloudera Manager Admin Console.
- Select .
- Click the Configuration tab.
- Type lineage in the Search box.
- Click the Enable Lineage Collection check-box to deselect it and disable lineage collection.
- Click Save Changes.
- Restart the Hive service.
Modifying Lineage Collection Settings for Impala
Property | Default | Description |
---|---|---|
Enable Impala Lineage Generation (enable_lineage_log) | Enabled for the Impala daemon default group | When enabled, Impala daemon process creates a logfile containing lineage data and stores it in the directory specified by the Impala Daemon Log Lineage Directory property. |
Enable Lineage Collection | Enabled for Impala Service-Wide | Enable collection of lineage from the service's roles. |
Impala Daemon Lineage Log Directory (lineage_event_log_dir) | /var/log/impalad/lineage | Directory in which Impala daemon lineage log files are written. When Impala Lineage Generation property is enabled, Impala generates its lineage logs in this directory. |
Impala Daemon Maximum Lineage Log File Size (max_lineage_log_file_size) | 5000 | Maximum number of Impala daemon lineage log file entries (queries) written to file before a new file is created. |
The Enable Lineage Collection property determines whether lineage logs should be collected by the Cloudera Manager Agent. To control whether the Impala Daemon role logs to the lineage log and whether the Cloudera Manager Agent collects the Hive and Impala lineage entries:
To disable lineage collection for Impala queries:
- Log in to Cloudera Manager Admin Console.
- Select .
- Click the Configuration tab.
- Type lineage in the Search box.
- Click the Enable Lineage Collection check-box to deselect it and disable lineage collection.
- Click the Enable Impala Lineage Generation check-box to deselect it.
- Click Save Changes.
- Restart the Impala service.
Configuring Hive on Spark and Impala Daemon Lineage Logs
- Stop the affected service.
- Copy the lineage log files and (for Impala only) the impalad_lineage_wal file from the old log directory to the new log directory. This needs to be done on the HiveServer2 host and all the hosts where Impala Daemon roles are running.
- Start the service.
To edit lineage log properties:
- Go to the service.
- Click the Configuration tab.
- Type lineage in the Search box.
- Edit the lineage log properties.
- Click Save Changes.
- Restart the service.