Documentation
  • Products
  • Services & Support
  • Solutions

Cloudera Enterprise 5.13.x | Other versions

Cloudera Navigator Data ManagementCloudera Navigator and the CloudUsing Cloudera Navigator with Altus Clusters
View All Categories
  • Cloudera Introduction
    • Cloudera Personas
    • CDH Overview
      • Hive Overview
      • Apache Impala Overview
      • Cloudera Search Overview
        • Understanding Cloudera Search
        • Cloudera Search and Other Cloudera Components
        • Cloudera Search Architecture
        • Cloudera Search Tasks and Processes
      • Apache Kudu Overview
      • Apache Sentry Overview
      • Apache Spark Overview
      • External Documentation
    • Cloudera Manager 5 Overview
      • Cloudera Manager Admin Console
        • Cloudera Manager Admin Console Home Page
        • Displaying Cloudera Manager Documentation
        • Automatic Logout
      • Cloudera Manager API
        • Using the Cloudera Manager API for Cluster Automation
      • Extending Cloudera Manager
    • Cloudera Navigator Data Management
      • Getting Started with Cloudera Navigator
      • Cloudera Navigator Frequently Asked Questions
    • Cloudera Navigator Data Encryption
      • Cloudera Navigator Key Trustee Server Overview
      • Cloudera Navigator Key HSM Overview
      • Cloudera Navigator HSM KMS Overview
      • Cloudera Navigator Encrypt Overview
    • Frequently Asked Questions About Cloudera Software
    • Getting Support
  • Cloudera Release Notes
  • Requirements and Supported Versions
  • Cloudera QuickStart VM
  • Cloudera Manager
    • Cloudera Manager 5 Frequently Asked Questions
  • Cloudera Installation
    • Configuration Requirements for Cloudera Manager, Cloudera Navigator, and CDH 5
      • Permission Requirements for Package-based Installations and Upgrades of CDH
      • Cluster Hosts and Role Assignments
      • Required Tomcat Directories
      • Ports
        • Ports Used by Cloudera Manager and Cloudera Navigator
        • Ports Used by Cloudera Navigator Encryption
        • Ports Used by Components of CDH 5
        • Ports Used by Impala
        • Ports Used by Cloudera Search
        • Ports Used by DistCp
        • Ports Used by Third-Party Components
        • Ports Used by Apache Flume and Apache Solr
    • Managing Software Installation Using Cloudera Manager
      • Parcels
      • Creating Virtual Images of Cluster Hosts
      • Migrating from Packages to Parcels
      • Migrating from Parcels to Packages
    • Installing Cloudera Manager and CDH
      • Java Development Kit Installation
      • Configuring Single User Mode
      • Cloudera Manager and Managed Service Datastores
        • Embedded PostgreSQL Database
        • Install and Configure PostgreSQL
        • Install and Configure MariaDB
        • Install and Configure MySQL
        • Oracle Database
        • Configuring an External Database for Oozie
        • Configuring an External Database for Sqoop
        • Backing Up Databases
        • Data Storage for Monitoring Data
        • Storage Space Planning for Cloudera Manager
      • Installation Path A - Automated Installation by Cloudera Manager (Non-Production Mode)
      • Installation Path B - Installation Using Cloudera Manager Parcels or Packages
        • (Optional) Manually Install CDH and Managed Service Packages
      • Installation Path C - Manual Installation Using Cloudera Manager Tarballs
      • Installing Impala
      • Installing Kudu
      • Installing Cloudera Search
      • Installing Spark
      • Installing the GPL Extras Parcel
      • Understanding Custom Installation Solutions
        • Creating and Using a Parcel Repository for Cloudera Manager
        • Creating and Using a Package Repository for Cloudera Manager
        • Configuring a Custom Java Home Location
        • Installing Lower Versions of Cloudera Manager 5
        • Creating a CDH Cluster Using a Cloudera Manager Template
      • Deploying Clients
      • Testing the Installation
      • Uninstalling Cloudera Manager and Managed Software
      • Uninstalling a CDH Component From a Single Host
      • Installing the Cloudera Navigator Data Management Component
      • Installing Cloudera Navigator Key Trustee Server
      • Installing Cloudera Navigator Key HSM
      • Installing Key Trustee KMS
      • Installing Navigator HSM KMS Backed by Thales HSM
      • Installing Navigator HSM KMS Backed by Luna HSM
      • Installing Cloudera Navigator Encrypt
    • Installing and Deploying CDH Using the Command Line
      • Before You Install CDH 5 on a Cluster
      • Creating a Local Yum Repository
      • Installing the Latest CDH 5 Release
      • Installing an Earlier CDH 5 Release
      • CDH 5 and MapReduce
      • Migrating from MapReduce (MRv1) to MapReduce (MRv2)
      • Deploying CDH 5 on a Cluster
        • Configuring Dependencies Before Deploying CDH on a Cluster
          • Enable an NTP Service
          • Configuring Network Names
          • Disabling SELinux
          • Disabling the Firewall
        • Deploying HDFS on a Cluster
        • Deploying MapReduce v2 (YARN) on a Cluster
        • Deploying MapReduce v1 (MRv1) on a Cluster
        • Configuring Hadoop Daemons to Run at Startup
      • Installing CDH 5 Components
        • Crunch Installation
          • Crunch Prerequisites
          • Crunch Packaging
          • Installing and Upgrading Crunch
          • Crunch Documentation
        • Flume Installation
          • Upgrading Flume
          • Flume Packaging
          • Installing the Flume Tarball
          • Installing the Flume RPM or Debian Packages
          • Flume Configuration
          • Verifying the Flume Installation
          • Running Flume
          • Files Installed by the Flume RPM and Debian Packages
          • Supported Sources, Sinks, and Channels
          • Viewing the Flume Documentation
        • HBase Installation
          • New Features and Changes for HBase in CDH 5
          • Upgrading HBase
          • Installing HBase
          • Configuration Settings for HBase
          • Starting HBase in Standalone Mode
          • Configuring HBase in Pseudo-Distributed Mode
          • Deploying HBase on a Cluster
          • Accessing HBase by using the HBase Shell
          • Configuring HBase Online Merge
          • Using MapReduce with HBase
          • Troubleshooting HBase
          • Viewing the HBase Documentation
        • HCatalog Installation
          • HCatalog Prerequisites
          • Installing and Upgrading the HCatalog RPM or Debian Packages
          • Configuration Change on Hosts Used with HCatalog
          • Starting and Stopping the WebHCat REST server
          • Accessing Table Information with the HCatalog Command-line API
          • Accessing Table Data with MapReduce
          • Accessing Table Data with Pig
          • Accessing Table Information with REST
          • Viewing the HCatalog Documentation
        • Impala Installation
          • Requirements
          • Installing Impala from the Command Line
          • Upgrading Impala
          • Starting Impala
            • Modifying Impala Startup Options
        • Hive Installation
          • Installing Hive
          • Upgrading Hive
        • HttpFS Installation
          • About HttpFS
          • HttpFS Packaging
          • HttpFS Prerequisites
          • Installing HttpFS
          • Configuring HttpFS
          • Starting the HttpFS Server
          • Stopping the HttpFS Server
          • Using the HttpFS Server with curl
        • Hue Installation
          • Configuring CDH Components for Hue
          • Hue Configuration
        • KMS Installation and Upgrade
        • Kudu Installation
          • Upgrading Kudu
        • Mahout Installation
          • Installing Mahout
          • Upgrading Mahout
          • The Mahout Executable
          • Getting Started with Mahout
          • Viewing the Mahout Documentation
        • Oozie Installation
          • About Oozie
          • Oozie Packaging
          • Oozie Prerequisites
          • Upgrading Oozie
          • Installing Oozie
          • Configuring Oozie
          • Starting, Stopping, and Accessing the Oozie Server
          • Using Sqoop Actions with Oozie
          • Viewing the Oozie Documentation
        • Pig Installation
          • Upgrading Pig
          • Installing Pig
          • Using Pig with HBase
          • Installing DataFu
          • Viewing the Pig Documentation
        • Search Installation
          • Installing Cloudera Search without Cloudera Manager
          • Installing the Spark Indexer
          • Installing MapReduce Tools for use with Cloudera Search
          • Installing the Lily HBase Indexer Service
          • Upgrading Cloudera Search
          • Installing Hue Search
            • Updating Hue Search
        • Sentry Installation
        • Snappy Installation
        • Spark Installation
          • Spark Packages
          • Spark Prerequisites
          • Installing and Upgrading Spark
        • Sqoop 1 Installation
        • Sqoop 2 Installation
          • Upgrading Sqoop 2 from an Earlier CDH 5 Release
          • Installing Sqoop 2
          • Configuring Sqoop 2
          • Starting, Stopping, and Accessing the Sqoop 2 Server
          • Viewing the Sqoop 2 Documentation
          • Feature Differences - Sqoop 1 and Sqoop 2
        • Whirr Installation
          • Upgrading Whirr
          • Installing Whirr
          • Generating an SSH Key Pair for Whirr
          • Defining a Whirr Cluster
          • Managing a Cluster with Whirr
          • Viewing the Whirr Documentation
        • ZooKeeper Installation
          • Upgrading ZooKeeper from an Earlier CDH 5 Release
          • Installing the ZooKeeper Packages
          • Maintaining a ZooKeeper Server
          • Viewing the ZooKeeper Documentation
      • Building RPMs from CDH Source RPMs
        • Prerequisites
        • Setting Up an Environment for Building RPMs
        • Building an RPM
      • Apache and Third-Party Licenses
        • Apache License
        • Third-Party Licenses
      • Uninstalling CDH Components
      • Viewing the Apache Hadoop Documentation
    • Troubleshooting Installation and Upgrade Problems
  • Upgrade
  • Cloudera Administration
    • Managing CDH and Managed Services
      • Managing CDH and Managed Services Using Cloudera Manager
        • Configuration Overview
          • Modifying Configuration Properties Using Cloudera Manager
          • Modifying Configuration Properties (Classic Layout)
          • Autoconfiguration
          • Custom Configuration
          • Stale Configurations
          • Client Configuration Files
          • Viewing and Reverting Configuration Changes
          • Exporting and Importing Cloudera Manager Configuration
        • Managing Clusters
          • Adding and Deleting Clusters
          • Starting, Stopping, Refreshing, and Restarting a Cluster
          • Pausing a Cluster in AWS
          • Renaming a Cluster
          • Cluster-Wide Configuration
          • Moving a Host Between Clusters
        • Managing Services
          • Adding a Service
          • Comparing Configurations for a Service Between Clusters
          • Add-on Services
          • Starting, Stopping, and Restarting Services
          • Rolling Restart
          • Aborting a Pending Command
          • Deleting Services
          • Renaming a Service
          • Configuring Maximum File Descriptors
          • Exposing Hadoop Metrics to Graphite
          • Exposing Hadoop Metrics to Ganglia
        • Managing Roles
          • Role Instances
          • Role Groups
        • Managing Hosts
          • Viewing Host Details
          • Using the Host Inspector
          • Adding a Host to the Cluster
          • Specifying Racks for Hosts
          • Host Templates
          • Decommissioning and Recommissioning Hosts
          • Deleting Hosts
        • Maintenance Mode
        • Cloudera Manager Configuration Properties
      • Managing CDH Using the Command Line
        • Starting CDH Services Using the Command Line
          • Configuring init to Start Hadoop System Services
          • Starting and Stopping HBase Using the Command Line
        • Stopping CDH Services Using the Command Line
        • Migrating Data between Clusters Using distcp
          • Copying Cluster Data Using DistCp
          • Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS
          • Post-migration Verification
        • Decommissioning DataNodes Using the Command Line
      • Managing Individual Services
        • Managing Flume
          • Configuring Flume Security with Kafka
        • Managing the HBase Service
          • Managing HBase
          • Managing HBase Security
          • Starting and Stopping HBase
          • Accessing HBase by using the HBase Shell
          • Using HBase Command-Line Utilities
          • Configuring HBase Garbage Collection
          • Configuring the HBase Canary
          • Checking and Repairing HBase Tables
          • Hedged Reads
          • Configuring the Blocksize for HBase
          • Configuring the HBase BlockCache
          • Configuring the HBase Scanner Heartbeat
          • Limiting the Speed of Compactions
          • Reading Data from HBase
          • HBase Filtering
          • Writing Data to HBase
          • Importing Data Into HBase
          • Configuring and Using the HBase REST API
          • Configuring HBase MultiWAL Support
          • Storing Medium Objects (MOBs) in HBase
          • Configuring the Storage Policy for the Write-Ahead Log (WAL)
          • Exposing HBase Metrics to a Ganglia Server
          • Using Azure Data Lake Store with HBase
          • Using HashTable and SyncTable Tool
        • Managing HDFS
          • NameNodes
            • Backing Up and Restoring HDFS Metadata
            • Moving NameNode Roles
            • Sizing NameNode Heap Memory
            • Backing Up and Restoring NameNode Metadata
          • DataNodes
            • Configuring Storage Directories for DataNodes
            • Configuring Storage Balancing for DataNodes
            • Performing Disk Hot Swap for DataNodes
          • JournalNodes
          • Configuring Short-Circuit Reads
          • Configuring HDFS Trash
          • HDFS Balancers
          • Enabling WebHDFS
          • Adding HttpFS
          • Adding and Configuring an NFS Gateway
          • Setting HDFS Quotas
          • Configuring Mountable HDFS
          • Configuring Centralized Cache Management in HDFS
          • Configuring Proxy Users to Access HDFS
          • Using CDH with Isilon Storage
          • Configuring Heterogeneous Storage in HDFS
        • Managing Hive
        • Managing Hue
          • Adding a Hue Service and Role Instance
          • Managing Hue Analytics Data Collection
          • Enabling Hue Applications Using Cloudera Manager
        • Managing Impala
          • The Impala Service
          • Post-Installation Configuration for Impala
          • Configuring Impala to Work with ODBC
          • Configuring Impala to Work with JDBC
        • Managing Key-Value Store Indexer
        • Managing Kudu
        • Managing Oozie
          • Oozie High Availability
          • Adding the Oozie Service Using Cloudera Manager
          • Redeploying the Oozie ShareLib
          • Configuring Oozie Data Purge Settings Using Cloudera Manager
          • Dumping and Loading an Oozie Database Using Cloudera Manager
          • Adding Schema to Oozie Using Cloudera Manager
          • Enabling the Oozie Web Console
          • Enabling Oozie SLA with Cloudera Manager
          • Setting the Oozie Database Timezone
          • Scheduling in Oozie Using Cron-like Syntax
          • Configuring Oozie to Enable MapReduce Jobs To Read/Write from Amazon S3
          • Configuring Oozie to Enable MapReduce Jobs To Read/Write from Microsoft Azure (ADLS)
        • Managing Solr
        • Managing Spark
          • Managing Spark Using Cloudera Manager
          • Managing Spark Standalone Using the Command Line
          • Managing the Spark History Server
        • Managing the Sqoop 1 Client
        • Managing Sqoop 2
        • Managing YARN (MRv2) and MapReduce (MRv1)
          • Managing YARN
          • Managing YARN ACLs
          • Managing MapReduce
        • Managing ZooKeeper
        • Configuring Services to Use the GPL Extras Parcel
    • Performance Management
      • Optimizing Performance in CDH
      • Choosing and Configuring Data Compression
      • Tuning the Solr Server
      • Tuning Spark Applications
      • Tuning YARN
    • Resource Management
      • Static Service Pools
        • Linux Control Groups (cgroups)
      • Dynamic Resource Pools
      • YARN (MRv2) and MapReduce (MRv1) Schedulers
        • Configuring the Fair Scheduler
        • Enabling and Disabling Fair Scheduler Preemption
      • Resource Management for Impala
        • Admission Control and Query Queuing
        • Managing Impala Admission Control
      • Cluster Utilization Reports
        • Creating a Custom Cluster Utilization Report
    • High Availability
      • HDFS High Availability
        • Introduction to HDFS High Availability
        • Configuring Hardware for HDFS HA
        • Enabling HDFS HA
        • Disabling and Redeploying HDFS HA
        • Configuring Other CDH Components to Use HDFS HA
        • Administering an HDFS High Availability Cluster
        • Changing a Nameservice Name for Highly Available HDFS Using Cloudera Manager
      • MapReduce (MRv1) and YARN (MRv2) High Availability
        • YARN (MRv2) ResourceManager High Availability
        • Work Preserving Recovery for YARN Components
        • MapReduce (MRv1) JobTracker High Availability
      • Cloudera Navigator Key Trustee Server High Availability
      • Enabling Key Trustee KMS High Availability
      • Enabling Navigator HSM KMS High Availability
      • High Availability for Other CDH Components
        • HBase High Availability
          • HBase Read Replicas
        • Oozie High Availability
        • Search High Availability
      • Configuring Cloudera Manager for High Availability With a Load Balancer
        • Introduction to Cloudera Manager Deployment Architecture
        • Prerequisites for Setting up Cloudera Manager High Availability
        • Cloudera Manager Failover Protection
        • High-Level Steps to Configure Cloudera Manager High Availability
          • Step 1: Setting Up Hosts and the Load Balancer
          • Step 2: Installing and Configuring Cloudera Manager Server for High Availability
          • Step 3: Installing and Configuring Cloudera Management Service for High Availability
          • Step 4: Automating Failover with Corosync and Pacemaker
        • Database High Availability Configuration
        • TLS and Kerberos Configuration for Cloudera Manager High Availability
    • Backup and Disaster Recovery
      • Port Requirements for Backup and Disaster Recovery
      • Data Replication
        • Designating a Replication Source
        • HDFS Replication
          • HDFS Replication Tuning
          • Monitoring the Performance of HDFS Replications
        • Hive/Impala Replication
          • Monitoring the Performance of Hive/Impala Replications
        • Replicating Data to Impala Clusters
        • Using Snapshots with Replication
        • Enabling Replication Between Clusters with Kerberos Authentication
        • Replication of Encrypted Data
        • HBase Replication
      • Snapshots
        • Cloudera Manager Snapshot Policies
        • Managing HBase Snapshots
        • Managing HDFS Snapshots
      • BDR Tutorials
        • How To Back Up and Restore Apache Hive Data Using Cloudera Enterprise BDR
        • How To Back Up and Restore HDFS Data Using Cloudera Enterprise BDR
        • BDR Automation Examples
    • Cloudera Manager Administration
      • Starting, Stopping, and Restarting the Cloudera Manager Server
      • Configuring Cloudera Manager Server Ports
      • Moving the Cloudera Manager Server to a New Host
      • Migrating from the Cloudera Manager Embedded PostgreSQL Database Server to an External PostgreSQL Database
      • Managing the Cloudera Manager Server Log
      • Cloudera Manager Agents
        • Starting, Stopping, and Restarting Cloudera Manager Agents
        • Configuring Cloudera Manager Agents
        • Managing Cloudera Manager Agent Logs
      • Changing Hostnames
      • Configuring Network Settings
      • Alerts
        • Managing Alerts
          • Configuring Alert Email Delivery
          • Configuring Alert SNMP Delivery
          • Configuring Custom Alert Scripts
      • Managing Licenses
      • Sending Usage and Diagnostic Data to Cloudera
      • Exporting and Importing Cloudera Manager Configuration
      • Backing up Cloudera Manager
      • Other Cloudera Manager Tasks and Settings
      • Cloudera Management Service
    • Cloudera Navigator Administration
    • Accessing Storage Using Amazon S3
      • Configuring the Amazon S3 Connector
        • Using S3 Credentials with YARN, MapReduce, or Spark
      • Using Fast Upload with Amazon S3
      • Configuring and Managing S3Guard
      • How to Configure a MapReduce Job to Access S3 with an HDFS Credstore
    • Configuring ADLS Connectivity
    • How To Create a Multitenant Enterprise Data Hub
  • Cloudera Navigator Data Management
    • Overview
      • Cloudera Navigator Console
      • Data Stewardship Dashboard
    • Auditing
      • Use Case Examples for Cloudera Navigator Auditing
      • Exploring Audit Data
      • Cloudera Navigator Audit Event Reports
      • Downloading HDFS Directory Access Permission Reports
    • Metadata
      • Defining Properties for Managed Metadata
      • Adding and Editing Metadata
      • Finding Specific Entities by Searching Metadata
      • Performing Actions on Entities
      • Using Policies to Automate Metadata Tagging
        • Metadata Policy Expressions
    • Lineage Diagrams
      • Using Lineage to Display Table Schema
    • Cloudera Navigator Administration
      • Managing Metadata Storage with Purge
      • Administering User Roles
    • Cloudera Navigator and the Cloud
      • Using Cloudera Navigator with Altus Clusters
        • Configuring Extraction for Altus Clusters on AWS
      • Using Cloudera Navigator with Amazon S3
        • Configuring Extraction for Amazon S3
    • Services and Security Management
      • Navigator Audit Server Management
        • Setting Up Navigator Audit Server
        • Enabling Audit and Log Collection for Services
        • Configuring Audit Properties
        • Monitoring Navigator Audit Service Health
        • Publishing Audit Events
      • Navigator Metadata Server Management
        • Setting Up Navigator Metadata Server
        • Navigator Metadata Server Tuning
        • Hive and Impala Lineage Configuration
        • Configuring and Managing Extraction
        • Configuring the Server for Policy Messages
      • Authentication and Authorization
      • Encryption (TLS/SSL) and Cloudera Navigator
      • Configuring Cloudera Navigator to work with Hue HA
    • Cloudera Navigator APIs
      • Navigator APIs Overview
      • Applying Metadata to HDFS and Hive Entities using the API
      • Using the Purge APIs for Metadata Maintenance Tasks
    • Cloudera Navigator Reference
      • Lineage Diagram Icons
      • Search Syntax and Properties
      • Service Audit Events
      • User Roles and Privileges Reference
    • Troubleshooting Navigator Data Management
  • Cloudera Operation
    • Monitoring and Diagnostics
      • Introduction to Cloudera Manager Monitoring
        • Time Line
        • Health Tests
        • Cloudera Manager Admin Console Home Page
        • Viewing Charts for Cluster, Service, Role, and Host Instances
        • Configuring Monitoring Settings
      • Monitoring Clusters
      • Monitoring Multiple CDH Deployments Using the Multi Cloudera Manager Dashboard
        • Installing and Managing the Multi Cloudera Manager Dashboard
        • Using the Multi Cloudera Manager Status Dashboard
      • Monitoring Services
        • Monitoring Service Status
        • Viewing Service Status
        • Viewing Service Instance Details
        • Viewing Role Instance Status
          • The Processes Tab
        • Running Diagnostic Commands for Roles
        • Periodic Stacks Collection
        • Viewing Running and Recent Commands
        • Monitoring Resource Management
      • Monitoring Hosts
        • Host Details
        • Host Inspector
      • Monitoring Activities
        • Monitoring MapReduce Jobs
          • Viewing and Filtering MapReduce Activities
          • Viewing the Jobs in a Pig, Oozie, or Hive Activity
          • Task Attempts
          • Viewing Activity Details in a Report Format
          • Comparing Similar Activities
          • Viewing the Distribution of Task Attempts
        • Monitoring Impala Queries
          • Query Details
        • Monitoring YARN Applications
        • Monitoring Spark Applications
      • Events
      • Triggers
        • Cloudera Manager Trigger Use Cases
      • Lifecycle and Security Auditing
      • Charting Time-Series Data
        • Dashboards
        • tsquery Language
        • Metric Aggregation
      • Logs
        • Viewing the Cloudera Manager Server Log
        • Viewing the Cloudera Manager Agent Logs
        • Managing Disk Space for Log Files
      • Reports
        • Directory Usage Report
        • Disk Usage Reports
        • Activity, Application, and Query Reports
        • The File Browser
        • Downloading HDFS Directory Access Permission Reports
      • Troubleshooting Cluster Configuration and Operation
    • Cloudera Manager Entity Types
    • Cloudera Manager Entity Type Attributes
    • Cloudera Manager Events
      • HBASE Category
      • AUDIT_EVENT Category
      • ACTIVITY_EVENT Category
      • SYSTEM Category
      • LOG_MESSAGE Category
      • HEALTH_CHECK Category
    • Cloudera Manager Health Tests
      • Active Database Health Tests
      • Active Key Trustee Server Health Tests
      • Activity Monitor Health Tests
      • Alert Publisher Health Tests
      • Beeswax Server Health Tests
      • Cloudera Management Service Health Tests
      • DataNode Health Tests
      • Event Server Health Tests
      • Failover Controller Health Tests
      • Flume Health Tests
      • Flume Agent Health Tests
      • Garbage Collector Health Tests
      • HBase Health Tests
      • HBase REST Server Health Tests
      • HBase Thrift Server Health Tests
      • HDFS Health Tests
      • History Server Health Tests
      • Hive Health Tests
      • Hive Metastore Server Health Tests
      • HiveServer2 Health Tests
      • Host Health Tests
      • Host Monitor Health Tests
      • HttpFS Health Tests
      • Hue Health Tests
      • Hue Server Health Tests
      • Impala Health Tests
      • Impala Catalog Server Health Tests
      • Impala Daemon Health Tests
      • Impala Llama ApplicationMaster Health Tests
      • Impala StateStore Health Tests
      • JobHistory Server Health Tests
      • JobTracker Health Tests
      • JournalNode Health Tests
      • Kafka Broker Health Tests
      • Kafka MirrorMaker Health Tests
      • Kerberos Ticket Renewer Health Tests
      • Key Management Server Health Tests
      • Key Management Server Proxy Health Tests
      • Key-Value Store Indexer Health Tests
      • Lily HBase Indexer Health Tests
      • Load Balancer Health Tests
      • Logger Health Tests
      • MapReduce Health Tests
      • Master Health Tests
      • Monitor Health Tests
      • NFS Gateway Health Tests
      • NameNode Health Tests
      • Navigator Audit Server Health Tests
      • Navigator HSM KMS Metastore Health Tests
      • Navigator HSM KMS Proxy Health Tests
      • Navigator Metadata Server Health Tests
      • NodeManager Health Tests
      • Oozie Health Tests
      • Oozie Server Health Tests
      • Passive Database Health Tests
      • Passive Key Trustee Server Health Tests
      • RegionServer Health Tests
      • Reports Manager Health Tests
      • ResourceManager Health Tests
      • SecondaryNameNode Health Tests
      • Sentry Health Tests
      • Sentry Server Health Tests
      • Service Monitor Health Tests
      • Solr Health Tests
      • Solr Server Health Tests
      • Spark Health Tests
      • Spark (Standalone) Health Tests
      • Spark 2 Health Tests
      • Sqoop 2 Health Tests
      • Sqoop 2 Server Health Tests
      • Tablet Server Health Tests
      • TaskTracker Health Tests
      • Tracer Health Tests
      • WebHCat Server Health Tests
      • Worker Health Tests
      • YARN (MR2 Included) Health Tests
      • ZooKeeper Health Tests
      • ZooKeeper Server Health Tests
    • Cloudera Manager Metrics
      • Accumulo Metrics
      • Accumulo 1.4 Metrics
      • Active Database Metrics
      • Active Key Trustee Server Metrics
      • Activity Metrics
      • Activity Monitor Metrics
      • Agent Metrics
      • Alert Publisher Metrics
      • Attempt Metrics
      • Beeswax Server Metrics
      • Cloudera Management Service Metrics
      • Cloudera Manager Server Metrics
      • Cluster Metrics
      • DataNode Metrics
      • Directory Metrics
      • Disk Metrics
      • Event Server Metrics
      • Failover Controller Metrics
      • Filesystem Metrics
      • Flume Metrics
      • Flume Channel Metrics
      • Flume Sink Metrics
      • Flume Source Metrics
      • Garbage Collector Metrics
      • HBase Metrics
      • HBase REST Server Metrics
      • HBase RegionServer Replication Peer Metrics
      • HBase Thrift Server Metrics
      • HDFS Metrics
      • HDFS Cache Directive Metrics
      • HDFS Cache Pool Metrics
      • HRegion Metrics
      • HTable Metrics
      • History Server Metrics
      • Hive Metrics
      • Hive Metastore Server Metrics
      • HiveServer2 Metrics
      • Host Metrics
      • Host Monitor Metrics
      • HttpFS Metrics
      • Hue Metrics
      • Hue Server Metrics
      • Impala Metrics
      • Impala Catalog Server Metrics
      • Impala Daemon Metrics
      • Impala Daemon Resource Pool Metrics
      • Impala Llama ApplicationMaster Metrics
      • Impala Pool Metrics
      • Impala Pool User Metrics
      • Impala Query Metrics
      • Impala StateStore Metrics
      • Isilon Metrics
      • Java KeyStore KMS Metrics
      • JobHistory Server Metrics
      • JobTracker Metrics
      • JournalNode Metrics
      • Kafka Metrics
      • Kafka Broker Metrics
      • Kafka Broker Topic Metrics
      • Kafka MirrorMaker Metrics
      • Kafka Replica Metrics
      • Kerberos Ticket Renewer Metrics
      • Key Management Server Metrics
      • Key Management Server Proxy Metrics
      • Key Trustee KMS Metrics
      • Key Trustee Server Metrics
      • Key-Value Store Indexer Metrics
      • Kudu Metrics
      • Kudu Replica Metrics
      • Lily HBase Indexer Metrics
      • Load Balancer Metrics
      • Logger Metrics
      • MapReduce Metrics
      • Master Metrics
      • Monitor Metrics
      • NFS Gateway Metrics
      • NameNode Metrics
      • Navigator Audit Server Metrics
      • Navigator HSM KMS Metastore Metrics
      • Navigator HSM KMS Proxy Metrics
      • Navigator HSM KMS backed by SafeNet Luna HSM Metrics
      • Navigator HSM KMS backed by Thales HSM Metrics
      • Navigator Metadata Server Metrics
      • Network Interface Metrics
      • NodeManager Metrics
      • Oozie Metrics
      • Oozie Server Metrics
      • Passive Database Metrics
      • Passive Key Trustee Server Metrics
      • RegionServer Metrics
      • Reports Manager Metrics
      • ResourceManager Metrics
      • SecondaryNameNode Metrics
      • Sentry Metrics
      • Sentry Server Metrics
      • Server Metrics
      • Service Monitor Metrics
      • Solr Metrics
      • Solr Replica Metrics
      • Solr Server Metrics
      • Solr Shard Metrics
      • Spark Metrics
      • Spark (Standalone) Metrics
      • Spark 2 Metrics
      • Sqoop 1 Client Metrics
      • Sqoop 2 Metrics
      • Sqoop 2 Server Metrics
      • Tablet Server Metrics
      • TaskTracker Metrics
      • Time Series Table Metrics
      • Tracer Metrics
      • User Metrics
      • WebHCat Server Metrics
      • Worker Metrics
      • YARN (MR2 Included) Metrics
      • YARN Pool Metrics
      • YARN Pool User Metrics
      • ZooKeeper Metrics
      • Disabling Metrics for Specific Roles
  • Cloudera Security
    • Cloudera Security Overview
      • Authentication Concepts
      • Encryption Concepts
        • Encryption Mechanisms Overview
      • Authorization Concepts
      • Auditing and Data Lineage Concepts
    • Authentication
      • Kerberos Security Artifacts Overview
      • Configuring Authentication in Cloudera Manager
        • Cloudera Manager User Accounts
        • Configuring External Authentication for Cloudera Manager
        • Enabling Kerberos Authentication Using the Wizard
          • Step 1: Install Cloudera Manager and CDH
          • Step 2: If You are Using AES-256 Encryption, Install the JCE Policy File
          • Step 3: Create the Kerberos Principal for Cloudera Manager Server
          • Step 4: Enabling Kerberos Using the Wizard
          • Step 5: Create the HDFS Superuser
          • Step 6: Get or Create a Kerberos Principal for Each User Account
          • Step 7: Prepare the Cluster for Each User
          • Step 8: Verify that Kerberos Security is Working
          • Step 9: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles
        • Kerberos Authentication for Single User Mode and Non-Default Users
        • Configuring a Cluster with Custom Kerberos Principals
        • Managing Kerberos Credentials Using Cloudera Manager
        • Using a Custom Kerberos Keytab Retrieval Script
        • Adding Trusted Realms to the Cluster
        • Using Auth-to-Local Rules to Isolate Cluster Users
      • Configuring Authentication for Cloudera Navigator
        • Cloudera Navigator and External Authentication
          • Configuring Cloudera Navigator for Active Directory
          • Configuring Cloudera Navigator for LDAP
          • Configuring Cloudera Navigator for SAML
        • Configuring Groups for Cloudera Navigator
      • Configuring Authentication in CDH Using the Command Line
        • Enabling Kerberos Authentication for Hadoop Using the Command Line
          • Step 1: Install CDH 5
          • Step 2: Verify User Accounts and Groups in CDH 5 Due to Security
          • Step 3: If you are Using AES-256 Encryption, Install the JCE Policy File
          • Step 4: Create and Deploy the Kerberos Principals and Keytab Files
          • Step 5: Shut Down the Cluster
          • Step 6: Enable Hadoop Security
          • Step 7: Configure Secure HDFS
          • Optional Step 8: Configuring Security for HDFS High Availability
          • Optional Step 9: Configure secure WebHDFS
          • Optional Step 10: Configuring a secure HDFS NFS Gateway
          • Step 11: Set Variables for Secure DataNodes
          • Step 12: Start up the NameNode
          • Step 12: Start up a DataNode
          • Step 14: Set the Sticky Bit on HDFS Directories
          • Step 15: Start up the Secondary NameNode (if used)
          • Step 16: Configure Either MRv1 Security or YARN Security
            • Configuring MRv1 Security
            • Configuring YARN Security
        • FUSE Kerberos Configuration
        • Using kadmin to Create Kerberos Keytab Files
        • Hadoop Users in Cloudera Manager and CDH
        • Configuring the Mapping from Kerberos Principals to Short Names
      • Configuring Authentication for Other Components
        • Flume Authentication
          • Configuring Flume's Security Properties
          • Configuring Kerberos for Flume Thrift Source and Sink Using Cloudera Manager
          • Configuring Kerberos for Flume Thrift Source and Sink Using the Command Line
          • Flume Account Requirements
          • Testing the Flume HDFS Sink Configuration
          • Writing to a Secure HBase Cluster
          • Step 10: (Flume Only) Use Substitution Variables for the Kerberos Principal and Keytab
        • HBase Authentication
          • Configuring Kerberos Authentication for HBase
          • Configuring Secure HBase Replication
          • Configuring the HBase Client TGT Renewal Period
        • HCatalog Authentication
        • Hive Authentication
          • HiveServer2 Security Configuration
          • Hive Metastore Server Security Configuration
          • Using Hive to Run Queries on a Secure HBase Server
        • HttpFS Authentication
        • Hue Authentication
          • Configuring Kerberos Authentication for Hue
          • Step 9: Enable Hue to Work with Hadoop Security using Cloudera Manager
        • Impala Authentication
          • Enabling Kerberos Authentication for Impala
          • Enabling LDAP Authentication for Impala
          • Using Multiple Authentication Methods with Impala
          • Configuring Impala Delegation for Hue and BI Tools
        • Llama Authentication
        • Oozie Authentication
          • Configuring Kerberos Authentication for the Oozie Server
          • Configuring Oozie HA with Kerberos
        • Solr Authentication
          • Using Kerberos with Solr
        • Spark Authentication
          • Configuring Spark on YARN for Long-Running Applications
        • Sqoop 2 Authentication
        • Sqoop 1, Pig, and Whirr Security
        • ZooKeeper Authentication
      • Configuring a Cluster-dedicated MIT KDC with Cross-Realm Trust
      • Integrating Hadoop Security with Active Directory
    • Authorization
      • Cloudera Manager User Roles
      • HDFS Extended ACLs
      • Authorization for HDFS Web UIs
      • Configuring LDAP Group Mappings
      • Authorization With Apache Sentry
        • The Sentry Service
          • Before You Install Sentry
          • Installing and Upgrading the Sentry Service
          • Migrating from Sentry Policy Files to the Sentry Service
          • Configuring the Sentry Service
          • Sentry High Availability
          • Sentry Debugging and Failure Scenarios
          • Hive SQL Syntax for Use with Sentry
          • Synchronizing HDFS ACLs and Sentry Permissions
          • Using the Sentry Web Server
          • Troubleshooting the Sentry Service
        • Sentry Policy File Authorization
          • Installing and Upgrading Sentry for Policy File Authorization
          • Configuring Sentry Policy File Authorization Using Cloudera Manager
          • Configuring Sentry Policy File Authorization Using the Command Line
        • Enabling Sentry Authorization for Impala
        • Configuring Sentry Authorization for Cloudera Search
      • Configuring HBase Authorization
    • Encrypting Data in Transit
      • Understanding Keystores and Truststores
      • Configuring TLS Encryption for Cloudera Manager
      • Configuring TLS/SSL Encryption for CDH Services
        • Configuring TLS/SSL for HDFS, YARN and MapReduce
        • Configuring TLS/SSL for HBase
        • Configuring TLS/SSL for Flume Thrift Source and Sink
        • Configuring Encrypted Communication Between HiveServer2 and Client Drivers
        • Configuring TLS/SSL for Hue
        • Configuring TLS/SSL for Impala
        • Configuring TLS/SSL for Oozie
        • Configuring TLS/SSL for Solr
        • Spark Encryption
        • Configuring TLS/SSL for HttpFS
        • Encrypted Shuffle and Encrypted Web UIs
      • Configuring TLS/SSL for Navigator Audit Server
      • Configuring TLS/SSL for Navigator Metadata Server
      • Configuring TLS/SSL for Kafka (Navigator Event Broker)
      • Configuring Encrypted Transport for HDFS
      • Configuring Encrypted Transport for HBase
    • Encrypting Data at Rest
      • Data at Rest Encryption Reference Architecture
      • Data at Rest Encryption Requirements
      • Resource Planning for Data at Rest Encryption
    • HDFS Transparent Encryption
      • Optimizing Performance for HDFS Transparent Encryption
      • Enabling HDFS Encryption Using the Wizard
      • Managing Encryption Keys and Zones
      • Configuring the Key Management Server (KMS)
      • Securing the Key Management Server (KMS)
        • Configuring KMS Access Control Lists (ACLs)
      • Migrating Keys from a Java KeyStore to Cloudera Navigator Key Trustee Server
      • Configuring CDH Services for HDFS Encryption
    • Cloudera Navigator Key Trustee Server
      • Backing Up and Restoring Key Trustee Server and Clients
      • Initializing Standalone Key Trustee Server
      • Configuring a Mail Transfer Agent for Key Trustee Server
      • Verifying Cloudera Navigator Key Trustee Server Operations
      • Managing Key Trustee Server Organizations
      • Managing Key Trustee Server Certificates
    • Cloudera Navigator Key HSM
      • Initializing Navigator Key HSM
      • HSM-Specific Setup for Cloudera Navigator Key HSM
      • Validating Key HSM Settings
      • Managing the Navigator Key HSM Service
      • Integrating Key HSM with Key Trustee Server
    • Cloudera Navigator Encrypt
      • Registering Cloudera Navigator Encrypt with Key Trustee Server
      • Preparing for Encryption Using Cloudera Navigator Encrypt
      • Encrypting and Decrypting Data Using Cloudera Navigator Encrypt
      • Migrating eCryptfs-Encrypted Data to dm-crypt
      • Navigator Encrypt Access Control List
      • Maintaining Cloudera Navigator Encrypt
    • Configuring Encryption for Data Spills
      • Configuring Encrypted On-disk File Channels for Flume
    • Impala Security Overview
      • Security Guidelines for Impala
      • Securing Impala Data and Log Files
      • Installation Considerations for Impala Security
      • Securing the Hive Metastore Database
      • Securing the Impala Web User Interface
    • Kudu Security Overview
    • Security How-To Guides
      • Add Root and Intermediate CAs to Truststore for TLS/SSL
      • Amazon S3 Security
      • Authenticating Kerberos Principals in Java Code
      • Check Cluster Security Settings
      • Configure Antivirus Software on CDH Hosts
      • How to Configure Browser-based Interfaces to Require Kerberos Authentication
      • Configure Browsers for Kerberos Authentication (SPNEGO)
      • Configure Cluster to Use Kerberos Authentication
      • Convert DER, JKS, PEM Files for TLS/SSL Artifacts
      • Configure Authentication for Amazon S3
      • Configure Encryption for Amazon S3
      • Configure AWS Credentials
      • Enable Sensitive Data Redaction
      • Enable Sentry High Availability
      • Log a Security Support Case
      • Obtain and Deploy Keys and Certificates for TLS/SSL
      • Renew and Redistribute Certificates
      • Set Up a Gateway Node to Restrict Access to the Cluster
      • Set Up Access to Cloudera EDH or Cloudera Director (Microsoft Azure Marketplace)
      • Use Self-Signed Certificates for TLS
      • Using Sentry to Manage Table Access in Hue
      • Verify HDFS ACL Sync
    • Troubleshooting Security Issues
      • Error Messages
      • Authentication and Kerberos Issues
      • HDFS Encryption Issues
      • TLS/SSL Issues
      • YARN, MRv1, and Linux OS Security
        • TaskController Error Codes (MRv1)
        • ContainerExecutor Error Codes (YARN)
  • File Formats and Compression
    • Parquet
    • Avro
    • Data Compression
    • Snappy Compression
  • HBase Guide
  • Hive Guide
    • Installation and Upgrade
    • Configuring
      • Configuring Hive Metastore
      • Configuring HiveServer2
      • Starting the Metastore
      • File System Permissions
      • Starting, Stopping, & Using HS2
      • Starting HS1 and Hive CLI (deprecated)
      • Using Hive w/HBase
      • Using Schema Tool
      • Installing JDBC Driver on Clients
      • Setting HADOOP_MAPRED_HOME
      • Configuring HMS for HDFS HA
    • Using & Managing
      • Managing Hive with Cloudera Manager
      • Ingesting & Querying Data
      • Running Hive on Spark
      • Using HS2 Web UI
      • Accessing Table Statistics
      • Managing UDFs
      • Hive ETL Jobs on S3
      • Hive with Amazon RDS
      • Hive with ADLS
    • Tuning
      • Tuning Hive on Spark
      • Tuning Hive on S3
      • Configuring Metastore HA
      • Configuring HS2 HA
    • Data Replication
    • Security
    • Troubleshooting
  • Hue Guide
    • Hue Versions
    • Installation & Upgrade
    • Databases
      • Connect Hue to MySQL or MariaDB
      • Connect Hue to PostgreSQL
      • Connect Hue to Oracle (Parcel)
      • Connect Hue to Oracle (Package)
      • Migrate Hue Database
      • Hue Custom Database Tutorial
      • Populate the Hue Database
    • Administration
      • Hue Configuration Files
      • Hue Logs and Paths
      • Hue User Permissions
      • Create Hue Password Scripts
      • Customize Hue Web UI
    • Security
      • Configure Hue for High Availability
      • Authenticate Hue Users with LDAP
      • Synchronize Hue with LDAP Server
      • Authenticate Hue Users with SAML
      • Authorize Hue Groups with Sentry
    • Hue How-tos
      • Add Hue Load Balancer
      • Enable SQL Editor Autocompleter
      • Enable and Use Governance-Based Data Discovery
      • Enable S3 Cloud Storage in Hue
      • Use S3 as Source or Sink in Hue
      • Run Hue Shell Commands
    • Troubleshooting
      • Potential Misconfiguration
  • Impala Guide
    • Concepts and Architecture
      • Components
      • Developing Applications
      • Role in the Hadoop Ecosystem
    • Deployment Planning
      • Requirements
      • Designing Schemas
    • Tutorials
    • Administration
      • How to Configure Resource Management for Impala
      • Setting Timeouts
      • Load-Balancing Proxy for HA
      • Managing Disk Space
      • Auditing
      • Viewing Lineage Info
    • SQL Reference
      • Comments
      • Data Types
        • ARRAY Complex Type (CDH 5.5 or higher only)
        • BIGINT
        • BOOLEAN
        • CHAR
        • DECIMAL
        • DOUBLE
        • FLOAT
        • INT
        • MAP Complex Type (CDH 5.5 or higher only)
        • REAL
        • SMALLINT
        • STRING
        • STRUCT Complex Type (CDH 5.5 or higher only)
        • TIMESTAMP
        • TINYINT
        • VARCHAR
        • Complex Types (CDH 5.5 or higher only)
      • Literals
      • SQL Operators
      • Schema Objects and Object Names
        • Aliases
        • Databases
        • Functions
        • Identifiers
        • Tables
        • Views
      • SQL Statements
        • DDL Statements
        • DML Statements
        • ALTER TABLE
        • ALTER VIEW
        • COMPUTE STATS
        • CREATE DATABASE
        • CREATE FUNCTION
        • CREATE ROLE
        • CREATE TABLE
        • CREATE VIEW
        • DELETE
        • DESCRIBE
        • DROP DATABASE
        • DROP FUNCTION
        • DROP ROLE
        • DROP STATS
        • DROP TABLE
        • DROP VIEW
        • EXPLAIN
        • GRANT
        • INSERT
        • INVALIDATE METADATA
        • LOAD DATA
        • REFRESH
        • REVOKE
        • SELECT
          • Joins
          • ORDER BY Clause
          • GROUP BY Clause
          • HAVING Clause
          • LIMIT Clause
          • OFFSET Clause
          • UNION Clause
          • Subqueries
          • TABLESAMPLE Clause
          • WITH Clause
          • DISTINCT Operator
          • Hints
        • SET
          • Query Options for the SET Statement
            • ABORT_ON_DEFAULT_LIMIT_EXCEEDED
            • ABORT_ON_ERROR
            • ALLOW_UNSUPPORTED_FORMATS
            • APPX_COUNT_DISTINCT
            • BATCH_SIZE
            • BUFFER_POOL_LIMIT
            • COMPRESSION_CODEC
            • DEBUG_ACTION
            • DECIMAL_V2
            • DEFAULT_JOIN_DISTRIBUTION_MODE
            • DEFAULT_ORDER_BY_LIMIT
            • DEFAULT_SPILLABLE_BUFFER_SIZE
            • DISABLE_CODEGEN
            • DISABLE_CODEGEN_ROWS_THRESHOLD
            • DISABLE_ROW_RUNTIME_FILTERING
            • DISABLE_STREAMING_PREAGGREGATIONS
            • DISABLE_UNSAFE_SPILLS
            • ENABLE_EXPR_REWRITES
            • EXEC_SINGLE_NODE_ROWS_THRESHOLD
            • EXPLAIN_LEVEL
            • HBASE_CACHE_BLOCKS
            • HBASE_CACHING
            • LIVE_PROGRESS
            • LIVE_SUMMARY
            • MAX_ERRORS
            • MAX_IO_BUFFERS
            • MAX_NUM_RUNTIME_FILTERS
            • MAX_ROW_SIZE
            • MAX_SCAN_RANGE_LENGTH
            • MEM_LIMIT
            • MIN_SPILLABLE_BUFFER_SIZE
            • MT_DOP
            • NUM_NODES
            • NUM_SCANNER_THREADS
            • OPTIMIZE_PARTITION_KEY_SCANS
            • PARQUET_COMPRESSION_CODEC
            • PARQUET_ANNOTATE_STRINGS_UTF8
            • PARQUET_ARRAY_RESOLUTION
            • PARQUET_DICTIONARY_FILTERING
            • PARQUET_FALLBACK_SCHEMA_RESOLUTION
            • PARQUET_FILE_SIZE
            • PARQUET_READ_STATISTICS
            • PREFETCH_MODE
            • QUERY_TIMEOUT_S
            • REQUEST_POOL
            • REPLICA_PREFERENCE
            • RESERVATION_REQUEST_TIMEOUT
            • RUNTIME_BLOOM_FILTER_SIZE
            • RUNTIME_FILTER_MAX_SIZE
            • RUNTIME_FILTER_MIN_SIZE
            • RUNTIME_FILTER_MODE
            • RUNTIME_FILTER_WAIT_TIME_MS
            • S3_SKIP_INSERT_STAGING
            • SCAN_NODE_CODEGEN_THRESHOLD
            • SCHEDULE_RANDOM_REPLICA
            • SCRATCH_LIMIT
            • SUPPORT_START_OVER
            • SYNC_DDL
            • V_CPU_CORES
        • SHOW
        • TRUNCATE TABLE
        • UPDATE
        • UPSERT
        • USE
      • Built-In Functions
        • Mathematical Functions
        • Bit Functions
        • Type Conversion Functions
        • Date and Time Functions
        • Conditional Functions
        • String Functions
        • Miscellaneous Functions
        • Aggregate Functions
          • APPX_MEDIAN
          • AVG
          • COUNT
          • GROUP_CONCAT
          • MAX
          • MIN
          • NDV
          • STDDEV, STDDEV_SAMP, STDDEV_POP
          • SUM
          • VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP
        • Analytic Functions
      • User-Defined Functions (UDFs)
      • SQL Differences Between Impala and Hive
      • Porting SQL
    • The Impala Shell
      • Configuration Options
      • Connecting to impalad
      • Running Commands and SQL Statements
      • Command Reference
    • Performance Tuning
      • Performance Best Practices
      • Join Performance
      • Table and Column Statistics
      • Benchmarking
      • Controlling Resource Usage
      • Runtime Filtering
      • HDFS Caching
      • Testing Impala Performance
      • EXPLAIN Plans and Query Profiles
      • HDFS Block Skew
    • Scalability Considerations
      • Dedicated Coordinators
    • Partitioning
    • File Formats
      • Text Data Files
      • Parquet Data Files
      • Avro Data Files
      • RCFile Data Files
      • SequenceFile Data Files
    • Using Impala to Query Kudu Tables
    • HBase Tables
    • S3 Tables
      • Configure with Cloudera Manager
      • Configure from Command Line
    • ADLS Tables
    • Isilon Storage
    • Logging
    • Troubleshooting Impala
      • Web User Interface
      • Breakpad Minidumps
    • Ports Used by Impala
    • Impala Reserved Words
    • Impala Frequently Asked Questions
  • Kudu Guide
    • Concepts and Architecture
    • Installation and Upgrade
    • Usage Limitations
    • Configuration
    • Administration
    • Developing Applications with Kudu
    • Using Apache Impala with Kudu
    • Schema Design
    • Transaction Semantics
    • Background Tasks
    • Troubleshooting
    • More Resources
  • Cloudera Search Guide
    • Cloudera Search Tutorial
      • Validating the Cloudera Search Deployment
      • Preparing to Index Sample Tweets with Cloudera Search
      • Using MapReduce Batch Indexing to Index Sample Tweets
      • Near Real Time (NRT) Indexing Tweets Using Flume
      • Using Hue with Cloudera Search
    • Deployment Planning for Cloudera Search
      • Schemaless Mode
    • Deploying Cloudera Search
      • Using Search through a Proxy for High Availability
      • Using Custom JAR Files with Search
      • Cloudera Search Security
    • Managing Cloudera Search
      • Managing Cloudera Search Configuration
      • Managing Collections in Cloudera Search
      • solrctl Reference
      • Example solrctl Usage
      • Migrating Solr Replicas
      • Backing Up and Restoring Cloudera Search
    • ETL With Cloudera Morphlines
      • Example Morphline Usage
    • Indexing Data
      • NRT Indexing
        • Flume NRT Indexing
          • Flume MorphlineSolrSink Configuration Options
          • Flume MorphlineInterceptor Configuration Options
          • Flume Solr UUIDInterceptor Configuration Options
          • Flume Solr BlobHandler Configuration Options
          • Flume Solr BlobDeserializer Configuration Options
        • Lily HBase NRT Indexing
          • Using the Lily HBase NRT Indexer Service
      • Batch Indexing
        • Spark Indexing
        • MapReduce Indexing
          • MapReduceIndexerTool
          • Lily HBase Batch Indexing
          • HdfsFindTool
    • Cloudera Search Frequently Asked Questions
    • Troubleshooting Cloudera Search
      • Static Solr Log Analysis
  • Spark Guide
    • Running Your First Spark Application
    • Spark Application Overview
    • Developing Spark Applications
      • Developing and Running a Spark WordCount Application
      • Using Spark Streaming
      • Using Spark SQL
      • Using Spark MLlib
      • Accessing External Storage
        • Accessing Data Stored in Amazon S3 through Spark
        • Accessing Data Stored in Azure Data Lake Store (ADLS) through Spark
        • Accessing Avro Data Files From Spark SQL Applications
        • Accessing Parquet Files From Spark SQL Applications
      • Building Spark Applications
      • Configuring Spark Applications
    • Running Spark Applications
      • Running Spark Applications on YARN
      • Using PySpark
        • Running Spark Python Applications
        • Spark and IPython and Jupyter Notebooks
      • Tuning Spark Applications
    • Spark and Hadoop Integration
      • Building and Running a Crunch Application with Spark
  • Cloudera Glossary

Configuring Extraction for Altus Clusters on AWS

Follow the steps below to configure Cloudera Navigator to extract metadata and lineage from single-user transient clusters deployed to Amazon Web Services using Cloudera Altus. The Cloudera Navigator extraction process for clusters launched by Cloudera Altus works as follows:
  • Any HDFS paths in a job, query, or data entity are extracted as proxy entities for the path, similar to how Hive entities are extracted. That means that HDFS is not bulk extracted from an Altus cluster.
  • Hive Metastore (HMS) entities are also not bulk extracted. Cloudera Navigator extracts Hive entities used in queries that generate lineage, such as databases, tables, and so on.

Continue reading:

  • Requirements
  • Obtaining AWS Credentials for the Amazon S3 Bucket
  • Cloudera Altus Configuration
  • Cloudera Navigator Configuration

Requirements

Cloudera Navigator collects metadata and lineage entities from transient clusters deployed to AWS by Cloudera Altus users. The metadata and lineage data is not collected directly from the transient clusters but rather from an Amazon S3 bucket that serves as the storage mechanism for the Telemetry Publisher running in the cluster (see How it Works: Background to the Setup Tasks for details).

As detailed in the steps below, successful integration of Cloudera Navigator and Cloudera Altus clusters depends on correctly:
  • Identifying an Amazon S3 bucket for storing metadata
  • Configuring correct access permissions to the S3 bucket from both Cloudera Altus and Cloudera Navigator. Transient clusters instantiated by Altus users must have read and write permissions to the Amazon S3 bucket used by Telemetry Publisher. The on-premises centralized Cloudera Navigator instance must have read permissions on the same Amazon S3 bucket.



—Altus user account (with cross-account privileges to AWS account). Privileges to launch EC2 clusters and use other AWS resources, including Amazon S3 buckets identified in the Altus environment (data input, data output, logs, and the S3 bucket for Telemetry Publisher).

— Read and write privileges to the Amazon S3 bucket configured in the Altus environment assigned to the Altus user.

—AWS access key ID and AWS secret key for the AWS account associated with the Amazon S3 bucket.

The steps below assume that you have:
  • An Amazon Web Services account.
  • A Cloudera Altus account.
  • A Cloudera Altus user account that can run jobs on transient clusters deployed to AWS.
  • Access to the on-premises or persistent Cloudera Manager cluster running Cloudera Navigator. The Cloudera Manager user role of Full Administrator and the ability to log in to the Cloudera Manager Admin Console is required.
  • AWS Credentials for the AWS account hosting the Amazon S3 bucket that serves as the storage mechanism for metadata and lineage data from clusters on AWS launched by Cloudera Altus.

Obtaining AWS Credentials for the Amazon S3 Bucket

AWS Credentials are available to be downloaded whenever you create an IAM user account through the AWS Management Console. If you are configuring an existing Amazon S3 bucket and you do not have the AWS Credentials for it, you can generate new AWS Credentials from the AWS account using either the AWS Management Console or the AWS CLI.

Important: The AWS credentials must have read and write access to the Amazon S3 bucket.

Generating new AWS Credentials deactivates any previously issued credentials and makes the newly generated credentials Active for the AWS account. Keep that in mind if you obtain new AWS Credentials to use for the Cloudera Navigator-Cloudera Altus integration.

Note: If you have the AWS Credentials obtained when the account was created, do not regenerate a new set of AWS Credentials unless you want to change the credentials.

These steps assume you have an AWS account and that an Amazon S3 bucket exists on that account that you want to use as the storage location for metadata and lineage.

  • Log in to the AWS Management Console using the account associated with the Amazon S3 bucket.
  • Navigate to the Security credentials section of the Users page in IAM for this account. For example:

  • Click the Create access key button to generate new AWS Credentials. Extract the credentials (the Access Key Id and Secret Key) from the user interface or download the credentials.csv for later use.

New credentials can be created by using the AWS CLI rather than the AWS Management Console. See Amazon documentation for details.

Cloudera Altus Configuration

Cloudera Altus instantiates single-user transient clusters focused on data engineering workloads that use compute services such as Hive or MapReduce2. The typical deployment scenario involves running scripts that invoke the Cloudera Altus CLI to instantiate the cluster, in this case, using Amazon Web Services according to the details specified in the Altus environment. An Altus environment specifies all resources needed by the cluster, including the AWS account that will be used to instantiate the cluster. The Cloudera Altus user account is configured to provide cross-account access to the AWS account that has permissions to launch AWS Elastic Compute Cloud (EC2) instances and use other AWS resources, including Amazon S3 buckets.

Although the Altus Environment can be created using Quickstart, Cloudera recommends using the Environment Wizard or the Altus CLI instead. The wizard provides better control over configuring resources, including letting you specify the Amazon S3 bucket that clusters will use to store metadata and lineage information for collection by Cloudera Navigator. Specifically, the Instance Profile Role page of the Configuration Wizard lets you enable integration with Cloudera Navigator and specify the Amazon S3 bucket that will hold collected metadata and lineage information.

On the Instance Profile Role page of the Configuration Wizard, complete the following steps:
  • Click the Enable checkbox for Cloudera Navigator Integration.
  • In the Cloudera Navigator S3 Data Bucket field, enter the path to the Amazon S3 bucket, including the final /, which identifies the target as an S3 bucket. For example:
    s3a://cluster-lab.example.com/cust-input/
To provide the correct access to the S3 bucket, you must also create the appropriate policy in the AWS Management Console and apply the policy to the Amazon S3 bucket.

For more information about using Cloudera Altus, see the Cloudera Altus documentation.

Cloudera Navigator Configuration

The Cloudera Navigator runs in the context of Cloudera Manager Server. Its two role instances, the Navigator Audit Server and Navigator Metadata Server, run on the Cloudera Management Service. The Navigator Metadata Server role instance is the component that extracts metadata and lineage from the Amazon S3 bucket using the AWS Credentials configured for connectivity in the steps below:
  • Follow the steps in Adding AWS Credentials and Configuring Connectivity to add new or regenerated AWS Credentials to the Cloudera Manager Server and then configure connectivity.
  • Follow the steps in Configuring Connectivity for AWS Credentials to configure connectivity for AWS Credentials that are already available to be used for the Amazon S3 bucket but have not yet been configured for connectivity.
Important: Cloudera Navigator extracts metadata and lineage for clusters deployed using Altus from one Amazon S3 bucket only. In addition, for any given Amazon S3 bucket collecting metadata and lineage from Altus clusters, configure only one Cloudera Navigator instance to extract from that Amazon S3 bucket. Using multiple Cloudera Navigator instances to extract from the same Amazon S3 bucket is not supported and has unpredictable results.

Adding AWS Credentials and Configuring Connectivity

Cloudera Manager Required Role: Full Administrator

The AWS Credentials must be added to the Cloudera Manager Server for use by Cloudera Navigator. These credentials must be from the AWS account hosting the Amazon S3 bucket that is configured in the Altus environment.
Note: The AWS account associated with these credentials must have cross-account access permissions from the Altus user account that will launch clusters on AWS and run jobs. These credentials must also have read and write permissions on the S3 bucket because the clusters launched must be able to write metadata and lineage information to the Amazon S3 bucket as jobs run.
  1. Log in to the Cloudera Manager Admin Console.
  2. Select Administration > AWS Credentials.
  3. Click the Add Access Key Credentials button on the AWS Credentials page.
    1. Enter a meaningful name for the AWS Credential, such as the type of jobs the associated clusters will run (for example, etl-processing). This name is for your own information and is not checked against any Cloudera Altus or AWS attributes.
    2. Enter the AWS Access Key ID and the AWS Secret Key.

  4. Click Add to save the credentials. The S3Guard option page displays, reflecting the credential name (for example, Edit S3Guard: etl-processing). Disregard this option.
  5. Click Save. The Connect to Amazon Web Services page displays, showing the options available for this specific AWS credential.

  6. Click Enable Metadata and Lineage Extraction from Cloudera Altus. The Metadata and Lineage Extraction Configuration setting page displays a field for specifying the Amazon S3 bucket name.
  7. Enter the name of the Amazon S3 bucket. For example:

  8. Click OK. The AWS Credentials page re-displays, and the newly added AWS Credential is listed with any other AWS Credentials held by the Cloudera Manager Server.
  9. Restart the Cloudera Management Service.
When the service restarts, the AWS credentials will be used by Cloudera Navigator to authenticate to the AWS account and extract metadata and lineage stored on the specified S3 bucket.

Configuring Connectivity for AWS Credentials

If the AWS Credentials are available for the Amazon S3 bucket, you can configure them as follows:
  1. Log in to the Cloudera Manager Admin Console.
  2. Select Administration > AWS Credentials.
  3. Find the available AWS Credentials that provide access to the Amazon S3 bucket used to collect metadata and lineage from transient clusters.

  4. Click the Actions drop-down menu and select Edit Connectivity. The Connect to Amazon Web Services page displays the three sections of possible configurations.
  5. In the Cloudera Navigator section, click the Enable Metadata and Lineage Extraction from Cloudera Altus link. The Metadata and Lineage Extraction Configuration page displays.
  6. Enter the name of the Amazon S3 bucket in the S3 Bucket Name field.
  7. Click OK.
  8. Restart the Cloudera Management Service.

This completes the setup process. After the restart, metadata and lineage for transient clusters deployed using Cloudera Altus should be available in the Cloudera Navigator console.

Note: See Troubleshooting to identity possible issues if metadata and lineage do not display in the Cloudera Navigator console after completing the configuration and restarting the system.
Technical metadata specific to clusters deployed using Altus include the following property names and types:
  • Cluster (Source Type)
  • Cluster-name (Cluster Group)
  • Transient (Deployment Type)
  • Cluster Template, Cluster Instance (Classname)
For example:

See Search Syntax and Properties and Cloudera Navigator Metadata for more information.

Categories: AWS | Altus | Cloud | Data Management | Navigator | All Categories

Using Cloudera Navigator with Altus Clusters
Using Cloudera Navigator with Amazon S3
Next Topic Previous Topic Print Back to top
  • About Cloudera
  • Resources
  • Contact
  • Careers
  • Press
  • Documentation

United States: +1 888 789 1488
Outside the US: +1 650 362 0488

© 2021 Cloudera, Inc. All rights reserved. Apache Hadoop and associated open source project names are trademarks of the Apache Software Foundation. For a complete list of trademarks, click here.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required notices. A copy of the Apache License Version 2.0 can be found here.

Terms & Conditions  |  Privacy Policy

Page generated February 4, 2021.