Best Practices for Using Apache Hive in CDH

Hive data warehouse software enables reading, writing, and managing large datasets in distributed storage. Using the Hive query language (HiveQL), which is very similar to SQL, queries are converted into a series of jobs that execute on a Hadoop cluster through MapReduce or Apache Spark.

Users can run batch processing workloads with Hive while also analyzing the same data for interactive SQL or machine-learning workloads using tools like Apache Impala or Apache Spark—all within a single platform.

As part of CDH, Hive also benefits from:

Unified resource management provided by YARN
Simplified deployment and administration provided by Cloudera Manager
Shared security and governance to meet compliance requirements provided by Apache Sentry and Cloudera Navigator

Continue reading:

Installation and Upgrade
Configuring
Using & Managing
Tuning
Data Replication
Security
Troubleshooting

Categories: Hive | All Categories

HBase Guide

Installation and Upgrade