Using Azure Data Lake Store with HBase
CDH 5.12 and higher support using Azure Data Lake Store (ADLS) as a storage layer for HBase.
There are two scenarios in which ADLS can be used with HBase:
- ADLS-only: In this scenario, both HFiles, which contain user data, and write-ahead logs (WALs) are written to ADLS. This configuration is not recommended for use cases that demand high performance.
- ADLS + HDFS: In this scenario, HFiles are written to ADLS, but WALs are written to HDFS. This configuration provides higher throughput and lower latency for writes than does the ADLS-only configuration.
Configuring HBase to Use ADLS as a Storage Layer
- Set up credentials to enable communication between HBase and ADLS. See Configuring ADLS Connectivity and use one of the configuration methods listed there that HBase supports.
- In the Cloudera Manager Admin Console, select the HBase service, click the Configuration tab, and locate the Hbase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
-
Depending on which scenario you plan to use, add the following values for the Name and Value fields:
-
ADLS-only:
-
Name: hbase.rootdir
Value: adl://<adls_account_name>.azuredatalakestore.net/<hbase_directory>
-
-
ADLS + HDFS:
-
Name: hbase.rootdir
Value: adl://<adls_account_name>.azuredatalakestore.net/<hbase_directory>
-
Name: hbase.wal.dir
Value: hdfs://<name_node>:8020/<hbase_wal_directory>
-
-
-
Still on the Configuration page for the HBase service, locate the HBase Service Advanced Configuration Snippet (Safety Valve) for core-site.xml and add the following Name and Value pairs for both configuration scenarios (ADLS-only and ADLS + HDFS):
-
Name: fs.defaultFS
Value: adl://<adls_account_name>.azuredatalakestore.net/
-
Name: adl.debug.override.localuserasfileowner
Value: true
-