Using HBase Command-Line Utilities
Besides the HBase Shell, HBase includes several other command-line utilities, which are available in the hbase/bin/ directory of each HBase host. This topic provides basic usage instructions for the most commonly used utilities.
PerformanceEvaluation
The PerformanceEvaluation utility allows you to run several preconfigured tests on your cluster and reports its performance. To run the PerformanceEvaluation tool in CDH 5.1 and higher, use the bin/hbase pe command. In CDH 5.0 and lower, use the command bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation.
For usage instructions, run the command with no arguments. The following output shows the usage instructions for the PerformanceEvaluation tool in CDH
5.7. Options and commands available depend on the CDH version.
$ hbase pe Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \ <OPTIONS> [-D<property=value>]* <command> <nclients> Options: nomapred Run multiple clients using threads (rather than use mapreduce) rows Rows each client runs. Default: One million size Total size in GiB. Mutually exclusive with --rows. Default: 1.0. sampleRate Execute test on a sample of total rows. Only supported by randomRead. Default: 1.0 traceRate Enable HTrace spans. Initiate tracing every N rows. Default: 0 table Alternate table name. Default: 'TestTable' multiGet If >0, when doing RandomRead, perform multiple gets instead of single gets. Default: 0 compress Compression type to use (GZ, LZO, ...). Default: 'NONE' flushCommits Used to determine if the test should flush the table. Default: false writeToWAL Set writeToWAL on puts. Default: True autoFlush Set autoFlush on htable. Default: False oneCon all the threads share the same connection. Default: False presplit Create presplit table. Recommended for accurate perf analysis (see guide). Default: disabled inmemory Tries to keep the HFiles of the CF inmemory as far as possible. Not guaranteed that reads are always served from memory. Default: false usetags Writes tags along with KVs. Use with HFile V3. Default: false numoftags Specify the no of tags that would be needed. This works only if usetags is true. filterAll Helps to filter out all the rows on the server side there by not returning anything back to the client. Helps to check the server side performance. Uses FilterAllFilter internally. latency Set to report operation latencies. Default: False bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL] valueSize Pass value size to use: Default: 1024 valueRandom Set if we should vary value size between 0 and 'valueSize'; set on read for stats on size: Default: Not set. valueZipf Set if we should vary value size between 0 and 'valueSize' in zipf form: Default: Not set. period Report every 'period' rows: Default: opts.perClientRunRows / 10 multiGet Batch gets together into groups of N. Only supported by randomRead. Default: disabled addColumns Adds columns to scans/gets explicitly. Default: true replicas Enable region replica testing. Defaults: 1. splitPolicy Specify a custom RegionSplitPolicy for the table. randomSleep Do a random sleep before each get between 0 and entered value. Defaults: 0 columns Columns to write per row. Default: 1 caching Scan caching to use. Default: 30 Note: -D properties will be applied to the conf used. For example: -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.task.timeout=60000 Command: append Append on each row; clients overlap on keyspace so some concurrent operations checkAndDelete CheckAndDelete on each row; clients overlap on keyspace so some concurrent operations checkAndMutate CheckAndMutate on each row; clients overlap on keyspace so some concurrent operations checkAndPut CheckAndPut on each row; clients overlap on keyspace so some concurrent operations filterScan Run scan test using a filter to find a specific row based on it's value (make sure to use --rows=20) increment Increment on each row; clients overlap on keyspace so some concurrent operations randomRead Run random read test randomSeekScan Run random seek and scan 100 test randomWrite Run random write test scan Run scan test (read every row) scanRange10 Run random seek scan with both start and stop row (max 10 rows) scanRange100 Run random seek scan with both start and stop row (max 100 rows) scanRange1000 Run random seek scan with both start and stop row (max 1000 rows) scanRange10000 Run random seek scan with both start and stop row (max 10000 rows) sequentialRead Run sequential read test sequentialWrite Run sequential write test Args: nclients Integer. Required. Total number of clients (and HRegionServers) running: 1 <= value <= 500 Examples: To run a single client doing the default 1M sequentialWrites: $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 To run 10 clients doing increments over ten rows: $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred increment 10
LoadTestTool
The LoadTestTool utility load-tests your cluster by performing writes, updates, or reads on it. To run the LoadTestTool in
CDH 5.1 and higher, use the bin/hbase ltt command . In CDH 5.0 and lower, use the command bin/hbase
org.apache.hadoop.hbase.util.LoadTestTool. To print general usage information, use the -h option. Options and commands available depend on the CDH version.
$ bin/hbase ltt -h Options: -batchupdate Whether to use batch as opposed to separate updates for every column in a row -bloom <arg> Bloom filter type, one of [NONE, ROW, ROWCOL] -compression <arg> Compression type, one of [LZO, GZ, NONE, SNAPPY, LZ4] -data_block_encoding <arg> Encoding algorithm (e.g. prefix compression) to use for data blocks in the test column family, one of [NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE]. -deferredlogflush Enable deferred log flush. -encryption <arg> Enables transparent encryption on the test table, one of [AES] -families <arg> The name of the column families to use separated by comma -generator <arg> The class which generates load for the tool. Any args for this class can be passed as colon separated after class name -h,--help Show usage -in_memory Tries to keep the HFiles of the CF inmemory as far as possible. Not guaranteed that reads are always served from inmemory -init_only Initialize the test table only, don't do any loading -key_window <arg> The 'key window' to maintain between reads and writes for concurrent write/read workload. The default is 0. -max_read_errors <arg> The maximum number of read errors to tolerate before terminating all reader threads. The default is 10. -mob_threshold <arg> Desired cell size to exceed in bytes that will use the MOB write path -multiget_batchsize <arg> Whether to use multi-gets as opposed to separate gets for every column in a row -multiput Whether to use multi-puts as opposed to separate puts for every column in a row -num_keys <arg> The number of keys to read/write -num_regions_per_server <arg> Desired number of regions per region server. Defaults to 5. -num_tables <arg> A positive integer number. When a number n is speicfied, load test tool will load n table parallely. -tn parameter value becomes table name prefix. Each table name is in format <tn>_1...<tn>_n -read <arg> <verify_percent>[:<#threads=20>] -reader <arg> The class for executing the read requests -region_replica_id <arg> Region replica id to do the reads from -region_replication <arg> Desired number of replicas per region -regions_per_server <arg> A positive integer number. When a number n is specified, load test tool will create the test table with n regions per server -skip_init Skip the initialization; assume test table already exists -start_key <arg> The first key to read/write (a 0-based index). The default value is 0. -tn <arg> The name of the table to read or write -update <arg> <update_percent>[:<#threads=20>][:<#whether to ignore nonce collisions=0>] -updater <arg> The class for executing the update requests -write <arg> <avg_cols_per_key>:<avg_data_size>[:<#threads=20>] -writer <arg> The class for executing the write requests -zk <arg> ZK quorum as comma-separated host names without port numbers -zk_root <arg> name of parent znode in zookeeper
wal
The wal utility prints information about the contents of a specified WAL file. To get a list of all WAL files, use the HDFS command hadoop fs -ls -R /hbase/WALs. To run the wal utility, use the bin/hbase wal command. Run it without options to get
usage information.
hbase wal usage: WAL <filename...> [-h] [-j] [-p] [-r <arg>] [-s <arg>] [-w <arg>] -h,--help Output help message -j,--json Output JSON -p,--printvals Print values -r,--region <arg> Region to filter by. Pass encoded region name; e.g. '9192caead6a5a20acb4454ffbc79fa14' -s,--sequence <arg> Sequence to filter by. Pass sequence number. -w,--row <arg> Row to filter by. Pass row name.
hfile
The hfile utility prints diagnostic information about a specified hfile, such as block headers or statistics. To get a
list of all hfiles, use the HDFS command hadoop fs -ls -R /hbase/data. To run the hfile utility, use the
bin/hbase hfile command. Run it without options to get usage information.
$ hbase hfile usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p] [-s] [-v] [-w <arg>] -a,--checkfamily Enable family check -b,--printblocks Print block index meta data -e,--printkey Print keys -f,--file <arg> File to scan. Pass full-path; e.g. hdfs://a:9000/hbase/hbase:meta/12/34 -h,--printblockheaders Print block headers for each block. -i,--checkMobIntegrity Print all cells whose mob files are missing -k,--checkrow Enable row order check; looks for out-of-order keys -m,--printmeta Print meta data of file -p,--printkv Print key/value pairs -r,--region <arg> Region to scan. Pass region name; e.g. 'hbase:meta,,1' -s,--stats Print statistics -v,--verbose Verbose output; emits file and meta data delimiters -w,--seekToRow <arg> Seek to this row and print all the kvs for this row only
hbck
The hbck utility checks and optionally repairs errors in HFiles.
To run hbck, use the bin/hbase hbck command. Run it with the -h option to get more usage information.
$ bin/hbase hbck -h Usage: fsck [opts] {only tables} where [opts] are: -help Display help options (this) -details Display full report of all regions. -timelag <timeInSeconds> Process only regions that have not experienced any metadata updates in the last <timeInSeconds> seconds. -sleepBeforeRerun <timeInSeconds> Sleep this many seconds before checking if the fix worked if run with -fix -summary Print only summary of the tables and status. -metaonly Only check the state of the hbase:meta table. -sidelineDir <hdfs://> HDFS path to backup existing meta. -boundaries Verify that regions boundaries are the same between META and store files. -exclusive Abort if another hbck is exclusive or fixing. -disableBalancer Disable the load balancer. Metadata Repair options: (expert features, use with caution!) -fix Try to fix region assignments. This is for backwards compatiblity -fixAssignments Try to fix region assignments. Replaces the old -fix -fixMeta Try to fix meta problems. This assumes HDFS region info is good. -noHdfsChecking Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't check/fix any HDFS issue, e.g. hole, orphan, or overlap -fixHdfsHoles Try to fix region holes in hdfs. -fixHdfsOrphans Try to fix region dirs with no .regioninfo file in hdfs -fixTableOrphans Try to fix table dirs with no .tableinfo file in hdfs (online mode only) -fixHdfsOverlaps Try to fix region overlaps in hdfs. -fixVersionFile Try to fix missing hbase.version file in hdfs. -maxMerge <n> When fixing region overlaps, allow at most <n> regions to merge. (n=5 by default) -sidelineBigOverlaps When fixing region overlaps, allow to sideline big overlaps -maxOverlapsToSideline <n> When fixing region overlaps, allow at most <n> regions to sideline per group. (n=2 by default) -fixSplitParents Try to force offline split parents to be online. -ignorePreCheckPermission ignore filesystem permission pre-check -fixReferenceFiles Try to offline lingering reference store files -fixEmptyMetaCells Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows) Datafile Repair options: (expert features, use with caution!) -checkCorruptHFiles Check all Hfiles by opening them to make sure they are valid -sidelineCorruptHFiles Quarantine corrupted HFiles. implies -checkCorruptHFiles Metadata Repair shortcuts -repair Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles -fixTableLocks -fixOrphanedTableZnodes -repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles Table lock options -fixTableLocks Deletes table locks held for a long time (hbase.table.lock.expire.ms, 10min by default) Table Znode options -fixOrphanedTableZnodes Set table state in ZNode to disabled if table does not exists Replication options -fixReplication Deletes replication queues for removed peers
clean
After you have finished using a test or proof-of-concept cluster, the hbase clean utility can remove all HBase-related data from ZooKeeper and HDFS.
To run the hbase clean utility, use the bin/hbase clean command. Run it with no options for usage information.
$ bin/hbase clean Usage: hbase clean (--cleanZk|--cleanHdfs|--cleanAll) Options: --cleanZk cleans hbase related data from zookeeper. --cleanHdfs cleans hbase related data from hdfs. --cleanAll cleans hbase related data from both zookeeper and hdfs.