Troubleshooting HBase

The Cloudera HBase packages have been configured to place logs in /var/log/hbase. Cloudera recommends tailing the .log files in this directory when you start HBase to check for any error messages or failures.

Table Creation Fails after Installing LZO

If you install LZO after starting the RegionServer, you will not be able to create a table with LZO compression until you re-start the RegionServer.

Why this happens

When the RegionServer starts, it runs CompressionTest and caches the results. When you try to create a table with a given form of compression, it refers to those results. You have installed LZO since starting the RegionServer, so the cached results, which pre-date LZO, cause the create to fail.

What to do

Restart the RegionServer. Now table creation with LZO will succeed.

Thrift Server Crashes after Receiving Invalid Data

The Thrift server may crash if it receives a large amount of invalid data, due to a buffer overrun.

Why this happens

The Thrift server allocates memory to check the validity of data it receives. If it receives a large amount of invalid data, it may need to allocate more memory than is available. This is due to a limitation in the Thrift library itself.

What to do

To prevent the possibility of crashes due to buffer overruns, use the framed and compact transport protocols. These protocols are disabled by default, because they may require changes to your client code. The two options to add to your hbase-site.xml are hbase.regionserver.thrift.framed and hbase.regionserver.thrift.compact. Set each of these to true, as in the XML below. You can also specify the maximum frame size, using the hbase.regionserver.thrift.framed.max_frame_size_in_mb option.

<property> 
  <name>hbase.regionserver.thrift.framed</name> 
  <value>true</value> 
</property> 
<property> 
  <name>hbase.regionserver.thrift.framed.max_frame_size_in_mb</name> 
  <value>2</value> 
</property> 
<property> 
  <name>hbase.regionserver.thrift.compact</name> 
  <value>true</value> 
</property>

HBase is using more disk space than expected.

HBase StoreFiles (also called HFiles) store HBase row data on disk. HBase stores other information on disk, such as write-ahead logs (WALs), snapshots, data that would otherwise be deleted but would be needed to restore from a stored snapshot.

HBase Disk Usage
Location	Purpose	Troubleshooting Notes
`/hbase/.snapshots`	Contains one subdirectory per snapshot.	To list snapshots, use the HBase Shell command `list_snapshots`. To remove a snapshot, use `delete_snapshot`.
`/hbase/.archive`	Contains data that would otherwise have been deleted (either because it was explicitly deleted or expired due to TTL or version limits on the table) but that is required to restore from an existing snapshot.	To free up space being taken up by excessive archives, delete the snapshots that refer to them. Snapshots never expire so data referred to by them is kept until the snapshot is removed. Do not remove anything from `/hbase/.archive` manually, or you will corrupt your snapshots.
`/hbase/.logs`	Contains HBase WAL files that are required to recover regions in the event of a RegionServer failure.	WALs are removed when their contents are verified to have been written to StoreFiles. Do not remove them manually. If the size of any subdirectory of `/hbase/.logs/` is growing, examine the HBase server logs to find the root cause for why WALs are not being processed correctly.
`/hbase/logs/.oldWALs`	Contains HBase WAL files that have already been written to disk. A HBase maintenance thread removes them periodically based on a TTL.	To tune the length of time a WAL stays in the `.oldWALs` before it is removed, configure the `hbase.master.logcleaner.ttl` property, which defaults to 60000 milliseconds, or 1 hour.
`/hbase/.logs/.corrupt`	Contains corrupted HBase WAL files.	Do not remove corrupt WALs manually. If the size of any subdirectory of `/hbase/.logs/` is growing, examine the HBase server logs to find the root cause for why WALs are not being processed correctly.

Security

Hive Guide