Breakpad Minidumps for Impala (CDH 5.8 or higher only)

The breakpad project is an open-source framework for crash reporting. In CDH 5.8 / Impala 2.6 and higher, Impala can use breakpad to record stack information and register values when any of the Impala-related daemons crash due to an error such as SIGSEGV or unhandled exceptions. The dump files are much smaller than traditional core dump files. The dump mechanism itself uses very little memory, which improves reliability if the crash occurs while the system is low on memory.

Enabling or Disabling Minidump Generation

By default, a minidump file is generated when an Impala-related daemon crashes.

To turn off generation of the minidump files, use one of the following options:
  • Set the --enable_minidumps configuration setting to false. Restart the corresponding services or daemons.
  • Set the --minidump_path configuration setting to an empty string. Restart the corresponding services or daemons.

In CDH 5.9 / Impala 2.7 and higher, you can send a SIGUSR1 signal to any Impala-related daemon to write a Breakpad minidump. For advanced troubleshooting, you can now produce a minidump without triggering a crash.

Specifying the Location for Minidump Files

By default, all minidump files are written to the following location on the host where a crash occurs:
  • Clusters managed by Cloudera Manager: /var/log/impala-minidumps/daemon_name

  • Clusters not managed by Cloudera Manager: impala_log_dir/daemon_name/minidumps/daemon_name

The minidump files for impalad, catalogd, and statestored are each written to a separate directory.

To specify a different location, set the minidump_path configuration setting of one or more Impala-related daemons, and restart the corresponding services or daemons.

If you specify a relative path for this setting, the value is interpreted relative to the default minidump_path directory.

Controlling the Number of Minidump Files

Like any files used for logging or troubleshooting, consider limiting the number of minidump files, or removing unneeded ones, depending on the amount of free storage space on the hosts in the cluster.

Because the minidump files are only used for problem resolution, you can remove any such files that are not needed to debug current issues.

To control how many minidump files Impala keeps around at any one time, set the max_minidumps configuration setting for of one or more Impala-related daemon, and restart the corresponding services or daemons. The default for this setting is 9. A zero or negative value is interpreted as "unlimited".

Detecting Crash Events

You can see in the Impala log files or in the Cloudera Manager charts for Impala when crash events occur that generate minidump files. Because each restart begins a new log file, the "crashed" message is always at or near the bottom of the log file. (There might be another later message if core dumps are also enabled.)

Using the Minidump Files for Problem Resolution

Typically, you provide minidump files to Cloudera Support as part of problem resolution, in the same way that you might provide a core dump. The Send Diagnostic Data under the Support menu in Cloudera Manager guides you through the process of selecting a time period and volume of diagnostic data, then collects the data from all hosts and transmits the relevant information for you.

Send Diagnostic Data choice under Support menu

You might get additional instructions from Cloudera Support about collecting minidumps to better isolate a specific problem. Because the information in the minidump files is limited to stack traces and register contents, the possibility of including sensitive information is much lower than with core dump files. If any sensitive information is included in the minidump, Cloudera Support preserves the confidentiality of that information.

Demonstration of Breakpad Feature

The following example uses the command kill -11 to simulate a SIGSEGV crash for an impalad process on a single DataNode, then examines the relevant log files and minidump file.

First, as root on a worker node, we kill the impalad process with a SIGSEGV error. The original process ID was 23114. (Cloudera Manager restarts the process with a new pid, as shown by the second ps command.)

# ps ax | grep impalad
23114 ?        Sl     0:18 /opt/cloudera/parcels/<parcel_version>/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/114-impala-IMPALAD/impala-conf/impalad_flags
31259 pts/0    S+     0:00 grep impalad
#
# kill -11 23114
#
# ps ax | grep impalad
31374 ?        Rl     0:04 /opt/cloudera/parcels/<parcel_version>/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/114-impala-IMPALAD/impala-conf/impalad_flags
31475 pts/0    S+     0:00 grep impalad

We locate the log directory underneath /var/log. There is a .INFO, .WARNING, and .ERROR log file for the 23114 process ID. The minidump message is written to the .INFO file and the .ERROR file, but not the .WARNING file. In this case, a large core file was also produced.

# cd /var/log/impalad
# ls -la | grep 23114
-rw-------   1 impala impala 3539079168 Jun 23 15:20 core.23114
-rw-r--r--   1 impala impala      99057 Jun 23 15:20 hs_err_pid23114.log
-rw-r--r--   1 impala impala        351 Jun 23 15:20 impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
-rw-r--r--   1 impala impala      29101 Jun 23 15:20 impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
-rw-r--r--   1 impala impala        228 Jun 23 14:03 impalad.worker_node_123.impala.log.WARNING.20160623-140343.23114

The .INFO log includes the location of the minidump file, followed by a report of a core dump. With the breakpad minidump feature enabled, now we might disable core dumps or keep fewer of them around.

# cat impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
...
Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000030c0e0b68a, pid=23114, tid=139869541455968
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libpthread.so.0+0xb68a]  pthread_cond_wait+0xca
#
# Core dump written. Default location: /var/log/impalad/core or core.23114
#
# An error report file with more information is saved as:
# /var/log/impalad/hs_err_pid23114.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
...

# cat impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114

Log file created at: 2016/06/23 14:03:43
Running on machine:.worker_node_123
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0623 14:03:43.911002 23114 logging.cc:118] stderr will be logged to this file.
Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp

The resulting minidump file is much smaller than the corresponding core file, making it much easier to supply diagnostic information to Cloudera Support. The transmission process for the minidump files is automated through Cloudera Manager.

# pwd
/var/log/impalad
# cd ../impala-minidumps/impalad
# ls
0980da2d-a905-01e1-25ff883a-04ee027a.dmp
# du -kh *
2.4M  0980da2d-a905-01e1-25ff883a-04ee027a.dmp