Configuring Kerberos for Flume Sinks
Writing as a single user for all HDFS sinks in a given Flume agent
The Hadoop services require a three-part principal that has the form of username/fully.qualified.domain.name@YOUR-REALM.COM. Cloudera recommends using flume as the first component and the fully qualified domain name of the host machine as the second. Assuming that Kerberos and security-enabled Hadoop have been properly configured on the Hadoop cluster itself, you must add the following parameters to the Flume agent's flume.conf configuration file, which is typically located at /etc/flume-ng/conf/flume.conf:
agentName.sinks.sinkName.hdfs.kerberosPrincipal = flume/fully.qualified.domain.name@YOUR-REALM.COM agentName.sinks.sinkName.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab
where:
agentName is the name of the Flume agent being configured, which in this release defaults to the value "agent". sinkName is the name of the HDFS sink that is being configured. The respective sink's type must be HDFS. These properties can also be set using the substitution strings $KERBEROS_PRINCIPAL and $KERBEROS_KEYTAB, respectively.
In the previous example, flume is the first component of the principal name, fully.qualified.domain.name is the second, and YOUR-REALM.COM is the name of the Kerberos realm your Hadoop cluster is in. The /etc/flume-ng/conf/flume.keytab file contains the keys necessary for flume/fully.qualified.domain.name@YOUR-REALM.COM to authenticate with other services.
Flume and Hadoop also provide a simple keyword, _HOST, that expands to the fully qualified domain name of the host machine where the service is running, so you can use one flume.conf file with the same hdfs.kerberosPrincipal value on all of your agent host machines.
agentName.sinks.sinkName.hdfs.kerberosPrincipal = flume/_HOST@YOUR-REALM.COM
Writing as different users across multiple HDFS sinks in a single Flume agent
Hadoop users, such as secure impersonation of Hadoop users (similar to "sudo" in UNIX). This is implemented in a way similar to how Oozie implements secure user impersonation.
The following steps to set up secure impersonation from Flume to HDFS assume your cluster is configured using Kerberos. (However, impersonation also works on non-Kerberos secured clusters, and Kerberos-specific aspects should be omitted in that case.)
- Configure Hadoop to allow impersonation. Add the following configuration properties to your core-site.xml.
<property> <name>hadoop.proxyuser.flume.groups</name> <value>group1,group2</value> <description>Allow the flume user to impersonate any members of group1 and group2</description> </property> <property> <name>hadoop.proxyuser.flume.hosts</name> <value>host1,host2</value> <description>Allow the flume user to connect only from host1 and host2 to impersonate a user</description> </property>
You can use the wildcard character * to enable impersonation of any user from any host. For more information, see Secure Impersonation. - Set up a Kerberos keytab for the Kerberos principal and host Flume is connecting to HDFS from. This user must match the Hadoop configuration in the preceding step. For instructions, see Configuring Hadoop Security in CDH 5.
- Configure the HDFS sink with the following configuration options:
- hdfs.kerberosPrincipal - fully qualified principal. Note: _HOST will be replaced by the hostname of the local machine (only in-between the / and @ characters)
- hdfs.kerberosKeytab - location on the local machine of the keytab containing the user and host keys for the above principal
- hdfs.proxyUser - the proxy user to impersonate
Example snippet (the majority of the HDFS sink configuration options have been omitted):
agent.sinks.sink-1.type = HDFS agent.sinks.sink-1.hdfs.kerberosPrincipal = flume/_HOST@YOUR-REALM.COM agent.sinks.sink-1.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab agent.sinks.sink-1.hdfs.proxyUser = weblogs agent.sinks.sink-2.type = HDFS agent.sinks.sink-2.hdfs.kerberosPrincipal = flume/_HOST@YOUR-REALM.COM agent.sinks.sink-2.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab agent.sinks.sink-2.hdfs.proxyUser = applogs
In the above example, the flume Kerberos principal impersonates the user weblogs in sink-1 and the user applogs in sink-2. This will only be allowed if the Kerberos KDC authenticates the specified principal (flume in this case), and the if NameNode authorizes impersonation of the specified proxy user by the specified principal.
Limitations
Flume does not support using multiple Kerberos principals or keytabs in the same agent. Creating files on HDFS as different users requires impersonation. To impersonate various other users, configure a single principal in Hadoop to impersonate all other user accounts.
In addition, the same keytab path must be used across all HDFS sinks in the same agent. Attempting to configure multiple principals or keytabs in the same agent raises a Flume error message:
Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab.