Integrating Hadoop Security with Active Directory

Considerations when using an Active Directory KDC

Performance:

As your cluster grows, so will the volume of Authentication Service (AS) and Ticket Granting Service (TGS) interaction between the services on each cluster server. Consider evaluating the volume of this interaction against the Active Directory domain controllers you have configured for the cluster before rolling this feature out to a production environment. If cluster performance suffers, over time it might become necessary to dedicate a set of AD domain controllers to larger deployments. Cloudera recommends you use a dedicated AD instance for every 100 nodes in your cluster. However, note that this recommendation may not apply to high-volume clusters, or cases where the AD host is also being used for LDAP lookups.

Network Proximity:

By default, Kerberos uses UDP for client/server communication. Often, AD services are in a different network than project application services such as Hadoop. If the domain controllers supporting a cluster for Kerberos are not in the same subnet, or they're separated by a firewall, consider using the udp_preference_limit = 1 setting in the [libdefaults] section of the krb5.conf used by cluster services. Cloudera strongly recommends against using AD domain controller (KDC) servers that are separated from the cluster by a WAN connection, as latency in this service will significantly impact cluster performance.

Process:

Troubleshooting the cluster's operations, especially for Kerberos-enabled services, will need to include AD administration resources. Evaluate your organizational processes for engaging the AD administration team, and how to escalate in case a cluster outage occurs due to issues with Kerberos authentication against AD services. In some situations it might be necessary to enable Kerberos event logging to address desktop and KDC issues within windows environments.

Also note that if you decommission any Cloudera Manager roles or nodes, the related AD accounts will need to be deleted manually. This is required because Cloudera Manager will not delete existing entries in Active Directory.

If direct integration with AD is not currently possible, use the following instructions to configure a local MIT KDC to trust your AD server:
  1. Run an MIT Kerberos KDC and realm local to the cluster and create all service principals in this realm.
  2. Set up one-way cross-realm trust from this realm to the Active Directory realm. Using this method, there is no need to create service principals in Active Directory, but Active Directory principals (users) can be authenticated to Hadoop. See Configuring a Local MIT Kerberos Realm to Trust Active Directory.

Configuring a Local MIT Kerberos Realm to Trust Active Directory

On the Active Directory Server

  1. Add the local realm trust to Active Directory with this command:
    netdom trust YOUR-LOCAL-REALM.COMPANY.COM /Domain:AD-REALM.COMPANY.COM /add /realm /passwordt:<TrustPassword>
  2. Set the proper encryption type with this command:

    On Windows 2003 RC2:

    ktpass /MITRealmName YOUR-LOCAL-REALM.COMPANY.COM /TrustEncryp <enc_type>

    On Windows 2008:

    ksetup /SetEncTypeAttr YOUR-LOCAL-REALM.COMPANY.COM <enc_type>

    The <enc_type> parameter specifies AES, DES, or RC4 encryption. Refer to the documentation for your version of Windows Active Directory to find the <enc_type> parameter string to use.

  3. Get and verify the list of encryption types set with this command:

    On Windows 2008:

    ksetup /GetEncTypeAttr YOUR-LOCAL-REALM.COMPANY.COM

On the MIT KDC Server

Type the following command in the kadmin.local or kadmin shell to add the cross-realm krbtgt principal. Use the same password you used in the netdom command on the Active Directory Server.

kadmin:  addprinc -e "<enc_type_list>" krbtgt/YOUR-LOCAL-REALM.COMPANY.COM@AD-REALM.COMPANY.COM

where the <enc_type_list> parameter specifies the types of encryption this cross-realm krbtgt principal will support: either AES, DES, or RC4 encryption. You can specify multiple encryption types using the parameter in the command above, what's important is that at least one of the encryption types corresponds to the encryption type found in the tickets granted by the KDC in the remote realm. For example:

kadmin:  addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/YOUR-LOCAL-REALM.COMPANY.COM@AD-REALM.COMPANY.COM

On All of the Cluster Hosts

  1. Verify that both Kerberos realms are configured on all of the cluster hosts. Note that the default realm and the domain realm should remain set as the MIT Kerberos realm which is local to the cluster.
    [realms]
      AD-REALM.CORP.FOO.COM = {
        kdc = ad.corp.foo.com:88
        admin_server = ad.corp.foo.com:749
        default_domain = foo.com
      }
      CLUSTER-REALM.CORP.FOO.COM = {
        kdc = cluster01.corp.foo.com:88
        admin_server = cluster01.corp.foo.com:749
        default_domain = foo.com
      }
  2. To properly translate principal names from the Active Directory realm into local names within Hadoop, you must configure the hadoop.security.auth_to_local setting in the core-site.xml file on all of the cluster machines. The following example translates all principal names with the realm AD-REALM.CORP.FOO.COM into the first component of the principal name only. It also preserves the standard translation for the default realm (the cluster realm).
    <property>
      <name>hadoop.security.auth_to_local</name>
      <value>
        RULE:[1:$1@$0](^.*@AD-REALM\.CORP\.FOO\.COM$)s/^(.*)@AD-REALM\.CORP\.FOO\.COM$/$1/g
        RULE:[2:$1@$0](^.*@AD-REALM\.CORP\.FOO\.COM$)s/^(.*)@AD-REALM\.CORP\.FOO\.COM$/$1/g
        DEFAULT
      </value>
    </property>

For more information about name mapping rules, see Configuring the Mapping from Kerberos Principals to Short Names.