Configuring MRv1 Security

If you are using YARN, skip this section and see Configuring YARN Security.

If you are using MRv1, do the following steps to configure, start, and test secure MRv1.

  1. Step 1: Configure Secure MRv1
  2. Step 2: Start up the JobTracker
  3. Step 3: Start up a TaskTracker
  4. Step 4: Try Running a Map/Reduce Job

Step 1: Configure Secure MRv1

Keep the following important information in mind when configuring secure MapReduce:

  • The properties for JobTracker and TaskTracker must specify the mapred principal, as well as the path to the mapred keytab file.
  • The Kerberos principals for the JobTracker and TaskTracker are configured in the mapred-site.xml file. The same mapred-site.xml file with both of these principals must be installed on every host machine in the cluster. That is, it is not sufficient to have the JobTracker principal configured on the JobTracker host machine only. This is because, for example, the TaskTracker must know the principal name of the JobTracker to securely register with the JobTracker. Kerberos authentication is bi-directional.
  • Do not use ${user.name} in the value of the mapred.local.dir or hadoop.log.dir properties in mapred-site.xml. Doing so can prevent tasks from launching on a secure cluster.
  • Make sure that each user who will be running MRv1 jobs exists on all cluster hosts (that is, on every host that hosts any MRv1 daemon).
  • Make sure the value specified for mapred.local.dir is identical in mapred-site.xml and taskcontroller.cfg. If the values are different, this error message is returned.
  • Make sure the value specified in taskcontroller.cfg for hadoop.log.dir is the same as what the Hadoop daemons are using, which is /var/log/hadoop-0.20-mapreduce by default and can be configured in mapred-site.xml. If the values are different, this error message is returned.

To configure secure MapReduce:

  1. Add the following properties to the mapred-site.xml file on every machine in the cluster:
    <!-- JobTracker security configs -->
    <property>
      <name>mapreduce.jobtracker.kerberos.principal</name>
      <value>mapred/_HOST@YOUR-REALM.COM</value>
    </property>
    <property>
      <name>mapreduce.jobtracker.keytab.file</name>
      <value>/etc/hadoop/conf/mapred.keytab</value> <!-- path to the MapReduce keytab -->
    </property>
    
    <!-- TaskTracker security configs -->
    <property>
      <name>mapreduce.tasktracker.kerberos.principal</name>
      <value>mapred/_HOST@YOUR-REALM.COM</value>
    </property>
    <property>
      <name>mapreduce.tasktracker.keytab.file</name>
      <value>/etc/hadoop/conf/mapred.keytab</value> <!-- path to the MapReduce keytab -->
    </property>
    
    <!-- TaskController settings -->
    <property>
      <name>mapred.task.tracker.task-controller</name>
      <value>org.apache.hadoop.mapred.LinuxTaskController</value>
    </property>
    <property>
      <name>mapreduce.tasktracker.group</name>
      <value>mapred</value>
    </property>
  2. Create a file called taskcontroller.cfg that contains the following information:
    hadoop.log.dir=<Path to Hadoop log directory. Should be same value used to start the TaskTracker. This is required to set proper permissions on the log files so that they can be written to by the user's tasks and read by the TaskTracker for serving on the web UI.>
    mapreduce.tasktracker.group=mapred
    banned.users=mapred,hdfs,bin
    min.user.id=1000 
  3. The path to the taskcontroller.cfg file is determined relative to the location of the task-controller binary. Specifically, the path is <path of task-controller binary>/../../conf/taskcontroller.cfg. If you installed the CDH 5 package, this path will always correspond to /etc/hadoop/conf/taskcontroller.cfg.

Step 2: Start up the JobTracker

You are now ready to start the JobTracker.

If you're using the /etc/init.d/hadoop-0.20-mapreduce-jobtracker script, then you can use the service command to run it now:

$ sudo service hadoop-0.20-mapreduce-jobtracker start

You can verify that the JobTracker is working properly by opening a web browser to http://machine:50030/ where machine is the name of the machine where the JobTracker is running.

Step 3: Start up a TaskTracker

You are now ready to start a TaskTracker.

If you're using the /etc/init.d/hadoop-0.20-mapreduce-tasktracker script, then you can use the service command to run it now:

$ sudo service hadoop-0.20-mapreduce-tasktracker start

Step 4: Try Running a Map/Reduce Job

You should now be able to run Map/Reduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples (/usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar). You need Kerberos credentials to do so.