Step 2: Verify User Accounts and Groups in CDH 5 Due to Security
- If you are using MRv1, see Step 2a (MRv1 only): Verify User Accounts and Groups in MRv1 for configuration information.
- If you are using YARN, see Step 2b (YARN only): Verify User Accounts and Groups in YARN for configuration information.
Step 2a (MRv1 only): Verify User Accounts and Groups in MRv1
During CDH 5 package installation of MRv1, the following Unix user accounts are automatically created to support security:
This User |
Runs These Hadoop Programs |
---|---|
hdfs |
HDFS: NameNode, DataNodes, Secondary NameNode (or Standby NameNode if you are using HA) |
mapred |
MRv1: JobTracker and TaskTrackers |
The hdfs user also acts as the HDFS superuser.
The hadoop user no longer exists in CDH 5. If you currently use the hadoop user to run applications as an HDFS super-user, you should instead use the new hdfs user, or create a separate Unix account for your application such as myhadoopapp.
MRv1: Directory Ownership in the Local File System
Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local filesystem of each host:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
Local |
dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
mapred.local.dir |
mapred:mapred |
drwxr-xr-x |
See also Setting Up MapReduce v1 (MRv1) Using the Command Line.
You must also configure the following permissions for the HDFS and MapReduce log directories (the default locations in /var/log/hadoop-hdfs and /var/log/hadoop-0.20-mapreduce), and the $MAPRED_LOG_DIR/userlogs/ directory:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
Local |
HDFS_LOG_DIR |
hdfs:hdfs |
drwxrwxr-x |
Local |
MAPRED_LOG_DIR |
mapred:mapred |
drwxrwxr-x |
Local |
userlogs directory in MAPRED_LOG_DIR |
mapred:anygroup |
permissions will be set automatically at daemon start time |
MRv1: Directory Ownership on HDFS
The following directories on HDFS must also be configured as follows:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
HDFS |
mapreduce.jobtracker.system.dir (mapred.system.dir is deprecated but will also work) |
mapred:hadoop |
drwx------ |
HDFS |
/ (root directory) |
hdfs:hadoop |
drwxr-xr-x |
MRv1: Changing the Directory Ownership on HDFS
- If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands before changing the directory ownership on HDFS:
$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM
If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. (For more information, see Error Messages and Various Failures). To change the directory ownership on HDFS, run the following commands. Replace the example /mapred/system directory in the commands below with the HDFS directory specified by the mapreduce.jobtracker.system.dir (or mapred.system.dir) property in the conf/mapred-site.xml file:
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system $ sudo -u hdfs hadoop fs -chown hdfs:hadoop / $ sudo -u hdfs hadoop fs -chmod -R 700 /mapred/system $ sudo -u hdfs hadoop fs -chmod 755 /
- In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating /tmp and setting its permissions, see these instructions.
Step 2b (YARN only): Verify User Accounts and Groups in YARN
During CDH 5 package installation of MapReduce 2.0 (YARN), the following Unix user accounts are automatically created to support security:
This User |
Runs These Hadoop Programs |
---|---|
hdfs |
HDFS: NameNode, DataNodes, Standby NameNode (if you are using HA) |
yarn |
YARN: ResourceManager, NodeManager |
mapred |
YARN: MapReduce JobHistory Server |
YARN: Directory Ownership in the Local Filesystem
Because the HDFS and MapReduce services run as different users, you must be sure to configure the correct directory ownership of the following files on the local filesystem of each host:
File System |
Directory |
Owner |
Permissions (see Footnote 1) |
---|---|---|---|
Local |
dfs.namenode.name.dir (dfs.name.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
dfs.datanode.data.dir (dfs.data.dir is deprecated but will also work) |
hdfs:hdfs |
drwx------ |
Local |
yarn.nodemanager.local-dirs |
yarn:yarn |
drwxr-xr-x |
Local |
yarn.nodemanager.log-dirs |
yarn:yarn |
drwxr-xr-x |
Local |
container-executor |
root:yarn |
--Sr-s--- |
Local |
conf/container-executor.cfg |
root:yarn |
r-------- |
You must also configure the following permissions for the HDFS, YARN and MapReduce log directories (the default locations in /var/log/hadoop-hdfs, /var/log/hadoop-yarn and /var/log/hadoop-mapreduce):
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
Local |
HDFS_LOG_DIR |
hdfs:hdfs |
drwxrwxr-x |
Local |
$YARN_LOG_DIR |
yarn:yarn |
drwxrwxr-x |
Local |
MAPRED_LOG_DIR |
mapred:mapred |
drwxrwxr-x |
YARN: Directory Ownership on HDFS
The following directories on HDFS must also be configured as follows:
File System |
Directory |
Owner |
Permissions |
---|---|---|---|
HDFS |
/ (root directory) |
hdfs:hadoop |
drwxr-xr-x |
HDFS |
yarn.nodemanager.remote-app-log-dir |
yarn:hadoop |
drwxrwxrwxt |
HDFS |
mapreduce.jobhistory.intermediate-done-dir |
mapred:hadoop |
drwxrwxrwxt |
HDFS |
mapreduce.jobhistory.done-dir |
mapred:hadoop |
drwxr-x--- |
YARN: Changing the Directory Ownership on HDFS
If Hadoop security is enabled, use kinit hdfs to obtain Kerberos credentials for the hdfs user by running the following commands:$ sudo -u hdfs kinit -k -t hdfs.keytab hdfs/fully.qualified.domain.name@YOUR-REALM.COM $ hadoop fs -chown hdfs:hadoop / $ hadoop fs -chmod 755 /
If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. See Error Messages and Various Failures. To change the directory ownership on HDFS, run the following commands:
$ sudo -u hdfs hadoop fs -chown hdfs:hadoop / $ sudo -u hdfs hadoop fs -chmod 755 / $ sudo -u hdfs hadoop fs -chown yarn:hadoop [yarn.nodemanager.remote-app-log-dir] $ sudo -u hdfs hadoop fs -chmod 1777 [yarn.nodemanager.remote-app-log-dir] $ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.intermediate-done-dir] $ sudo -u hdfs hadoop fs -chmod 1777 [mapreduce.jobhistory.intermediate-done-dir] $ sudo -u hdfs hadoop fs -chown mapred:hadoop [mapreduce.jobhistory.done-dir] $ sudo -u hdfs hadoop fs -chmod 750 [mapreduce.jobhistory.done-dir]
- In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating /tmp and setting its permissions, see Step 7: If Necessary, Create the HDFS /tmp Directory.
- In addition (whether or not Hadoop security is enabled), change permissions on the /user/history Directory. See Step 8: Create the history Directory and Set Permissions.