How to Configure a MapReduce Job to Access S3 with an HDFS Credstore

This topic describes how to configure your MapReduce jobs to read and write to Amazon S3 using a custom password for an HDFS Credstore.

Copy the contents of the /etc/hadoop/conf directory to a local working directory on the host where you will submit the MapReduce job. Use the --dereference option when copying the file so that symlinks are correctly resolved. For example:
```
cp -r --dereference /etc/hadoop/conf ~/my_custom_config_directory
```

Change the permissions of the directory so that only you have access:

chmod go-wrx -R my_custom_config_directory/

If you see the following message, you can ignore it:

cp: cannot open `/etc/hadoop/conf/container-executor.cfg' for reading: Permission denied

Add the following to the copy of the core-site.xml file in the working directory:

<property>
  <name>hadoop.security.credential.provider.path</name>
  <value>jceks://hdfs/user/username/awscreds.jceks</value>
</property>

Specify a custom Credstore by running the following command on the client host:
```
export HADOOP_CREDSTORE_PASSWORD=your_custom_keystore_password
```

In the working directory, edit the mapred-site.xml file:

Add the following properties:

<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_CREDSTORE_PASSWORD=your_custom_keystore_password</value>
</property>

<property>
  <name>mapred.child.env</name>
  <value>HADOOP_CREDSTORE_PASSWORD=your_custom_keystore_password</value>
</property>

Add yarn.app.mapreduce.am.env and mapred.child.env to the comma-separated list of values of the mapreduce.job.redacted-properties property. For example (new values shown bold):

<property>
  <name>mapreduce.job.redacted-properties</name>
  <value>fs.s3a.access.key,fs.s3a.secret.key,yarn.app.mapreduce.am.env,mapred.child.env</value>
</property>

Set the environment variable to point to your working directory:
```
export HADOOP_CONF_DIR=~/path_to_working_directory
```
Create the Credstore by running the following commands:
```
hadoop credential create fs.s3a.access.key
hadoop credential create fs.s3a.secret.key
```
You will be prompted to enter the access key and secret key.
List the credentials to make sure they were created correctly by running the following command:
```
hadoop credential list
```

Submit your job. For example:

ls
```
hdfs dfs -ls s3a://S3_Bucket/
```

distcp

hadoop distcp hdfs_path s3a://S3_Bucket/S3_path

teragen (package-based installations)

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 100 s3a://S3_Bucket/teragen_test

teragen (parcel-based installations)

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 100 s3a://S3_Bucket/teragen_test

Configuring and Managing S3Guard

Accessing Storage Using Microsoft ADLS