Configuring Oozie to Enable MapReduce Jobs To Read/Write from Microsoft Azure (ADLS)
MapReduce jobs controlled by Oozie as part of a workflow can read from and write to Azure Data Lake Storage (ADLS). The steps below show you how to enable this capability. Before you
begin, you will need the following information from your Microsoft Azure account:
- The client id.
- The client secret.
- The refresh URL. To get this value, in the Azure portal, go to OAUTH 2.0 TOKEN ENDPOINT. This is the value you need for the refresh_URL, below. . In the Endpoints region, copy the
In the steps below, replace the path/to/file with the HDFS directory where the .jceks file is located, and replace access_key_ID and secret_access_key with your Microsoft Azure credentials.
- Create the credential store (.jceks) and add your Azure Client ID, Client Secret, and refresh URL to the store as follows:
hadoop credential create dfs.adls.oauth2.client.id -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value client ID hadoop credential create dfs.adls.oauth2.credential -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value client secret hadoop credential create dfs.adls.oauth2.refresh.url -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value refresh URL
- Set hadoop.security.credential.provider.path to the path of the .jceks file in Oozie's workflow.xml file in the MapReduce Action's <configuration> section so that the MapReduce framework can load the Azure credentials that give
access to ADLS.
<action name="ADLSjob"> <map-reduce> <job-tracker>${jobtracker}</job-tracker> <name-node>${namenode}</name-node> <configuration> <property> <name>hadoop.security.credential.provider.path</name> <value>jceks://hdfs/path/to/file.jceks</value> </property> .... .... </action>