How to Configure Encryption for Amazon S3
Amazon offers several server-side encryption mechanisms for use with Amazon S3 storage. Cloudera clusters support server-side encryption for Amazon S3 data using either SSE-S3 (CDH 5.10 and later) or SSE-KMS (CDH 5.11 and later). With SSE-S3, keys are completely under the control of Amazon. With SSE-KMS, you have more control over the encryption keys, and can upload your own key material to use for encrypting Amazon S3. With either mechanism, encryption is applied transparently to the Amazon S3 bucket objects after you configure your cluster to use it.
Amazon S3 also supports TLS/SSL ('wire' or data-in-transit) encryption by default. Configuring data-at-rest encryption for Amazon S3 for use with your cluster involves some configuration both on Amazon S3 and for the cluster, using the Cloudera Manager Admin Console, as detailed below:
Requirements
Using Amazon S3 assumes that you have an Amazon Web Services account and the appropriate privileges on the AWS Management Console to set up and configure Amazon S3 buckets.
In addition, to configure Amazon S3 storage for use with a Cloudera cluster, you must have privileges as the User Administrator or Full Administrator on the Cloudera Manager Admin Console. See How to Configure AWS Credentials for details.
Amazon S3 and TLS/SSL Encryption
Amazon S3 uses TLS/SSL by default. Cloudera clusters (release 5.9 and later) include in their default configuration file the boolean property, fs.s3a.connection.ssl.enabled set to true, which activates TLS/SSL. This means that if the cluster has been configured to use TLS/SSL, connections from the cluster to Amazon S3 automatically use TLS wire encryption for the communication. The value of the fs.s3a.connection.ssl.enabled property can be confirmed by running hadoop org.apache.hadoop.conf.Configuration.
If the cluster is not configured to use TLS, the connection to Amazon S3 silently reverts to an unencrypted connection.
Amazon S3 and Data at Rest Encryption
- Server-side Encryption with AWS KMS-Managed Keys (SSE-KMS), which requires using Amazon Key Management Server (AWS KMS) in conjunction with your Amazon S3. You can have Amazon generate and manage the keys in AWS KMS for you, or you can provide your own key material, but you must configure AWS KMS and create a key before you can use it with your cluster. See Prerequisites for Using SSE-KMS for details.
- Server-side Encryption with S3-Managed Encryption Keys (SSE-S3) , which is simplest to set up because it uses Amazon-provided and -managed keys and has no requirements beyond setting a single property. See Configuring the Cluster to Use Server-Side Encryption on Amazon S3 for details.
Enabling the cluster to use Amazon S3 server-side encryption involves using the Cloudera Manager Admin Console to configure the Advanced Configuration Snippet (Safety Valve) as detailed in Configuring the Cluster to Use Server-Side Encryption on Amazon S3, below.
The steps assume that your cluster has been set up and that you have set up AWS credentials.
Prerequisites for Using SSE-KMS
To use SSE-KMS with your Amazon S3 bucket, you must log in to the AWS Management Console using the account you set up in step 1 of Getting Started with Amazon Web Services. For example, the account lab-iam has an IAM user named etl-workload that has been granted permissions on the Amazon S3 storage bucket to be configured using SSE-KMS.
- Select My Security Credentials from the menu.
- Click Encryption keys (bottom left-hand on the AWS Management Console that displays at step 1, above).
- Click the Create key button to start the 5-step key-creation wizard that leads you through entry pages for giving the key an alias and description; adding tags, defining administrator permissions to the key, and defining usage permissions. The last page of the wizard shows you the policy that will be applied to the key before creating the key.
Configuring the Cluster to Use Server-Side Encryption on Amazon S3
Follow the steps below to enable server-side encryption on Amazon S3. To use SSE-KMS encryption, you will need your KMS key ID at step 7. Using SSE-S3 has no pre-requisites—Amazon generates and manages the keys transparently.
To configure the cluster to encrypt data stored on Amazon S3:
- Log into the Cloudera Manager Admin Console.
- Select .
- Click the Configuration tab.
- Select .
- Select .
- Locate the Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property.
- In the text field, define the properties and values appropriate for the type of encryption you want to use.
- To use SSE-S3 encryption:
<property> <name>fs.s3a.server-side-encryption-algorithm</name> <value>AES256</value> </property>
- To use SSE-KMS encryption:
<property> <name>fs.s3a.server-side-encryption-algorithm</name> <value>SSE-KMS</value> </property> <property> <name>fs.s3a.server-side-encryption-key</name> <value>your_kms_key_id</value> </property>
- To use SSE-S3 encryption:
- Click Save Changes.
- Restart the HDFS service.
For the value of your_kms_key_id (step 7b., above), you can use any of Amazon's four different key ID formats. Here are some examples:
Format | Example |
---|---|
Key ARN | arn:aws:kms:us-west-1:141229114088:key/c914b724-f191- 41df-934a-6147f6235983 |
Alias ARN | arn:aws:kms:us-west-1: 141229114088:key/c914b724-f191-41df-934a-6147f6235983: alias/awsCreatedMasterKey |
Globally Unique Key ID | 141229114088:key/c914b724-f191-41df-934a- 6147f6235983 |
Alias Name | alias/awsCreatedMasterKey |
Changing Encryption Modes or Keys
Cloudera clusters can be configured to use only one type of server-side encryption for Amazon S3 data at a time.
- Changing encryption mechanisms or keys on Amazon S3 has no effect on existing encrypted or unencrypted data.
- Data stored on Amazon S3 without encryption remains unencrypted even after you configure encryption for Amazon S3.
- Any existing encrypted data continues using the original mechanism and key to decrypt data (on reads) and re-encrypt data (on writes).
- After changing encryption mode or key, new objects stored on Amazon S3 from the cluster use the new mode and key.
Effect of Changing Encryption Mode or Key
This table shows the effect on existing encrypted or unencrypted data on Amazon S3 (far left column labeled "Data starts as...," reading down) and the result of "New" and "Existing" data and the keys that would be used after changing encryption-key configuration on the cluster. After changing encryption mode or key, existing data (Existing) and new data (New) use the mode and keys shown in columns 2 ("Unencrypted") through 5 ("Non-SSE Key"):
Data starts as...↓ | Data results after modifying encryption mode or keys... | |||
---|---|---|---|---|
Unencrypted | SSE-S3 | SSE-KMS | Non-SSE Key | |
Unencrypted | Existing | New | New | ~ |
SSE-S3 encrypted | ~ | Existing | New | ~ |
SSE-KMS [key1] | ~ | New | Existing [key1] New [key2] | ~ |
Non-SSE key | ~ | ~ | ~ | Existing |
Migrating Encrypted Data to New Encryption Mode or Key
- Create an Amazon S3 bucket as temporary storage for the unencrypted files.
- Decrypt the data on the Amazon S3 bucket using the mechanism and key used for encryption (legacy encryption mode or key), moving the unencrypted data to the temporary bucket created in step 1.
- Configure the Amazon S3 bucket to use the new encryption mechanism and key of your choice (SSE-S3, SSE-KMS).
- Move the unencrypted data from the temporary bucket back to the Amazon S3 bucket that is now configured using the new mechanism and key.
Deleting an Encryption Key
If you change encryption modes or keys on Amazon S3, do not delete the key. To replace the old key and mode with a completely new mode or key, you must manually migrate the data.
When you delete an encryption key, Amazon puts the key in a Pending Deletion state (as shown in the Status column of the screenshot below) for at least 7
days, allowing you to reinstate a key if you change your mind or realize an error.
The pending time frame is configurable, from 7 up to 30 days. See AWS Key Management Service Documentation for complete details.