Cloudera Manager Snapshot Policies
Minimum Required Role: BDR Administrator (also provided by Full Administrator)
Cloudera Manager enables the creation of snapshot policies that define the directories or tables to be snapshotted, the intervals at which snapshots should be taken, and the number of snapshots that should be kept for each snapshot interval. For example, you can create a policy that takes both daily and weekly snapshots, and specify that seven daily snapshots and five weekly snapshots should be maintained.
Managing Snapshot Policies
To create a snapshot policy:
- Select
Existing snapshot policies are shown in a table. See Snapshot Policies Page.
in
the top navigation bar.
- To create a new policy, click Create Snapshot Policy.
- From the drop-down list, select the service (HDFS or HBase) and cluster for which you want to create a policy.
- Provide a name for the policy and, optionally, a description.
- Specify the directories or tables to include in the snapshot.
- For an HDFS service, select the paths of the directories to include in the snapshot. The drop-down list allows you to select only directories that are enabled for snapshotting. If no
directories are enabled for snapshotting, a warning displays.
Click to add a path and to remove a path.
- For an HBase service, list the tables to include in your snapshot. You can use a Java regular expression to specify a set of tables. For example, finance.* matchs all tables with names starting with finance. You can also create a snapshot for all tables in a given namespace, using the {namespace}:.* syntax.
- For an HDFS service, select the paths of the directories to include in the snapshot. The drop-down list allows you to select only directories that are enabled for snapshotting. If no
directories are enabled for snapshotting, a warning displays.
- Specify the snapshot Schedule. You can schedule snapshots hourly, daily, weekly, monthly, or yearly, or any combination of those. Depending on the
frequency you select, you can specify the time of day to take the snapshot, the day of the week, day of the month, or month of the year, and the number of snapshots to keep at each interval. Each
time unit in the schedule information is shared with the time units of larger granularity. That is, the minute value is shared by all the selected schedules, hour by all the schedules for which hour
is applicable, and so on. For example, if you specify that hourly snapshots are taken at the half hour, and daily snapshots taken at the hour 20, the daily snapshot will occur at 20:30.
To select an interval, check its box. Fields display where you can edit the time and number of snapshots to keep. For example:
- Specify whether Alerts should be generated for various state changes in the snapshot workflow. You can alert on failure, on start, on success, or when the snapshot workflow is aborted.
- Click Save Policy.
The new Policy displays on the Snapshot Policies page. See Snapshot Policies Page.
To edit or delete a snapshot policy:
- Select
Existing snapshot policies are shown in a table. See Snapshot Policies Page.
in the top navigation bar.
- Click the Actions menu shown next to a policy and select Edit or Delete.
Snapshot Policies Page
The policies you add are shown in a table on the Snapshot Policies screen. The table displays the following columns:
Column | Description |
---|---|
Policy Name | The name of the policy. |
Cluster | The cluster that hosts the service (HDFS or HBase). |
Service | The service from which the snapshot is taken. |
Objects |
HDFS Snapshots: The directories included in the snapshot. HBase Snapshots: The tables included in the snapshot. |
Last Run | The date and time the snapshot last ran. Click the link to view the Snapshots History page. Also displays the status icon for the last run. |
Snapshot Schedule | The type of schedule defined for the snapshot: Hourly, Daily, Weekly, Monthly, or Yearly. |
Actions | A drop-down menu with the following options:
|
Snapshots History
The Snapshots History page displays information about Snapshot jobs that have been run or attempted. The page displays a table of Snapshot jobs with the following columns:
Column | Description |
---|---|
Start Time | Time when the snapshot job started execution.
Click to display details about the snapshot. For example: Click the View link to open the Managed scheduled snapshots Command page, which displays details
and messages about each step in the execution of the command. For example:
|
Outcome | Displays whether the snapshot succeeded or failed. |
Paths | Tables Processed |
HDFS snapshots: the number of Paths Processed for the snapshot. HBase snapshots: the number of Tables Processed for the snapshot. |
Paths | Tables Unprocessed |
HDFS Snapshots: the number of Paths Unprocessed for the snapshot. HBase Snapshots: the number of Tables Unprocessed for the snapshot. |
Snapshots Created | Number of snapshots created. |
Snapshots Deleted | Number of snapshots deleted. |
Errors During Creation | Displays a list of errors that occurred when creating the snapshot. Each error shows the related path and the error message. |
Errors During Deletion | Displays a list of errors that occurred when deleting the snapshot. Each error shows the related path and the error message. |
See Managing HDFS Snapshots and Managing HBase Snapshots for more information about managing snapshots.
Orphaned Snapshots
When a snapshot policy includes a limit on the number of snapshots to keep, Cloudera Manager checks the total number of stored snapshots each time a new snapshot is added, and automatically deletes the oldest existing snapshot if necessary. When a snapshot policy is edited or deleted, files, directories, or tables that were removed from the policy may leave "orphaned" snapshots behind that are not deleted automatically because they are no longer associated with a current snapshot policy. Cloudera Manager never selects these snapshots for automatic deletion because selection for deletion only occurs when the policy creates a new snapshot containing those files, directories, or tables.
You can delete snapshots manually through Cloudera Manager or by creating a command-line script that uses the HDFS or HBase snapshot commands. Orphaned snapshots can be hard to locate for manual deletion. Snapshot policies automatically receive the prefix cm-auto followed by a globally unique identifier (GUID). You can locate all snapshots for a specific policy by searching for t the prefix cm-auto-guid that is unique to that policy.
To avoid orphaned snapshots, delete snapshots before editing or deleting the associated snapshot policy, or record the identifying name for the snapshots you want to delete. This prefix is displayed in the summary of the policy in the policy list and appears in the delete dialog box. Recording the snapshot names, including the associated policy prefix, is necessary because the prefix associated with a policy cannot be determined after the policy has been deleted, and snapshot names do not contain recognizable references to snapshot policies.