Managing the Spark History Server
The Spark History Server displays information about the history of completed Spark applications. For further information, see Monitoring Spark Applications.
For instructions for configuring the Spark History Server to use Kerberos, see Spark Authentication.
Adding the Spark History Server Using Cloudera Manager
By default, the Spark (Standalone) service does not include a History Server. To configure applications to store history, on Spark clients, set spark.eventLog.enabled to true before starting the application.
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
- Go to the Spark service.
- Click the Instances tab.
- Click the Add Role Instances button.
- Select a host in the column under History Server, and then click OK.
- Click Continue.
- Check the checkbox next to the History Server role.
- Select Start. and click
- Click Close when the action completes.
Configuring and Running the Spark History Server Using the Command Line
- Create the /user/spark/applicationHistory/ directory in HDFS and set ownership and permissions as follows:
$ sudo -u hdfs hadoop fs -mkdir /user/spark $ sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory $ sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark $ sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory
- On hosts from which you will launch Spark jobs, do the following:
- Create /etc/spark/conf/spark-defaults.conf:
cp /etc/spark/conf/spark-defaults.conf.template /etc/spark/conf/spark-defaults.conf
- Add the following to /etc/spark/conf/spark-defaults.conf:
spark.eventLog.dir=hdfs://namenode_host:namenode_port/user/spark/applicationHistory spark.eventLog.enabled=true
orspark.eventLog.dir=hdfs://name_service_id/user/spark/applicationHistory spark.eventLog.enabled=true
- On one host, start the History Server:
$ sudo service spark-history-server start
- Create /etc/spark/conf/spark-defaults.conf:
To link the YARN ResourceManager directly to the Spark History Server, set the spark.yarn.historyServer.address property in /etc/spark/conf/spark-defaults.conf:
spark.yarn.historyServer.address=http://spark_history_server:history_port
By default, history_port is 18088. This causes Spark applications to write their history to the directory that the History Server reads.