Managing Spark Standalone Using the Command Line

This section describes how to configure and start Spark Standalone services.

For information on installing Spark using the command line, see Setting Up Apache Spark Using the Command Line. For information on configuring and starting the Spark History Server, see Configuring and Running the Spark History Server Using the Command Line.

For information on Spark applications, see Spark Application Overview.

Configuring Spark Standalone

Before running Spark Standalone, do the following on every host in the cluster:

  • Edit /etc/spark/conf/spark-env.sh and change hostname in the last line to the name of the host where the Spark Master will run:
    ###
    ### === IMPORTANT ===
    ### Change the following to specify the Master host
    ###
    export STANDALONE_SPARK_MASTER_HOST=`hostname`
    
  • Optionally, edit other configuration options:
    • SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT and SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports
    • SPARK_WORKER_CORES, to set the number of cores to use on this machine
    • SPARK_WORKER_MEMORY, to set how much memory to use (for example: 1000 MB, 2 GB)
    • SPARK_WORKER_INSTANCE, to set the number of worker processes per node
    • SPARK_WORKER_DIR, to set the working directory of worker processes

Starting and Stopping Spark Standalone Clusters

To start Spark Standalone clusters:
  1. On one host in the cluster, start the Spark Master:
    $ sudo service spark-master start

    You can access the Spark Master UI at spark_master:18080.

  2. On all the other hosts, start the workers:
    $ sudo service spark-worker start
To stop Spark, use the following commands on the appropriate hosts:
$ sudo service spark-worker stop
$ sudo service spark-master stop

Service logs are stored in /var/log/spark.