Building and Running a Crunch Application with Spark
Developing and Running a Spark WordCount Application provides a tutorial on writing, compiling, and running a Spark application. Using the tutorial as a starting point, do the following to build and run a Crunch application with Spark:
- Along with the other dependencies shown in the tutorial, add the appropriate version of the
crunch-core and crunch-spark dependencies to the Maven project.
<dependency> <groupId>org.apache.crunch</groupId> <artifactId>crunch-core</artifactId> <version>${crunch.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.crunch</groupId> <artifactId>crunch-spark</artifactId> <version>${crunch.version}</version> <scope>provided</scope> </dependency>
- Use SparkPipeline where you would have used MRPipeline in the declaration of your Crunch pipeline. SparkPipeline takes either a String that contains the connection string for the Spark master (local for local mode, yarn for YARN) or a JavaSparkContext instance.
- As you would for a Spark application, use spark-submit start the pipeline with your Crunch application app-jar-with-dependencies.jar file.
For an example, see Crunch demo. After building the example, run with the following command:
spark-submit --class com.example.WordCount crunch-demo-1.0-SNAPSHOT-jar-with-dependencies.jar \ hdfs://namenode_host:8020/user/hdfs/input hdfs://namenode_host:8020/user/hdfs/output