Using Spark MLlib
CDH 5.5 supports MLlib, Spark's machine learning library. For information on MLlib, see the Machine Learning Library (MLlib) Guide.
Running a Spark MLlib Example
To try Spark MLlib using one of the Spark example applications, do the following:
- Download MovieLens sample data and copy it to HDFS:
$ wget --no-check-certificate \ https://raw.githubusercontent.com/apache/spark/branch-1.6/data/mllib/sample_movielens_data.txt $ hdfs dfs -copyFromLocal sample_movielens_data.txt /user/hdfs
- Run the
Spark MLlib MovieLens example application, which calculates recommendations based on movie reviews:
$ spark-submit --master local --class org.apache.spark.examples.mllib.MovieLensALS \ SPARK_HOME/lib/spark-examples.jar \ --rank 5 --numIterations 5 --lambda 1.0 --kryo sample_movielens_data.txt
Enabling Native Acceleration For MLlib
MLlib algorithms are compute intensive and benefit from hardware acceleration. To enable native acceleration for MLlib, perform the following tasks.
Install Required Software
- Install the appropriate libgfortran 4.6+ package for your operating system. No compatible version is available for RHEL 5 or 6.
OS Package Name Package Version RHEL 7.1 libgfortran 4.8.x SLES 11 SP3 libgfortran3 4.7.2 Ubuntu 12.04 libgfortran3 4.6.3 Ubuntu 14.04 libgfortran3 4.8.4 Debian 7.1 libgfortran3 4.7.2 - Install the GPL Extras parcel or package.
Verify Native Acceleration
You can verify that native acceleration is working by examining logs after running an application. To verify native acceleration with an MLlib example application:
- Do the steps in Running a Spark MLlib Example.
- Check the logs. If native libraries are not loaded successfully, you see the following four warnings before the final line, where the RMSE is printed:
15/07/12 12:33:01 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/12 12:33:01 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 15/07/12 12:33:01 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 15/07/12 12:33:01 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Test RMSE = 1.5378651281107205.
You see this on a system with no libgfortran. The same error occurs after installing libgfortran on RHEL 6 because it installs version 4.4, not 4.6+.After installing libgfortran 4.8 on RHEL 7, you should see something like this:15/07/12 13:32:20 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 15/07/12 13:32:20 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK Test RMSE = 1.5329939324808561.