Using PySpark Apache Spark provides APIs in non-JVM languages such as Python. Many data scientists use Python because it has a rich variety of numerical libraries with a statistical, machine-learning, or optimization focus. Continue reading: Running Spark Python Applications Spark and IPython and Jupyter Notebooks Running Spark Applications on YARN Running Spark Python Applications