Ubuntu: How do I get pyspark on Ubuntu?


I can get Spark on it through the Software Center, but how do I get pyspark?


pyspark is a python binding to the spark program written in Scala.

As long as you have Java 6+ and Python 2.6+ you can download pre-built binaries for spark from the download page. Make sure that the java and python programs are on your PATH or that the JAVA_HOME environment variable is set. Follow these steps to get started;

  1. Unzip and move the unzipped directory to a working directory:

    tar -xzf spark-1.4.0-bin-hadoop2.6.tgz

    mv spark-1.4.0-bin-hadoop2.6 /srv/spark-1.4.8

  2. Symlink the version of Spark to a spark directory:

    ln -s /srv/spark-1.4.8 /srv/spark

  3. Edit ~/.bash_profile using your favorite text editor and add Spark to your PATH and set the SPARK_HOME environment variable:

    export SPARK_HOME=/srv/spark

    export PATH=$SPARK_HOME/bin:$PATH

Now you should be able to execute pyspark by running the command pyspark in the terminal.

