Ubuntu: How do I get pyspark on Ubuntu?



Question:

I can get Spark on it through the Software Center, but how do I get pyspark?


Solution:1

pyspark is a python binding to the spark program written in Scala.

As long as you have Java 6+ and Python 2.6+ you can download pre-built binaries for spark from the download page. Make sure that the java and python programs are on your PATH or that the JAVA_HOME environment variable is set. Follow these steps to get started;

  1. Unzip and move the unzipped directory to a working directory:

    tar -xzf spark-1.4.0-bin-hadoop2.6.tgz

    mv spark-1.4.0-bin-hadoop2.6 /srv/spark-1.4.8

  2. Symlink the version of Spark to a spark directory:

    ln -s /srv/spark-1.4.8 /srv/spark

  3. Edit ~/.bash_profile using your favorite text editor and add Spark to your PATH and set the SPARK_HOME environment variable:

    export SPARK_HOME=/srv/spark

    export PATH=$SPARK_HOME/bin:$PATH

Now you should be able to execute pyspark by running the command pyspark in the terminal.

Some references:

https://spark.apache.org/docs/0.9.0/python-programming-guide.html

https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »