I have encountered lots of tutorials from 2019 on how to install Spark on MacOS, like this one. However, due to a recent update on the availability of Java through Homebrew, these commands do not work anymore.

Step 1 (Optional): Install Homebrew

If you don’t have Homebrew, here’s the command:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Step 2: Install Java 8

Spark requires Java8, and this is where I had to browse Github to find this alternative command:

brew cask install homebrew/cask-versions/adoptopenjdk8

Step 3: Install Scala

You probably know it, but Apache-Spark is written in Scala, which is a requirement to run it.

brew install scala

Step 4: Install Spark

We’re almost there. Let’s now install Spark:

brew install apache-spark

Step 5: Install pySpark

You might want to write your Spark code in Python, and pySpark will be useful for that:

pip install pyspark

Step 6: Modify your bashrc

Whether you have bashrc or zshrc, modify your profile with the following commands. Adapt the commands to match your Python path (using which python3) and the folder in which Java has been installed:

export JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
export JRE_HOME=/Library/java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/
export SPARK_HOME=/usr/local/Cellar/apache-spark/3.0.1/libexec
export PATH=/usr/local/Cellar/apache-spark/3.0.1/bin:$PATH
export PYSPARK_PYTHON=/Users/maelfabien/opt/anaconda3/bin/python

Finally, source the profile using:

source .zshrc

And you are all set!

Step 7: Launch a Jupyter Notebook

Now, in your Jupyter notebook, you should be able to execute the following commands:

import pyspark
from pyspark import SparkContext
sc = SparkContext()
n = sc.parallelize([4,10,9,7])
n.take(3)

And observe the SparkUI on the following link: http://localhost:4040/.


Like it? Buy me a coffeeLike it? Buy me a coffee

Categories:

Updated:

Leave a comment