I have encountered lots of tutorials from 2019 on how to install Spark on MacOS, like this one. However, due to a recent update on the availability of Java through Homebrew, these commands do not work anymore.
Step 1 (Optional): Install Homebrew
If you don’t have Homebrew, here’s the command:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Step 2: Install Java 8
Spark requires Java8, and this is where I had to browse Github to find this alternative command:
brew cask install homebrew/cask-versions/adoptopenjdk8
Step 3: Install Scala
You probably know it, but Apache-Spark is written in Scala, which is a requirement to run it.
brew install scala
Step 4: Install Spark
We’re almost there. Let’s now install Spark:
brew install apache-spark
Step 5: Install pySpark
You might want to write your Spark code in Python, and pySpark will be useful for that:
pip install pyspark
Step 6: Modify your bashrc
Whether you have bashrc or zshrc, modify your profile with the following commands. Adapt the commands to match your Python path (using which python3
) and the folder in which Java has been installed:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
export JRE_HOME=/Library/java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/
export SPARK_HOME=/usr/local/Cellar/apache-spark/3.0.1/libexec
export PATH=/usr/local/Cellar/apache-spark/3.0.1/bin:$PATH
export PYSPARK_PYTHON=/Users/maelfabien/opt/anaconda3/bin/python
Finally, source the profile using:
source .zshrc
And you are all set!
Step 7: Launch a Jupyter Notebook
Now, in your Jupyter notebook, you should be able to execute the following commands:
import pyspark
from pyspark import SparkContext
sc = SparkContext()
n = sc.parallelize([4,10,9,7])
n.take(3)
And observe the SparkUI on the following link: http://localhost:4040/.