Install and setup Apache Spark locally on windows

Step 1: Install Java

Java is an essential prerequisite for installing and running Apache Spark, as Spark is built on the Java Virtual Machine (JVM).

Apache Spark runs on Java 8/11/17. You can download the version 8 by going to https://www.java.com/en/download/manual.jsp link. Alternatively, for OpenJDK, visit the [OpenJDK download page](https://openjdk.java.net/install/index.html) to find the appropriate distribution.

Note: make sure to install your java in a folder path where folders name has no white spaces, otherwise you will get errors like The system cannot find the path specified or \Common was unexpected at this time, or strange errors.

For example, you can change the default destination of the installation like in the below snapshot.

 

Set up java home environment variable

Search for ‘Environment Variables’ in windows search and below window will appear.

Click on Environment Variables -> User variables -> New -> Create -> variable name JAVA_HOME with variable value <full path of your Java installation>

Setup JAVA_HOME path variable

Go to Path variable under User variables -> New -> %JAVA_HOME%\bin

At this point you have downloaded and setup the Java.

You can verify that Java is installed and environment variable is properly setup by opening CMD and running java -version command. The output would like like below;

java version "1.8.0_441"
Java(TM) SE Runtime Environment (build 1.8.0_441-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.441-b07, mixed mode)

Step 2: Download Apache Spark

Visit the official Apache Spark website at spark.apache.org/downloads.html.

Download the latest version of Apache Spark. I am downloading version 3.5.5 by clicking in step three.

Note the step 2 Apache Hadoop major version number because we need it later. For example for me is number Hadoop 3.

Step 3: Extract .tgz file

You can extract the .tgz file anywhere but again make sure there are no spaces in your folder names. I am extracting it in C:\spark.
Below is the snapshot after extracting.

 

Step 4: Download the winutils.exe

Next you need to download windows binaries for Hadoop versions. I am downloading it from https://github.com/steveloughran/winutils.
Make sure to match the Hadoop major version again, this is for me number 3, so I will download .exe file in hadoop-3.0.0/bin folder. You only need winutils.exe file for basic stuff, ignore other files.
I am downloading or copying it in C:\spark\hadoop\bin folder that I have created. The full path of the file would be C:\spark\hadoop\bin\winutils.exe
 

Step 5: Setup Spark and Hadoop environment variables

Similar to JAVA_HOME environment variable, we need to now setup SPARK_HOME and HADOOP_HOME variables under User variables.
Search for ‘Environment Variables’ in windows search.

Click on Environment Variables -> User variables -> New -> Create -> variable name HADOOP_HOME with variable value <full path of your hadoop installation before bin folder>

User variables -> New -> Create -> variable name SPARK_HOME with variable value <full path of your spark installation before bin folder>

Finally setup Path variable like below

Step 6: Test Apache Spark setup

Go to CMD and type pyspark or spark-shell and you should see following results in PySpark.

 

C:\Users\MaxImtiaz>PySpark

Python 3.13.2 (tags/v3.13.2:4f8bb39, Feb  4 2025, 15:23:48) [MSC v.1942 64 bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.

Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.

Unable to get Charset 'cp65001' for property 'sun.stderr.encoding', using default windows-1252 and continuing.

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

25/03/18 20:52:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 3.5.5

      /_/

 

Using Python version 3.13.2 (tags/v3.13.2:4f8bb39, Feb  4 2025 15:23:48)

Spark context Web UI available at http://host.docker.internal:4040

Spark context available as 'sc' (master = local[*], app id = local-1742327548139).

SparkSession available as 'spark'.

In this blog