How to Install Hadoop on macOS M1 (Apple Silicon) Using Homebrew

In this article, we will walk you through the step-by-step process of installing Hadoop on a macOS M1 (Apple Silicon) machine using Homebrew. We’ll also cover configuration, common troubleshooting tips, and how to verify your Hadoop installation.
What is Hadoop?
Hadoop is an open-source framework used for distributed storage and processing of large datasets across many computers. It is widely used in big data analytics and data engineering. Hadoop’s ability to process vast amounts of data in parallel makes it a powerful tool for businesses and organizations.
For this tutorial, we will set up Hadoop on your local Mac, which allows you to experiment and learn without the need for a distributed setup.
Prerequisites
Before you start, make sure you have the following:
- macOS M1 (Apple Silicon) machine.
- Homebrew installed on your Mac (Homebrew is a package manager).
- Java 8 or Java 11 installed on your machine (Hadoop has compatibility issues with Java versions above 11).
Step-by-Step Installation of Hadoop on macOS M1
1. Install Homebrew (if not already installed)
Homebrew is a package manager for macOS that simplifies the installation process for many software packages, including Hadoop.
- Open Terminal on your macOS.
After installation, verify Homebrew is installed by typing:
brew --version
Paste the following command to install Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
2. Install Java (OpenJDK)
Hadoop requires Java to run. Since Java 8 and Java 11 are the most compatible with Hadoop, you can install OpenJDK 11.
Verify the Java installation:
java -version
After installation, link Java to the system:
sudo ln -sfn /opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk
Install OpenJDK 11 using Homebrew:
brew install openjdk@11
3. Install Hadoop Using Homebrew
Homebrew simplifies installing Hadoop by handling the necessary dependencies for you.
After installation, verify Hadoop is installed by typing:
hadoop version
Install Hadoop via Homebrew:
brew install hadoop
4. Configure Hadoop
Once Hadoop is installed, the next step is configuring it to run on your machine. The configuration files are located in the etc/hadoop
directory. Follow the steps below to configure them:
Edit hadoop-env.sh
:Open the hadoop-env.sh
file:
nano hadoop-env.sh
Set the JAVA_HOME
environment variable to the path where Java is installed:
export JAVA_HOME=/opt/homebrew/opt/openjdk@11
Edit yarn-site.xml
:Open yarn-site.xml
:
nano yarn-site.xml
Add the following configuration:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Edit mapred-site.xml
:Open mapred-site.xml
:
nano mapred-site.xml
Add the following configuration:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit hdfs-site.xml
:Open hdfs-site.xml
:
nano hdfs-site.xml
Add the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Edit core-site.xml
:Open core-site.xml
with your preferred text editor:
nano core-site.xml
Add the following configuration:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Navigate to the Hadoop configuration directory:
cd /opt/homebrew/Cellar/hadoop/3.3.6/libexec/etc/hadoop
5. Format the Hadoop Namenode
Before starting Hadoop, you need to format the Namenode (this is a one-time operation). Run the following command:
hadoop namenode -format
6. Start Hadoop
Once the configuration is complete, you can start Hadoop using the following command:
cd /opt/homebrew/Cellar/hadoop/3.3.6/libexec/sbin
./start-all.sh
This will start all the essential Hadoop services:
- NameNode
- DataNode
- ResourceManager
- NodeManager
Verifying the Installation
NameNode
DataNode
ResourceManager
NodeManager
- Access the Hadoop Web UIs: You can check the web interfaces to ensure that everything is working correctly:
- NameNode UI: http://localhost:9870
- ResourceManager UI: http://localhost:8088
Check the running Hadoop processes: Run jps
to verify the Hadoop daemons are running:
jps
You should see processes like:
- NameNode
- DataNode
- ResourceManager
- NodeManager
- SecondaryNameNode
Troubleshooting Common Issues
1. SSH Errors (Connection Refused)
If you encounter errors such as:
ssh: connect to host localhost port 22: Connection refused
Follow these steps:
- Enable Remote Login (SSH):
- Open System Preferences > Security & Privacy > Privacy > Full Disk Access.
- Add Terminal to Full Disk Access and restart the Terminal.
Restart Hadoop: After enabling SSH, run the start-all.sh
script again:
./start-all.sh
Generate SSH Keys: If you don’t have SSH keys, generate them:
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Enable remote login:
sudo systemsetup -setremotelogin on
2. Multiple Root Elements in XML Files
If you encounter an error like:
Illegal to have multiple roots (start tag in epilog?)
This means you have multiple <configuration>
tags in one of your XML files. Here’s how to fix it:
- Fix XML Files: Ensure each configuration file (e.g.,
core-site.xml
,hdfs-site.xml
) has only one<configuration>
root element. - Remove Extra Tags: Remove any redundant
<configuration></configuration>
tags from the XML files.
Conclusion
By following this detailed guide, you should now have Hadoop installed and configured on your macOS M1 machine. You can use it to experiment with big data processing and learn how Hadoop works on a local machine.
With Hadoop running on your local machine, you’re all set to start experimenting with distributed processing, working with HDFS (Hadoop Distributed File System), and exploring MapReduce or YARN frameworks.