How to Install Hadoop on macOS M1 (Apple Silicon) Using Homebrew

How to Install Hadoop on macOS M1 (Apple Silicon) Using Homebrew

In this article, we will walk you through the step-by-step process of installing Hadoop on a macOS M1 (Apple Silicon) machine using Homebrew. We’ll also cover configuration, common troubleshooting tips, and how to verify your Hadoop installation.


What is Hadoop?

Hadoop is an open-source framework used for distributed storage and processing of large datasets across many computers. It is widely used in big data analytics and data engineering. Hadoop’s ability to process vast amounts of data in parallel makes it a powerful tool for businesses and organizations.

For this tutorial, we will set up Hadoop on your local Mac, which allows you to experiment and learn without the need for a distributed setup.


Prerequisites

Before you start, make sure you have the following:

  • macOS M1 (Apple Silicon) machine.
  • Homebrew installed on your Mac (Homebrew is a package manager).
  • Java 8 or Java 11 installed on your machine (Hadoop has compatibility issues with Java versions above 11).

Step-by-Step Installation of Hadoop on macOS M1

1. Install Homebrew (if not already installed)

Homebrew is a package manager for macOS that simplifies the installation process for many software packages, including Hadoop.

  1. Open Terminal on your macOS.

After installation, verify Homebrew is installed by typing:

brew --version

Paste the following command to install Homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2. Install Java (OpenJDK)

Hadoop requires Java to run. Since Java 8 and Java 11 are the most compatible with Hadoop, you can install OpenJDK 11.

Verify the Java installation:

java -version

After installation, link Java to the system:

sudo ln -sfn /opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk

Install OpenJDK 11 using Homebrew:

brew install openjdk@11

3. Install Hadoop Using Homebrew

Homebrew simplifies installing Hadoop by handling the necessary dependencies for you.

After installation, verify Hadoop is installed by typing:

hadoop version

Install Hadoop via Homebrew:

brew install hadoop

4. Configure Hadoop

Once Hadoop is installed, the next step is configuring it to run on your machine. The configuration files are located in the etc/hadoop directory. Follow the steps below to configure them:

Edit hadoop-env.sh:Open the hadoop-env.sh file:

nano hadoop-env.sh

Set the JAVA_HOME environment variable to the path where Java is installed:

export JAVA_HOME=/opt/homebrew/opt/openjdk@11

Edit yarn-site.xml:Open yarn-site.xml:

nano yarn-site.xml

Add the following configuration:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

Edit mapred-site.xml:Open mapred-site.xml:

nano mapred-site.xml

Add the following configuration:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Edit hdfs-site.xml:Open hdfs-site.xml:

nano hdfs-site.xml

Add the following configuration:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Edit core-site.xml:Open core-site.xml with your preferred text editor:

nano core-site.xml

Add the following configuration:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

Navigate to the Hadoop configuration directory:

cd /opt/homebrew/Cellar/hadoop/3.3.6/libexec/etc/hadoop

5. Format the Hadoop Namenode

Before starting Hadoop, you need to format the Namenode (this is a one-time operation). Run the following command:

hadoop namenode -format

6. Start Hadoop

Once the configuration is complete, you can start Hadoop using the following command:

cd /opt/homebrew/Cellar/hadoop/3.3.6/libexec/sbin
./start-all.sh

This will start all the essential Hadoop services:

  • NameNode
  • DataNode
  • ResourceManager
  • NodeManager

Verifying the Installation

    • NameNode
    • DataNode
    • ResourceManager
    • NodeManager
  1. Access the Hadoop Web UIs: You can check the web interfaces to ensure that everything is working correctly:

Check the running Hadoop processes: Run jps to verify the Hadoop daemons are running:

jps

You should see processes like:

  • NameNode
  • DataNode
  • ResourceManager
  • NodeManager
  • SecondaryNameNode

Troubleshooting Common Issues

1. SSH Errors (Connection Refused)

If you encounter errors such as:

ssh: connect to host localhost port 22: Connection refused

Follow these steps:

  1. Enable Remote Login (SSH):
    • Open System Preferences > Security & Privacy > Privacy > Full Disk Access.
    • Add Terminal to Full Disk Access and restart the Terminal.

Restart Hadoop: After enabling SSH, run the start-all.sh script again:

./start-all.sh

Generate SSH Keys: If you don’t have SSH keys, generate them:

ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Enable remote login:

sudo systemsetup -setremotelogin on

2. Multiple Root Elements in XML Files

If you encounter an error like:

Illegal to have multiple roots (start tag in epilog?)

This means you have multiple <configuration> tags in one of your XML files. Here’s how to fix it:

  1. Fix XML Files: Ensure each configuration file (e.g., core-site.xml, hdfs-site.xml) has only one <configuration> root element.
  2. Remove Extra Tags: Remove any redundant <configuration></configuration> tags from the XML files.

Conclusion

By following this detailed guide, you should now have Hadoop installed and configured on your macOS M1 machine. You can use it to experiment with big data processing and learn how Hadoop works on a local machine.

With Hadoop running on your local machine, you’re all set to start experimenting with distributed processing, working with HDFS (Hadoop Distributed File System), and exploring MapReduce or YARN frameworks.