How To install Hadoop on Windows and Linux

Practical - 2
To Install Hadoop on Windows & Linux.

Download Practical

Fig. 1- Extracted Hadoop 2.6.0 folder

Configuration:

Step –1 Windows Path Configuration

Set HADOOP_HOME path in environment variable for windows

Right click on my computer > properties > advanced system settings > advance tab > environment variables > click on new

Fig. 2 – Creating New User Variable

Set hadoop bin directory path

Find path variable in system variable > click on edit > at the end insert ‘; (semicolon)’ and paste path up tohadoop bin.

Step –2 Hadoop Configuration

Edit hadoop-2.6.0/etc/hadoop/core-site.xml, paste the following lines and save it.

Fig. 3 – Edited core-site.xml

Edit hadoop-2.6.0/etc/hadoop/mapred-site.xml, paste the following lines and save it.

Fig. 4 – Edited mapred-site.xml

Edit hadoop-2.6.0/etc/hadoop/hdfs-site.xml, paste the following lines and save it.

Fig. 5 – Edited hdfs-site.xml

Edit hadoop-2.6.0/etc/hadoop/yarn-site.xml, paste the following lines and save it.

Fig. 6 - Edited yarn-site.xml

Edit hadoop-2.6.0/etc/hadoop/hadoop-env.cmd.

Fig. 7 – Edited hadoop-env.cmd

Step –3 Start Everything

Open cmd and type ‘hdfsnamenode–format’ –after execution you will see following logs.

Open cmd and point to sbin directory and type ‘start-all.cmd’.

It will start the following processes:

· Namenode

· Datanode

· Yarn resourcemanager

· Yarn nodemanager

Fig. 8–Namenode

Fig. 9–Datanode

Fig. 10 – Yarn resourcemanager

Fig. 11 – Yarn nodemanager

Step –4Namenode GUI, Resourcemanager GUI

Resourcemanager GUI address: http://localhost:8088

Fig. 12 – Resourcemanager GUI

Namenode GUI address: http://localhost:50070

Fig. 13 - Namenode GUI

Installation on Linux:

1. Installing Java 6 JDK

user@ubuntu:~$ sudo apt-get update

Install Sun Java 6 JDK

user@ubuntu:~$ sudo apt-get install sun-java6-jdk

user@ubuntu:~$ java -version

2. Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop.

user@ubuntu:~$ sudo addgroup hadoop_group

user@ubuntu:~$ sudo adduser --ingroup hadoop_group hduser1

This will add the user hduser1 and the group hadoop_group to the local machine. Add hduser1 to the sudo group

user@ubuntu:~$ sudo adduser hduser1 sudo

3. Configuring SSH

We have to generate an SSH key for the hduser user.

user@ubuntu:~$ su - hduser1

hduser1@ubuntu:~$ ssh-keygen -t rsa -P ""

The second line will create an RSA key pair with an empty password.

You have to enable SSH access to your local machine with this newly created key which is done by the following

command.

hduser1@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to the local machine with the hduser1 user. The step is also needed

to save your local machine’s host key fingerprint to the hduser user’s known hosts file.

hduser@ubuntu:~$ ssh localhost

4. Main Installation

hduser@ubuntu:~$ su - hduser1

• Now, download and extract Hadoop 1.2.0

• Setup Environment Variables for Hadoop

Add the following entries to .bashrc file

# Set Hadoop-related environment variables

export HADOOP_HOME=/usr/local/hadoop

# Add Hadoop bin/ directory to PATH

export PATH= $PATH:$HADOOP_HOME/bin

5. Configuration

hadoop-env.sh

Change the file: conf/hadoop-env.sh

#export JAVA_HOME=/usr/lib/j2sdk1.5-sun

to in the same file

# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 64 bit)
# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 32 bit)

conf/*-site.xml

Now we create the directory and set the required ownerships and permissions

hduser@ubuntu:~$ sudo mkdir -p /app/hadoop/tmp

hduser@ubuntu:~$ sudo chown hduser:hadoop /app/hadoop/tmp

hduser@ubuntu:~$ sudo chmod 750 /app/hadoop/tmp

The last line gives reading and writing permissions to the /app/hadoop/tmp directory

• Error: If you forget to set the required ownerships and permissions, you will see a java.io.IO Exception when

you try to format the name node.

Paste the following between <configuration>

• In file conf/core-site.xml

<name>hadoop.tmp.dir</name>

<value>/app/hadoop/tmp</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

• In file conf/mapred-site.xml

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

• In file conf/hdfs-site.xml

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

6. Formatting the HDFS filesystem via the NameNode

To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable). Run the

command

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

7. Starting your single-node cluster

Before starting the cluster, we need to give the required permissions to the directory with the following command

hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop

Run the command

hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the machine.

hduser@ubuntu:/usr/local/hadoop$ jps

Search This Blog

GTU Practicals

How To install Hadoop on Windows and Linux

Comments

Post a Comment

Popular posts from this blog

Study of DB Miner Tool

Study of WEKA tool

Create calculated member using arithmetic operators and member property of dimension member