How To install Hadoop on Windows and Linux

Practical - 2
To Install Hadoop on Windows & Linux.


Download Practical




Fig.  1- Extracted Hadoop 2.6.0 folder

Configuration:
Step –1 Windows Path Configuration
Set HADOOP_HOME path in environment variable for windows

Right click on my computer > properties > advanced system settings > advance tab > environment variables > click on new


Fig.  2 – Creating New User Variable
Set hadoop bin directory path

Find path variable in system variable > click on edit > at the end insert ‘; (semicolon)’ and paste path up tohadoop bin. 


Step –2 Hadoop Configuration
Edit hadoop-2.6.0/etc/hadoop/core-site.xml, paste the following lines and save it.

Fig.  3 – Edited core-site.xml

Edit hadoop-2.6.0/etc/hadoop/mapred-site.xml, paste the following lines and save it.

Fig.  4 – Edited mapred-site.xml

Edit hadoop-2.6.0/etc/hadoop/hdfs-site.xml, paste the following lines and save it.
\
Fig.  5 – Edited hdfs-site.xml

Edit hadoop-2.6.0/etc/hadoop/yarn-site.xml, paste the following lines and save it.

Fig.  6 - Edited yarn-site.xml

Edit hadoop-2.6.0/etc/hadoop/hadoop-env.cmd.

Fig.  7 – Edited hadoop-env.cmd

Step –3 Start Everything
Open cmd and type hdfsnamenode–format’ –after execution you will see following logs.


Open cmd and point to sbin directory and type ‘start-all.cmd’.
It will start the following processes:
·         Namenode
·         Datanode
·         Yarn resourcemanager
·         Yarn nodemanager


Fig.  8–Namenode

Fig.  9–Datanode


Fig.  10 – Yarn resourcemanager


Fig.  11 – Yarn nodemanager


Step –4Namenode GUI, Resourcemanager GUI
Resourcemanager GUI address:  http://localhost:8088

Fig.  12 – Resourcemanager GUI

Namenode GUI address:  http://localhost:50070

Fig.  13 - Namenode GUI



Installation on Linux:


1.      Installing Java 6 JDK

user@ubuntu:~$ sudo apt-get update
or
Install Sun Java 6 JDK
user@ubuntu:~$ sudo apt-get install sun-java6-jdk
user@ubuntu:~$ java -version


2.      Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop.
user@ubuntu:~$ sudo addgroup hadoop_group
user@ubuntu:~$ sudo adduser --ingroup hadoop_group hduser1



This will add the user hduser1 and the group hadoop_group to the local machine. Add hduser1 to the sudo group
user@ubuntu:~$ sudo adduser hduser1 sudo


3.      Configuring SSH

We have to generate an SSH key for the hduser user.
user@ubuntu:~$ su - hduser1
hduser1@ubuntu:~$ ssh-keygen -t rsa -P ""



The second line will create an RSA key pair with an empty password.
You have to enable SSH access to your local machine with this newly created key which is done by the following
command.
hduser1@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


The final step is to test the SSH setup by connecting to the local machine with the hduser1 user. The step is also needed

to save your local machine’s host key fingerprint to the hduser user’s known hosts file.



hduser@ubuntu:~$ ssh localhost


4.      Main Installation

hduser@ubuntu:~$ su - hduser1



• Now, download and extract Hadoop 1.2.0
• Setup Environment Variables for Hadoop



Add the following entries to .bashrc file
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
export PATH= $PATH:$HADOOP_HOME/bin


5.      Configuration


hadoop-env.sh
Change the file: conf/hadoop-env.sh
#export JAVA_HOME=/usr/lib/j2sdk1.5-sun
to in the same file
# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 64 bit)
# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 32 bit)


conf/*-site.xml

Now we create the directory and set the required ownerships and permissions
hduser@ubuntu:~$ sudo mkdir -p /app/hadoop/tmp
hduser@ubuntu:~$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser@ubuntu:~$ sudo chmod 750 /app/hadoop/tmp


The last line gives reading and writing permissions to the /app/hadoop/tmp directory

• Error: If you forget to set the required ownerships and permissions, you will see a java.io.IO Exception when
you try to format the name node.
Paste the following between <configuration>


• In file conf/core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>



• In file conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>



• In file conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>


6.      Formatting the HDFS filesystem via the NameNode

To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable). Run the
command
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format


7.      Starting your single-node cluster

Before starting the cluster, we need to give the required permissions to the directory with the following command
hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop
Run the command
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh


This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the machine.

hduser@ubuntu:/usr/local/hadoop$ jps

Comments

Popular posts from this blog

Study of DB Miner Tool

Study of WEKA tool

Create calculated member using arithmetic operators and member property of dimension member