How To install Hadoop on Windows and Linux
Practical - 2
To Install Hadoop on Windows & Linux.
To Install Hadoop on Windows & Linux.
Download Practical
Fig. 1- Extracted Hadoop 2.6.0 folder
Configuration:
Step –1 Windows Path Configuration
Set HADOOP_HOME path in environment variable for windows
Right click on my computer > properties > advanced system settings > advance tab > environment variables > click on new
Fig. 2 – Creating New User Variable
Set hadoop bin directory path
Find path variable in system variable > click on edit > at the end insert ‘; (semicolon)’ and paste path up tohadoop bin.
Step –2 Hadoop Configuration
Edit hadoop-2.6.0/etc/hadoop/core-site.xml, paste the following lines and save it.
Fig. 3 – Edited core-site.xml
Edit hadoop-2.6.0/etc/hadoop/mapred-site.xml, paste the following lines and save it.
Fig. 4 – Edited mapred-site.xml
Edit hadoop-2.6.0/etc/hadoop/hdfs-site.xml, paste the following lines and save it.
\
Fig. 5 – Edited hdfs-site.xml
Edit hadoop-2.6.0/etc/hadoop/yarn-site.xml, paste the following lines and save it.
Fig. 6 - Edited yarn-site.xml
Edit hadoop-2.6.0/etc/hadoop/hadoop-env.cmd.
Fig. 7 – Edited hadoop-env.cmd
Step –3 Start Everything
Open cmd and type ‘hdfsnamenode–format’ –after execution you will see following logs.
Open cmd and point to sbin directory and type ‘start-all.cmd’.
It will start the following processes:
· Namenode
· Datanode
· Yarn nodemanager
Fig. 8–Namenode
Fig. 9–Datanode
Fig. 10 – Yarn resourcemanager
Fig. 11 – Yarn nodemanager
Step –4Namenode GUI, Resourcemanager GUI
Resourcemanager GUI address: http://localhost:8088
Fig. 12 – Resourcemanager GUI
Namenode GUI address: http://localhost:50070
Fig. 13 - Namenode GUI
Installation on Linux:
1. Installing Java 6 JDK
user@ubuntu:~$ sudo apt-get update
or
Install Sun Java 6 JDK
user@ubuntu:~$ sudo apt-get install sun-java6-jdk
user@ubuntu:~$ java -version
2. Adding a dedicated Hadoop system user
We will use a dedicated Hadoop user account for running Hadoop.
user@ubuntu:~$ sudo addgroup hadoop_group
user@ubuntu:~$ sudo adduser --ingroup hadoop_group hduser1
This will add the user hduser1 and the group hadoop_group to the local machine. Add hduser1 to the sudo group
user@ubuntu:~$ sudo adduser hduser1 sudo
3. Configuring SSH
We have to generate an SSH key for the hduser user.
user@ubuntu:~$ su - hduser1
hduser1@ubuntu:~$ ssh-keygen -t rsa -P ""
The second line will create an RSA key pair with an empty password.
You have to enable SSH access to your local machine with this newly created key which is done by the following
command.
hduser1@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
The final step is to test the SSH setup by connecting to the local machine with the hduser1 user. The step is also needed
to save your local machine’s host key fingerprint to the hduser user’s known hosts file.
hduser@ubuntu:~$ ssh localhost
4. Main Installation
hduser@ubuntu:~$ su - hduser1
• Now, download and extract Hadoop 1.2.0
• Setup Environment Variables for Hadoop
Add the following entries to .bashrc file
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
export PATH= $PATH:$HADOOP_HOME/bin
5. Configuration
hadoop-env.sh
Change the file: conf/hadoop-env.sh
#export JAVA_HOME=/usr/lib/j2sdk1.5-sun
to in the same file
# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 64 bit)
# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 32 bit)
conf/*-site.xml
Now we create the directory and set the required ownerships and permissions
hduser@ubuntu:~$ sudo mkdir -p /app/hadoop/tmp
hduser@ubuntu:~$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser@ubuntu:~$ sudo chmod 750 /app/hadoop/tmp
The last line gives reading and writing permissions to the /app/hadoop/tmp directory
• Error: If you forget to set the required ownerships and permissions, you will see a java.io.IO Exception when
you try to format the name node.
Paste the following between <configuration>
• In file conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
• In file conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
• In file conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
6. Formatting the HDFS filesystem via the NameNode
To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable). Run the
command
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
7. Starting your single-node cluster
Before starting the cluster, we need to give the required permissions to the directory with the following command
hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop
Run the command
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the machine.
hduser@ubuntu:/usr/local/hadoop$ jps
Comments
Post a Comment