Sunday, May 13, 2012

Hadoop install on Ubuntu Linux

Some friends asked me about how to setup Hadoop in Ubuntu Linux. I would strongly suggest Michael G. Noll's tutorial online: running-hadoop-on-ubuntu-linux-single-node-cluster
It's a very clearly written step-by-step tutorial and especially good for new users.

For his multi-node cluster setup, I do have some additional explanations on the network setup that's ommitted in his tutorial. So if you're done with his single node tutorial and got problems with the cluster setup, you may try examine the setups here:

1. Setup multiple single nodes
You can simply following Michael's tutorial in the above link.

2. Cluster Networking setup
2.1 Update hostname if needed:
//to display hostname:
hostname
//to change host name:
 vi /etc/hostname
 vi /etc/hosts


2.2  networking setup
2.2.1 set up static IP adress in /etc/network/interfaces in each node
Example contents in the interfaces file:
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address 192.168.0.1
netmask 255.255.255.0
gateway 192.168.229.2


2.2.2 Update /etc/hosts on all machines. Example:
192.168.0.1    master
192.168.0.2    slave

192.168.0.2    slave2

2.3 SSH access control
2.3.1 Truncate all files in $HOME/.ssh,  and regenerate SSH keys as in the single node
2.3.2 Add master's key to slave's authorized_keys file, to enable password-less SSH login
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
2.3.3 On slave machines do the same and add key's to master's authorized_keys file
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@master
2.3.4 Try ssh from both master and slave to master/slave

3. Hadoop setup
You need to modify the namespaceID of the datanodes to the same as that of the namenode in the master.
The namespaceIDs of the namenode and datanode are defined in:
${hadoop.tmp.dir}/dfs/name/current/VERSION
${hadoop.tmp.dir}/dfs/data/current/VERSION

Remember that here the {hadoop.tmp.dir} is setup in conf/core-site.xml

if you don't do this, you might get the error: java.io.IOException: Incompatible namespaceIDs in logs/hadoop-hduser-datanode-master(or: slave).log

After all these are done, you can then follow the multinode cluster setup tutorial in Michael's link.

Labels:

0 Comments:

Post a Comment

<< Home