Cloud
Manual Cluster setup
We are going to install Hadoop 2.0 (YARN + MR2) on Virtual Machines (VMs) of the StratusLab Cloud. We will use 2 VMs (master + slave) where the master will be also a slave.
Launch VMs
$ stratus-run-instance --cpu=4 --ram=16384 --swap=2048 --vm-name=master --volatile-disk=20 ID $ stratus-run-instance --cpu=4 --ram=16384 --swap=2048 --vm-name=slave --volatile-disk=20 ID
Connect and config each VM
Mount external disk
$ mkdir /mnt/data_hadoop $ mount /dev/vdc /mnt/data_hadoop/
Use automatic configuration
$ source custom.sh
Ckeck if iptable is not running
$ service iptables status
Copy ssh keys of each MV
MV1$ cat /home/hadoop/.ssh/id_rsa.pub MV2$ nano /home/hadoop/.ssh/authorized_keys
Configure host file
$ more /etc/hosts 134.158.75.XX onevm-XX.lal.in2p3.fr master 134.158.75.XX onevm-XX.lal.in2p3.fr slave
Configure Hadoop files
- Parameters file:
core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml
- Only on Master node:
$ cat $HADOOP_INSTALL/etc/hadoop/masters master $ cat $HADOOP_INSTALL/etc/hadoop/slaves master slave
Start Hadoop services
Format HDFS from master
$ hdfs namenode -format ... Storage directory /mnt/data_hadoop/hadoop/tmp/dfs/name has been successfully formatted
Start Hadoop daemons
- Only on Master node:
$HADOOP_INSTALL/sbin/hadoop-daemon.sh start namenode $HADOOP_INSTALL/sbin/hadoop-daemon.sh start datanode $HADOOP_INSTALL/sbin/yarn-daemon.sh start resourcemanager $HADOOP_INSTALL/sbin/yarn-daemon.sh start nodemanager $HADOOP_INSTALL/sbin/yarn-daemon.sh start proxyserver $HADOOP_INSTALL/sbin/mr-jobhistory-daemon.sh start historyserver
- Only on Slave node:
$HADOOP_INSTALL/sbin/hadoop-daemon.sh start datanode $HADOOP_INSTALL/sbin/yarn-daemon.sh start nodemanager
Check
Master$ jps NameNode DataNode ResourceManager NodeManager JobHistoryServer Slave$ jps DataNode NodeManager
API
http://master:50070 => Namenode http://master:8088 => ResourceManager (cluster) http://master:50090 => Secondary NameNode http://master:50075 => DataNode