Configuring Hadoop Cluster Using Ansible PlayBook

Monil Goyal
4 min readMar 20, 2021

--

In this article, I demonstrated how to create playbook to launch hadoop cluster.

I used virtual machines in local system to demonstrate this practical, you can also use cloud instances.

Steps to follow during this practical

  1. Update Inventory In Controller Node
  2. Create PlayBook
  3. Running PlayBook in Controller Node

Step 1: Update Inventory In Controller Node

Controller Node

→ IP of NameNode is 192.168.43.70

Namenode

→ IP of DataNode is 192.168.43.189

Datanode
Inventory file

Step 2: Create playBook

→ Ansible code to installing java and hadoop software

# installing required software
- hosts: Namenode,datanode
tasks:
- name: 'copy hadoop software'
copy:
src: /root/hadoop-1.2.1-1.x86_64.rpm
dest: /root/
notify: "install hadoop software"
- name: "copy java jdk software"
copy:
src: /root/jdk-8u171-linux-x64.rpm
dest: /root/
notify: "install jdk software"
handlers:
- name: "install hadoop software"
shell: "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force"
- name: "install jdk software"
shell: "rpm -ivh /root/jdk-8u171-linux-x64.rpm --force"

→ Ansible code to configuring Namenode

# configuring Namenode
- hosts: Namenode
vars_prompt:
- name: Namenode_dir
prompt: Namenode directory?
private: no
tasks:
- name: 'creating directory'
file:
state: directory
path: "{{Namenode_dir}}"
notify: "format namenode directory"
- name: "running handlers"
meta: flush_handlers
- name: 'configure core-site.xml'
blockinfile:
path: "/etc/hadoop/core-site.xml"
insertafter: "<configuration>"
block:
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9001</value>
</property>
- name: 'configure hdfs-site.xml'
blockinfile:
path: "/etc/hadoop/hdfs-site.xml"
insertafter: "<configuration>"
block:
<property>
<name>fs.name.dir</name>
<value>{{Namenode_dir}}</value>
</property>
- name: "start namenode"
shell: "hadoop-daemon.sh start namenode"
handlers:
- name: "format namenode directory"
shell: "echo Y| hadoop namenode -format"

→ Ansible code to configure datanode

# Configuring datanode
- hosts: datanode
vars_prompt:
- name: Datanode_dir
prompt: Datanode directory?
private: no
tasks:
- name: 'creating directory'
file:
state: directory
path: "{{Datanode_dir}}"
- name: 'configure core-site.xml'
blockinfile:
path: "/etc/hadoop/core-site.xml"
insertafter: "<configuration>"
block:
<property>
<name>fs.default.name</name>
<value>hdfs://{{groups['Namenode'][0]}}:9001</value>
</property>
- name: 'configure hdfs-site.xml'
blockinfile:
path: "/etc/hadoop/hdfs-site.xml"
insertafter: "<configuration>"
block:
<property>
<name>fs.data.dir</name>
<value>{{Datanode_dir}}</value>
</property>
- name: "start datanode"
shell: "hadoop-daemon.sh start datanode"

Step 3: Running PlayBook in Controller Node

→ Give a directory name to create a directory that will have all storage shared by datanode.

→ Give directory name to create a directory that will share storage to Master.

→ Check whether Namenode has started or not

→ Check whether datanode has started or not

→ Check information about cluster.

Thanks, for reading…

Keep Learning Keep Sharing !!!

--

--

No responses yet