Saturday, July 26, 2014

Big Data- Apache Hadoop Multi Node

In the last blog we installed hadoop in a single node environment. In this blog we will do multi node environment.
All daemons will be spread across different nodes.

Cluster Configuration:
 2-10 node (small)
- Name Node, Job tracker and Secondary name node on the same machine
-Data node, task tracker on all other machines

10-40 Node ( medium/Single rack)
- Name Node, Job tracker on the same machine
-Secondary name node on the dedicated Machine
-Data node, task tracker on all other machines

100+ node ( large/multi rack)
- Name Node, Job tracker and Secondary name node on the dedicated machine
-Rack awareness
-Network, HDFS optimization
-Map reduce optimization

lets see the process of bringing up multi node cluster. Once you have a bunch of machines, with OS and hadoop on it, then you need to

-Get ssh key from the name node and distribute to all our slaves/data nodes.

-Then Configure name node. in the name node , we have the masters and the slaves file. in the masters file we configure our secondary name node. in the salve file, we will list all our data nodes.

NOTE: If you have name node and job tracker on different machines, make sure that slave files are synchronized,

-Third step is configure our Data nodes and Task trackers. this is done by editing their site.xml file, specifically core-site file and then map red site file

Once its done then

- file

we just need to follow our process and our multi node cluster is set.

The commands are: Hadoop Commands

Please single node cluster installation to help you in installation of multi node cluster.I am not going to reinvent the wheel.



Featured Post

Amazon Route 53

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.Route 53  perform three main functions in any...