Saturday, July 26, 2014

Big Data- Apache Hadoop Multi Node

In the last blog we installed hadoop in a single node environment. In this blog we will do multi node environment.
All daemons will be spread across different nodes.

Cluster Configuration:
 2-10 node (small)
- Name Node, Job tracker and Secondary name node on the same machine
-Data node, task tracker on all other machines

10-40 Node ( medium/Single rack)
- Name Node, Job tracker on the same machine
-Secondary name node on the dedicated Machine
-Data node, task tracker on all other machines

100+ node ( large/multi rack)
- Name Node, Job tracker and Secondary name node on the dedicated machine
-Rack awareness
-Network, HDFS optimization
-Map reduce optimization

lets see the process of bringing up multi node cluster. Once you have a bunch of machines, with OS and hadoop on it, then you need to

-Get ssh key from the name node and distribute to all our slaves/data nodes.

-Then Configure name node. in the name node , we have the masters and the slaves file. in the masters file we configure our secondary name node. in the salve file, we will list all our data nodes.

NOTE: If you have name node and job tracker on different machines, make sure that slave files are synchronized,

-Third step is configure our Data nodes and Task trackers. this is done by editing their site.xml file, specifically core-site file and then map red site file

Once its done then

- file

we just need to follow our process and our multi node cluster is set.

The commands are: Hadoop Commands

Please single node cluster installation to help you in installation of multi node cluster.I am not going to reinvent the wheel.


No comments:

Post a Comment

Featured Post

Amazon Route 53

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.Route 53  perform three main functions in any...