Saturday, August 16, 2014

Big Data: Introduction to Zookeeper

So far we have covered following topics in Big Data

                         Big Data- The Rise and the Future

·         Big data: Technology Stack

·         Big Data: Hadoop Distributed Filesystem (HDFS)

·         Big Data: Map Reduce

·         Big Data- Installing Hadoop ( Single Node)

·         Big Data- Apache Hadoop Multi Node

·         Big Data: Troubleshooting, Administering and optimizing Hadoop

·         Big Data: Managing HDFS

·         Big Data: Map Reduce Development

·         Big Data: Introduction to Pig


In this blog we will discuss ZooKeeper.

In the world of hadoop, theme is distributed. What if you want to build your own distributed application?
You have to worry about centralized configuration, synchronization, serialization.

Zookeeper is the distributed coordination service for the distributed application. a centralized repository.

ZOOKEEPER Overview

What is zookeeper?
-Distributed coordination service for distributed applications
-Used for synchronization, serialization and coordination
-Handles the 'nitty-gritty' side of distributed app dev
-Apps use these services to coordinate distributed processing



Distributed Challenges:
- coordination is error prone
-Rack conditions, deadlocks,partial failures, inconsistencies

ZooKeeper Goals
-Serialization
-reliability
-Simple API
-Atomicity


Typical Uses
-Configuration - message queue
-Notification/Synchronization

Now Lets talk about ZOOKeeper architecture


Above diagram, if we looks closely
Z nodes
- Container for data and other nodes
-Stores Stats' user data( 1MB)

Z nodes types
- persistent
-emphermeral
-sequential


No comments:

Post a Comment

Featured Post

Amazon Route 53

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.Route 53  perform three main functions in any...