Thursday, August 14, 2014

Big Data: Introduction to Hbase

So far we have covered following topics in Big Data
                         Big Data- The Rise and the Future

·         Big data: Technology Stack

·         Big Data: Hadoop Distributed Filesystem (HDFS)

·         Big Data: Map Reduce

·         Big Data- Installing Hadoop ( Single Node)

·         Big Data- Apache Hadoop Multi Node

·         Big Data: Troubleshooting, Administering and optimizing Hadoop

·         Big Data: Managing HDFS

·         Big Data: Map Reduce Development

·         Big Data: Introduction to Pig

In this blog we will discuss Hbase.

What is HBase? It is
-Distributed Column Oriented Database on tap of Hadoop/HDFS: Those coming from relational database understand that there are Row oriented Database and column Oriented database. Hbase is column oriented database.

Row                    vs                   Column
-OLTP                                   -OLAP
-Single row                            -Aggregration but many rows and cloumns
-Smaller number of                 -High compression rates due to few distinct values
columns and rows
lets talk little high level, Hbase                    vs                RDBMS
-Schema less                                                           -Schema
-Wide                                                                      - Thin
-Denormalised                                                         - Normalised

How about differences between Hbase and HDFS?
HBase-Low latency access to single rows from billions of records
HDFS-High latency batch processing. No concept of random read/writes


We have another Master slave architecture here.So in this we have our master server and Region servers. Region servers are the slaves.Master server is compared to name node in HDFS.its gona manage and monitor Hbase cluster operations, assign region to region servers and do load balancing and spitting.

Zookeeper also plays major role in HBase, which we will discuss later. If you go to region server and go in detail

at the very top level, we see table, under table then we see region. Underneath the region is store.In the Store, there are memstore and store file. Memstore is the place where data comes in and later flushed to store file.

so this describes the Hbase architecture

No comments:

Post a Comment

Featured Post

Amazon Route 53

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.Route 53  perform three main functions in any...