So far we have covered following topics in Big Data
In this blog we will discuss Hbase.
What is HBase? It is
-Distributed Column Oriented Database on tap of Hadoop/HDFS: Those coming from relational database understand that there are Row oriented Database and column Oriented database. Hbase is column oriented database.
Row vs Column
-Single row -Aggregration but many rows and cloumns
-Smaller number of -High compression rates due to few distinct values
columns and rows
lets talk little high level, Hbase vs RDBMS
-Schema less -Schema
-Wide - Thin
-Denormalised - Normalised
How about differences between Hbase and HDFS?
HBase-Low latency access to single rows from billions of records
HDFS-High latency batch processing. No concept of random read/writes
We have another Master slave architecture here.So in this we have our master server and Region servers. Region servers are the slaves.Master server is compared to name node in HDFS.its gona manage and monitor Hbase cluster operations, assign region to region servers and do load balancing and spitting.
Zookeeper also plays major role in HBase, which we will discuss later. If you go to region server and go in detail
at the very top level, we see table, under table then we see region. Underneath the region is store.In the Store, there are memstore and store file. Memstore is the place where data comes in and later flushed to store file.
so this describes the Hbase architecture