Skip to main content

Big Data: Introduction to Hbase

So far we have covered following topics in Big Data
                         Big Data- The Rise and the Future

·         Big data: Technology Stack

·         Big Data: Hadoop Distributed Filesystem (HDFS)

·         Big Data: Map Reduce

·         Big Data- Installing Hadoop ( Single Node)

·         Big Data- Apache Hadoop Multi Node

·         Big Data: Troubleshooting, Administering and optimizing Hadoop

·         Big Data: Managing HDFS

·         Big Data: Map Reduce Development

·         Big Data: Introduction to Pig

In this blog we will discuss Hbase.

What is HBase? It is
-Distributed Column Oriented Database on tap of Hadoop/HDFS: Those coming from relational database understand that there are Row oriented Database and column Oriented database. Hbase is column oriented database.

Row                    vs                   Column
-OLTP                                   -OLAP
-Single row                            -Aggregration but many rows and cloumns
-Smaller number of                 -High compression rates due to few distinct values
columns and rows
lets talk little high level, Hbase                    vs                RDBMS
-Schema less                                                           -Schema
-Wide                                                                      - Thin
-Denormalised                                                         - Normalised

How about differences between Hbase and HDFS?
HBase-Low latency access to single rows from billions of records
HDFS-High latency batch processing. No concept of random read/writes


We have another Master slave architecture here.So in this we have our master server and Region servers. Region servers are the slaves.Master server is compared to name node in HDFS.its gona manage and monitor Hbase cluster operations, assign region to region servers and do load balancing and spitting.

Zookeeper also plays major role in HBase, which we will discuss later. If you go to region server and go in detail

at the very top level, we see table, under table then we see region. Underneath the region is store.In the Store, there are memstore and store file. Memstore is the place where data comes in and later flushed to store file.

so this describes the Hbase architecture


Popular posts from this blog

Data Center Migration

Note: This blog is written with the help of my friend Rajanikanth
Data Center Migrations / Data Center Consolidations
Data Center Consolidations, Migrations are complex projects which impact entire orgnization they support. They usually dont happen daily but once in a decade or two. It is imperative to plan carefully, leverage technology improvements, virtualization, optimizations.
The single most important factor for any migration project is to have high caliber, high performing, experienced technical team in place. You are migrating business applications from one data center to another and there is no scope for failure or broken application during migration. So testing startegy should be in place for enterprise business applications to be migrated.
Typical DCC and Migrations business objectives
Business Drivers
·Improve utilization of IT assets ·DC space & power peaked out - business growth impacted ·Improve service levels and responsiveness to new applications ·Reduce support complexi…

HP CSA Implementation

I know the above picture is little confusing but don’t worry I break it down and explain in detail. By the time I am done explaining you all will be happy. HARDWARE AND SOFTWARE REQUIREMENTS 1.VMware vSphere infrastructure / Microsoft Hyper V: For the sake of Simplicity we will use VMware vSphere. We Need vSphere 4.0 /5/5.5 and above and vCenter 4.0 and above ready and installed. This is the first step. 2.We need Software medias for HP Cloud Service Automation, 2.00, HP Server Automation, 9.02, HP Operations Orchestration (OO)9.00.04, HP Universal CMDB 9.00.02, HP Software Site Scope, 11.01,HP Insight Software6.2 Update 1 3.DNS, DHCP and NTP systems are already installed and configured. NTP information should be part of VM templates 4.SQL Server 2005 or Microsoft® SQL Server 2008 or Microsoft® SQL Server 2012 , Oracle 11g, both 32-bit and 64-bit versions may be used for CSA database.
5.We will install  HP Cloud Service Automation, 2.00, HP Server Automation, 9.02, HP Operations Orchestra…

Openstack- Its importance in Cloud. The HP Helion Boost

Every enterprise expects few things from cloud computing, mainly:

· Auto scaling: The workload should increase and decrease as needed by the IT environment.

· Automatic repair: If there is any fault or crash of the application or the server, it automatically fix it

· Fault tolerant: The application or underlying technology is intelligent enough to make itself fault torrent

· Integrated lifecycle: It should have integrated lifecycle

· Unified management: Its easy to manage all different aspects of technology

· Less cost

· Speed

Its year 2014. till now only 5% to 7% enterprises are using cloud computing. Such a small number. Its a huge opportunity and a vast majority for anyone who is interested in providing cloud computing services.
Current IT environment is very complex. You just cant solve all your problems with cloud computing.
There are legacy systems, databases, data processors, different hardware and software. You name it , there are so many technology available in just o…