Sunday, August 31, 2014

Cloud Computing and Big Data: Threat to IT Jobs and Over Hyped?

There are two disruptive technologies which have gained significant traction in the world economy: Cloud Computing and Big Data.

Lets discuss some opportunities and threats and realistic scenarios regarding these technologies.

Cloud Computing:

The world is seeing significant transformation in business with the arrival of cloud. In this blog, our focus is on public cloud.

More and More large and small enterprises are moving to public cloud hosted by Amazon, Microsoft and HP. Software as a Service (SAAS) is the hottest service available. All small and big software vendors have started hosting their software on the public and offer as a service. This frees up organizations headache in purchasing hardware and software.
Pay as you go makes lots of sense to an organization. There is faster turnover  to transition to service in cloud then in your own Data centers.

Since multiple vendors are hosting their software on cloud and offering it as a service, they are also taking ownership of availability of service and management of service. This affects IT jobs in Operation. As more and more servers are built on cloud, the less jobs remain for operations team. This results in lots of jobs to me lost. This reduces the cost of an enterprises. So the managed services model of the IT companies is dying. Dont be surprised if managed service model is completely dead in 5 years.

Now there is an interesting issue here. The challenge for an enterprises will be multiple vendor management. There will be a situation when there will too many software vendors to be managed so that  enterprise starts bringing the cloud back to its Data center. Its a possibility. Circle of life. Lets wait and watch

Big Data

Let me start with the statement that Big Data technologies are awesome and its potential for enterprise is enormous.
But when we talk about business opportunities in big data solution from IT perspective, i think its over hyped.

Let em explain you why.

Its an awesome opportunity if you have to stand up multi node clusters for big data. But this is not true anymore thanks to Amazon and Microsoft Azure. I dont need to set up expensive cluster to get big data solution for my enterprise. I can just go to Amazon EMR or Microsoft Azure HDinsight and spin up and instance and get my big data solution in few hundred dollars.

SO for IT companies Big Data wont make much money too.

Thursday, August 28, 2014

Amazon Web Services: AWS elasticity Understanding and using Bootstrapping

In this blog we will discuss AWS elasticity Understanding and using Bootstrapping

-The process of automatically setting up your instances
   -Install Windows Updates
   -Create Custom DNS names
   -Register with an ELB
   -Mount additional drives
   -start specific services
   -Copy files from S3
   -Install latest version of the software

   -Modify firewall configuration

Bootstrapping tools and tricks
-Cloud-init is installed on LINUX AMIs by default
-EC2 config (windows service) is installed on the windows
   -Windows activation
   -Hostname changes
   …..and more
-Scripting ( bash, powershell) or config management(chef, puppet)
-Dynamic Information 

Understanding User data
-Limited to 16K in sinze
-Can be set through AWS management Console or AWS command line tool
-Linux:embed scripts with # (cloudinit)
-Windows:embed scripts with <script>(EC2config)
-Also interact with AWS services using ec2 command line tools

Bootstrapping Best practices and Principles

-Creating a good Bootstrapping AMI is great...but theres more
-Dynamic as possible:Discovery or registration of instances
-Dyanmic as possible: graceful departures
-Example: MySQL replica failure(Died, poor performance....)
   -Failure Detection ( Ping, slow slave status.....)
   -smooth addition (Duplicate source DB instance  id, use as an api, maual)

Tuesday, August 26, 2014

Amazon Web Services: AWS Elasticity Principles of elasticity

Traditional/AWS Architecture
-Traditional architecture is based on peak load

-AWS architecture provisions resources on demand / as needed

Elasticity Principle #1: Automate
-        -  Manual intervention= failure to meet demand
-        -  Elastic scaling = Automation
-        -  Preconfigured AMI
-         - Bootstrapping principles

Elasticity Principle #2: Loose Coupling
-          -Tightly coupled applications- apps where one component specifically depends on another specific , single      unit (creates failure or poor performance)
-       -   Loose coupling design uses AWS services or horizontally scaled design. you will need to recode your application.

Elasticity Principle #3: Staying stateless
-Avoid statefull application that stores state information on a single instance
-Instead store state information on a speedy, redundant, shared location (ex. dyanmeDB)

Elasticity Principle #4: Horizontal scaling
-Large instances will eventually hit a limit ( cpu, memory etc)
-Horizontal scaling and design grows without bounds

Monday, August 25, 2014

Amazon Web Services: Global Best Practices

I will be writing series of blog on Amazon Cloud. This blog is about Global best practices for public cloud created by Amazon

Why should you use the AWS cloud?
-Service based vs Resource based
-Instant Delivery
-Non-Commit architecture
-predictable, pay as you go cost model
-proven, expert design

Best Practice #1: Redundant Design
-No Single point of failure
-Many AWS services assume redundancy

Best Practice #2: Loose coupling

-Loose coupling design principle : Each application component is independent
-Use Integration with AWS services instead of “Hard Ties”
-Elastic Load balancer
-Simple notification service
-Simple Queue service 

Best Practice #3: Elasticity
-Grow or shrink architecture on demand
-Instances are “drones” reporting for duty: tasked during provision
-Bootstrapping/dynamic configuration
-health Monitoring\no faith in any of the system

Best Practice #4: Think Security
-AWS is secure…until they invited us! Shared Responsibility
-Use Iron wall security principles rather than fish net principles
-Encrypt everything you can
-Use MFA (multi factor authentication) when possible

Best Practice #5: Things work better in parallel
-New paradigm….IT resource limitations removed
-Farming: advantage of using one plow/five hrs or five plows /one hr?
-Use parallel processing( with elastic principles) to accomplish goals faster

Best Practice #6: Use Multiple storage types
-Multiple AWS offerings for multiple scenarios:
   -S3-Object Storage
   -Cloud front-Edge Caching (CDN)
   -EBS- block storage
   -RDS-Relational Database
   -Dynamo DB-NOSQL DB

Friday, August 22, 2014

Big Data: Microsoft HDinsight

So far we have covered following topics in Big Data

In this blog we will discuss  Microsoft HDinsight.

HD Insight Overview

- Microsoft Hadoop distribution
-local and cloud hadoop support
-Local: HDinsight server on HDP( hortonworks data platform)
-Cloud:HDinsight on Azure service

HDinsight advantages:
-Full integration with MS BI stack
-Sharepoint (Powerview)
-office (Excel)

Lets go to windows azure

here you can create a free trial account. I have already one account

HDinsight is not fully available in windows azure. to enable it you need to go to PREVIEW FEATURES

and enable it.

Thats it. all you need to go to dashboard.

Thursday, August 21, 2014


So far we have covered following topics in Big Data

                         Big Data- The Rise and the Future

·         Big data: Technology Stack

·         Big Data: Hadoop Distributed Filesystem (HDFS)

·         Big Data: Map Reduce

·         Big Data- Installing Hadoop ( Single Node)

·         Big Data- Apache Hadoop Multi Node

·         Big Data: Troubleshooting, Administering and optimizing Hadoop

·         Big Data: Managing HDFS

·         Big Data: Map Reduce Development

·         Big Data: Introduction to Pig

In this blog we will discuss AMAZON ELASTIC MAPREDUCE (EMR)


What is EMR?
-Webservice on top of AWS that uses EC2 for processing and S3 for storage
-Data is pulled from S3, processed by auto-configured EC2 cluster and results pushed back to S3
-Crunch your data in the cloud without the hassle of managing your own cluster/infrastructure!!

What is an EMR Job Flow?
-Data processing wizard
-Hive,mapreduce, hbase and pig

The only thing we need to do is configure EMR Job Flow. Once its configured, rest is very easy. Even EMR JOB FLOW is very easy in amazon.

Thats it.

Saturday, August 16, 2014

Big Data: Introduction to Zookeeper

So far we have covered following topics in Big Data

                         Big Data- The Rise and the Future

·         Big data: Technology Stack

·         Big Data: Hadoop Distributed Filesystem (HDFS)

·         Big Data: Map Reduce

·         Big Data- Installing Hadoop ( Single Node)

·         Big Data- Apache Hadoop Multi Node

·         Big Data: Troubleshooting, Administering and optimizing Hadoop

·         Big Data: Managing HDFS

·         Big Data: Map Reduce Development

·         Big Data: Introduction to Pig

In this blog we will discuss ZooKeeper.

In the world of hadoop, theme is distributed. What if you want to build your own distributed application?
You have to worry about centralized configuration, synchronization, serialization.

Zookeeper is the distributed coordination service for the distributed application. a centralized repository.


What is zookeeper?
-Distributed coordination service for distributed applications
-Used for synchronization, serialization and coordination
-Handles the 'nitty-gritty' side of distributed app dev
-Apps use these services to coordinate distributed processing

Distributed Challenges:
- coordination is error prone
-Rack conditions, deadlocks,partial failures, inconsistencies

ZooKeeper Goals
-Simple API

Typical Uses
-Configuration - message queue

Now Lets talk about ZOOKeeper architecture

Above diagram, if we looks closely
Z nodes
- Container for data and other nodes
-Stores Stats' user data( 1MB)

Z nodes types
- persistent

Thursday, August 14, 2014

Big Data: Introduction to Hbase

So far we have covered following topics in Big Data
                         Big Data- The Rise and the Future

·         Big data: Technology Stack

·         Big Data: Hadoop Distributed Filesystem (HDFS)

·         Big Data: Map Reduce

·         Big Data- Installing Hadoop ( Single Node)

·         Big Data- Apache Hadoop Multi Node

·         Big Data: Troubleshooting, Administering and optimizing Hadoop

·         Big Data: Managing HDFS

·         Big Data: Map Reduce Development

·         Big Data: Introduction to Pig

In this blog we will discuss Hbase.

What is HBase? It is
-Distributed Column Oriented Database on tap of Hadoop/HDFS: Those coming from relational database understand that there are Row oriented Database and column Oriented database. Hbase is column oriented database.

Row                    vs                   Column
-OLTP                                   -OLAP
-Single row                            -Aggregration but many rows and cloumns
-Smaller number of                 -High compression rates due to few distinct values
columns and rows
lets talk little high level, Hbase                    vs                RDBMS
-Schema less                                                           -Schema
-Wide                                                                      - Thin
-Denormalised                                                         - Normalised

How about differences between Hbase and HDFS?
HBase-Low latency access to single rows from billions of records
HDFS-High latency batch processing. No concept of random read/writes


We have another Master slave architecture here.So in this we have our master server and Region servers. Region servers are the slaves.Master server is compared to name node in HDFS.its gona manage and monitor Hbase cluster operations, assign region to region servers and do load balancing and spitting.

Zookeeper also plays major role in HBase, which we will discuss later. If you go to region server and go in detail

at the very top level, we see table, under table then we see region. Underneath the region is store.In the Store, there are memstore and store file. Memstore is the place where data comes in and later flushed to store file.

so this describes the Hbase architecture

Wednesday, August 13, 2014

5 Technologies that every Entrepreneur Needs - markITwrite

5 Technologies that every Entrepreneur Needs - markITwrite

MY TOP 50 Blogs

I have written about 50 blogs. When you write so many blogs, articles gets lost. Below is the link of my all blogs, in one place

Cloud Computing

  1. vCoud Director Installation and configuration
  2. Changing Face of Managed services: A Threat to Likes of HP, IBM
  3. HP CSA Implementation
  4. Are HP and Intel About to Revolutionize Virtual Desktops? (HPQ, INTC)
  5. HP CSA Vs VMware VCAC
  6. 10 best practices for cloud design
  7. vCloud Suite Architecture-1
  8. vCloud Suite Architecture-2
  9. Application Suitability For Cloud
  10. Openstack- Its importance in Cloud. The HP Helion Boost

Big Data:

·         Big data: Technology Stack

·         Big Data: Hadoop Distributed Filesystem (HDFS)

·         Big Data: Map Reduce

·         Big Data- Installing Hadoop ( Single Node)

·         Big Data- Apache Hadoop Multi Node

·         Big Data: Troubleshooting, Administering and optimizing Hadoop

·         Big Data: Managing HDFS

·         Big Data: Map Reduce Development

·         Big Data: Introduction to Pig

Datacenter, Transformation and Migration

·         Data Center Migration

·         Application Migration Strategy

·         Green Datacenter

Enterprise architecture- Solution Architecture

·         TOGAF: Practical implementation-1

·         TOGAF: Practical Implementation-2: Requirement Gathering- Business Architecture

·         TOGAF: Practical Implementation-3: Requirement Gathering-Information Systems Architecture, Technology architecture and Opportunities=> Solution





Architecting a Citrix Virtualization solution
 There are 2 parts to it:
  1)    Assessment
  2)    Design

Assessment is further divided into
·          User Community
·         Operating System Delievery
·         Application Delivery
·         Server Virtualization
·         Infrastructure
·         Security and personalization
·          Operation and Support
·         Conceptual Architecture

Design is further divided into
·         Application Delievery design
·         Desktop Delivery design
·         Access Design

Featured Post

Amazon Route 53

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.Route 53  perform three main functions in any...