Embrace NoSQL as a relational Guy – Column Family Store

It’s been really long, ever since I wrote any post. It’s been a really busy month otherwise, It’s really difficult to stay away from writing – consciously. This is the final post from this series of posts about NoSQL technologies.

My Intent for this series of post, is to cover breadth of technologies to help the readers understand the bigger picture. It feels like a revolution where the technology is growing at a massive scale. Everyone must have , at least a basic understanding about the technologies like Cloud Computing/NoSQL/Big Data/Machine Learning etc.

Okay, let’s talk about Column family store aka Columnar family. The best way to explain this will be, by taking an example of Columnstore Indexes starting SQL Server 2012. In the traditional SQL Server tables the data is stored in the form of rows:


Referring to above picture – If we need to select column data , in RDBMS systems the data is stored in the unit of a complete row on the page. Even if we select a single column from the table, entire row has to be brought in the memory.

For business intelligence reports , we generally rely on aggregations like sum /avg /max /count etc. Imagine aggregation of a single column on a table with 1 billion rows will have to scan entire table to process the query. On the other hand, if the data is stored in the form of individual columns, then aggregations reduce huge number of I/Os. Moreover, Columnar databases offer huge compression ratio which can easily convert a 1 TB into few GBs.

This database system is specifically designed to cater to aggregations and BI reporting e.g. If you want to find and average hits on each website on the web from the terabytes of size of a table, it’s probably going to take days but with Columnar database, the benefit that we will get is:

1. The data will picked depending on the columns selected in the query instead if the entire row. Mostly, these aggregations go for scans and scanning TBs of data is going to take very long. By leveraging Columnar databases, the I/O will drastically reduce.

2. Using Columnar databases e.g. HBase, we can leverage distributed processing to fetch the result faster. As we know, with these NoSQL technologies , we can scale out really well and can leverage power of distributed queries across multiple machines.The data which takes days to process can be processed within minutes or seconds.

Major Known players for Columnar databases are: HBase, Cassandra, Amazon’s DynamoDB and Google’s Bigtable.
References: – https://www.youtube.com/watch?v=C3ilG2-tIn0

Embrace NoSQL as a relational Guy – Key Value Pair

There are majorly two types of Key Value pair DBs, 1. Persisted  2. Non-persisted (cache based). This is a very popular type of NoSQL database e.g.

Persisted –> Azure tables, Amazon Dynamo, CouchBase etc.

Non-Persisted –> Redis Cache , Memcached etc.  (Main purpose is caching on the websites)

The data in these databases is stored in the form key and Value:image

The data is accessed based on the key and the value can be JSON or XML or image or any thing which fits in blob storage i.e. the value is stored in the form of blobs. Like other NoSQL DBs ,they are not schema bound. For the ecommerce websites, if we want to store the information about a customer shopping. We can have Key as customer id and value as all the shopping information. Since they have all the required data stored in a single unit in the form of a value, it can be scaled really efficiently.
Another use case for Key value store is , storing session information e.g. a game where millions of users are active online, their profile information can be stored in key value pair. These databases can handle massive scale easily and it can also provide redundancy to avoid loss of data. Moreover, there are many applications which are used to just store information in the form of images , can leverage this key value pair database easily.

As a RDBMS guy, it’s little difficult to relate to these databases but just try to understand the context for now. We will try to discover more about these databases as we go along. In a summary, that’s another factor which influence our decision to choose a NoSQL DB : (Tables refer to Azure tables (key value pair DBs)):


References – http://davidchappellopinari.blogspot.in/


Embrace NoSQL technologies as a relational Guy – Graph DB

It’s a fact that NoSQL technologies are growing at a rapid pace. I even heard someone saying NoSQL is old now , NewSQL is the new trend. NewSQL gives performance of NoSQL and follows ACID principals of RDBMS system. Anyways, lets focus on Graph DB for now.

Graph databases are specialized in dealing with relationship data. They specialize in finding the patters and relationships between the certain events or employees of organization or certain operations. It can help to make the application more interactive by suggesting more options based on the previous patterns of browsing or shopping.

Have you noticed:

1. Facebook offers option of suggested friends

2. LinkedIn offering suggested connections or connections from the same organization-that’s the use of Graph Databases.

3. Flipkart/Amazon offering “people also viewed” (Real time recommendations) options help you purchase more products.

4. Master data management where based on the support case, knowledge base article could be suggested for the faster resolution.

5. Dependency analysis of shutting down an IT operation i.e. users which may be impacted if this router is shut down for maintenance. It can help to send the advanced notification to those employees.

Graph DBs are being used for all of the above scenarios. Just check the below picture to see how relationships look like:


Neo4J is one of the best Graph DB companies today. The language used for Neo4J is Cypher. It’s being used largely by the major Tech./healthcare/manufacturing companies.

Please check this video for more details about relationship and properties. It’s series of videos which you could look for to understand more about this subject.


Embrace NoSQL technologies as a relational Guy – Document Store

This is the one of the most famous NoSQL technologies today. In Document Store we don’t store word/PowerPoint/Excel Spreadsheets but majorly JSON documents.

I am sure you have heard about XML (Extensible Markup Language) documents, it’s simply used as intermediate data when we want to communicate with various devices/applications.e.g. if .net based application wants to communicate with Java based application or even C language based device – they can send XML documents to share the data across each other. JSON(JavaScript Object Notation) is based on similar concept but it’s more optimized and easier to use hence largely accepted as a standard of communication.




The reason why JSON has become so popular , with websites like Flipkart,Amazon or even gaming companies or cross device communication, is the ease to use and high performance while working with application of large scale.

Let’s take an example of an application which stores data from a sensing device. It hits thousands of time within seconds and moreover, the data transmission is in the form of JSON. If the data is stored in the RDBMS system,with every hit the JSON will have to converted to a plain data and inserted into the table. If it’s to be done thousands of time per second , there will definitely be an overhead. There will be an issue with the scalability of this system. How about saving the data in Document Store in the form of JSON itself and then if required just query to fetch the results.

Another example is, the shopping websites or blogging websites or even online book library where we don’t want to be schema bound and there is no need to maintain relationships between data, Document Store DBs play a vital role. The release cycle of the application becomes shorter due to schema free architecture and lesser need to perform database impact analysis.

Just for information, If you have heard about Polyglot persistence based applications e.g. shopping websites / Online Libraries or event video libraries , these kinds of services use multiple database systems e.g. for product displays they use JSON (Document store) , for transactions they may use RDBMS , for relationship data they may use graph DB etc. . We have lots of flexibility these days to leverage various DB systems to cater to different needs.

DocumentDB, MongoDB , CouchDB , RavenDB are key players in the market for Document Store. DocumentDB  and Mongolabs (MongoDB on Azure) are two managed services that can be hosted in Azure PaaS platform. However, MongoDB , CouchDB and RavenDB can be installed on bare metal machines.


For a free Demo of DocumentDB, check this URL – http://www.documentdb.com/sql/demo#


We will discuss about DocumentDB in detail in the future posts. First, I will try to finish the introduction and use of all NoSQL databases types.


Embrace NoSQL technologies as a relational guy! Intro.

I have been writing in the form of series quite a bit. Recently, I wrote a series of posts for SQL Azure DB which really helped the readers to understand the subject. Even, I had to write the same series for SQL on Azure VMs but somehow I couldn’t finish that. Hopefully, I will finish that in the near future.

For now, lets talk on what’s NoSQL all about. There has been lots of discussion and publicity around this subject over the past few years and it’s gaining lot of popularity because of it’s efficiency. If you are in the world where SQL/Oracle/DB2 etc. are the only resorts for the data storage, then you really need to upgrade. It doesn’t mean RDBMS systems like SQL/Oracle/DB2 etc. are going out of trend , it just means that now for different needs we need to pick different database systems. People no longer rely just on RDBMS systems.

I have been reading some of the really amazing blog posts,written by David Campbell. For now, I’m going to diverge a little bit from the subject. Using the technique mentioned by David , I will explain the bigger picture.The best way to understand where the data world is going, is to understand the below mentioned data categories:

1. Operational Data
2. Analytical Data
3. Streaming Data


Operational Data – Operational data is the data used by the applications to maintain their state e.g. Payment data/Customer information like we have on OLTP or non-transaction systems. However, slowly people realized the real value of historical data which could be used to  understand the trends for higher customer satisfaction or for building the business strategies. That’s how all the technologies for data warehousing started gaining traction.

Analytical Data
–  This is the read only data , analyzed using Data warehouse or Big Data systems to understand the business trends and historical data. Due to the huge volume of the data , Big Data is gaining the trends as it needs extreme hardware capacity to analyze PBs of data.However, for the smaller systems traditional OLAP systems work perfectly fine.

Streaming Data
– In this modern world, people want analytics in the real time e.g. fitness tracker on the people’s wrists, toll payment devices in the cars or sensors on the oil well etc. One way is to store the entire data and then do the analysis but sometimes delay in the processing is not affordable e.g. if oil company wants to raise an alert if the pressure in the well is increasing or if you wanted to know how many cars passed through a specific city in last 30 minutes. There has to be a provision to read live stream and make sense out of that. This is the world of IOT.

Understanding the above terms , was really important for everyone working in the data platform. There is a plethora of companies working towards making data platform really a happening world. Have you heard about 3Vs of data – Volume , Variety , Velocity? In today’s world it’s difficult to manage these 3 Vs in RDBMS. When it comes to RDBMS, we talk about structured data in the form of Tables/Columns. Anything and everything you want to store in the RDBMS systems has to be stored in the form of Tables/Columns. How about the data like flat files/JSON/ telemetry data where there is no structure? As I shared in the beginning, for different types of data we need different database systems.

In a nutshell, there are five major categories of database technologies:

1. Relational databases (RDBMS)
2. Document Store
3. Column family Store
4. Key Value Store
5. Graph Databases

We will discuss each of the category for the NoSQL in detail, in upcoming posts.


Disclaimer: The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.