Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Azure Data Lake Store

Let’s start with advanced storage which we have got on Microsoft Azure. Now we have two options for storage 1. Blob Storage 2. Azure Data Lake Storage(ADLS). ADLS is more optimized for analytics workload therefore, when it comes to Big Data/Advanced analytics ADLS should be the first choice. Moreover, when we talk about Big Data, one must understand the concepts of HDFS (Hadoop Distributed File System) and Map Reduce. For more information, please check – Video

Before we get into Azure Data lake Store, it’s really important to understand Azure Data Lake is a fully managed Big Data Service from Microsoft. It consists of three major components:

1. HDInsight (Big Data Cluster as a Service) (It further has 5 types of clusters)
We have an option create any of these 5 types of the cluster as per the needs.

2. Azure Data lake Store (Hyper Scale Storage optimized for analytics)
3. Azure Data Lake Analytics ( Big Data Queries as a Service)

ADLS is HDFS for Big Data Analytics on Azure. The major benefits, it serves are:
1. No Limits on the file size – maximum file size can be in PBs
2. Optimized for Analytics workload
3. Integration with all major Big Data Players
4. Fully managed and supported by Microsoft
5. Can store data of any file formats
6. Enterprise ready with the features like access control and encryption at rest

It’s really simple to create an Azure Data Lake Store Account:

Step 1:  Search for the Azure Data Lake Service on the Portal

Step 2:  Enter the Service Name , Resource Group name and choose the appropriate location. Currently, it’s under preview and there will be limited options on the location of the data centers.


Step 3 : Use Data Viewer to upload and download the data – if the size of the data is small.


However, you have options to upload the data to ADL Store using various tools like ADL Copy or Azure Data Factory Copy data Pipeline to upload/download the data from ADL store. As shown the above picture, you can easily monitor the number of requests and data ingress/egress rate from the portal itself.  In the next blog post, we will talk about leveraging ADL store for ADL analytics and Azure Data Factory.



Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Cortana Analytics Suite(Intro.)

One stream which is really in huge demand today , is Big Data. With Cloud Platforms it’s got even more power and visibility. Earlier, Big Data was being done is silos but ever since Cloud computing has come into rhythm, this technology has gained even more traction in the world of technology. Everything today is getting integrated on Cloud.

As a relational guy, it’s been a great journey to learn this technology. It’s been really easy to grasp things like HDInsight or Azure Data Lake Analytics or even stream analytics for that matter. All these technologies are based on SQL like query language or .net or Powershell etc.. However, it’s a really great integration with open source technologies (Linux / NoSQL) or other languages like Java.

All the posts which I have written so far, have been with the point of view of a relational guy i.e. how as a DBA or Database developer you can pursue these technologies. This time, I am going to write the series for Big Data Stack on Microsoft Azure aka Cortana Analytics Suite.

Anything, that is available in the below diagram – I am going to write a post on that. Moreover, we will deep dive into few technologies like Machine Learning and Azure Data Lake .