Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Azure Data Lake Store

Let’s start with advanced storage which we have got on Microsoft Azure. Now we have two options for storage 1. Blob Storage 2. Azure Data Lake Storage(ADLS). ADLS is more optimized for analytics workload therefore, when it comes to Big Data/Advanced analytics ADLS should be the first choice. Moreover, when we talk about Big Data, one must understand the concepts of HDFS (Hadoop Distributed File System) and Map Reduce. For more information, please check – Video

Before we get into Azure Data lake Store, it’s really important to understand Azure Data Lake is a fully managed Big Data Service from Microsoft. It consists of three major components:

1. HDInsight (Big Data Cluster as a Service) (It further has 5 types of clusters)
image
We have an option create any of these 5 types of the cluster as per the needs.

2. Azure Data lake Store (Hyper Scale Storage optimized for analytics)
3. Azure Data Lake Analytics ( Big Data Queries as a Service)

ADLS is HDFS for Big Data Analytics on Azure. The major benefits, it serves are:
1. No Limits on the file size – maximum file size can be in PBs
2. Optimized for Analytics workload
3. Integration with all major Big Data Players
4. Fully managed and supported by Microsoft
5. Can store data of any file formats
6. Enterprise ready with the features like access control and encryption at rest

It’s really simple to create an Azure Data Lake Store Account:

Step 1:  Search for the Azure Data Lake Service on the Portal
image

Step 2:  Enter the Service Name , Resource Group name and choose the appropriate location. Currently, it’s under preview and there will be limited options on the location of the data centers.

image

Step 3 : Use Data Viewer to upload and download the data – if the size of the data is small.

image

However, you have options to upload the data to ADL Store using various tools like ADL Copy or Azure Data Factory Copy data Pipeline to upload/download the data from ADL store. As shown the above picture, you can easily monitor the number of requests and data ingress/egress rate from the portal itself.  In the next blog post, we will talk about leveraging ADL store for ADL analytics and Azure Data Factory.

 

HTH!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s