Let’s start with advanced storage which we have got on Microsoft Azure. Now we have two options for storage 1. Blob Storage 2. Azure Data Lake Storage(ADLS). ADLS is more optimized for analytics workload therefore, when it comes to Big Data/Advanced analytics ADLS should be the first choice. Moreover, when we talk about Big Data, one must understand the concepts of HDFS (Hadoop Distributed File System) and Map Reduce. For more information, please check – Video
Before we get into Azure Data lake Store, it’s really important to understand Azure Data Lake is a fully managed Big Data Service from Microsoft. It consists of three major components:
2. Azure Data lake Store (Hyper Scale Storage optimized for analytics)
3. Azure Data Lake Analytics ( Big Data Queries as a Service)
ADLS is HDFS for Big Data Analytics on Azure. The major benefits, it serves are:
1. No Limits on the file size – maximum file size can be in PBs
2. Optimized for Analytics workload
3. Integration with all major Big Data Players
4. Fully managed and supported by Microsoft
5. Can store data of any file formats
6. Enterprise ready with the features like access control and encryption at rest
It’s really simple to create an Azure Data Lake Store Account:
Step 1: Search for the Azure Data Lake Service on the Portal
Step 2: Enter the Service Name , Resource Group name and choose the appropriate location. Currently, it’s under preview and there will be limited options on the location of the data centers.
Step 3 : Use Data Viewer to upload and download the data – if the size of the data is small.
However, you have options to upload the data to ADL Store using various tools like ADL Copy or Azure Data Factory Copy data Pipeline to upload/download the data from ADL store. As shown the above picture, you can easily monitor the number of requests and data ingress/egress rate from the portal itself. In the next blog post, we will talk about leveraging ADL store for ADL analytics and Azure Data Factory.