Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Azure Data Factory (Part 1)

In the previous posts, I wrote about Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA). To make you understand better, let’s say ADLS works as a SAN and ADLA works as compute (server with RAM and CPUs). They make a great server machine together. Now, what is Azure Data Factory ?

Azure Data Factory(ADF)  is a framework that’s used for data transformation on Azure. Like, we have SSIS service for On-premise SQL Server similarly, ADF is a transformation service for Azure data platform services – primarily. Let’s take the same example for perfmon analysis, we need to process the perfmon logs for 500,000 machines on daily basis.

1. The data has to be ingested into the system  – Azure Data Lake Store
2. The data has to be cleaned  – Azure Data Lake Analytics / Hive Queries / Pig Queries
3. The data has to be transformed for the reporting/aggregation – Azure Data Lake Analytics / Hive Queries / Pig Queries/ Machine Learning model
4.  The data has to be inserted to a destination for reporting – Azure SQL DW or SQL Azure DB

How to run all these steps in sequence and on regular intervals? ADF is the solution for all that. To use ADF, we need to create ADF account first:



Once you click create, you will this screen:




This is the dashboard which we will use to create the transformation using ADF.  In the next post, I will write about how to create pipeline for transformation.