We have discussed about the following components so far:
Azure Data Factory -> It’s a transformation service for the Big Data Stack on Azure
Azure Data Lake Store –> It’s a storage to store any types of any file formats on Azure
Azure Data Lake Analytics –> It’s a compute which process the data on Azure
Now, let’s talk about next managed service know as Big Data Cluster as a Service. For ADLA, we just write the query , select the parallelism and execute the query – without worrying about what’s happening underneath. For HDInsight cluster, you will get virtual machines for RDP where you could write your hive or pig queries and manage within preselected sources.
Let’s see how to create this cluster:
Just write HDinsight in the search textbox and option to create HDInsight cluster will appear.
Select the option Hadoop from the drop down and choose appropriate options. One of the main screens is:
selecting the number of worker nodes. Based on the compute requirement, you could select worker nodes. Once rest of the inputs are completed, click create and it’ll roughly take around 20 minutes to setup your cluster. you could further check the progress by checking the below options from the portal: