Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Build HDInsight Cluster

We have discussed about the following components so far:

  • Azure Data Factory  -> It’s a transformation service for the Big Data Stack on Azure
  • Azure Data Lake Store –> It’s a storage to store any types of any file formats on Azure
  • Azure Data Lake Analytics –> It’s a compute which process the data on Azure

Now, let’s talk about next managed service know as Big Data Cluster as a Service. For ADLA, we just write the query , select the parallelism and execute the query – without worrying about what’s happening underneath. For HDInsight cluster, you will get virtual machines for RDP where you could write your hive or pig queries and manage within preselected sources.

Let’s see how to create this cluster:

image

 

Just write HDinsight in the search textbox and option to create HDInsight cluster will appear.

image

Select the option Hadoop from the drop down and choose appropriate options. One of the main screens is:

image

selecting the number of worker nodes. Based on the compute requirement, you could select worker nodes. Once rest of the inputs are completed, click create and it’ll roughly take around 20 minutes to setup your cluster.  you could further check the progress by checking the below options from the portal:

image

 

HTH!

Configure Point-to-Site connectivity for Windows Azure VMs

As a DBA generally, we have networking as a gray area. I have been trying my hands on little bit of networking while learning Azure. I wanted to share it so that, it can help you whenever required. Before we start getting into point-to-site connectivity configuration, It’s important to understand the usage . It’s generally used to access the Azure resources from you on-premise machine.

Point to site connectivity is meant to be used for small operations because of the limitation of the gateway bandwidth upto 80Mbps.It can be used for small operations like troubleshooting and monitoring etc. It’s something like our office VPN , which we use to connect to office infrastructure from anywhere. When you want to connect you Azure infrastructure to your local local datacenter, you need to use site-site-connectivity e.g. if the machines need to be connected to Azure resources all the time where the download and upload will be really high. For that solution, you need to a VPN device to support that bandwidth or optionally can use Windows 2012 RRAS feature .

Lets discussion how we can access Azure resources from on-premise machine to Azure virtual machine resources using point-to-site connectivity:

1. Create a network:

image

2. Enter the IP address for the DNS server and click on point-to-site connectivity:

image

3. Click on Add gateway subnet – it will provide the IP address to the VPN client used by the on-premise machine:

image

4. Once the Network is created, click on the network name and the dashboard will look like this:

image

5. Create the gateway for the VPN connectivity: Click on create gateway:

image

Once that is done , the color will change 🙂 :

image

6. Now, it’s time to create the certificates for the VPN connectivity. If you click on the certificate tab as see the above picture , it will ask you for:

image

Let’s create certificates to upload here:

Create self-signed root certificate – to be uploaded to the site:

makecert -sky exchange -r -n “CN=dbcouncil” -pe -a sha1 -len 2048 -ss My  c:\work\dbcouncil1.cer

image

image

7. Create Client certificate: It will be kept on the client which needs VPN connectivity:

makecert.exe -n “CN=dbcouncil” -pe -sky exchange -m 96 -ss My -in “dbcouncil” -is my -a sha1

image

8. Once the certificate is uploaded , download the VPN client:

image

9.  Once you install the client, you can find it under connections on your system like this:

image

10. Once you click on it, you will get a connect option – just connect to it:

image

11. Once the connection is established, it looks like:
image

HTH!