SQL Server 2017 on Windows containers – Part 1

I have got various opportunities to work with the datacenter folks who maintain the SQL server environments along with other applications. One of the major challenge they face is, patching and maintaining these servers. With the advent of virtualization and Cloud technology, virtual machine is one of  the preferences to install the enterprise application. They maintain servers and patching , installing Antivirus or other relevant software regularly. System center makes these tasks really easy however, still it takes 20-30% of their working hours to do these activities.

There are multiple challenges faced by the datacenter staff:

1. Install entire OS from the scratch in the VMs and then install SQL server on top of it

2.  Install antivirus and software on each VM

3.  Patch each of the VMs and then monitor and manage those

This concept of containers caters to these pain areas. *It’s a soft layer of virtualization in which you don’ t need to install a separate virtual machine for each environment. You can still install container image on the host OS. Container image will share the same kernel with the host machine. It helps to save lots of efforts and resources for large environments. However, the pros and cons are debatable for both the approaches in terms of security and isolation.

*Note – You can still install container images in Hyper-V mode on the top of the host OS. It fits well for multi tenant systems where the kernel of OS needs to be separate.

Before we get into SQL server installation, it’s really important to understand how containers work.  As per the picture below, for installing virtual machine –


1. Operating system, device drivers , antivirus etc. are need to installed separately and then the  actual application like SQL server will be installed

2. To start SQL server, first we will need to boot the VM and then start the application and it’s a long process and spending of lots of time and resources.

3. Lots of memory and CPU cycles are spent in the process of booting and keeping the OS up and running

However, with containers – it’s much simpler

1. Just install container on the top of Host OS and there is no need to install the VM and OS separately

2. Very quick to boot up the image as there is no need to load whole bunch of OS binaries. It’s available only with Windows Nano server or core edition

3. Saves lots of CPU cycles and memory and it’s easier to maintain

Let’s try to understand, what are containers and how SQL server 2017 can be installed on top of it.

Common Jargon which is used to talk about containers:

  • Docker Engine
  • Containers
  • Image
  • Dockers Hub

Docker Engine– Dockers inc. is the company which started the concept of docker in 2013. Hypervisor is to virtual machine which Docker engine is to the container. It’s an open source technology open for the community contribution.  It’s the core technology to manage container images as well as starting and stopping containers. Everything else like clustering, security or orchestration is built around this technology.  Docker engine is required to installed to manage the dockers on Windows and Linux.

Containers – Containers , logically are like virtual machine, are soft virtualization layer running on top of host OS. They share the kernel of host OS. Moreover, there is an option to install those on hyper-v machine and preferred for the application which have to be isolated.

Images – Image technically is the template you use to spin up the containers. Image will have it’s base OS and then on top of it the application which you want to use and the sandbox which will capture the changes. Once the container is stopped, it can be converted to image by applying the changes in the sandbox.


Docker Hub – It’s the repository where you can put the images of your own or download the images publically available.

Let’s learn in the next posts, how to configure container on Windows and Linux OS on Windows 10 host.



Summary of the learning in the year 2016-2017

It’s been really long since I wrote any post. I wanted to write this post to talk about how the last year has been and how I have been trying to evolve in the data platform technologies. Last year , I got an opportunity to peep outside RDMS world and I found to be a whole new dimension of data. I remember, I attended one of the sessions in Year 2015 which amazed with the power of data. This session was about Data Platform technologies where I saw this:


This is the list of different data platform companies which are working to create products in various categories like Relational/NoSQL/Caching etc. It’s a long list Smile 

I had heard a lot about NoSQL and Big Data but I never got a chance to learn or get into all this. But last year, I decided to learn what it’s all about. I started learning NOSQL and I even wrote a series of posts on the concepts of NOSQL from a DBA perspective. I went in depth of documentStore like MongoDB and DocumentDB and organized an event on NoSQL for the community. My intent has always been to share what’s new in the market and help community evolve as per the latest trend.

For the last almost eleven years of my career, I have immersed myself in technical learning and I wanted to learn about some bit of architecture and planning. I wanted to learn

1. What exactly senior leadership team does
2. How the business planning is done
3. How the business solutions and technical solutions are made to solve a business problem
4. How the architectures are created

In summary, how the entire business functions and what are the various types of profiles involved to solve a business problem. I got an opportunity to do TOGAF certification and I grabbed it whole heartedly. It was a tough challenge to align myself with the business processes to clear this exam. However, it was quite interesting to know the various types of architectures that are created to solve a business problem. It was fabulous experience to learn about TOGAF and getting certification cleared in the first attempt. I wrote a small blog post on this as well.

Later, I moved to Big Data where I thoroughly learnt about Entire Big Data stack on Microsoft Azure Platform i.e. Cortana Analytics Suite. This completely changed my point of view of looking at data and I was blown away with the power of Big Data.


One great advantage of this era of cloud computing is, every technology or product is getting integrated at one single location. For past couple of years, I have been working on SQL Server on Azure and on Premise and It was the perfect time to extend my learning to Big Data on Azure. I learnt about Cortana Analytics Suite and also did some customer projects as well. Moreover, We organized two events for the SQL Server Community in North India region and wrote some posts to share the understanding. 

Another interesting subject Machine Learning , Oh boy! it’s great to understand the power of machine learning and the kind of impact it could bring to the various business problems. Actually, it’s very difficult to master these technologies but understanding where these technologies fit in the solutions is really interesting.  Fortunately, I got opportunities with the customers for POC and Education deliveries. It further helped me to delve deeper in these technologies. To enhance my knowledge further on the big data solutions, I started preparing for 70-475 Designing and Implementing Big Data Analytics Solutions. It covers:

1. Stream Analytics
2.  Azure Data Factory
3. Machine Learning
4. HDInsight
5. Azure Data Lake Store
6. Storm/Spark
7. R

The exam was really tough and had to lots of hands on during preparation. However, cleared it in the first attempt and got lots of learning points further on. I wanted to write this post to share my non-RDBMS learning in the recent past. It may help the data professionals to find a new S curve and learn something new.

I am going to move to a new position in Microsoft however, the passion for writing and community work will still be on.  Soon, there is a plan to have a community meet for SQL on Linux and the new features of SQL Server 2017.

Stay Tuned for more updates!

Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Build HDInsight Cluster

We have discussed about the following components so far:

  • Azure Data Factory  -> It’s a transformation service for the Big Data Stack on Azure
  • Azure Data Lake Store –> It’s a storage to store any types of any file formats on Azure
  • Azure Data Lake Analytics –> It’s a compute which process the data on Azure

Now, let’s talk about next managed service know as Big Data Cluster as a Service. For ADLA, we just write the query , select the parallelism and execute the query – without worrying about what’s happening underneath. For HDInsight cluster, you will get virtual machines for RDP where you could write your hive or pig queries and manage within preselected sources.

Let’s see how to create this cluster:



Just write HDinsight in the search textbox and option to create HDInsight cluster will appear.


Select the option Hadoop from the drop down and choose appropriate options. One of the main screens is:


selecting the number of worker nodes. Based on the compute requirement, you could select worker nodes. Once rest of the inputs are completed, click create and it’ll roughly take around 20 minutes to setup your cluster.  you could further check the progress by checking the below options from the portal:




Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Azure Data Factory (Part 2)

In the previous post, I wrote about creating ADF account to do the transformations. Now, let’s learn about how to create pipelines with ADF. I will explain the functionality of the ADF in this post. Moreover, I will share a link where you can find the demo to create a pipeline.

There are four major components of ADF:

1. Linked Services
2. Data Sets
3. Activities
4. Pipelines
5. Data Gateways


Linked Services : – Linked Services are used to link resources to the ADF e.g. if we need to process perfmon file, we first need to create linked services for the storage where it will pick the file from. Another Linked service for the compute which will process the file and so on.

Storage : – As shown below, there are various options to pick the files from: We need to select the source of the data and destination of the data. e.g. if I need to pick the data from Azure Storage where the files are kept, I will pick Azure storage for the data store. Moreover, if the transformed data has to be stored in SQL server or DocumentDB – I will pick the respective options from the same list:


Compute- Once Source and destination data store is selected , we need to select the compute now. For compute, we need to select:


In this example, I want to process the file using ADLA – USQL Queries so, I will choose Azure Data Lake Analytics. However, if I had to process the file using hive or pig queries – I would have chosen HDInsight or On demand HDInsight cluster.

Data Sets – It’s the same concept like Reporting Service reports/SSIS package. We first choose the data source which may anything like SQL Server Database or MS Access DB. Similarly,here once the source and destination is selected, we need to choose the data set which can be file/folder or table/collection.

In out current scenario, the files exist on Azure Blog storage and we have already created a linked service for the Blob storage. Therefore, the data set will also be for Azure Blob Storage:


In this Data Set you will mention the actual file name or folder you want to pick.

Activities – Activities define actions to be performed on the data.  The relationship between all the components is as follows


Pipeline- Once the source and destination of the file is decided, file to work on is selected and compute is decided. It’s the time to create a pipeline with the set of activities for the transformation:

Data GatewayIf you want to move data to and fro on-premise to Azure, then you use Data gateway.


Link to demo which you can try yourself – https://azure.microsoft.com/en-us/documentation/articles/data-factory-usql-activity/

References – https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

Build Smart Solutions using Big Data Stack on Microsoft Azure Platform – Azure Data Factory (Part 1)

In the previous posts, I wrote about Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA). To make you understand better, let’s say ADLS works as a SAN and ADLA works as compute (server with RAM and CPUs). They make a great server machine together. Now, what is Azure Data Factory ?

Azure Data Factory(ADF)  is a framework that’s used for data transformation on Azure. Like, we have SSIS service for On-premise SQL Server similarly, ADF is a transformation service for Azure data platform services – primarily. Let’s take the same example for perfmon analysis, we need to process the perfmon logs for 500,000 machines on daily basis.

1. The data has to be ingested into the system  – Azure Data Lake Store
2. The data has to be cleaned  – Azure Data Lake Analytics / Hive Queries / Pig Queries
3. The data has to be transformed for the reporting/aggregation – Azure Data Lake Analytics / Hive Queries / Pig Queries/ Machine Learning model
4.  The data has to be inserted to a destination for reporting – Azure SQL DW or SQL Azure DB

How to run all these steps in sequence and on regular intervals? ADF is the solution for all that. To use ADF, we need to create ADF account first:



Once you click create, you will this screen:




This is the dashboard which we will use to create the transformation using ADF.  In the next post, I will write about how to create pipeline for transformation.