I was working on an issue where the restore of a backup was running fast on a low configuration machine but slower on the huge one.Slow configuration machine was a developement server however the other one was a production server with huge configuration.
Normally when we start troublehsooting on such scenarios we check the no. of Vitual log files(VLF) by running Dbcc loginfo command. But this one was making us contemplate that if restore on the development server is fast then there shouldn’t be the issue of huge no. of vlfs.
I am losing track a little bit here to explain you why no. of vlfs is an issue. when we have huge number of vlfs then while redo phase of the restore it takes long time.Even when the database starts up then also it takes long time to bring the database online due the similar reasons. For the later one there is a bug also which is mentioned in the article: http://support.microsoft.com/kb/2455009#
Anyways, lets move forward and find out what happened. I am going to describe all relevant steps which I took while troubleshooting. It may add more information in the blog and yet more knowledge for you.
As mentioned earlier – first point was checking for the no. of VLFs which came to be alright.
1. Then enabled the trace flags : – dbcc traceon (3213,3605,3004, -1)
3213 : – prints the configuration parameter
3605 : – sends the output of the trace flags to the error logs
3004 : – Prints the progress messages while backup and restore and the output was something like :
On the problem server:
02/28/2011 16:27:08,spid80,Unknown,FileHandleCache: 0 files opened. CacheSize: 10
02/28/2011 16:27:08,spid80,Unknown,Restore: LogZero complete
02/28/2011 16:27:08,spid80,Unknown,Restore: Waiting for log zero on Test –>if you see major time is being spent while zeroing the file.
02/28/2011 16:19:45,spid80,Unknown,Zeroing completed on E:DatabaseLogsDevelopmentTEST_1.ldf
02/28/2011 16:1:26,spid80,Unknown,Restore: Transferring data to Test 02/28/2011 16:19:26,spid80,Unknown,Restore: Restoring backup set 02/28/2011 16:19:26,spid80,Unknown,Zeroing E:DatabaseLogsDevelopmentTEST_1.ldf from page 1 to 16496 (0x2000 to 0x80e0000) 02/28/2011 16:19:26,spid80,Unknown,Restore: Containers are ready 02/28/2011 16:19:26,spid80,Unknown,Restore: PreparingContainers
On the development server:
02/28/2011 16:39:17,spid55,Unknown,Restore: LogZero complete –> here it completed within 2 minutes but on problem server it took 7 minutes.
02/28/2011 16:39:17,spid55,Unknown,Restore: Waiting for log zero on Test
02/28/2011 16:37:51,spid55,Unknown,Zeroing completed on C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLTest_1.ldf
02/28/2011 16:37:50,spid55,Unknown,Restore: Transferring data to Test
02/28/2011 16:37:50,spid55,Unknown,Restore: Restoring backup set
02/28/2011 16:37:50,spid55,Unknown,Zeroing C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLTest_1.ldf from page 1 to 16496 (0x2000 to 0x80e0000)
2. Based on the output above we got to know that it’s the zeroing of file which is causing the trouble. point to be noted : – it’s an ldf file.
3. Enabling instant file initialization clicked my mind. After reading the article : – http://msdn.microsoft.com/en-us/library/ms175935.aspx I came to know that for ldf we can’t enable the instant file intialization.
4. We suspected that the issue may be with the disks. When the disk tests were run, we came to know that the issue was with the disk throughput and worked on the disk to resolve the issue.