Unable to take tail log backup?

While going through the database forums on MSDN site, I came across a very good question about tail log backups.

As per the article : – http://msdn.microsoft.com/en-us/library/ms179314.aspx, we can take tail log backups even if the database is in offline mode.  Even when my database is corrupt and not able to come online.

The question was why the tail log backup is failing when the database is offline (Essence of the discussion in the forum)

Even I tried myself and found it very interesting. Yes, it was failing for me with the error:
Msg 942, Level 14, State 3, Line 1
Database ‘xxxx’ cannot be opened because it is offline.
Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.

Did little bit of research and thought of simulating the database crash and check if it makes any difference. Here is what I did:

1. Created a database Test
2. Took a full database backup to make sure we are able to take the t-log backup
3. Set the database to offline mode
4. Rename the mdf file of the database
5. Bring the database online which fails and the database goes to offline mode.
6. ran a query select * from sys.databases command and checked the state_desc column for 2 databases 1. was which was set to offline and 2. was the one after running the above mentioned steps.
7. Took the tail log backup for the second database and it succeeded.

Now we got to know whatever mentioned in the article http://msdn.microsoft.com/en-us/library/ms179314.aspxis correct.

Now, you may be thinking why the tail backup succeeded for the second database.

The answer is unless the database is marked as recovery_pending the tail log backup can’t be done.  In the first case the database was marked as offline because there was no crash but was a normal offline.  But in the second scenario, It was a crash and database wasn’t recovered due to missing mdf file.

Another reason of why restore of SQL DB took long time to complete

Issue description
I was working on an issue where the restore of a backup was running fast on a low configuration machine but slower on the huge one.Slow configuration machine was a developement server however the other one was a production server with huge configuration.

Normally when we start troublehsooting on such scenarios we check the no. of Vitual log files(VLF) by running Dbcc loginfo command. But this one was making us contemplate that if restore on the development server is fast then there shouldn’t be the issue of huge no. of vlfs.

I am losing track a little bit here to explain you why no. of vlfs is an issue. when we have huge number of vlfs then while redo phase of the restore it takes long time.Even when the database starts up then also it takes long time to bring the database online due the similar reasons. For the later one there is a bug also which is mentioned in the article: http://support.microsoft.com/kb/2455009#

Anyways, lets move forward and find out what happened. I am going to describe all relevant steps which I took while troubleshooting. It may add more information in the blog and yet more knowledge for you.

As mentioned earlier – first point was checking for the no. of VLFs which came to be alright.
1. Then enabled the trace flags : – dbcc traceon (3213,3605,3004, -1)
3213 : – prints the configuration parameter
3605 : – sends the output of the trace flags to the error logs
3004 : – Prints the progress messages while backup and restore and the output was something like :

On the problem server:
02/28/2011 16:27:08,spid80,Unknown,FileHandleCache: 0 files opened. CacheSize: 10
02/28/2011 16:27:08,spid80,Unknown,Restore: LogZero complete

02/28/2011 16:27:08,spid80,Unknown,Restore: Waiting for log zero on Test –>if you see major time is being spent while zeroing the file.

02/28/2011 16:19:45,spid80,Unknown,Zeroing completed on E:DatabaseLogsDevelopmentTEST_1.ldf
02/28/2011 16:1:26,spid80,Unknown,Restore: Transferring data to Test 02/28/2011 16:19:26,spid80,Unknown,Restore: Restoring backup set 02/28/2011 16:19:26,spid80,Unknown,Zeroing E:DatabaseLogsDevelopmentTEST_1.ldf from page 1 to 16496 (0x2000 to 0x80e0000) 02/28/2011 16:19:26,spid80,Unknown,Restore: Containers are ready 02/28/2011 16:19:26,spid80,Unknown,Restore: PreparingContainers

On the development server:
02/28/2011 16:39:17,spid55,Unknown,Restore: LogZero complete –> here it completed within 2 minutes but on problem server it took 7 minutes.

02/28/2011 16:39:17,spid55,Unknown,Restore: Waiting for log zero on Test
02/28/2011 16:37:51,spid55,Unknown,Zeroing completed on C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLTest_1.ldf
02/28/2011 16:37:50,spid55,Unknown,Restore: Transferring data to Test
02/28/2011 16:37:50,spid55,Unknown,Restore: Restoring backup set
02/28/2011 16:37:50,spid55,Unknown,Zeroing C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLTest_1.ldf from page 1 to 16496 (0x2000 to 0x80e0000)

2. Based on the output above we got to know that it’s the zeroing of file which is causing the trouble. point to be noted : – it’s an ldf file.
3. Enabling instant file initialization clicked my mind. After reading the article : – http://msdn.microsoft.com/en-us/library/ms175935.aspx I came to know that for ldf we can’t enable the instant file intialization.
4. We suspected that the issue may be with the disks. When the disk tests were run, we came to know that the issue was with the disk throughput and worked on the disk to resolve the issue.