The interesting case where DBCC CHECKFILEGROUP started to require a DB (x) Lock for some DBs after storage migration (SQL2014)

No comments

I was quite puzzled with a situation that happened recently and i find it interesting to share, well it could help someone in the same or similar situation that have bypassed this issue by using DB Snapshots (run integrity checks againts those same snapshots, among other approaches)…

Basically, what started to happen after migration, was that some databases couldn’t have their regular integrity check run due to timeouts (errors such as 5030 / 7926 ):

I built my hypothesis from repro and came to find out, that it was likely that the instance of SQL server was trying to run some validations (the validations running against a volume where the files from the defunct filegroup were;validations for a volume that no longer exists), that imho, I don’t believe it should be done, because there are no Filegroups with Datafiles Online on those same volumes.

I’ve attached a debugger to the sql server instance(dev) (albeit limited with only the public symbols for troubleshooting):

Got the thread ID @ Windows (session_id>task>worker>thread)

Converted this value to hex, leveraged !runaway to find it in windbg

It translated to thread 114, I have done a “couple” of <Step IntoS>, to see if I could find saw anything odd, just the fact that it is trying to lock the database

The observed behavior above came from the the repro bellow, that I believe is similar to what was happening:

I’ve “mimicked” the migration of database file of a database that has defunct files (well, because the problem started when the files were migrated from one location to another due to a storage migration):

Repro Steps:

-- 
ALTER DATABASE ProblemRepro SET OFFLINE WITH ROLLBACK IMMEDIATE
DROP DATABASE ProblemRepro


---
-- Leverage subst to be able to create a virtual drive
-- this is where we will point to defunct files
-- Note: This is a repro, in production this wheren't virtual drives
---
xp_cmdshell 'subst x: C:\TMP\ProblemRepro'
--
-- Create database
-- 3 Filegroups
-- Primary
-- FG_OK
-- FG_NOK = FG that will be defunct


CREATE DATABASE [ProblemRepro]
CONTAINMENT = NONE
ON  PRIMARY 
( NAME = N'ProblemRepro', FILENAME = N'c:\tmp\ProblemRepro\ProblemRepro.mdf' , SIZE = 5120KB , FILEGROWTH = 1024KB ), 
 FILEGROUP [ProblemRepro_FG_NOK] 
       -- used subst to mount a folder on a drive
( NAME = N'ProblemRepro_DF_NOK', FILENAME = N'c:\tmp\ProblemRepro\ProblemRepro_DF_NOK.ndf' , SIZE = 5120KB , FILEGROWTH = 1024KB ), 
 FILEGROUP [ProblemRepro_FG_OK] 
( NAME = N'ProblemRepro_DF_OK', FILENAME = N'c:\tmp\ProblemRepro\ProblemRepro_DF_OK.ndf' , SIZE = 5120KB , FILEGROWTH = 1024KB )
LOG ON 
( NAME = N'ProblemRepro_log', FILENAME = N'c:\tmp\ProblemRepro\ProblemRepro_log.ldf' , SIZE = 2048KB , FILEGROWTH = 10%)
GO
USE [ProblemRepro]
GO
IF NOT EXISTS (SELECT name FROM sys.filegroups WHERE is_default=1 AND name = N'PRIMARY') ALTER DATABASE [ProblemRepro] MODIFY FILEGROUP [PRIMARY] DEFAULT
GO

-- create tables

CREATE TABLE ok (campo INT ) ON ProblemRepro_FG_OK

INSERT INTO ok SELECT 11 
GO 10

CREATE TABLE nok (campo INT ) ON ProblemRepro_FG_NOK

INSERT INTO nok SELECT 11 
GO 10


ALTER DATABASE ProblemRepro SET OFFLINE
-- alter file to a drive that will not exist
-- this will mimic the defunct
ALTER DATABASE ProblemRepro
MODIFY FILE (NAME='ProblemRepro_DF_NOK',FILENAME='X:\ProblemRepro_DF_NOK.ndf')

ALTER DATABASE ProblemRepro SET ONLINE

-- Now backup, time to run backups

BACKUP DATABASE ProblemRepro
FILEGROUP = N'Primary' 
TO DISK = N'c:\tmp\P1.bak' 
WITH NOFORMAT, NOINIT, COMPRESSION, STATS = 25
GO
BACKUP DATABASE ProblemRepro
FILEGROUP = N'ProblemRepro_FG_OK' 
TO DISK = N'c:\tmp\OK1.bak' 
WITH NOFORMAT, NOINIT, COMPRESSION, STATS = 25
GO
BACKUP DATABASE ProblemRepro
FILEGROUP = N'ProblemRepro_FG_NOK' 
TO DISK = N'c:\tmp\NOK1.bak' 
WITH NOFORMAT, NOINIT, COMPRESSION, STATS = 25
GO
BACKUP Log ProblemRepro
TO DISK = N'c:\tmp\log1.trn' 
WITH NOFORMAT, NOINIT, COMPRESSION, STATS = 25
GO
--
-- Now, Time to Drop database
--
ALTER DATABASE ProblemRepro SET OFFLINE with  ROLLBACK IMMEDIATE
DROP DATABASE ProblemRepro


-- Start the restores, leave one filegroup with Defunct files
-- Delete files from directories !!!
RESTORE DATABASE ProblemRepro FILEGROUP='PRIMARY' FROM DISK = N'c:\tmp\P1.bak'    
  WITH PARTIAL, NORECOVERY


  RESTORE DATABASE ProblemRepro FILEGROUP='ProblemRepro_FG_OK' FROM DISK = N'c:\tmp\ok1.bak'    
  WITH  NORECOVERY


  --Skip this filegroup, to mimic the problem
  --RESTORE DATABASE ProblemRepro FILEGROUP='ProblemRepro_FG_NOK' FROM DISK = N'Nok.bak'    
  --WITH  NORECOVERY


  RESTORE LOG ProblemRepro FROM DISK = N'c:\tmp\log1.trn'    
  WITH  RECOVERY


  -- remove old "fake" mountpoint"
  xp_cmdshell 'subst x: /D'


  -- Simulate the fact there is no filegroup 
  ALTER DATABASE ProblemRepro REMOVE FILEGROUP  ProblemRepro_FG_NOK


  -- Run the first dbcc checkfilegroup
  -- it should run in less than one second
  USE  ProblemRepro
  GO
  DBCC CHECKFILEGROUP (ProblemRepro_FG_OK)


  -- Now, run again, but open another session on the database, having more than 1 session this database to generate the blocking
  DBCC CHECKFILEGROUP (ProblemRepro_FG_OK)

Happy troubleshooting!

-PC.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s