Monday, 8 February 2010

Keeping your database files tidy

Behind every database there are database files, traditionally they are denoted by one of 3 file extensions - mdf is the main data file, ldf is the main log file and ndf is used for any additional log files. one mdf and one ldf are a minimum requirement for a SQL Server database.

These files are locked by SQL Server when it is running, thus ensuring that the only updates go through the database engine and are therefore managed correctly. It is possible to separate SQL Server and the database files in a number of ways, one of which is to detach the database using sp_detach_db. This removes the database from the SQL Server and releases the database files to be accessed by any other process that can reach them. It may be that you were moving the database from one server to another or what ever circumstance but it is possible to leave the database files on your server and forget they are there.

This means a number of things:
1 - the files are taking up HDD space and potentially being backed up by your nightly processes (you are backing up regularly arent you?)
2 - the files can be accessed by any editor that can get to the folder location where they are stored. Notepad can open an mdf file. Dont believe me?

open up your favourite T-SQL editor and run this:
CREATE DATABASE DETACHTEST
GO
USE DETACHTEST
GO
Create table TestTable
(Myname varchar(30))
go

insert into testtable
select 'Fatherjack'
go

select * from TestTable
go

use master
go

sp_detach_db 'DETACHTEST'


That will have created a database, created a table in it, inserted data and then detached the database.

Now open up windows explorer and go to you default SQL data folder, in there you will find a file called DETACHTEST.mdf, open that in notepad or any other text editor and search for the string "fatherjack". There it is, in plain text. I agree its not readily usable and a database of 00's of tables and 000,000's of rows will make data extraction difficult without a SQL Server but it is wholly possible and therefore wholly insecure to have these files lying around on your server.

To find any of these files that may have got left behind we need a process to locate and remove them. My preferred method to do this involves a small script to be run in LogParser and then a nifty join to some live SQL data and a look at the date the files were last used.

Here is the LogParser script to get the file name, location, last access time and last write time from any folder up to 10 levels down from the root on the C:\ and D:\ drives where the file extension is either mdf or ldf:
logparser "SELECT path, name, lastaccesstime, lastwritetime into datafiles from '\\London\d$\*.mdf', '\\London\c$\*.mdf','\\London\c$\*.ldf','\\London\d$\*.ldf'" -i:fs -recurse:10 -o:sql -database:AdminDB -server:SQLLondon -createtable:on -e:10

(Where London is the windows server and SQLondon is the SQL Server instance installed.)

Here is the SQL to work with the data that LogParser retrieves and give us a hit list of files to track down and deal with:
USE [CentralDB]
GO
SELECT DISTINCT
COALESCE(mf.name, 'Not attached to a database. ***') AS [Database Name],
[d].[Path],
[d].[Name],
[d].[LastAccessTime],
[d].[LastWriteTime]
FROM [dbo].[datafiles] AS d
LEFT JOIN [BOA\PARIS].[master].[sys].[master_files] AS mf ON REPLACE(REPLACE([d].[Path], '\\boa\d$\', 'd:\'), '\\boa\c$\', 'c:\') = [mf].physical_name
UNION
SELECT '-----',
'--- Items above here not accessed in the last day --',
'-----',
DATEADD(d, -1, GETDATE()), -- 24 hours ago
''
ORDER BY [LastAccessTime],
[Database Name]
The resulting data will be a list of database files that have not been accessed recently and whether they are attached to a database on the server. You may wish to adjust the value I have used in the lower part of the query to suit your environment - 24 hours may not be appropriate to you.

No comments:

AddThis

Bookmark and Share
 
Google Analytics Alternative