Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Sunday, 18 April 2010

SQLBits VI

Date: Friday 16th April 2010
Location: Church House Westminster
Event: SQLBits VI

Having been to one SQLBits previously I was very keen to get registered and attend this event. This time I was accompanied by my wife (she's also a DBA/Developer so it wasn't a case of having to fund a shopping trip too!) and travelling from Devon meant that we stayed over in London for the previous night. The accommodation as recommended by SQLBits at The Grange Rochester was excellent, I had hoped it was a spelling mistake and we were at The Dorchester but sadly that wasnt the case!

We had a relaxed start to the day and walked for little more than 10 minutes to get to the venue. We easily found our way to the registration area and got our goody bags (more on them later) and then went on towards the exhibition hall. We had plenty of time to grab a coffee & sausage sandwich, meet with some people from last SQLBits and check out a couple of sponsors stands before the first session of the day.

Our session plan for the day was:
Monitoring and Tuning Parallel Query Execution - R Meyyappan - link
Inside the SQL Server Query Optimizer - C Cunnigham - link
Performance and Scalability through normalisation and set theory - T Rogerson - link
Lunch (inc sponsored session)
Denormalisation - Having your cake and eating it - M Whitehorn and Y Ahmed - link
Optimising Tempdb and temporary object use - C Bolton - link
High Performance Functions - S Sabin - link


Monitoring and Tuning Parallel Query Execution - R Meyyappan - link
Ramesh was on fine form again and we really enjoyed his presentation and picked up some neat tips and techniques to handle situations where the query optimiser recommends (or uses) parallelism in a non-optimal way. Sadly the session was a bit rushed as the first few minutes were taken up by announcements about the day ahead and changes to sessions following the issues with flights being cancelled after the Icelandic volcano ash closed British airspace. The DVD handed out at the end of the session should fill in the gaps though.

Inside the SQL Server Query Optimizer - C Cunnigham - link
Conor was a real treat to listen to, hearing his accounts of how the Query Optimiser was built to work and how they managed to account for so many different possibilities of executing a query and then pick the best. Well, not actually the very best but it picks a pretty good one, pretty much every time - its a case of getting as close as possible in a reasonable amount of time so that actually executing the query isnt held up too long. Oh, he also mentioned that he worked on the data management views framework so make sure you take a minute to thank Conor when you next use one.

Performance and Scalability through normalisation and set theory - T Rogerson - link
Tony started out showing us what he had in mind for his talk - it filled the screen with about 30 different headings. We covered a good number of them, based around normalising data and how it can benefit the integrity of the data stored and 

Lunch (inc sponsored session - Improving database development with source control)
We attended the Red Gate sponsored lunch session where Stephanie Herr and David Simner took us through the way that all of Red Gate's SQL Toolbelt applications work together to ease the work of a DBA and provide continuous integration. We had a preview of the new SQL Source Control application that maintains database versioning details while you work in SSMS. All the scripting is done in the background and it supports SVN or TFS source control. A really neat product and they are looking for EAP volunteers, get a copy of SQL Source Control here and help them develop the application the way you want it to work.

Denormalisation - Having your cake and eating it - M Whitehorn and Y Ahmed - link
We attended this session hoping to get some tips and ideas on what would be a good policy for denormalising data and how to go about it in a structured and well ordered way. Sadly it was a session that dealt almost wholly in theory and apart from hearing an interesting discussion about what qualifies as 1NF, 2NF, 3NF and so forth we didnt get as much as we had hoped from this session.

Optimising Tempdb and temporary object use - C Bolton - link
Christian Bolton gave us a peak into the mind of a SQL Server Master with his talk about the way to optimise your server with various tempdb settings, when to take action like this and also when it is not advisable. We had details about Latches, what Trace 1118 used to do and why its no longer needed, the sizing of tempdb and how many files to have and so on.

High Performance Functions - S Sabin - link
Simon's session on using functions was the last one in the day and it started with creating a SSMS keyboard shortcut (Tools|Options|Environment|Keyboard and adding sp_helptext in the Query Shortcuts grid). Next time you highlight an object name in your script and press Ctrl + 3 - hey presto the results window opens with the results of sp_helptext for that object. Nice time saver. We then heard about how Scalar functions suck and how its better to replace them with table values functions wherever possible with the usual selection of clear examples from Simon.

When we got back to the main conference lobby the exhibitors stands had been dismantled and there was beer and pizza to keep us going while we played air hockey, giant jenga, Xbox, giant connect4 or just chatted . A short prize giving was done and then people started to drift off home.

Another great SQLBits, I would like to thank all of the organisers, speakers and sponsors for their not inconsiderable contributions to make this event happen and run as smoothly as it did.



Please be sure to read the code disclaimer page before you use any code in this blog.

Monday, 8 March 2010

The varying cost of free stuff

That doesn't sound right does it? If something is free it doesn't cost anything and if you have lots of those things they all cost the same - right?

Well, not entirely. Let me explain in a little more detail. I am learning PowerShell as you may well know if you have seen any other of my recent blog items and so am looking around the internet to find code samples and try out different methods of working with PS scripts. I have previously mentioned PowerShellAnalyser, so far the best PS script editor I have found, and the fact that its free.

In contrast I also found a site that supports SQL DBA's with various applications/tools to make their job easier with automation of all the important tasks. They have a link to a set of Powershell scripts that they have accumulated, presumably verified and declared worthy of distribution to the SQL community. The route to them from their own (paid) PS editor details page is via a "Free Scripts" link where you get another option to get their paid editor and click on another link to "Download Now". You now reach a personal details page where 12 pieces of data (note only 1 is optional) are required and there are 2 checkboxes to discourage/encourage follow up sales calls/emails. Once this data has been verified - yes, there is intense validation of the data formats you input - you get to a page that says
"Thank you! You will soon receive a confirmation email with the download link to the XX Company name product(s) you selected.".
I went through this process a while ago and thought 'great, when I get a chance I'll download the scripts and see what they do'. When I got the chance I opened the email, clicked the link and the browser opened a page that said
"Downloads.Your download link has expired. Please email support to request reactivation of your link. For the fastest response, please ensure that you send from the email address you used when registering for the download. Re-activation requests will be handled in 1 business day or less."
Gah, this is starting to hurt now, I'm building a dislike for a product I haven't even seen - I had to submit stupid amounts of details to get nothing a week ago and I have come to take a proper look at the product and its now taken from me. I email support. Support reply - `just download the product again`. OK, building a proper dislike of the whole company now. In all honesty, I cant be bothered. My time is not worth a lot in the grand scheme of things but its worth more than dancing through their hoops. Whatever paid products they have they will be unlikely to see me on their future customers list. If I get any emails from them following this then the domain name is going on the spam list. Just as my curiosity got spiked I reviewed the URL from my first download attempt and my trial today to get the process right. There have been just under 2000 people getting downloads in this fashion since my first attempt and 330 in the last 48 hours. How do I know? The link is simply their domain with a querystring parameter called CustomerID that has an incremental integer value. I wonder what their conversion rate from enquiries to sales is? Perhaps their paid-up users are simply more patient people than I am.

By contrast, the PowerShellAnalyser download is actually making its way to my PC with two clicks from the main page.

I understand there are different business models but Shell Tools have endeared themselves to me by making their free stuff freely available. They have no idea who I am or what I have done on their site but I would go to them if I have a need that one of their future products can fix simply for the help they gave me in using PSA for free without any strings. I dont feel stalked or tracked or like I have somehow given away a load of information for no good reason. There are other companies that have paid for products but also make available some tools for free with little or no data capture attempts. They are the sort of companies that make me a friend and I want to support their commitment to giving something to the community.

Its possible either product would have been useless or brilliant. PSA is brilliant AND free, the other tool would have had to be brilliant in order to balance out what I have given that company already.

While I am at it, another tool that I use regularly that is free but does a great job is Paint.net. An image manipulation tool that punches way above its weight. This product has a PayPal link in its menus and on the site as it is a project that isnt associated with any business endeavour. I whole heartedly recommend Paint.Net too

Please be sure to read the code disclaimer page before you use any code in this blog.

Thursday, 4 March 2010

Getting to SQL Server from PowerShell script interface

Right, I'm progressing with PowerShell a little at a time. I still dont have any books but whenever I need to reference SQL information (not data, I am only getting info on Server settings not accessing customer database data) I try the PS route first.

In order to connect to a SQL Server from the command line you need to add two PS snapins. To do that you issue the commands add-PSSnapin sqlservercmdletsnapin100 and add-PSSnapin sqlserverprovidersnapin100

Today I am looking at collation on my server so I first need to 'map a drive' using new-PSDrive -name ~Drivename~ -psprovider SQLServer -root sqlserver:\SQL\~servername~\~instancename~ and then change my current drive to that server so I use set-location ~Drivename~:\databases and then simply use get-ChildItem |Select-Object name, collation to get a list of database names on that SQL Instance and their collation setting.

As always, if you want to see the extra switches or other information about PS commands then simply type get-help ~pscommand~ for the command you are interested in. eg. get-help new-PSDrive will return the help for the new-PSDrive command

Please be sure to read the code disclaimer page before you use any code in this blog.

Wednesday, 4 November 2009

A relevant choice of Index options?

I have recently had reason to review the indexing of a table as a query was taking a lot longer to run than I would have expected.

My first move was to check out the estimated execution plan for the SQL that the view was based on. It turns out that a large chunk of effort in the query was running an index scan on a table that had 3 indexes. . The indexes details were: a primary key (unique, clustered), a single column index (based on one of the most common client references in our system) and a third index on some key columns that describe our clients. The Primary Key column was an INT IDENTITY column - basically just a unique ID in the table for absolute reference to asingle row if needed. It plays no part in our business.

Have you spotted what's wrong here?

Exactly, having the ID column as a CLUSTERED index means that table is sorted on disk in the order of that column, this is like having a library sort it's books by the height of the respective authors - useless when you need to find anything. You would have to run up and down the shelves to find anything you need.

I changed the PRIMARY KEY to be non-clustered and then changed the single column index to be the CLUSTERED index for the table and the performance of the query went from 7s down to 1s (YMMV). WIN!!

This got me thinking whether this may be a scenario that is repeated in other tables or other databases on our servers so I have written the query below to track down indexes that are clustered primary keys and returns the table/column name and the data type of that column. Simply reviewing the index name (columns called ID are suspicious!) and whether its an INT IDENTITY column mean I can check out anything that looks awry and improve our systems performance.

SELECT
OBJECT_NAME([i].[object_id]) AS [Table] ,
[i].[name] AS [Index name] ,
[c].name AS [Column name] ,
[t].[name] AS [Column type] ,
[c].[is_identity]
FROM
[sys].[indexes] AS i
INNER JOIN [sys].[index_columns] AS ic
ON i.index_id = ic.index_id
AND i.object_id = ic.object_id
INNER JOIN sys.columns AS c
INNER JOIN [sys].[types] AS t
ON [c].[system_type_id] = [t].[system_type_id]
ON c.column_id = ic.column_id
AND ( ic.[object_id] = i.[object_id]
AND ic.[object_id] = c.[object_id]
)
WHERE
[i].[is_primary_key] = 1
AND [i].[type] = 1

Personally where the column type is a INT and is_identity is true then I am off to look into making similar changes across our servers...

Monday, 5 October 2009

How to group time data by time periods

This solution builds on using a 'numbers' or 'tally' table as championed by Jeff Moden here http://www.sqlservercentral.com/articles/T-SQL/62867/. Read that article for the full description of how to use set based solutions for efficient SQL queries.

Here I am considering a large log table that has entries that are time based and how to aggregate data based on various time bandings. For example if you want to see how many visitors your website had in 10 minute bands through the day or how many application errors you get in the server log per hour and so on...

-- First things first we will create the Numbers table
------------------------------------------------------
-- Things to note:
-- We are using a temp table in this instance as it is a small one but it is often the case that DBAs add a numbers table to a production database or have one in their metadata database that they can use.
-- Select a sensible number of rows to insert - relevant to the task in hand. If you are dealing with data that you want to analyse over a year and its date based then 1000 rows will be more than ample, its excessive to use 1,000,000! Especially in this instance where its a temp table, remember how temp tables work and how this might affect your system.

--DROP TABLE #numbers
IF OBJECT_ID('tempdb..#Numbers') > 0
BEGIN
RAISERROR('There is already a table called Numbers, please use that or select a different name.',0,0,1) WITH NOWAIT
END
ELSE
BEGIN
CREATE TABLE #Numbers
(
Num INT NOT NULL ,
PRIMARY KEY CLUSTERED ( Num )
)
RAISERROR('Numbers table created.',0,0,1) WITH NOWAIT
-- Insert values to a given (sic Relevant) value
-------------------------------------------------
INSERT INTO #numbers
SELECT TOP ( 1500 )
ROW_NUMBER() OVER ( ORDER BY c1.column_id )
FROM
master.sys.all_columns c1
CROSS JOIN master.sys.all_columns c2

RAISERROR('Numbers table filled.',0,0,1) WITH NOWAIT
END



-- Right the numbers table is ready for us, lets set some local values
DECLARE @Start DATETIME -- when we want to start anlysis
DECLARE @interval INT -- the grouping period we want to use


SET @interval = 60
--where 10=10 minutes;30=half hourly;60=hourly;240=4hourly;720=halfdaily;1440 = daily
SET @Start = '05-Oct-2009 00:00' ;
-- any date you want as the start date

-- Using the numbers table we can create a CTE that has the start and end times over which we want to analyse the data
WITH times
AS ( SELECT n.Num,
DATEADD(n, num - @interval, @Start) AS [LowerTime],
DATEADD(n, num, @Start) AS [UpperTime]
FROM [#Numbers] AS n
WHERE [n].[Num] % @interval = 0 -- use modulo to get required intervals

)
-- simply join our source data to the CTE using the date column in the join
SELECT [times].[LowerTime] AS [Interval Start],
COUNT(*) AS [Sessions]
FROM [dbo].[Log] AS dl
INNER JOIN [times] ON [StartTime] BETWEEN [times].[LowerTime]
AND [times].[UpperTime]
GROUP BY [LowerTime]
ORDER BY [LowerTime] desc


Your results will look something like this:









IntervalStartSessions
2009-10-0511:00:00.000320
2009-10-0510:00:00.0002892
2009-10-0509:00:00.0002490
2009-10-0508:00:00.0002264
2009-10-0507:00:00.0001249
2009-10-0506:00:00.000106
2009-10-0503:00:00.0001
2009-10-0502:00:00.0007


Simply adjusting the value for @interval will group your day in smaller or larger periods.

It is simple from here to adjust the script to calculate days or months rather than minutes and it could all be added into a stored procedure to provide a way for users to query their own data via some interface that allows them to supply a parameter for the interval they need - this may be Reporting Services, Excel or a bespoke interface.

Wednesday, 6 May 2009

How to find if and where your table is replicated

Another in the sporadic series of I need to do but I cant find it on the internet.

I have a table that I think might be involved in replication but I dont know which publication its in...

We know the table name - 'MyTable'
We have replication running on our server and its replicating our database, or part of it - 'MyDatabase'
we need to know if the table is replicated and if so which publication is going to need to be reviewed if we make changes to the table.

How?

USE MyDatabase
GO

/* Lots of info about our table but not much thats relevant to our current requirements*/
SELECT
*
FROM
sysobjects
WHERE
NAME = 'MyTable'

-- mmmm, getting there
/* To quote BOL - "Contains one row for each merge article defined in the local database. This table is stored in the publication database.replication"

interesting column is [pubid]
*/
SELECT
*
FROM
dbo.sysmergearticles AS s
WHERE
NAME = 'MyTable'

-- really close now
/*
the sysmergepublications table - Contains one row for each merge publication defined in the database. This table is stored in the publication and subscription databases.

so this would be where we get the publication details
*/
SELECT
*
FROM
dbo.sysmergepublications AS s
WHERE
s.pubid = '2876BBD8-3D4E-4ED8-88F3-581A659E8144'

-- DONE IT.
/*
Combine the two tables above and we get the information we need
*/
SELECT
s.[name] AS [Publication name]
FROM
dbo.sysmergepublications AS s
INNER JOIN dbo.sysmergearticles AS s2 ON s.pubid = s2.pubid
WHERE
s2.NAME = 'MyTable'

So I now know which

AddThis

Bookmark and Share
 
Google Analytics Alternative