Monday 5 October 2009

How to group time data by time periods

This solution builds on using a 'numbers' or 'tally' table as championed by Jeff Moden here http://www.sqlservercentral.com/articles/T-SQL/62867/. Read that article for the full description of how to use set based solutions for efficient SQL queries.

Here I am considering a large log table that has entries that are time based and how to aggregate data based on various time bandings. For example if you want to see how many visitors your website had in 10 minute bands through the day or how many application errors you get in the server log per hour and so on...

-- First things first we will create the Numbers table
------------------------------------------------------
-- Things to note:
-- We are using a temp table in this instance as it is a small one but it is often the case that DBAs add a numbers table to a production database or have one in their metadata database that they can use.
-- Select a sensible number of rows to insert - relevant to the task in hand. If you are dealing with data that you want to analyse over a year and its date based then 1000 rows will be more than ample, its excessive to use 1,000,000! Especially in this instance where its a temp table, remember how temp tables work and how this might affect your system.

--DROP TABLE #numbers
IF OBJECT_ID('tempdb..#Numbers') > 0
BEGIN
RAISERROR('There is already a table called Numbers, please use that or select a different name.',0,0,1) WITH NOWAIT
END
ELSE
BEGIN
CREATE TABLE #Numbers
(
Num INT NOT NULL ,
PRIMARY KEY CLUSTERED ( Num )
)
RAISERROR('Numbers table created.',0,0,1) WITH NOWAIT
-- Insert values to a given (sic Relevant) value
-------------------------------------------------
INSERT INTO #numbers
SELECT TOP ( 1500 )
ROW_NUMBER() OVER ( ORDER BY c1.column_id )
FROM
master.sys.all_columns c1
CROSS JOIN master.sys.all_columns c2

RAISERROR('Numbers table filled.',0,0,1) WITH NOWAIT
END



-- Right the numbers table is ready for us, lets set some local values
DECLARE @Start DATETIME -- when we want to start anlysis
DECLARE @interval INT -- the grouping period we want to use


SET @interval = 60
--where 10=10 minutes;30=half hourly;60=hourly;240=4hourly;720=halfdaily;1440 = daily
SET @Start = '05-Oct-2009 00:00' ;
-- any date you want as the start date

-- Using the numbers table we can create a CTE that has the start and end times over which we want to analyse the data
WITH times
AS ( SELECT n.Num,
DATEADD(n, num - @interval, @Start) AS [LowerTime],
DATEADD(n, num, @Start) AS [UpperTime]
FROM [#Numbers] AS n
WHERE [n].[Num] % @interval = 0 -- use modulo to get required intervals

)
-- simply join our source data to the CTE using the date column in the join
SELECT [times].[LowerTime] AS [Interval Start],
COUNT(*) AS [Sessions]
FROM [dbo].[Log] AS dl
INNER JOIN [times] ON [StartTime] BETWEEN [times].[LowerTime]
AND [times].[UpperTime]
GROUP BY [LowerTime]
ORDER BY [LowerTime] desc


Your results will look something like this:









IntervalStartSessions
2009-10-0511:00:00.000320
2009-10-0510:00:00.0002892
2009-10-0509:00:00.0002490
2009-10-0508:00:00.0002264
2009-10-0507:00:00.0001249
2009-10-0506:00:00.000106
2009-10-0503:00:00.0001
2009-10-0502:00:00.0007


Simply adjusting the value for @interval will group your day in smaller or larger periods.

It is simple from here to adjust the script to calculate days or months rather than minutes and it could all be added into a stored procedure to provide a way for users to query their own data via some interface that allows them to supply a parameter for the interval they need - this may be Reporting Services, Excel or a bespoke interface.

No comments:

AddThis

Bookmark and Share
 
Google Analytics Alternative