Purging data & Partitioning for Paupers
Several months ago at work we started having some terrible problems with some jobs that purge old data from our system. These jobs were put into place before my time, and while fine at the time, were now causing us some big problems. Purging data would take hours and cause horrendous blocking while they were going on.
The original solution consisted of a simple trigger and an archive table something like below;
CREATE TABLE dbo.SomeTable
(
	id INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	created DATETIME NOT NULL,
	url VARCHAR(100) NOT NULL,
	html NVARCHAR(MAX) NOT NULL
);
CREATE TRIGGER trg_InsRecord
   ON dbo.SomeTable
   AFTER INSERT
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;
    INSERT INTO dbo.SomeArchiveTable
    (
		id,
		created,
		url,
		html
	)
	SELECT id,
		   created,
		   url,
		   html
	FROM inserted;
END
GO
CREATE TABLE dbo.SomeArchiveTable
(
	id INTEGER NOT NULL PRIMARY KEY CLUSTERED,
	created DATETIME NOT NULL,
	url VARCHAR(100) NOT NULL,
	html NVARCHAR(MAX) NOT NULL
);
Probably a setup many DBAs have seen before. In the dbo.SomeTable we keep just a few hours of data while in dbo.SomeArchiveTable seven days of data was kept. The SQL Agent job to purge data ran nightly and was just a simple delete statement;
DELETE FROM dbo.SomeArchiveTable WHERE created <= DATEADD(dd, -14, GETDATE());
This was fine while the table was small (yes created was indexed) but over time, as the business grew, the amount of data increased from a few gigabytes to hundreds of gigabytes. The job was taking anywhere between 2 and 5 hours, locking the entire database, and causing client timeouts. My SQL Server probably felt like this;
A temporary solution was to delete the data in small batches to prevent lock escalation.
DECLARE @rc INTEGER = -1;
WHILE(@rc != 0)
BEGIN
	-- Delete 4999 rows to prevent lock escalation
	-- http://msdn.microsoft.com/en-us/library/ms184286.aspx
	DELETE TOP (4999)
	FROM dbo.SomeArchiveTable
	WHERE created <= DATEADD(dd, -7, GETDATE())
	SET @rc = @@ROWCOUNT;
END
This solved the horrendous blocking issue but our SQL Server still wasn’t happy with several hours of heaving Disk I/O each night.
I’ve read a lot of great blogs about SQL Server Partitioning and thought it could be of use here. Essentially these blogs demonstrate how to use partitioning to make dealing with large amounts of data easier.
Partitioning is an Enterprise only feature and we run standard in production. Do we upgrade to Enterprise and pay for a Microsoft executives bonus?
With a little TSQL trickery we can stick with standard edition but get partition-like benefits.
Say we split our archive table into seven new tables, one for each day of the week, and change our trigger to insert an appropriate table according to the day.
CREATE TABLE [dbo].[SomeArchiveTable_Sunday](
	[id] [int] NOT NULL,
	[created] [datetime] NOT NULL,
	[url] [varchar](100) NOT NULL,
	[html] [nvarchar](max) NOT NULL,
PRIMARY KEY CLUSTERED
(
	[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[SomeArchiveTable_Monday](
	[id] [int] NOT NULL,
	[created] [datetime] NOT NULL,
	[url] [varchar](100) NOT NULL,
	[html] [nvarchar](max) NOT NULL,
PRIMARY KEY CLUSTERED
(
	[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[SomeArchiveTable_Tuesday](
	[id] [int] NOT NULL,
	[created] [datetime] NOT NULL,
	[url] [varchar](100) NOT NULL,
	[html] [nvarchar](max) NOT NULL,
PRIMARY KEY CLUSTERED
(
	[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
… and so on for each day of the week.
Then we need to alter our trigger to insert data into the correct table. In this case it is determined by the current day returned via the DATEPART function.
ALTER TRIGGER [dbo].[trg_InsRecord]
   ON [dbo].[SomeTable]
   AFTER INSERT
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;
	-- What day is it?
	DECLARE @day TINYINT = DATEPART(dw, GETDATE());
	IF(@day = 1)
	BEGIN
		INSERT INTO dbo.SomeArchiveTable_Sunday
		(
			id,
			created,
			url,
			html
		)
		SELECT id,
			   created,
			   url,
			   html
		FROM inserted;
	END
	ELSE IF (@day = 2)
	BEGIN
		INSERT INTO dbo.SomeArchiveTable_Monday
		(
			id,
			created,
			url,
			html
		)
		SELECT id,
			   created,
			   url,
			   html
		FROM inserted;
	END
	ELSE IF (@day = 3)
	BEGIN
		INSERT INTO dbo.SomeArchiveTable_Tuesday
		(
			id,
			created,
			url,
			html
		)
		SELECT id,
			   created,
			   url,
			   html
		FROM inserted;
	END
	ELSE IF (@day = 4)
	BEGIN
		INSERT INTO dbo.SomeArchiveTable_Wednesday
		(
			id,
			created,
			url,
			html
		)
		SELECT id,
			   created,
			   url,
			   html
		FROM inserted;
	END
	ELSE IF (@day = 5)
	BEGIN
		INSERT INTO dbo.SomeArchiveTable_Thursday
		(
			id,
			created,
			url,
			html
		)
		SELECT id,
			   created,
			   url,
			   html
		FROM inserted;
	END
	ELSE IF (@day = 6)
	BEGIN
		INSERT INTO dbo.SomeArchiveTable_Friday
		(
			id,
			created,
			url,
			html
		)
		SELECT id,
			   created,
			   url,
			   html
		FROM inserted;
	END
	ELSE IF (@day = 7)
	BEGIN
		INSERT INTO dbo.SomeArchiveTable_Saturday
		(
			id,
			created,
			url,
			html
		)
		SELECT id,
			   created,
			   url,
			   html
		FROM inserted;
	END
END
So how do we purge data? Simple, just run something like this a little before midnight each day…
-- What day is it?
	DECLARE @day TINYINT = DATEPART(dw, GETDATE());
	IF(@day = 1) -- Sunday - truncate Mondays table
	BEGIN
		TRUNCATE TABLE dbo.SomeArchiveTable_Monday;
	END
	ELSE IF (@day = 2) -- Monday -- truncate Tuesdays table
	BEGIN
		TRUNCATE TABLE dbo.SomeArchiveTable_Tuesday;
	END
	ELSE IF (@day = 3)
	BEGIN
		TRUNCATE TABLE dbo.SomeArchiveTable_Wednesday;
	END -- and so on for each day of the week...
Truncating a table takes milliseconds so our SQL Server is now much happier. Disk I/O is massively reduced. There’s also a couple of additional benefits from this approach; each table is smaller so re-indexing is much quicker, and usually a table being re-indexed is not being inserted into so no blocking occurs.



