Saturday, October 08, 2011

Generate test (random) data for a table in SQL Server

The best product I have seen so far for generating random test records within a SQL Server database is Red Gate's Data Generator software. Its feature list is really impressive. Please be advised it is not a free software but it has a 14 days trial version to try it out.

If you are looking for a simple script to generate records for a standalone table may be try out the below scripts.

To Generate Random Integer Value:

--This will generate an Integer value lesser than 100000 
SELECT CAST(RAND(CHECKSUM(NEWID())) * 100000 AS INT)

To Generate Random date:

--This will create DOB whose age would be between 21 and 60 (21 + 39)
SELECT GETDATE() - ((21 * 365) + RAND()* (39 * 365))

To Generate Random Money value:

--This will generate Money value (rounded to 2 decimals) between 25000 to 30000 
SELECT Round(CAST(25000 + RAND(CHECKSUM(NEWID())) * 5000 AS Money),2)

Assume in a company the salary range is between 25000 to 100000. So we can generate random salary between that range like shown below:

/*
It's kind of self explanatory. 
Min value is 25000 which we need to place it before RAND.
Then MaxValue - MinValue should be used to multiply with the output of RAND(CHECKSUM(NEWID()))
*/
SELECT Round(CAST(25000 + RAND(CHECKSUM(NEWID())) * 75000 AS Money),2)

To Generate Random Strings (Only Alphabets or Alpha numeric or Passwords)

For this let's write an user defined function to return us a random string. It would take two parameters which are explained below:

1. First parameter: To decide whether we need a string based out of only Alphabets (OR) Alpha numeric (OR) Alpha numeric + Special characters (Example: For password).

2. Second parameter: To mention the string length which we need. If we think there might be a need to pass a number greater than 500 then we might need to change the size of @randomString and the RETURNS VARCHAR appropriately.

In order to generate Random strings we need to make use of RAND(). But RAND() cannot be used within an User defined function as of now. So as a work around we would create a View which would return us an RAND() so that we can make use of it within the UDF! But in SQL Server 2008 or above machine we can overcome this limitation by using CRYPT_GEN_RANDOM.

--This view is just a work around to make use of RAND function within a UDF

CREATE VIEW [dbo].[vw_Rand]
AS 
SELECT RAND() AS rnd 
GO

--UDF for Random string Generation

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: Vadivel Mohanakrishnan
-- URL: http://vadivel.blogspot.com
-- Create date: Oct 8th, 2011
-- Description: Function to return random string. 
-- =============================================
CREATE FUNCTION udf_StringGenerator
(
-- A = only Alphabets,
-- AN = alpha numeric,
-- P = AN + special characters
@Type VARCHAR(2),
@MaxLength INT
)
--If @MaxLength > 500 then change the size here as well
RETURNS VARCHAR(500)
AS
BEGIN


        --If @MaxLength > 500 then change the size here as well
DECLARE @randomString VARCHAR(500)

       DECLARE @counter SMALLINT
DECLARE @Length INT
DECLARE @strPattern VARCHAR(150)
DECLARE @isType VARCHAR(2)
DECLARE @rand REAL

SET @isType = UPPER(@Type)

SELECT
  @strPattern = CASE
WHEN @isType = 'A' THEN 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
WHEN @isType = 'P' THEN 'ABCDEFGHIJKLMNOPQRST UVWXYZabcdefghijklmnopqrstuvwxyz0123456789 -=+&$'
WHEN @isType = 'AN' THEN 'ABCDEFGHIJKLMNOPQRST UVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
  END

SET @Length = LEN(@strPattern)
SET @randomString = ''
SET @counter = 1

WHILE @counter <= @MaxLength
BEGIN
SET @rand = (SELECT rnd FROM [dbo].[vw_Rand]) 
SET @randomString = @randomString + SUBSTRING(@strPattern, CONVERT(TINYINT, ((@Length - 1) * @rand + 1)), 1)
SET @counter = @counter + 1
END

RETURN @randomString
END
GO

It's time to test all the logic which we have discussed till now. Let's create a table with different data types and using the logic(s) explained above pump in random records into it.

CREATE TABLE tblEmployee 
(
EmpID INT,
EmpName VARCHAR(25),
DOB DATETIME,
Salary MONEY,
Password VARCHAR(8)
)
GO

--Usage: Lets populate some random 10000 records into this table
INSERT INTO tblEmployee 
SELECT CAST(RAND(CHECKSUM(NEWID())) * 10000 AS INT),
dbo.udf_StringGenerator('A', 25),
GETDATE() - ((21 * 365) + RAND()* (39 * 365)),
Round(CAST(25000 + RAND(CHECKSUM(NEWID())) * 1000000 AS Money),2),
dbo.udf_StringGenerator('P', 8)
Go 10000

--Check
SELECT * FROM tblEmployee 
GO

--Clean Up
DROP TABLE tblEmployee 
GO

Though I feel this should help us get started but if there is a real need we can extend this further as well. For example, we can write the logic to generate Integer values within a given range etc., 

3 comments:

Anonymous said...

Thanks Vadi. It's really helpful for people who works only with test data like us :-) - Vijai

Anonymous said...

Hi Vadi,

In keeping with recent wisdom requiring loop elimination, I've made a minor change to your random String-Generator UDF that uses a numbers table rather than a loop. Haven't done any performance testing, so i don't know if it's actually an improvement!
Here's the code:
-- =============================================
-- Author: Vadivel Mohanakrishnan
-- URL: http://vadivel.blogspot.com
-- Create date: Oct 8th, 2011
-- Description: Function to return random string.
-- =============================================
ALTER FUNCTION [dbo].[udf_StringGenerator]
(
-- A = only Alphabets,
-- AN = alpha numeric,
-- P = AN + special characters
@Type VARCHAR(2),
@MaxLength INT
)
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE @randomString VARCHAR(500);
DECLARE @counter SMALLINT;
DECLARE @Length INT;
DECLARE @strPattern VARCHAR(150);
DECLARE @isType VARCHAR(2);
DECLARE @rand REAL;

SET @isType = UPPER(@Type);

SELECT
@strPattern = CASE
WHEN @isType = 'A' THEN 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
WHEN @isType = 'P' THEN 'ABCDEFGHIJKLMNOPQRST UVWXYZabcdefghijklmnopqrstuvwxyz0123456789 -=+&$'
WHEN @isType = 'AN' THEN 'ABCDEFGHIJKLMNOPQRST UVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
WHEN @isType = 'N' THEN '0123456789'
END;

SET @Length = LEN(@strPattern);
SET @randomString = '';

WITH Numbers(n)
AS
(
SELECT
b.N + c.N * 5 + a.N * 25 + d.N * 125 as n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 2 AS N UNION ALL SELECT 3 AS N UNION ALL SELECT 4 AS N) a
CROSS JOIN
(SELECT 0 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 2 AS N UNION ALL SELECT 3 AS N UNION ALL SELECT 4 AS N) b
CROSS JOIN
(SELECT 0 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 2 AS N UNION ALL SELECT 3 AS N UNION ALL SELECT 4 AS N) c
CROSS JOIN
(SELECT 0 AS N UNION ALL SELECT 1 AS N UNION ALL SELECT 2 AS N UNION ALL SELECT 3 AS N UNION ALL SELECT 4 AS N) d
)
SELECT
@randomString = @randomString + SUBSTRING(@strPattern, (ABS(CONVERT(BIGINT, CONVERT(VARBINARY(8),
(SELECT [Guid] from vw_NewID) -- This line (with view to bypass non-deterministic limitation) for SQL Server 2005
-- NEWID() --for post SQL Server 2005
))) % @Length) + 1, 1)
FROM
Numbers
WHERE
n < @MaxLength;

RETURN @randomString;
END

-- TSQL to create view if using SQL Server 2005
/*
CREATE VIEW [dbo].[vw_NewID]
AS
SELECT NEWID() as [Guid];

*/

Anonymous said...

Nice one to read and it works