Saturday, July 14, 2012

Using template explorer to create your own code bank

Most of the Developers / DBAs I have worked with are maintaining their utility scripts in file system. When ever a script is needed they browse through that folder via SQL Server Management Studio (SSMS) and open it. 


I have personally found this method little tedious and not so productive way of doing things. I always prefer these two methods instead:

  • Creating an Utility database in the server and having all the required scripts in it
  • The other way is to organize our utility script with the help of SSMS Template explorer

Maintaining Utility DB method is self-explanatory and so in this post we would concentrate on the way to make use of Template Explorer for organizing our scripts. Let's get started.

To open template explorer from SSMS follow either of the methods:

Option 1: Click View >> Template Explorer
Option 2: Press Control + ALT + T



We would see how to utilize template explorer to organize our utility scripts and how it helps us in improving our productivity.

Creating Custom Folder

Step 1: Right click on "SQL Server Templates" and choose "New" >> "Folder"



Step 2: I am naming it as "Vadivel - TSQL Code Bank"



Organizing Utility Scripts within the folder:

Step 1: Right click on the new folder we created and choose "New" >> "Template"


Step 2: Lets create a template and name it as "Identify_Remove_DuplicateRecords"


Step 3: Right click on the new template created and choose "Edit"


Step 4: It would open a new blank query window. Paste/Write the required TSQL script and Save it.

Step 5: Repeat Step 1 to 4 until you have all your required frequently used Utility scripts added into this folder.

How to use it?

Now that we have added ALL our utility scripts within a code bank folder. Using it when required becomes very easy.

Step 1: Open a new query window
Step 2: Click on the utility script which you need from the Template explorer and "drag" it into the query window. That's it :)

I would like to add that this method is in no way a replacement to maintaining the scripts in source control. Having it added in the template explorer also as mentioned above would actually just help us improve the productivity.

If you are wondering where these scripts are getting stored. Read on.

I used SQL Server 2005 to create the folder/templates and they are available at \Microsoft SQL Server\90\Tools\Shell\Templates\Sql


Tuesday, May 01, 2012

BIGINT - Upper limit - Overflow - SQL Server

BIGINT upper limit is 2^63-1 (9,223,372,036,854,775,807). For complete reference check out this MSDN article

Recently I was asked when we use INT data type and it reaches its limit what do we do? The following is the error message we would see when it reaches its upper limit.


Arithmetic overflow error converting IDENTITY to data type int.
Arithmetic overflow occurred.

Though there are multiple solutions, one of the option for us is to change the datatype to BIGINT.

The person who asked me wasn't satisfied with this answer. He was worried is this a permanent solution? Won't BIGINT also overflow / reach its limits sooner or later?

Obviously BIGINT would also reach its limit but it would take really LOTS of years + millions of transactions per second for it. Actually I wouldn't bother about it at all for the reasons explained below.

Let's take few examples and see how many years will it take for BIGINT to reach its upper limit in a table:

(A) Considering only positive numbers, Max limit of BIGINT = 9,223,372,036,854,775,807
(B) Number of Seconds in a year = 31,536,000

Assume there are 50,000 records inserted per second into the table. Then the number of years it would take to reach the BIGINT max limit is:

9,223,372,036,854,775,807 / 31,536,000 / 50,000 = 5,849,424 years

Similarly,
If we inserted 1 lakh records per second into the table then it would take 2,924,712 yrs
If we inserted 1 million (1000000) records per second into the table then it would take 292,471 yrs
If we inserted 10 million (10000000) records per second into the table then it would take 29,247 yrs
If we inserted 100 million records per second into the table then it would take 2,925 yrs
If we inserted 1000 million records per second into the table then it would take 292 yrs

By this we would have understood that it would take extremely lots of years to reach the max limit of BIGINT. May be end of world would be earlier than this :) Atleast I have not seen (or) heard of any application which has exceeded the BIGINT's max limit as of now. Please feel free to let me know if you have seen any.

By the way, if you are wondering how we calculated the number of seconds in a year.
It is just 365 days * 24 hours * 60 minutes * 60 seconds = 31,536,000 seconds.

Monday, April 30, 2012

Capture Deprecated SQL Server code with SQL Profiler

While migrating your application from one version of SQL Server to another have you ever wondered how to identify the deprecated features in the new version? Manually going through hundreds of scripts is going to be tedious and time consuming. I am a lazy coder myself and would be interested only in an automated solution :)

Until SQL Server 2005 we had only one way to identify it that is by making use of SQL Server Profiler. From SQL Server 2008 onwards we can make use of Extended Events as well.

In this post lets see how to make use of SQL Server Profiler to identify the deprecated SQL Server code.

Step 1: Open a SQL Server Instance and find out the session ID by executing the following script. This would come handy in SQL Profiler to filter out only information coming from this session.

SELECT @@SPID



Step 2: Open your SQL Server Profiler.


Click on "Event Selections" tab and choose "Deprecation" event.


Deprecation Announcement: Occurs when you use a feature that will be removed from future version of SQL Server, but will NOT be removed from the next major release of SQL Server.

Deprecation Final Support: Occurs when you use a feature that will be removed from the next major release of SQL Server.


Next click on "Column Filters" and set SPID filter to 61 (or what ever number you got in Step 1)


Step 3: Now lets go back to the window where we ran SELECT @@SPID and create some sample scripts which contains deprecated commands in it.

/*
Some random scripts to demonstrate how deprecated code can be captured using SQL Server Profiler
*/

SET NOCOUNT ON;

--Here TEXT, IMAGE are deprecated
CREATE TABLE tblDeprecationLocator
(
Sno INT,
Remarks TEXT,
Profilepic IMAGE
)
GO

--This is deprecated. Better alternative is to use ALTER INDEX
DBCC DBREINDEX('tblDeprecationLocator')
GO

--Using hints without WITH keyword is deprecated. It should be WITH (NOLOCK)
SELECT * FROM tblDeprecationLocator (NOLOCK)
GO

USE Master
GO
--here Truncate_only option is deprecated
BACKUP TRAN TestBed WITH TRUNCATE_ONLY
GO

Now when we execute these scripts SQL Profiler would capture all the deprecated code and provide appropriate message as shown below.

1. The TEXT, NTEXT, and IMAGE data types will be removed in a future version of SQL Server. Avoid using them in new development work, and plan to modify applications that currently use them. Use the varchar(max), nvarchar(max), and varbinary(max) data types instead.


2. DBCC DBREINDEX will be removed in a future version of SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use it. Use ALTER INDEX instead.


3. Specifying table hints without using a WITH keyword is a deprecated feature and will be removed in a future version.


4. BACKUP LOG WITH TRUNCATE_ONLY or WITH NO_LOG is deprecated. The simple recovery model should be used to automatically truncate the transaction log.


In the next post we would see how to make use of Extended Events in SQL Server 2008.

Conclusion: 

For greatest longevity of your applications, avoid using features that cause the deprecation announcement event class (or) the deprecation final support event class.

Monday, April 09, 2012

Clustered Index on an UniqueIdentifier column is costly

Generally having a clustered index on a UniqueIdentifier column is going to be costly. Even if we add some 5000 or 10000 records in the table the fragmentation level would be surely around 95+ % which means we have to very frequently REORGANIZE or REBUILD that index. So always i would suggest lets double check and be very sure that we need a GUID column and not an INT or BIGINT column.

Fragmentation Level would be too heavy

Lets create a sample to see how data is getting fragmented for different datatypes.


-----------------------------------------------------
--Demo Table Creation Scripts
-----------------------------------------------------

--Table with UniqueIdentifier as Clusetered Index
CREATE TABLE dbo.TblUID
(
Sno Uniqueidentifier NOT NULL DEFAULT NEWID(),
FirstName VARCHAR(100) NOT NULL,
DOB DATETIME NOT NULL,
CONSTRAINT pk_tblUid PRIMARY KEY CLUSTERED(sno asc)
);

--Table with UniqueIdentifier as Clusetered Index
CREATE TABLE dbo.TblSEQUID
(
Sno Uniqueidentifier NOT NULL DEFAULT NEWSEQUENTIALID(),
FirstName VARCHAR(100) NOT NULL,
DOB DATETIME NOT NULL,
CONSTRAINT pk_tblSEQUid PRIMARY KEY CLUSTERED(sno asc)
);

--Table with INTeger column as Clusetered Index
CREATE TABLE dbo.TblInt
(
Sno INT IDENTITY(1,1) NOT NULL,
FirstName VARCHAR(100) NOT NULL,
DOB DATETIME NOT NULL,
CONSTRAINT pk_tblInt PRIMARY KEY CLUSTERED(sno asc)
);

--Table with BIGINTeger as Clusetered Index
CREATE TABLE dbo.TblBigInt
(
Sno BIGINT IDENTITY(1,1) NOT NULL,
FirstName VARCHAR(100) NOT NULL,
DOB DATETIME NOT NULL,
CONSTRAINT pk_tblBigInt PRIMARY KEY CLUSTERED(sno asc)
);


-----------------------------------------------------
--Create NON Clustered Index on DOB Column
-----------------------------------------------------
CREATE NONCLUSTERED INDEX nx_tbluid_dob on dbo.tbluid(DOB);
CREATE NONCLUSTERED INDEX nx_tblsequid_dob on dbo.tblSEQUid(DOB);
CREATE NONCLUSTERED INDEX nx_tblInt_dob on dbo.tblInt(DOB);
CREATE NONCLUSTERED INDEX nx_tblBigInt_dob on dbo.tblBigInt(DOB);


/*
Insert dummy records in each of the table. 
Change the number near GO for capturing data for more records
*/

SET NOCOUNT ON;

INSERT INTO dbo.TblUID (FirstName, DOB)
SELECT REPLICATE('x',100),Getdate();
GO 10000

INSERT INTO dbo.TblSEQUID (FirstName, DOB)
SELECT REPLICATE('x',100),Getdate();
GO 10000

INSERT INTO dbo.TblInt(FirstName, DOB)
SELECT REPLICATE('x',100),Getdate();
GO 10000

INSERT INTO dbo.TblBigInt(FirstName, DOB)
SELECT REPLICATE('x',100),Getdate();
GO 10000


---------------------------------------------------
--Check the Space Used and Fragmentation Level
---------------------------------------------------
EXEC sp_spaceused tblUID, True;
SELECT index_type_desc, avg_fragmentation_in_percent, page_count, record_count
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'tblUID'), NULL, NULL, NULL)
OPTION (MAXDOP 1);

EXEC sp_spaceused tblSEQUID, True;
SELECT index_type_desc, avg_fragmentation_in_percent, page_count, record_count
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'tblSEQUID'), NULL, NULL, NULL)          
OPTION (MAXDOP 1);
  
EXEC sp_spaceused tblINT, True;
SELECT index_type_desc, avg_fragmentation_in_percent, page_count, record_count
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'tblINT'), NULL, NULL, NULL)
OPTION (MAXDOP 1);

EXEC sp_spaceused tblBIGINT, True;
SELECT index_type_desc, avg_fragmentation_in_percent, page_count, record_count
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'tblBIGINT'), NULL, NULL, NULL)
OPTION (MAXDOP 1);

Inspired by this blog post of Kimberlyhttp://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx



Sunday, March 25, 2012

Create CLUSTERED Index first then NON CLUSTERED indexes

We might have heard that always we need to create our CLUSTERED index first then NONCLUSTERED indexes.

Why is that? What would happen if NONCLUSTERED indexes are created first and then we create the CLUSTERED index?

If you create NONCLUSTERED indexes first and then CLUSTERED index internally ALL NONCLUSTERED indexes on that table would get recreated. On a big table this might take for ever to create the CLUSTERED Index itself.

Example:

In the sample shown in blog post titled "Query tuning using SET STATISTICS IO and SET STATISTICS TIME" we had created couple of NONCLUSTERED indexes alone. Now, let us assume we need to create a CLUSTERED index for that table on ProductID column.

First enable SET STATISTICS PROFILE ON so that we can see the profile information of the scripts we are going to execute. Then execute the below script:

--Script to create CLUSTERED index on ProductID column
CREATE CLUSTERED INDEX [ix_productId] ON [dbo].[tblTest]
(
[ProductID] ASC
)
GO

During this process it would have recreated both of these NONCLUSTERED indexes 
  1. [nc_ix_manufacturing_dt] & 
  2. [nc_ix_productName]


Friday, March 23, 2012

Declaring VARCHAR without length

Do you find anything wrong with this script?

CREATE PROCEDURE uspProcedureName
      @param1 VARCHAR
AS
....
....

If you aren't sure may be you should read this post completely without fail :)

All this while I was thinking that it is a well known issue until last week I saw a stored procedure something similar to the one shown above. Who ever created that stored procedure hasn't bothered to specify the length. Before jumping into the explanation of why we should SPECIFY THE LENGTH ALWAYS let us do a small exercise to understand this better.

Guess the results:

Try to answer what would be the output before checking the result.

--Declaring a variable without specifying the length
DECLARE @strSentence VARCHAR
SET @strSentence = 'Rajinikanth is always the NO 1 hero of India'
SELECT @strSentence

Expected Output: Rajinikanth is always the NO 1 hero of India
Actual Output: R

--While CASTing / CONVERTing
--The given string has 36 characters.
SELECT CONVERT(VARCHAR, '123456789123456789123456789123456789');
SELECT CAST('123456789123456789123456789123456789' AS VARCHAR);

Expected Output: 123456789123456789123456789123456789
Actual Output: 123456789123456789123456789123

As you could see for Variables the default length for VARCHAR is 1. When used within CONVERT/CAST the default length for VARCHAR is 30. 

So if we don't specify the length ourself these are the default values SQL Server uses - which means the data what we would be expecting to get stored in the database would have got silently truncated without our knowledge. These are one such tricky issues which if we aren't aware would mean we have to spend hell a lot of time debugging our scripts at a later stage.

Now to answer the first question on what is wrong with this script let us create a sample table and a stored procedure similar to that and see what it does.

--Sample table
CREATE TABLE dbo.tblDemo
(
ID INT,
FirstName VARCHAR(30)
)
GO

--Stored Procedure to Insert data into the sample table
CREATE PROCEDURE dbo.uspInsertData
@ID INT,
@FName VARCHAR -- See there is no length specified here
AS
BEGIN
    SET NOCOUNT ON;
    INSERT INTO dbo.tblDemo VALUES (@ID, @FName)
END
GO

--Test
EXEC dbo.uspInsertData 10, 'Vadivel'
GO
SELECT ID, FirstName FROM dbo.tblDemo
GO

The output would be just 'V'. Once again the data has got truncated silently. So ALWAYS SPECIFY THE LENGTH OF THE STRING WITHOUT FAIL.


--Clean Up
DROP PROC dbo.uspInsertData;
DROP TABLE dbo.tblDemo;

Wednesday, March 21, 2012

Query tuning using SET STATISTICS IO and SET STATISTICS TIME

Often I find people aren't making use of the benefit of SET STATISTICS IO and SET STATISTICS TIME while trying to tune their queries.

Bottom-line is we want our queries to run as fast as possible. One of the challenges we face is not all environments which we would be working on are similar. The configuration, loads et al would be different between our Development box, Staging box, Production box etc., So how can we measure whether the  changes which we do really improves the performance and it would work well in other environmentts as well?

Let's try to understand few basics before seeing some code in action. For any query to be executed by SQL Server it uses many server resources. One such is "Amount of CPU resources it needs to run the query". This information would remain almost the same (There might be minimal changes in milliseconds) between executions. Another SQL resource which it needs for executing a query is IO. It would first check the Memory/Data Cache (Logical Reads) for the availability of the required data. In case the data is not available then it goes to Disk (Physical Reads) for fetching that data. So if queries are going to take more CPU and IO resources then the query is going to perform slower. So our goal should be to (re)write queries which use "Less CPU and IO resources".

Theory sounds good but how would we measure whether the changes we make to our queries are increasing or decreasing CPU/IO resource usages? This is where SET STATISTICS IO and SET STATISTICS TIME commands would come handy to us.

Things to look for in SET STATISTICS IO while tuning the queries: 

1. Scan Count - Number of times the underlying table in the query was accessed. If your query is not having any JOINs then you can ignore this count itself. Instead if your query is having JOINs then this is something which you need to watch out for during the course of your query tuning process. Needless to say, if this count goes down during your tuning then it's a good sign.

2. Logical Reads - This is the best piece of information to rely on. This count says how many PAGES it has to read from DATA CACHE to provide the desired result. This information is something which doesn't change between multiple executions. So we can rely on this output item while tuning the query. Fewer the Logical Reads better the performance would be.

Here I would like to elaborate a bit on Logical Reads. For whatever query we run SQL Server would check the DATA CACHE first for availability of data. In case required data pages are not available in DATA CACHE then it would go to the physical disk (Physical Reads) to fetch those DATA PAGES and move it to DATA CACHE.
So it means for SQL Server to execute a given query it requires all data to be available in the DATA CACHE first.
Apart from these two, the rest of them in SET STATISTICS IO are not useful for us while working on tuning the queries.

Things to look for in SET STATISTICS TIME while tuning the queries:

1. CPU Time - This tells us how much CPU time was used to execute the query. This information would be reliable. Though we could see slight variations in this number each time we run in the same box or different machines but it would be very minimal.

Always look for the "SQL Server Execution Times:" which occurs immediately after (x rows affected) line in the output.

There is another output item "Elapsed Time". This tells us the total amount of time it took from start to end. Elapsed time is not useful for query tuning as this number can vary drastically based on server load.

How to enable these commands?

By default, both SET STATISTICS IO and SET STATISTICS TIME would be disabled. So to enable them make use of either one of these options.

Option 1. Tools > Options > Query Execution > SQL Server > Advanced > Click on the check-box besides these commands.

Edit: We can also do this - Query > Query Options > Advanced.

Option 2. We can make use of the SQL command to do it.

SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO

..... Your query comes here .....

SET STATISTICS IO OFF
GO
SET STATISTICS TIME OFF
GO

Examples


Though this post is not about explaining how to tune queries thought would share some basic examples to show case these commands in action.

--Setting up a Sample table for demonstration purpose
CREATE TABLE tblTest
(
   ProductID INT IDENTITY,
   ProductName VARCHAR(50) NOT NULL,
   Manufacturing_Date DATETIME NOT NULL
)
GO

Let us make use of our Random records generator UDF to populate some test data into this table.

--Populating test data
INSERT INTO tblTest
SELECT
   dbo.udf_StringGenerator('A', 50),
   GETDATE() + (RAND() * (39 * 365))
GO 100000

Important: Always before you start tuning your query for performance run the below two commands.

**Please be advised not to use these commands in the PRODUCTION environment directly. Use these in Development or Test Environment ONLY**

DBCC DROPCLEANBUFFERS -- Clear SQL Server Data Cache
DBCC FREEPROCCACHE -- Clears SQL Server Procedure Cache

Example 1: Let's write a query to list total count of all products whose names start with 'ab'.

SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO

--I have 146 records in my table matching this condition.
--It would be different for you as we have generated random records.
SELECT COUNT(*) FROM tblTest WHERE ProductName LIKE 'ab%';

/*
(1 row(s) affected)
Table 'tblTest'. Scan count 1, logical reads 971, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 48 ms.
*/

There is no index on ProductName column. So lets create a non-clustered index on that column and then check the query again.

CREATE NONCLUSTERED INDEX [nc_ix_productName] ON [dbo].[tblTest]
(
[ProductName] ASC
)
GO

--Lets check again!
SELECT COUNT(*) FROM tblTest WHERE ProductName LIKE 'ab%';

/*
(1 row(s) affected)
Table 'tblTest'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.
*/

So by creating appropriate index we are able to bring down the CPU time and Logical Reads.


Example 2: Let's write a query to list total count of all products whose manufacturing year is 2012.

--I have 1923 records in my table matching this condition. 
--It would be different for you as we have generated random records.
SELECT COUNT(*) FROM tblTest WHERE YEAR(Manufacturing_Date) = 2012

/*
(1 row(s) affected)
Table 'tblTest'. Scan count 1, logical reads 971, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 33 ms.
*/

There is no index on Manufacturing_Date column as well. So lets create a non-clustered index on that column and then check the query again.

CREATE NONCLUSTERED INDEX [nc_ix_manufacturing_dt] ON [dbo].[tblTest]
(
   [Manufacturing_Date] ASC
)
GO

--Let's check it again!
SELECT COUNT(*) FROM tblTest WHERE YEAR(Manufacturing_Date) = 2012

/*
(1 row(s) affected)
Table 'tblTest'. Scan count 1, logical reads 237, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


SQL Server Execution Times:
   CPU time = 32 ms,  elapsed time = 37 ms.
*/

There is only a slight improvement and wait a minute the existing query is NOT SARGable (Check out "What does Avoid NON SARGable means?") So lets rewrite the query and make it SARGable.

--Lets rewrite the query to make the query SARGABLE.
SELECT COUNT(*) FROM tblTest
WHERE Manufacturing_Date > '20111231' and Manufacturing_Date < '20130101';

/*
(1 row(s) affected)
Table 'tblTest'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.
*/


SET STATISTICS IO OFF
GO
SET STATISTICS TIME OFF
GO

So always while performance tuning queries make use of SET STATISTICS IO and the SET STATISTICS TIME commands. Within it watch out for Logical Reads, the CPU time and/or Scan Count to know whether your query tuning steps are really improving or decreasing the performance.

Tuesday, March 20, 2012

Find the last day of the month

Prior to SQL Server 2012 we can make use of DATEADD function to find the last day of a month for the provided date.

DECLARE @dtTempDate DATETIME
SELECT @dtTempDate = GETDATE()

In SQL Server 2005 or 2008 these are couple of ways by which we can get the desired

--Option1
SELECT DATEADD(DAY, -1, DATEADD (MONTH, MONTH (@dtTempDate) , DATEADD (YEAR,  YEAR (@dtTempDate) - 1900, 0 )))

--Option2
SELECT DATEADD( DAY, -1, DATEADD(MONTH, 1, DATEADD(DAY, 1 - DAY(@dtTempDate), @dtTempDate)))

--Option3
SELECT DATEADD(DAY, -1, DATEADD(MONTH, DATEDIFF(MONTH, 0, @dtTempDate) +1, 0))

Now in SQL Server 2012 there is a new Date and Time function named EOMONTH.
EOMONTH ( start_date [, month_to_add ] )

Solution using EOMONTH which would work only in SQL Server 2012 and above:

SELECT EOMONTH (@dtTempDate) AS [Current Month Last Date]

--here the second parameter tells how many MONTHS to add to the given input
SELECT EOMONTH (@dtTempDate,1) AS [Next Month Last Date]
SELECT EOMONTH (@dtTempDate, -1) AS [Previous Month Last Date]



Beware EOMONTH might throw up wrong results when used in Date Range queries. Let me try to explain it using an example.

CREATE TABLE tblSales
(
SalesDate DATETIME,
Orders INT
)
GO

INSERT INTO tblSales VALUES ('2012-03-19 00:00:00.000',10)
INSERT INTO tblSales VALUES ('2012-03-20 07:13:54.420',1)
INSERT INTO tblSales VALUES ('2012-03-31 00:00:00.000',100)
INSERT INTO tblSales VALUES ('2012-03-31 00:00:01.001',200)
INSERT INTO tblSales VALUES ('2012-03-31 07:12:29.123',500)
GO


Assume current date is 19th March 2012. We need to list all records between 19th March to End of that Month. i.e., 19th March 2012 TO 31st March 2012

In prior versions of SQL Server: Using BETWEEN would show inaccurate results as it takes only till mid night of 31st March 2012. The below query would list only the first 3 records and it would ignore the Orders 200 and 500 which is obviously wrong.

SELECT SalesDate, Orders FROM tblSales
WHERE SalesDate BETWEEN '20120319' AND '20120331';


So to get proper result we need to make use of something like SalesDate > 19th March AND SalesDate is < April 1st.


SELECT SalesDate, Orders FROM tblSales
WHERE SalesDate >= '20120319' AND SalesDate < '20120401';

Using EOMONTH in SQL Server 2012:

--This would translate exactly to the query which we had written for prior version using BETWEEN
--So what does it mean? Its going to display wrong results!!

SELECT SalesDate, Orders FROM tblSales
WHERE SalesDate BETWEEN '20120319' AND EOMONTH('20120319');


--Add a day to the output of EOMONTH to fetch the accurate result
SELECT SalesDate, Orders FROM tblSales 
WHERE SalesDate >= '20120319' AND SalesDate < DATEADD(DAY, 1, EOMONTH('20120319'))



Monday, March 19, 2012

Download and Install SQL Server 2012 Product Documentation

When you install SQL Server 2012 by default it installs only the Help viewer and its related components. It doesn't install any help documentation locally and user has to go online for checking any reference. But that concept won't work for me as I have installed SQL Server 2012 in a laptop which doesn't have internet connection at all. 

Download SQL Server 2012 Product Documentation

So I was searching for 'downloadable version of SQL Server 2012 product documentation' and I found it here - http://www.microsoft.com/download/en/details.aspx?id=347

I downloaded SQLServer2012Documentation.exe which was approx 204 MB and ran the exe which created a folder named "SQLServer2012Documentation" and extracted the following:
  • Folder named "Packages" which had 25 CAB files (cabinet files)
  • Four HTML document (book1, book2, book3, product1)
  • One MSHA file (HelpContentSetup.msha)
So over all 30 items are extracted.

Installing the SQL Server 2012 Product Documentation

1. Goto Start > Program files > SQL Server 2012 >> Documentation & Community >> Manage Help Settings
2. Click on the link "Choose online or local help" and then choose "I want to use local help" before clicking on OK button.
3. Click 'Install content from disk' and browse to the directory where you unpacked the downloaded EXE.
4. Select the HelpContentSetup.msha file (MSHA = Microsoft Help Asset File) and click Open. 
5. Click Next.
6. There will be 3 option listed: Books Online, Developer Reference, Installation and besides each of them there would be an 'Add' link. 
7. Just click on the 'Add' link next to the documentation you want to install and click on 'Update' button.
8. Continue option 7 until you have installed all the documentation you wanted.

 


Adventure Works database for SQL Server 2012

At last during the weekend I was able to download and install SQL Server 2012 standard edition in my laptop. Next to get started I was looking for sample database for SQL Server 2012. Using the search string "adventureworks 2012 download" I found the below download link.

  • Adventure Works DB for SQL Server 2012 - http://msftdbprodsamples.codeplex.com/releases/view/55330
  • Actually it downloads the DATA file named AdventureWorks2012_Data.mdf and the size when I downloaded was 189 MB.
  • Copy the DATA file to the location where you want it to be stored in your hard drive and then make use of that path in "FILENAME" attribute within CREATE DATABASE syntax.

CREATE DATABASE AdventureWorks2012
ON
(
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\DATA\AdventureWorks2012_Data.mdf'
)
FOR ATTACH_REBUILD_LOG;



Reference: CREATE DATABASE syntax for SQL Server 2012

Saturday, February 04, 2012

Foreign key doesn't create an Index automatically

There are still people who are believing that Foreign Key does create an index automatically by SQL Server. I think since Primary key by default creates a Clustered Index people are assuming that Foreign keys would also create an Index automatically.


This is a myth and SQL Server does NOT automatically create an index on a foreign key columns.

But one of the best practices for Index tuning is to Index all the columns which are part of a foreign key relationship.

Check out the MSDN documentation for the sub heading Indexing FOREIGN KEY Constraints. The first line says "Creating an index on a foreign key is often useful ....". Microsoft wouldn't be saying this if FK is automatically indexed.

Friday, February 03, 2012

Avoid using SCOPE_IDENTITY and @@IDENTITY

Avoid using SCOPE_IDENTITY() and @@IDENTITY functions if your system is using Parallel Plans.


Extract from the above link:
Posted by Microsoft on 3/18/2008 at 1:10 PM
Dave, thanks to your very detailed and dilligent report I was able to find the problem. Yes, it's a bug - whenever a parallel query plan is generated @@IDENTITY and SCOPE_IDENTITY() are not being updated consistenly and can't be relied upon. The few workarounds I can offer you for now:
  1. Use MAX_DOP=1 as you are already using. This may hurt performance of the SELECT part of your query.
  2. Read the value from SELECT part into a set of variables (or single tabel variable) and then insert into the target table with MAX_DOP=1. Since the INSERT plan will not be parallel you will get the right semantic, yet your SELECT will be parallel to achieve performance there if you really need it.
  3. Use OUTPUT clause of INSERT to get the value you were looking for, as in the example I give further below. In fact I highly recomend using OUTPUT instead of @@IDENTITY in all cases. It's just the best way there is to read identity and timestamp.
  4. Changing autostas is NOT a good workaround. It may hide the problem for a while but a prallel plan will get produced eventually.
  5. Force serial plans for entire server via sp_configure 'max degree of parallelism' option.

Check out this KB Article for more information on this. As of SQL Server 2008 R2 this issue is not yet resolved and am hopeful that this would be fixed in SQL Server 2012.

Wednesday, February 01, 2012

Arithmetic overflow error converting IDENTITY to data type int

If we have an IDENTITY column and if our insert statement is trying to exceed the maximum value of INTEGER then it would throw this error.

To know the range for TinyInt, SmallInt, INT, BIGINT check out this MSDN link

Lets reproduce the error for an TINYINT column

CREATE TABLE dbo.tblIdentityTest
(
Sno TINYINT IDENTITY(250,1) --Max limit of TinyInt is 255
,Firstname VARCHAR(20)
)
GO

--These records would get inserted
INSERT INTO dbo.tblIdentityTest VALUES ('250')
INSERT INTO dbo.tblIdentityTest VALUES ('251')
INSERT INTO dbo.tblIdentityTest VALUES ('252')
INSERT INTO dbo.tblIdentityTest VALUES ('253')
INSERT INTO dbo.tblIdentityTest VALUES ('254')
INSERT INTO dbo.tblIdentityTest VALUES ('255')
GO


SELECT * FROM dbo.tblIdentityTest
GO

--As TINYINT has already reached it max limit any new insertion would fail
INSERT INTO dbo.tblIdentityTest VALUES ('This would fail')
GO

Msg 8115, Level 16, State 1, Line 1
Arithmetic overflow error converting IDENTITY to data type tinyint.
Arithmetic overflow occurred.

--Cleanup
DROP TABLE dbo.tblIdentityTest
GO

Lets reproduce the error for INTEGER column:

CREATE TABLE dbo.tblIdentityTest2
(
Sno INT IDENTITY(2147483645,1) --Max limit of INT is 2,147,483,647
,Firstname VARCHAR(20)
)
GO

--These records would get inserted
INSERT INTO dbo.tblIdentityTest2 VALUES ('Alpha')
INSERT INTO dbo.tblIdentityTest2 VALUES ('Beta')
INSERT INTO dbo.tblIdentityTest2 VALUES ('Gamma')
GO


SELECT * FROM dbo.tblIdentityTest2
GO


--As INT has already reached it max limit any new insertion would fail
INSERT INTO dbo.tblIdentityTest2 VALUES ('This would fail')
GO

Msg 8115, Level 16, State 1, Line 1
Arithmetic overflow error converting IDENTITY to data type int.
Arithmetic overflow occurred.

--Cleanup
DROP TABLE dbo.tblIdentityTest2

In both of the above cases for demonstration purpose we had seeded the value nearest to its maximum limit of that data type. Though we know that we have very less rows in the table still SQL Server would throw this Arithmetic overflow error. This would show us that SQL Server doesn't automatically try to fill the existing gaps in the Identity sequence!

Normally based on the requirements we have to choose an appropriate data type to store our data. For the above issues one of the easiest way to solve this issue is to change the data type as BIGINT.


ALTER TABLE dbo.tblIdentityTest2 ALTER COLUMN Sno BIGINT
GO

This way now the table can store upto maximum of  2^63-1 (9,223,372,036,854,775,807) records in it. Actually, if we need, we can start the sequence from the minimum value of BIGINT (-9223372036854775808).

CREATE TABLE dbo.tblIdentityTest3
(
Sno INT IDENTITY(-9223372036854775808,1) 
,Firstname VARCHAR(20)
)
GO

So this table can now store data from -9223372036854775808 to 0 and then from 0 to 9223372036854775807. It can store huge amount of rows before it can run out of numbers.

IDENTITY column can also be of data type NUMERIC or DECIMAL. The only restriction there is its scale can ONLY be 0. As decimal data type cannot have more than 38 precision the value it can store is -99999999999999999999999999999999999999 to 99999999999999999999999999999999999999

CREATE TABLE dbo.tblIdentityTest3
(
Sno DECIMAL(38,0) IDENTITY(-99999999999999999999999999999999999999,1)
,Firstname VARCHAR(20)
)
GO


Related posts: $IDENTITY in SQL Server