Skip to main content

SQL Server performance for NOT EXISTS vs NOT IN

Though "NOT EXISTS" and "NOT IN" sounds similar there is quite a lot of difference between them.  To start with check out the blog post by Mladen.

In continuation to what Mladen has already written I thought I would show the differences it makes on the Execution Plan and the IO / Time when we use NOT EXISTS or NOT IN in our queries. Let's see few of the differences between them.

Case 1: Lets use them in columns which are declared as NOT NULL

SET NOCOUNT ON
GO


CREATE TABLE PackageInformation
(
Sno INT IDENTITY(1,1) PRIMARY KEY,
PackageID INT NOT NULL,
PackageName VARCHAR(20)
)
GO

-- I am using the random records generator which I wrote few days back to populate data into this table.
-- generating 10000 records
INSERT INTO PackageInformation  (PackageID, PackageName)
SELECT CAST(RAND(CHECKSUM(NEWID())) * 10000 AS INT),
dbo.udf_StringGenerator('A', 20)
GO 10000


CREATE TABLE ChildTable
(
RID INT IDENTITY(1,1) PRIMARY KEY,
PackageID INT NOT NULL
)
GO

-- Lets take some 40% (approx 4000 records) from the PackageInformation table to populate this table
INSERT INTO ChildTable (PackageID)
  SELECT PackageID
  FROM dbo.PackageInformation
  TABLESAMPLE (40 PERCENT); -- This would work only on SQL Server version 2005 or above
GO

Let's write a query to list Package details from PackageInformation table which is not present in ChildTable.

--Clear out the cache (DONT TRY THIS IN PRODUCTION ENVIRONMENT)
DBCC FREEPROCCACHE
GO

SET STATISTICS IO ON
SET STATISTICS TIME  ON
GO

--Press Control + M to display the Actual Execution Plan of the queries
--Query1: Using NOT IN
SELECT PackageID, PackageName 
FROM dbo.PackageInformation 
WHERE PackageID NOT IN (SELECT PackageID FROM ChildTable)
GO

--Query2: Using NOT EXISTS
SELECT PackageID, PackageName 
FROM dbo.PackageInformation
WHERE NOT EXISTS
 (
   SELECT PackageID FROM ChildTable 
   WHERE ChildTable.PackageID = PackageInformation.PackageID
)
GO

SET STATISTICS IO OFF
SET STATISTICS TIME  OFF
GO

Result:

i) Query using NOT IN : Returned 4690 records
ii) Query using NOT EXISTS : Returned 4690 records

iii) Let's see the Actual execution plan for both the queries. 

iv) Let's also check on the logical reads and CPU time taken for these queries.

So when the column is declared as NOT NULL then both NOT IN and NOT EXISTS seems to perform the same way.

Case 2: Let's change the PackageID column as NULL

ALTER TABLE dbo.PackageInformation
ALTER COLUMN PackageID INT NULL
GO

ALTER TABLE dbo.ChildTable
ALTER COLUMN PackageID INT NULL
GO

--Lets insert some 100 null values into the PackageInformation Table
INSERT INTO PackageInformation 
SELECT NULL,
dbo.udf_StringGenerator('A', 5)
GO 100


Let's run the same query which we used in Case 1 to list Package details from PackageInformation table which is not present in ChildTable.

--Clear out the cache (DONT TRY THIS IN PRODUCTION ENVIRONMENT)
DBCC FREEPROCCACHE
GO

SET STATISTICS IO ON
SET STATISTICS TIME  ON
GO

--Query1: Using NOT IN
SELECT PackageID, PackageName 
FROM dbo.PackageInformation 
WHERE PackageID NOT IN (SELECT PackageID FROM ChildTable)
GO

--Query2: Using NOT EXISTS
SELECT PackageID, PackageName 
FROM dbo.PackageInformation
WHERE NOT EXISTS
 (
   SELECT PackageID FROM ChildTable 
   WHERE ChildTable.PackageID = PackageInformation.PackageID
)
GO

SET STATISTICS IO OFF
SET STATISTICS TIME  OFF
GO


Result:

i) Query using NOT IN : Returned 4690 records (It hasn't considered those 100 new NULL records which we added!!)
ii) Query using NOT EXISTS : Returned 4790 records

iii) Let's see the Actual execution plan for both the queries. 

iv) Let's also check on the logical reads and CPU time taken for these queries.

So when column is declared as NULL then NOT IN seems to generate a pretty complicated execution plan and does NUMEROUS number of logical reads more than NOT EXISTS. So the winner here is NOT EXISTS.

Case 3: Adding NULL values into ChildTable

INSERT INTO ChildTable (PackageID)
SELECT NULL

Result:

i) Query using NOT IN : Returned 0 records!
ii) Query using NOT EXISTS : Returned 4790 records

So if the Subquery returns even one NULL then NOT IN operator would not return any result which isn't right. So again the winner is NOT EXISTS.

I think it would be safe to say that we should use NOT EXISTS instead of NOT IN as it seems to work as expected by us in all the scenarios which we saw in this post.

--Cleanup
DROP TABLE ChildTable
GO
DROP TABLE PACKAGEINFORMATION
GO

Comments

Popular posts from this blog

My Wedding Anniversary :)

Six years back on the same day I married Sai Lakshmi (12-July-2000). I know Sai for almost 13 years now :) I fell in love with her during my 12th standard. I know @ 17 yrs any person wouldn't be matured enough to make a big decision like this. But thank God my choice was perfect :) Even now, very often we used to think about the past and laugh at our behaviors/actions then. My love story would be really interesting (at least for me and Sai :)) and I am sure none of you guys would be interested in reading about it so lemme not get into it in-depth. But one thing which I want to share is "Without Sai, I wouldn't have entered into the IT field at all". She was instrumental in convincing me to study my Master's degree in Computer Application. That's the move that changed my career. Till my schooling, my dream was to either become a "big" sportsman (Cricket and Badminton were my favorites at that time.) or an Aeronautics engineer. Unfortunately, my l...

Script table as - ALTER TO is greyed out - SQL SERVER

One of my office colleague recently asked me why we are not able to generate ALTER Table script from SSMS. If we right click on the table and choose "Script Table As"  ALTER To option would be disabled or Greyed out. Is it a bug? No it isn't a bug. ALTER To is there to be used for generating modified script of Stored Procedure, Functions, Views, Triggers etc., and NOT for Tables. For generating ALTER Table script there is an work around. Right click on the table, choose "Modify" and enter into the design mode. Make what ever changes you want to make and WITHOUT saving it right click anywhere on the top half of the window (above Column properties) and choose "Generate Change Script". Please be advised that SQL Server would drop actually create a new table with modifications, move the data from the old table into it and then drop the old table. Sounds simple but assume you have a very large table for which you want to do this! Then it woul...

What should one look @ while buying a land in chennai?

Offlate people have started thinking about investing their money in lands. I too think that to be a wise decision only! As most of us know buying a land in chennai (for that matter any where in the world) isn't an easy affair. I was just wondering what all one needs to look at before deciding to purchase a land. I thought I would put down what ever I know about this subject here. [Guys pls free to correct me if I my understanding is wrong somewhere. That way, it would help me understand as well as others who might read this in future]. Here we go ... 1. One should not buy farm lands if they want to build a residential house sometime later there. Because to my knowledge its illegal to build residential houses on lands meant for irrigation. 2. Encumberance Certificate -- This is what is shortly refered as "EC". One needs to get an EC from local sub registrar office (i guess we need pay a small amount for this). From this we / our lawyers :) can find out whether the guy who ...