My Query Optimization Story: Comparing the T-SQL OR VS UNION operators in a multi-table JOIN

December 2, 2013

My Query Optimization Story: Comparing the T-SQL OR VS UNION operators in a multi-table JOIN

I was recently tasked with troubleshooting a reporting query that was taking an unexpectedly long time.

In a nutshell, the sales report was aggregating data between two tables that were joined with a common key. The query WHERE clause was filtering for a condition which matched in the first

table, OR a condition which matched in the second table. This was causing the query optimizer to do a full table scan against the millions of rows in each table

I’ve anonymized the table schema but the results are still consistent. In my test case, built using SQL Server 2012, I’m creating two tables and populating one with random US State abbreviations, and the second table with sample sales numbers.

Step One: Create the test environment

/* Create Sample Database*/
USE master
GO

-- Drop the database if it already exists
IF  EXISTS ( SELECT name FROM sys.databases WHERE name = N'TestDB')
DROP DATABASE TestDB
GO
CREATE DATABASE TestDB
GO


/* Create States Table */
USE TestDB
GO

IF OBJECT_ID('dbo.TableA', 'U') IS NOT NULL
  DROP TABLE dbo.TableA
GO

CREATE TABLE dbo.TableA
(
    Counter int NOT NULL PRIMARY KEY CLUSTERED IDENTITY (1,1), 
    StateID char(2) NULL
)
GO

/* CREATE Amount Table */
IF OBJECT_ID('dbo.TableB', 'U') IS NOT NULL
  DROP TABLE dbo.TableA
GO

CREATE TABLE dbo.TableB
(
    Counter int NOT NULL PRIMARY KEY CLUSTERED IDENTITY (1,1), 
    SaleAmount decimal (18,2) NULL
)
GO



/*  Populate the two tables with sample data.  One with random state names and the second with sales  */


/*  INSERT SAMPLE STATE DATA */
GO
INSERT    INTO dbo.TableA
          (StateID)
          VALUES ( CASE  SCOPE_IDENTITY() % 4 WHEN 0 THEN 'WA' WHEN 1 THEN 'OR' WHEN 2 THEN 'ID' WHEN 3 THEN 'CA' END) 
GO 100000


/*  INSERT SAMPLE MONEY DATA */
GO
INSERT    INTO dbo.TableB
          (SaleAmount)
          VALUES (CASE SCOPE_IDENTITY() % 3 WHEN 0 THEN 25 WHEN 1 THEN 50 WHEN 2 THEN 75 WHEN 3 THEN 100 END)
GO 100000




/*    CREATE A NONCLUSTERED INDEX ON EACH TABLE   */
IF EXISTS (SELECT name FROM sysindexes WHERE name = 'ix_TableA_StateID_include') DROP INDEX [dbo].[TableA].[ix_TableA_StateID_include]
GO
CREATE    NONCLUSTERED INDEX ix_TableA_StateID_include ON [dbo].[TableA] (StateID) INCLUDE ([Counter]);
GO
IF EXISTS (SELECT name FROM sysindexes WHERE name = 'ix_TableB_SaleAmount_include') DROP INDEX [dbo].[TableB].[ix_TableB_SaleAmount_include]
go
CREATE    NONCLUSTERED INDEX ix_TableB_SaleAmount_include ON [dbo].[TableB] (SaleAmount) INCLUDE ([Counter]);
GO

Step Two: Run the two queries. Run this in a separate SQL Server Management Studio query window, with the actual query plan displayed.

Use     TestDB;
GO

SET STATISTICS IO ON;

/*    QUERY WITH OR */
SELECT  A.Counter
FROM    dbo.TableA A 
JOIN    dbo.TableB B ON (A.Counter = B.Counter)
WHERE   A.StateID = 'WA' OR B.SaleAmount = 75;
GO


/*    QUERY WITH UNION */
SELECT  A.Counter
FROM    dbo.TableA A 
WHERE   A.StateID = 'WA' 
UNION   
SELECT  Counter
FROM    dbo.TableB B
WHERE   B.SaleAmount = 75;
GO

The IO stats for the two queries show a significant reduction in page reads when the data is extracted as two queries using the UNION operator

(1 row(s) affected)
Table ‘TableA’. Scan count 1, logical reads 188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘TableB’. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

(2 row(s) affected)
Table ‘TableB’. Scan count 1, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘TableA’. Scan count 1, logical reads 41, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Graphical Execution Plan

The graphical execution plan shows that the query containing the WHERE/OR construct uses expensive table scans on both tables to extract the data. This method accounts for 78% of the total cost of the two queries.

In the second example, the optimizer is able to perform much more efficient nonclustered index seeks on the two tables since the query is broken into two SELEcT statements with a UNION clause. This method accounts for 22% of the total cost between the two queries.

In Summary:

A query that filters whether a condition exists in Table A OR exists in Table B will often run more efficiently with two SELECT statements joined with a UNION operator than a single statement with using the WHERE/OR construct.

comments powered by Disqus