Friday, February 4, 2011

Common Table Expressions(CTE)

Common Table Expressions(CTE):

Introduction:
CTE is a new feature provided by Microsoft SQL Server 2005.

In real world, we often need to query hierarchical data from the database. For example, to get a list of hierarchical list of all the employees, list of product categories etc. CTE fulfills this requirement and let us query the database recursively.

A CTE is a "temporary result set" that exists only within the scope of a single SQL statement.
It can be used instead of temp table or table variables in the stored procedures in the circumstances.

Background:

Most of the developers while writing the stored procedures they create the temp tables or table variables. They need some table to store the temporary results in order to manipulate the data in the other tables based on this temp result.

The temp variables will be stored on the tempdb and it needs to be deleted in the tempdb database.

The table variable is best when compare with the temp tables. Because the table variable initially will be there in the memory for the certain limit of size and if the size increase then it will be moved to the temp database. However the scope of the table variable is only up to that program. When compare with table variable the CTE is best. 
It just store the result set like normal view.

CTE  also provides one medium to deal with recursive queries. Moreover, I can say it is mainly for recursive queries. It can be used in replacement of derived tables (sub queries), temp tables, table variables, inline user-defined functions etc.

The Common Table Expression (CTE) is called as a temporary named RESULT SET, within the scope of an executing statement that can be used within a SELECT, INSERT, UPDATE, or DELETE or even in a CREATE VIEW or MERGE statement.
CTE is the best as it act as a normal view only.

The CTE is just store the result as temp result set. It can be access like normal table or view. This is only up to that scope. 

SQL Server supports two types of CTEs—recursive and nonrecursive.
SQL Server supports two types of Common Table Expressions (CTEs):
·         Recursive CTEs:  Here the query executes itself, like we use recursive functions (functions calling itself) in C, java or C# .net programming languages.  This is very useful in case of parent-child hierarchy relationship
·         Non- Recursive CTEs: These are the other complex queries that can be simplified with the help of CTEs like using cursors, getting sequence numbers, filtrations, etc.


Recursive CTEs:
 The Recursive CTEs must have two types of query definition members:
·         Recursive Member: If the query definition references the CTE itself, then it is called Recursive member. This is defined at the last i.e. after defining all the anchor members.
·         Anchor Member: All other CTE query definitions without reference of CTE are Anchor Members. All anchor member query definitions must be put before the first recursive member definition.

The following are the restrictions to Recursive CTEs:
·         In Recursive CTEs, it’s not possible to have multiple query definitions (i.e. multiple CTEs [, separated]) within a single “WITH” statement.
·         All anchor members supports UNION ALL, UNION, INTERSECT, or EXCEPT for combining the anchor definition. But only UNION ALL is supported for combining the last anchor member and the first recursive member and even for multiple recursive members.
·         The total number of columns and the data types of those columns must match for the anchor member and the recursive member. (Note: data type should be same for recursive member and the corresponding anchor member)
·         CTE expression_name can be referenced only once in the FROM clause of a recursive member.



Recursive Common Table Expression Examples: These are very useful in case of hierarchical data, because the CTE Query continues to execute till it reaches its lower granular level.

with MyCTE(x)
as
(
select x = convert(varchar(8000),'hello')
union all
select x + 'a' from MyCTE where len(x) < 100
)
select x from MyCTE
order by x


The below example is based on the “AdventureWorksDW2008R2” sample database.

WITH EmpDetails (EmployeeKey, FirstName, LastName, ParentEmployeeKey, PFirstName, PLastName,  EmployeeRank)
  AS
  (
    SELECT
                  EmployeeKey
                 ,FirstName
                 ,LastName
                 ,ParentEmployeeKey
                 ,CAST('NULL' AS VARCHAR(50)) AS PFirstName
                 ,CAST('NULL' AS VARCHAR(50)) AS PLastName
                 ,1 AS EmployeeRank
    FROM dbo.DimEmployee
                 WHERE ParentEmployeeKey IS NULL
    UNION ALL
    SELECT
                  e.EmployeeKey
                 ,e.FirstName
                 ,e.LastName
                 ,e.ParentEmployeeKey
                 ,CAST(CTE_Emp.FirstName AS VARCHAR(50)) AS PFirstName
                 ,CAST(CTE_Emp.LastName AS VARCHAR(50)) AS PLastName
                 ,CTE_Emp.EmployeeRank + 1 AS EmployeeRank
    FROM dbo.DimEmployee e
      INNER JOIN EmpDetails CTE_Emp
        ON e.ParentEmployeeKey = CTE_Emp.EmployeeKey
  )

SELECT * FROM EmpDetails ORDER BY EmployeeRank
OUTPUT:

Let’s dissect the above query to have a better picture of its working mechanism.

Basically any recursive CTE consists of three elements:
1.     First invocation of the CTE: In first invocation, all the members of the CTE will be executed i.e. both the anchor members and the recursive members. And as we have CTE referenced in the recursive members, it will automatically invoke the CTE for further iterations and thus called as recursive mode or recursive invocation. In this case Anchor member’s result set will be considered as the referenced data for the CTE. As in the above figure, in the first invocation, r0 is the value for the Anchor query and hence become the data for the CTE when referenced from the Recursive query.
2.     Recursive invocation of the CTE:  In all recursive invocation of the CTE, only recursive members will be executed and the anchor members will remain idle. And the data for the CTE referenced will be the result set built from the last call of the CTE. Elaborated briefly below.
3.     Recursive Termination Condition: Well practically, we don’t have any control on the termination of the Recursive CTEs. I mean it is implicit; it will stop itself when no rows are returned from the previous invocation. Hence, the control condition can be applied indirectly i.e. the query’s result set can be limited with the help of filter conditions.
Let’s look around the execution process of the above recursive CTE.
First Round: As in the above picture, r0 is the result set of the Anchor query. And when coming to Recursive query, r0 become the CTE value when referenced and joined with the Employee table E.
Anchor Resultset:  r[Top level employee with id 112]

Recursive Resultset (r1): E inner join r[based on joining condition E.ParentEmployeeKey = r0.EmployeeKey] i.e. all 2nd level employees

Second Round: From second round, the Anchor query will remain idle. Only the recursive queries will be executed. But the referenced CTE value will be the value of the last invocation only (r1and not the union of [(r0) and (r1)]
Anchor Resultset: No result set (query remain idle)
Recursive Resultset (r2): E inner join r[based on joining condition E.ParentEmployeeKey = r1.EmployeeKey] i.e. all 3rd level employees

and so on till (rn), till it reaches all the levels i.e. when there will be no records from the previous invocation of the CTE.
After execution of all the iterations, the final result set will be accumulated by merging all partial result sets by UNION ALL i.e. [(r0) + (r1) + (r2) + … + (rn)]

If the conditions applied are not valid inside CTE query definition i.e. if the CTE is queried incorrectly, it may enter to an infinite loop.  For example, in the above example, instead of “ON e.ParentEmployeeKey = CTE_Emp.EmployeeKey” if we apply “ON e.EmployeeKey= CTE_Emp.EmployeeKey”; it will enter an infinite loop.  May be the system will crash out of memory (kidding :) ). Once it extends beyond the number of recursion permitted (by default it is 100), it will show an error like the below:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
In order to avoid this type of error we can use MAXRECURSION hint in the OPTION clause of the outer query where the CTE is referenced. We can limit the number of recursion levels allowed for a particular statement by using the MAXRECURSION hint and a value between 0and 32,767. Like for example, in the above problem, if we want to see the result set till the employee Rank 3 (employee level), we can use the MAXRECURSION hint to ‘2’ (two sub level + 1 root level) as below:

WITH EmpDetails (EmployeeKey, FirstName, LastName, ParentEmployeeKey, PFirstName, PLastName,  EmployeeRank)
  AS
  (
    SELECT
                 EmployeeKey
                 ,FirstName
                 ,LastName
                 ,ParentEmployeeKey
                 ,CAST('NULL' AS VARCHAR(50)) AS PFirstName
                 ,CAST('NULL' AS VARCHAR(50)) AS PLastName
                 ,1 AS EmployeeRank
    FROM dbo.DimEmployee
                 WHERE ParentEmployeeKey IS NULL
    UNION ALL
    SELECT
                 e.EmployeeKey
                 ,e.FirstName
                 ,e.LastName
                 ,e.ParentEmployeeKey
                 ,CAST(CTE_Emp.FirstName AS VARCHAR(50)) AS PFirstName
                 ,CAST(CTE_Emp.LastName AS VARCHAR(50)) AS PLastName
                 ,CTE_Emp.EmployeeRank + 1 AS EmployeeRank
    FROM dbo.DimEmployee e
      INNER JOIN EmpDetails CTE_Emp
        ON e.ParentEmployeeKey = CTE_Emp.EmployeeKey
   )

SELECT * FROM EmpDetails -- order by EmployeeRank
OPTION (MAXRECURSION 2)
But in this case we cannot use order by clause with hints.
Some more examples for recursive CTEs:

WITH echo_CTE(st)
AS
(
SELECT st = CAST('Hi ' AS VARCHAR(50))
UNION all
SELECT CAST(st + 'Arun! ' AS VARCHAR(50)) FROM echo_CTE
)
SELECT st FROM echo_CTE
OPTION (MAXRECURSION 3)

I mentioned CAST operator in the above example, because recursive CTEs are data type sensitive as well as data size sensitive. So, be careful, while selecting the columns for the CTE.
Here the above statement is controlled by “OPTION (MAXRECURSION 3)” (to run only three iteration) but it can be also controlled by using filter condition as below (I modified the above query to give more clarity to the solution).

WITH echo_CTE(st, lencount)
AS
(
SELECT st = CAST('Hi ' AS VARCHAR(50)), LEN('Hi ')
UNION all
SELECT
         CAST(st + 'Arun! ' AS VARCHAR(50))
         , LEN(st) FROM echo_CTE
WHERE lencount < 50
)
SELECT st, lencount FROM echo_CTE

Note in the above output, the Lencount column is always calculating for the value of st from the previous iteration.
Now let’s try one complex operation with CTE, i.e. to split comma separated values (CSV)into table set.

DECLARE @stringst VARCHAR(8000) = 'Sign UP, today, and, start up ,with, new, ideas and, discussions, in, SQL Lion, Forums'
SET @stringst = @stringst + ','; --';' is needed before starting CTE defination

WITH SplitString(startid, endid) AS
(
         SELECT 0 AS startid, CHARINDEX(',',@stringst,0) AS endid
         UNION ALL
         SELECT endid + 1 AS startid, CHARINDEX(',',@stringst, endid + 1) AS endid FROM SplitString
         WHERE endid < LEN(@stringst)
)

SELECT SUBSTRING(@stringst,startid,endid - startid) AS SplitedValue, *, (endid - startid) AS wordSize FROM SplitString

Non recursive:
In its non-recursive form a CTE can serve as a substitute for derived tables or a view.  It provides a convenient tool for creating queries by building tables as and when they are required within nested SELECT statements. This simplifies query building by allowing them to be developed in separate logical blocks.
CTE 1: Simple CTE

WITH
 ProductCTE
AS
(
  SELECT ProductID AS [ID],ProductName AS [Name],CategoryID AS [CID],UnitPrice AS [Price]
  
FROM Products
)
SELECT * FROM ProductCTE

Here all the product details like ID, name, category ID and Unit Price will be retrieved and stored as temporary result set in the ProductCTE.

This result set can be retrieved like table or view.
CTE2:Simple CTE with alias 

WITH
 ProductCTE(ID,Name,Category,Price)
AS
(
  SELECT ProductID,ProductName,CategoryID,UnitPrice
  
FROM Products
)
SELECT * FROM ProductCTE

WITH   CTE1(CL11,CL22) AS (SELECT 1 AS CL1, 3 AS CL2)
      ,CTE2(CL3,cl5,cl6)AS         (SELECT 1,5,7)

SELECT * FROM CTE1 INNER join CTE2 ON CL11 = CL3


Here there are four fieds retrieves from the Products and the alias name have given in the arqument to the CTE result set name.

It also accepts like the following as it is in the normal select query.
Non-Recursive Common Table Expressions: These are CTEs with only Anchor Members andno Recursive members. Even CTEs in non-recursive mode are more powerful. Let me show one example.
To delete redundant records from a table without using any identity column or any other third table.

WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(partition BY col1, col2, col3 ORDER BY col1) AS RecID,
* FROM dbo.ABC
)

DELETE FROM CTE WHERE RecID > 1
And the coolest part of this is that it will not hamper the structure of the table i.e. all constraints and defaults will remain same as it is.


Examples for Recursion:
I have the following table in a SQL Server 2008 database:
Id  Name       ParentFolder

--  ----       ------------

1   Europe     NULL

2   Asia       NULL

3   Germany    1

4   UK         1

5   China      2

6   India      2

7   Scotland   4


I would like to create a view that results in something like this:
Id  Name       FullName

--  ----       --------

1   Europe     Europe

2   Asia       Asia

3   Germany    Europe/Germany

4   UK         Europe/UK

5   China      Asia/China

6   India      Asia/India

7   Scotland   Europe/UK/Scotland

ANSWER:
DECLARE @tbl TABLE (

         Id INT

        ,[Name] VARCHAR(20)

        ,ParentId INT

        )



    INSERT INTO @tbl( Id, Name, ParentId )

    VALUES

     (1, 'Europe', NULL)

    ,(2, 'Asia',   NULL)

    ,(3, 'Germany', 1)

    ,(4, 'UK',      1)

    ,(5, 'China',   2)

    ,(6, 'India',   2)

    ,(7, 'Scotland', 4)

    ,(8, 'Edinburgh', 7)

    ,(9, 'Leith', 8)



    ;

WITH  abcd

        AS (

              -- anchor member

            SELECT  id, [Name], ParentID,

                    CAST(([Name]) AS VARCHAR(1000)) AS "Path"

            FROM    @tbl

            WHERE   ParentId IS NULL

            UNION ALL

              --recursive member

            SELECT  t.id, t.[Name], t.ParentID,

                    CAST((a.path + '/' + t.Name) AS VARCHAR(1000)) AS "Path"

            FROM    @tbl AS t

                    JOIN abcd AS a

                      ON t.ParentId = a.id

           )

SELECT * FROM abcd
 
 
Hierarchical relations can be expressed as a recursive single tables' join. For example, consider table:
create table objects
(
id int not null primary key,
name varchar,
parent_id int
);
And insert:
  insert into objects(id,name,parent_id) values(1,'Cars',null);
 insert into objects(id,name,parent_id) values(2,'Porsche',1);
 insert into objects(id,name,parent_id) values(3,'911',2);
 insert into objects(id,name,parent_id) values(4,'Boxster',2);
 insert into objects(id,name,parent_id) values(5,'Cayman',2);
 insert into objects(id,name,parent_id) values(6,'Cayenne',2);
 insert into objects(id,name,parent_id) values(7,'Panamera',2);
 insert into objects(id,name,parent_id) values(8,'Aston Martin',1);
 insert into objects(id,name,parent_id) values(9,'DB7',8);
 insert into objects(id,name,parent_id) values(10,'DB9',8);
insert into objects(id,name,parent_id) values(11,'Vantage',8);
 insert into objects(id,name,parent_id) values(12,'One',8);

Here we have created hierarchy using parent_id field that refers to the parent row: "Cars" is the root (no parent), 2 brands (Porsche and Aston Martin) belongs to "Cars" and each brand has a list of models. Now, define the recursive CTE SQL to read all porsche models:
WITH Cars(id,parent_id,name,path,level) AS 
( 
   --initialization. read porsche root
   SELECT id,parent_id,name,name,0 as level
   FROM objects
   WHERE id = 2 -- porsche root
   UNION ALL 
   --recursive execution 
   SELECT o.id,o.parent_id,o.name,concat(c.path,'/', o.name), c.level+1
   FROM cars c
      INNER JOIN objects o ON c.id = o.parent_id 
) 
select * from cars;
The result:
id    Parent_id  Name        Path                     Level
----------------------------------------------------------------------------------------
2                1  Porsche     Porsche                      0
3                2  911           Porsche/911               1
4                2  Boxster     Porsche/Boxster          1
5                2  Cayman    Porsche/Cayman         1
6                2  Cayenne   Porsche/Cayenne         1
7                2  Panamera Porsche/Panamera       1
 

1 comment:



  1. Through this post, I know that your good knowledge in playing with all the pieces was very helpful. I notify that this is the first place where I find issues I've been searching for. You have a clever yet attractive way of writing on Msbi online training
    Msbi online training Hyderabad
    Msbi online training India
    Msbi online course

    ReplyDelete