Virtual Indexing Part 9: The Indexed Hierarchy

Turbo-charge your SQL hierarchy with a numbers table.

Introduction

Did you know you can add Indexes to your parent-child hierarchy in SQL Server? As tricky as this sounds – it’s actually quite easy with a physical tally (numbers) table and an enumerated sort path (AKA “materialized path”.) The technique in this article will make it possible to easily traverse a DB parent-child hierarchy table with blazingly fast performance.

In Part 3 of this series you were introduced to the Indexed Function. In Part 8 we created an indexed splitter — an indexed view that splits a delimited string once, then stores the “items” in a clustered index for fast retrieval. Building on the indexed split concept, lets create an Indexed Hierarchy, a parent-child hierarchy stored in you SQL database. All upstream and downstream relationships in your hierarchy can be easily traversed without recursion, self-joins, XML or intermediate aggregation tables which need to be maintained. To pull this off we’ll create an indexed view that leverages a tally table and an enumerated sort path.

What I love most about this approach is how it’s not limited to SQL Server, it can be applied to any RDBMS or database system that supports indexed views or similar (which is pretty much everything).

Setup and Sample Data

First for the set-up DDL and sample data. The code below builds (1) a tally (numbers) table, (2) dbo.nodes, a table to store a parent-child relationship — nodeId and parentID, (3) an enumerated path, and (4) sample values for dbo.nodes.

--==== 1.1. Tally table code
IF 1=2 -- remove this if you need a tally table
BEGIN
  IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally;
  CREATE TABLE dbo.tally (N INT NOT NULL);
  
  --==== Add Primary Key and Clustered Index
  ALTER TABLE dbo.tally 
    ADD CONSTRAINT pk_cl_tally PRIMARY KEY CLUSTERED(N) 
      WITH FILLFACTOR=100;
  
  --==== Add a Unique Index; the optimizer picks this, it requires less IO
  ALTER TABLE dbo.tally 
    ADD CONSTRAINT uq_tally UNIQUE NONCLUSTERED(N);
  
  --==== Populate with at least 1 Million Rows
  INSERT dbo.tally(N) SELECT t.N FROM dbo.fnTally(1,1000000) AS t;
END
GO
--==== 1.2. Test Data DDL to build the hierarchy
CREATE TABLE dbo.nodes
(
  ParentId INT NULL,
  NodeId   INT NOT NULL,
  NodePath VARCHAR(200) NOT NULL DEFAULT(''),
  CONSTRAINT pk_nodes__nodeid PRIMARY KEY CLUSTERED(nodeId),
  CONSTRAINT fk_nodes__parentId FOREIGN KEY (parentid) REFERENCES dbo.nodes(nodeId),
);
;--==== 1.3. Add Hierarchy values
INSERT dbo.nodes (ParentId,NodeId)
VALUES (NULL,1),(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(2,9),(2,10),(3,20),(3,25),
  (3,30),(4,100),(4,101),(101,1000),(101,1012),(101,1014),(1014,2000);

Next, a recursive inline table valued function (iTVF) for building our enumerated path, and for maintenance.

;--==== 1.4. Create Function to build the node path; allows for individual updates
CREATE OR ALTER FUNCTION dbo.fnBuildNodePath
(
  @nodeId INT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH BuildNodePath AS 
(
  SELECT n.ParentId, n.NodeId, N=1
  FROM   dbo.Nodes AS n 
  WHERE  n.NodeId = @nodeId
  UNION ALL
  SELECT v.ParentId, v.NodeId, a.N-1
  FROM   dbo.Nodes     AS v
  JOIN   BuildNodePath AS a
    ON a.ParentId = v.NodeId
)
SELECT 
  NodeId   = @NodeId, 
  NodePath = STUFF((
               SELECT   CONCAT('.',p.NodeId)
               FROM     BuildNodePath AS p
               ORDER BY p.N
               FOR XML PATH('')),1,1,'');

dbo.fnBuildNodePath allows us to return the enumerated path for each NodeId. Note the query below and results.

--==== 1.5. Build the enumerated sort paths
SELECT      n.NodeId, b.NodePath
FROM        dbo.nodes                     AS n
CROSS APPLY dbo.fnBuildNodePath(n.NodeId) AS b;
Results
Figure 1: dbo.fnBuildNodePath query results

We can use dbo.fnBuildNodePath to add the enumerated path to dbo.nodes, which is required for our indexed hierarchy. dbo.fnBuildNodePath can also be used to modify records as needed.

The Indexed Hierarchy

Before we create our indexed view lets review the underlying logic. Using dbo.fnTally we could split the enumerated path and return each node and downstream members it’s node tree.

Indexed Hierarchy Logic
;--==== 2. Index parsing logic using fnTally
SELECT
  n.NodeId,
  n.ParentId,
  n.NodePath,
  CurrentNode = Nd.C,
  IsLeafNode  = IIF(n.nodePath=p.Nd,1,0), -- leaf or descendant
  NodeDepth   = LEN(Nd.C) - LEN(REPLACE(Nd.C,'.',''))+1
FROM        dbo.nodes AS n
CROSS APPLY (VALUES(LEN(n.NodePath)+1))             AS np(Ln)
CROSS APPLY dbo.fnTally(1,np.Ln)                    AS t
CROSS APPLY (VALUES(SUBSTRING(n.NodePath,1,t.N-1))) AS p(Nd)
CROSS APPLY (VALUES(ISNULL(p.Nd,0)))                AS Nd(C)
CROSS APPLY (VALUES(IIF(n.nodePath=p.Nd,1,0)))      AS l(LNd)
WHERE       (SUBSTRING(n.NodePath,t.N,1)='.' OR t.N=np.Ln)
--ORDER BY Nd.C, -IIF(n.nodePath=p.Nd,1,0); -- for presentation
The denormalized result set

This query splits returns each node to return a row for each of it’s descendants.

Figure 2: Denormalized subsets of all node trees in dbo.nodes

CurrentNode represents the top each node tree; all records where IsLeafNode = 0 are downstream members of CurrentNode‘s node tree. This design makes it easy to group, filter and analyze hierarchal while enjoying big performance gains. Reviewing the execution plan, note the absence of a sort operator, thanks to the presence of a suitably sorted stream (AKA virtual index).

Figure 3: Indexed hierarchy logic using fnTally

This will not be the final execution plan but it is the plan that you get for each row you modify in dbo.nodeId. The tally table is never modified so data modifications are very quick.

Indexed Hierarchy DDL

With SQL Server indexed views we can’t reference table valued functions nor leverage APPLY. The circumvent these limitations we’ll replace fnTally with a physical tally table, dbo.tally (DDL at the beginning of this article).

;--==== 3. Use the tally table logic to create an indexed view
CREATE VIEW dbo.vwNodes WITH SCHEMABINDING AS
SELECT
  n.ParentId,
  n.NodeId,
  n.NodePath,
  CurrentNode = ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0),
  IsLeafNode  = CASE n.nodePath WHEN SUBSTRING(n.NodePath, 1, t.N-1) THEN 1 ELSE 0 END, -- leaf or ancestor
  NodeDepth   = LEN(ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0)) -
                LEN(REPLACE(ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0),'.',''))+1
FROM        dbo.nodes AS n
CROSS JOIN  dbo.tally AS t
WHERE       t.N BETWEEN 1 AND LEN(n.NodePath)+1 -- A node level Edge N-Gram "tokenization"
  AND       (SUBSTRING(n.NodePath,t.N,1)='.' OR t.N=LEN(n.NodePath)+1);

For all intent and purposes this logic, as is, will produce the same execution plan as the earlier fnTally logic and will perform relatively the same. Let’s super charge our view with some indexes.

--==== 4.1. Unique Clustered index to make it an indexed view
CREATE UNIQUE CLUSTERED INDEX uq_cl_vwNodes__currentNode
  ON dbo.vwNodes(currentNode, NodeId);

--==== 4.2. Nonclustered Indexes
CREATE NONCLUSTERED INDEX nc_vwNodes__1
  ON dbo.vwNodes(NodePath, NodeId, isLeafNode);
CREATE NONCLUSTERED INDEX nc_vwNodes__2
  ON dbo.vwNodes(NodeId, isLeafNode, nodeDepth) INCLUDE (NodePath,ParentId);
CREATE NONCLUSTERED INDEX nc_vwNodes__3
  ON dbo.vwNodes(IsLeafNode, nodeDepth);
CREATE NONCLUSTERED INDEX nc_vwNodes__4
  ON dbo.vwNodes(IsLeafNode, nodeId) INCLUDE (parentId,NodePath,NodeDepth);
CREATE NONCLUSTERED INDEX nc_vwNodes__5
  ON dbo.vwNodes(nodeDepth) INCLUDE(IsLeafNode);

Now we have clustered index (by way of an indexed view) as well as some nonclustered indexes which can be used when traversing our hierarchy.

Basic Retrieval

With our indexed view in place we can return the denormalized hierarchy structure from figure 2 (above) except with our hierarchy is stored nice and neatly in a single clustered index.

--==== 5. Basic SELECT against our indexed view
SELECT   h.*
FROM     dbo.vwNodes AS h WITH (NOEXPAND)
ORDER BY h.currentNode;

I included that NOEXPAND hint, this forces the optimizer to the clustered index or one it’s nonclustered indexes created on our indexed view afterwards. I generally don’t suggest query hints but I’ve had nothing but success with this technique. That said – make sure to do your on testing.

Execution Plan
Figure 4: Basic indexed hierarchy execution plan

It doesn’t get cleaner or more efficient than that.

Basic Hierarchy Analysis

Our indexed hierarchy can be used to join to other tables to perform aggregations and other analysis lets start with few basic aggregations against our indexed view. We’ll keep it simple for this article and run some basic aggregations against our indexed hierarchy without joining to anything else. Note the simplicity of these queries compared to the complexity of what is being returned. These queries below will return:

  1. A count of all descendants for each node with children
  2. A count of members by level
--==== 6.1. Count Descendants for each node with children
SELECT
  HLevel      = ISNULL(h.NodeDepth,0),
  CurrentNode = ISNULL(currentNode,'Total'),
  Descendents = COUNT(*)
FROM     dbo.vwNodes AS h WITH (NOEXPAND)
WHERE    h.isLeafNode = 0
GROUP BY h.NodeDepth, h.currentNode;

--==== 6.2. Count members by level 
SELECT
  HLevel   = ISNULL(n.nodeDepth,'Total'),
  Members  = COUNT(*)
FROM     dbo.vwNodes AS n WITH (NOEXPAND)
GROUP BY n.nodeDepth WITH ROLLUP;
Results
Figure 5: Left to right: results for Queries #1, #2 and #3

Again, virtually readless, no complex queries or intermediate data structures to maintain. Only a GROUP BY clause and you’re off to the races.

Execution Plans
Figure 6: Execution plan when leveraging an Indexed Hierarchy for traversal.

Conclusion

A physical tally table doesn’t just allow you to replace loops in your functions and stored procedures, nor should indexed views be limited to pre-aggregations and such. The topic of tally tables and indexed views is one that deserves much more attention.

Again, to keep things simple I restricted my examples to analysis of our hierarchy without joining to anything else. We’ll expand on this concept further in the next installment of this series. Thanks for reading!

Author: Alan Burstein

SQL Performance Ninja with 20+ years of writing high-performing, elegant SQL code. Expert at developing super fast NLP algorithms

Leave a Reply