Virtual Indexing Part 9: The Indexed Hierarchy

Turbo-charge your SQL hierarchy with a numbers table.

Introduction

Did you know you can add Indexes to your parent-child hierarchy in SQL Server? As tricky as this sounds – it’s actually quite easy with a physical tally (numbers) table and an enumerated sort path (AKA “materialized path”.) The technique in this article will make it possible to easily traverse a DB parent-child hierarchy table with blazingly fast performance.

In Part 3 of this series you were introduced to the Indexed Function. In Part 8 we created an indexed splitter — an indexed view that splits a delimited string once, then stores the “items” in a clustered index for fast retrieval. Building on the indexed split concept, lets create an Indexed Hierarchy, a parent-child hierarchy stored in you SQL database. All upstream and downstream relationships in your hierarchy can be easily traversed without recursion, self-joins, XML or intermediate aggregation tables which need to be maintained. To pull this off we’ll create an indexed view that leverages a tally table and an enumerated sort path.

What I love most about this approach is how it’s not limited to SQL Server, it can be applied to any RDBMS or database system that supports indexed views or similar (which is pretty much everything).

Setup and Sample Data

First for the set-up DDL and sample data. The code below builds (1) a tally (numbers) table, (2) dbo.nodes, a table to store a parent-child relationship — nodeId and parentID, (3) an enumerated path, and (4) sample values for dbo.nodes.

--==== 1.1. Tally table code
IF 1=2 -- remove this if you need a tally table
BEGIN
  IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally;
  CREATE TABLE dbo.tally (N INT NOT NULL);
  
  --==== Add Primary Key and Clustered Index
  ALTER TABLE dbo.tally 
    ADD CONSTRAINT pk_cl_tally PRIMARY KEY CLUSTERED(N) 
      WITH FILLFACTOR=100;
  
  --==== Add a Unique Index; the optimizer picks this, it requires less IO
  ALTER TABLE dbo.tally 
    ADD CONSTRAINT uq_tally UNIQUE NONCLUSTERED(N);
  
  --==== Populate with at least 1 Million Rows
  INSERT dbo.tally(N) SELECT t.N FROM dbo.fnTally(1,1000000) AS t;
END
GO
--==== 1.2. Test Data DDL to build the hierarchy
CREATE TABLE dbo.nodes
(
  ParentId INT NULL,
  NodeId   INT NOT NULL,
  NodePath VARCHAR(200) NOT NULL DEFAULT(''),
  CONSTRAINT pk_nodes__nodeid PRIMARY KEY CLUSTERED(nodeId),
  CONSTRAINT fk_nodes__parentId FOREIGN KEY (parentid) REFERENCES dbo.nodes(nodeId),
);
;--==== 1.3. Add Hierarchy values
INSERT dbo.nodes (ParentId,NodeId)
VALUES (NULL,1),(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(2,9),(2,10),(3,20),(3,25),
  (3,30),(4,100),(4,101),(101,1000),(101,1012),(101,1014),(1014,2000);

Next, a recursive inline table valued function (iTVF) for building our enumerated path, and for maintenance.

;--==== 1.4. Create Function to build the node path; allows for individual updates
CREATE OR ALTER FUNCTION dbo.fnBuildNodePath
(
  @nodeId INT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH BuildNodePath AS 
(
  SELECT n.ParentId, n.NodeId, N=1
  FROM   dbo.Nodes AS n 
  WHERE  n.NodeId = @nodeId
  UNION ALL
  SELECT v.ParentId, v.NodeId, a.N-1
  FROM   dbo.Nodes     AS v
  JOIN   BuildNodePath AS a
    ON a.ParentId = v.NodeId
)
SELECT 
  NodeId   = @NodeId, 
  NodePath = STUFF((
               SELECT   CONCAT('.',p.NodeId)
               FROM     BuildNodePath AS p
               ORDER BY p.N
               FOR XML PATH('')),1,1,'');

dbo.fnBuildNodePath allows us to return the enumerated path for each NodeId. Note the query below and results.

--==== 1.5. Build the enumerated sort paths
SELECT      n.NodeId, b.NodePath
FROM        dbo.nodes                     AS n
CROSS APPLY dbo.fnBuildNodePath(n.NodeId) AS b;
Results
Figure 1: dbo.fnBuildNodePath query results

We can use dbo.fnBuildNodePath to add the enumerated path to dbo.nodes, which is required for our indexed hierarchy. dbo.fnBuildNodePath can also be used to modify records as needed.

The Indexed Hierarchy

Before we create our indexed view lets review the underlying logic. Using dbo.fnTally we could split the enumerated path and return each node and downstream members it’s node tree.

Indexed Hierarchy Logic
;--==== 2. Index parsing logic using fnTally
SELECT
  n.NodeId,
  n.ParentId,
  n.NodePath,
  CurrentNode = Nd.C,
  IsLeafNode  = IIF(n.nodePath=p.Nd,1,0), -- leaf or descendant
  NodeDepth   = LEN(Nd.C) - LEN(REPLACE(Nd.C,'.',''))+1
FROM        dbo.nodes AS n
CROSS APPLY (VALUES(LEN(n.NodePath)+1))             AS np(Ln)
CROSS APPLY dbo.fnTally(1,np.Ln)                    AS t
CROSS APPLY (VALUES(SUBSTRING(n.NodePath,1,t.N-1))) AS p(Nd)
CROSS APPLY (VALUES(ISNULL(p.Nd,0)))                AS Nd(C)
CROSS APPLY (VALUES(IIF(n.nodePath=p.Nd,1,0)))      AS l(LNd)
WHERE       (SUBSTRING(n.NodePath,t.N,1)='.' OR t.N=np.Ln)
--ORDER BY Nd.C, -IIF(n.nodePath=p.Nd,1,0); -- for presentation
The denormalized result set

This query splits returns each node to return a row for each of it’s descendants.

Figure 2: Denormalized subsets of all node trees in dbo.nodes

CurrentNode represents the top each node tree; all records where IsLeafNode = 0 are downstream members of CurrentNode‘s node tree. This design makes it easy to group, filter and analyze hierarchal while enjoying big performance gains. Reviewing the execution plan, note the absence of a sort operator, thanks to the presence of a suitably sorted stream (AKA virtual index).

Figure 3: Indexed hierarchy logic using fnTally

This will not be the final execution plan but it is the plan that you get for each row you modify in dbo.nodeId. The tally table is never modified so data modifications are very quick.

Indexed Hierarchy DDL

With SQL Server indexed views we can’t reference table valued functions nor leverage APPLY. The circumvent these limitations we’ll replace fnTally with a physical tally table, dbo.tally (DDL at the beginning of this article).

;--==== 3. Use the tally table logic to create an indexed view
CREATE VIEW dbo.vwNodes WITH SCHEMABINDING AS
SELECT
  n.ParentId,
  n.NodeId,
  n.NodePath,
  CurrentNode = ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0),
  IsLeafNode  = CASE n.nodePath WHEN SUBSTRING(n.NodePath, 1, t.N-1) THEN 1 ELSE 0 END, -- leaf or ancestor
  NodeDepth   = LEN(ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0)) -
                LEN(REPLACE(ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0),'.',''))+1
FROM        dbo.nodes AS n
CROSS JOIN  dbo.tally AS t
WHERE       t.N BETWEEN 1 AND LEN(n.NodePath)+1 -- A node level Edge N-Gram "tokenization"
  AND       (SUBSTRING(n.NodePath,t.N,1)='.' OR t.N=LEN(n.NodePath)+1);

For all intent and purposes this logic, as is, will produce the same execution plan as the earlier fnTally logic and will perform relatively the same. Let’s super charge our view with some indexes.

--==== 4.1. Unique Clustered index to make it an indexed view
CREATE UNIQUE CLUSTERED INDEX uq_cl_vwNodes__currentNode
  ON dbo.vwNodes(currentNode, NodeId);

--==== 4.2. Nonclustered Indexes
CREATE NONCLUSTERED INDEX nc_vwNodes__1
  ON dbo.vwNodes(NodePath, NodeId, isLeafNode);
CREATE NONCLUSTERED INDEX nc_vwNodes__2
  ON dbo.vwNodes(NodeId, isLeafNode, nodeDepth) INCLUDE (NodePath,ParentId);
CREATE NONCLUSTERED INDEX nc_vwNodes__3
  ON dbo.vwNodes(IsLeafNode, nodeDepth);
CREATE NONCLUSTERED INDEX nc_vwNodes__4
  ON dbo.vwNodes(IsLeafNode, nodeId) INCLUDE (parentId,NodePath,NodeDepth);
CREATE NONCLUSTERED INDEX nc_vwNodes__5
  ON dbo.vwNodes(nodeDepth) INCLUDE(IsLeafNode);

Now we have clustered index (by way of an indexed view) as well as some nonclustered indexes which can be used when traversing our hierarchy.

Basic Retrieval

With our indexed view in place we can return the denormalized hierarchy structure from figure 2 (above) except with our hierarchy is stored nice and neatly in a single clustered index.

--==== 5. Basic SELECT against our indexed view
SELECT   h.*
FROM     dbo.vwNodes AS h WITH (NOEXPAND)
ORDER BY h.currentNode;

I included that NOEXPAND hint, this forces the optimizer to the clustered index or one it’s nonclustered indexes created on our indexed view afterwards. I generally don’t suggest query hints but I’ve had nothing but success with this technique. That said – make sure to do your on testing.

Execution Plan
Figure 4: Basic indexed hierarchy execution plan

It doesn’t get cleaner or more efficient than that.

Basic Hierarchy Analysis

Our indexed hierarchy can be used to join to other tables to perform aggregations and other analysis lets start with few basic aggregations against our indexed view. We’ll keep it simple for this article and run some basic aggregations against our indexed hierarchy without joining to anything else. Note the simplicity of these queries compared to the complexity of what is being returned. These queries below will return:

  1. A count of all descendants for each node with children
  2. A count of members by level
--==== 6.1. Count Descendants for each node with children
SELECT
  HLevel      = ISNULL(h.NodeDepth,0),
  CurrentNode = ISNULL(currentNode,'Total'),
  Descendents = COUNT(*)
FROM     dbo.vwNodes AS h WITH (NOEXPAND)
WHERE    h.isLeafNode = 0
GROUP BY h.NodeDepth, h.currentNode;

--==== 6.2. Count members by level 
SELECT
  HLevel   = ISNULL(n.nodeDepth,'Total'),
  Members  = COUNT(*)
FROM     dbo.vwNodes AS n WITH (NOEXPAND)
GROUP BY n.nodeDepth WITH ROLLUP;
Results
Figure 5: Left to right: results for Queries #1, #2 and #3

Again, virtually readless, no complex queries or intermediate data structures to maintain. Only a GROUP BY clause and you’re off to the races.

Execution Plans
Figure 6: Execution plan when leveraging an Indexed Hierarchy for traversal.

Conclusion

A physical tally table doesn’t just allow you to replace loops in your functions and stored procedures, nor should indexed views be limited to pre-aggregations and such. The topic of tally tables and indexed views is one that deserves much more attention.

Again, to keep things simple I restricted my examples to analysis of our hierarchy without joining to anything else. We’ll expand on this concept further in the next installment of this series. Thanks for reading!

Virtual Indexing Part 8: The Indexed Splitter Function

The fastest way to split a string is to have already done so.

Introduction

Happy 2021! Indexed views and tally tables – a virtually unexplored topic, one I will write about at length this year. This will be a quick read and there is likely to be at least a couple typos; time is short so today is about quality, not quantity. In Part 3: The Indexed Function I introduced you to I said “the fastest way to perform a calculation is to have already done so;” in this installment I will demonstrate that the fastest way to split a string is also to have already done so.

When I introduced the Indexed function I used prime numbers as my example. 7 is always a prime number, Factorial(5) always equals 120. Creating an Indexed prime numbers, factorial or any function where a function always returns the same output for a given argument. What if we wanted a more complex indexed function, such as one that splits a string?

I know about database normalization but I am not responsible for designing most of the databases I work with. If you’ve been in this racket for a couple fortnights you have run across that vender DB written by great application developers with terrible back-end databases. It’s not uncommon to see fields that contain a delimited list – ‘1,4,13,99’ instead of a single atomic value. You can write an ETL package that leverages a good string splitter (such as DelimitedSplit8K or DelimitedSplit8k_Lead) to regularly split and copy the values into a correctly normalized table. In this article I will show you a much faster, maintenance-free alternative which leverages a Numbers Table and an indexed view.

Introducing The “Indexed Splitter”

For this you will need a tally table with at least as many rows as the longest string you intend to split. 8,000 is enough unless you need to split string longer than 8K. My tally table (dbo.tally) has 1,000,000 rows. First the sample data.

Sample data

CREATE TABLE dbo.Test (ID INT, Column1 VARCHAR(20));
INSERT INTO  dbo.Test (ID,Column1) 
  VALUES (1,'ORM;PS;SUP'),(2,'ABC;XYZ;999;123');

Next we’re going to borrow Jeff Moden’s DelimitedSplit8K logic to create a set-based splitter that leverages permanent tally table (e.g. an regular ol properly indexed numbers table named dbo.tally. There’s a small performance hit compared to delimitedSplit8K but the upside makes it more than worth it, as you will see in a moment. Here’s our splitter as a view

The Splitter as a View

CREATE OR ALTER VIEW dbo.TestSplit WITH SCHEMABINDING AS 
SELECT 
  Id   = t.ID, 
  item = 
    SUBSTRING
    (
      t.Column1,
      tt.N+SIGN(tt.N-1),
      ISNULL(NULLIF((CHARINDEX(';',t.Column1,tt.N+1)),0),LEN(t.Column1)+1)-(tt.N)-SIGN(tt.N-1)
    ),
  ItemIndex = tt.N+1
FROM       dbo.Test  AS t
CROSS JOIN dbo.tally AS tt
WHERE      tt.N <= LEN(t.Column1)
AND        (tt.N = 1 OR SUBSTRING(t.column1,tt.N,1) = ';');
GO

The values are split and returned with their associated ID. This logic is going to be applied to an index view which means we can’t include parameters; the delimiter and source of the string need to be embedded in our logic. Above we are splitting the contents of dbo.test.Column1 using a semicolon as the delimiter.

SELECT f.Id, F.Item FROM dbo.TestSplit AS f;

Below is exactly what we expected.

Function Output

Result Set

Next the execution plan.

Splitter View Execution Plan

That’s a good plan: set-based, linear and fast. There are faster splitters out there but none that work without the APPLY table operator. APPLY is not allowed in Indexed Views. Subqueries either. My splitter logic doesn’t use APPLY , subqueries or any other logic that prevents me from adding an index to my view so let’s see if we can add a unique clustered index and turn this view into an indexed Split where the split happens ahead of time and only once (unless the value is modified or deleted).

CREATE UNIQUE CLUSTERED INDEX uq_cl__testSplit ON dbo.TestSplit(Id,Item)

No error. Let’s run the same SELECT query from earlier and check the execution plan.

The Indexed Split Execution plan

It doesn’t get cleaner or faster than that. This is a powerful example of the what you can accomplish when using a physical numbers table to build an Indexed view. This barely scratches the surface as I hope to demonstrate in the near future.

Make my index a double

In 2020 “make it a double” became:

2o2o got me like ^^^

I digress.

Now that your indexed function is in place you can expand it’s power by adding more indexes! I’ll have my clustered index with a nonclustered chaser. Here we’ll create an index for item.

CREATE NONCLUSTERED INDEX nc__testSplit__item ON dbo.TestSplit(Item);

Now a query that filters on item.

SELECT f.* 
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.item <> '999,123';

To my chagrin, the optimizer often ignores nonclustered indexes on indexed views which is the case here. This is why I’m using NOEXPAND. Nonetheless, the execution plan is even better than before: an indexed seek against a narrower and lighter nonclustered index.

Execution plan leveraging a second, nonclustered index

The Indexed Splitter Function

Let’s add our RowNumber column and wrap the guy in an iTVF for a Nasty fast index function that splits a string ahead of time and maintains itself as needed! We’ll include the option to exclude one or more values using an @exclude parameter.

The Function

CREATE FUNCTION dbo.itvf_TestSplit
(
  @exclude VARCHAR(20)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT f.ID, f.Item, f.ItemIndex
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.Item <> @exclude;
GO

Let’s give’r a spin and check the execution plan.

Execution Plan

This image has an empty alt attribute; its file name is image-7.png

Same as before but now we are accepting parameters which means we now have an indexed function that can split strings ahead of time. Not even STRING_SPLIT or a CLR can compete here.

What about an “Item Number”?

Sometimes you need the item’s ordinal in the string; for this we need an item number (labeled as “ItemNumber” in my function logic. To return an item number (without a sort in the execution plan) some assembly will be required. DelimitedSplit8K and DelimitedSplit8K_LEAD both use ROW_NUMBER() to produce the ItemNumber column. ROW_NUMBER is not allowed in indexed views which is why I included the “ItemIndex” . It represents the item’s position in the string, something I normally don’t need. Here it’s perfect however – it allows me to add an ItemNumber while still avoiding a sort in my final plan. Not a big deal with strings 8K or less, but it is for longer strings (something an indexed function can handle at no extra cost.) First the new index:

EATE NONCLUSTERED INDEX nc_poc__testSplit__ItemIndex ON dbo.TestSplit(Id, ItemIndex);

With that index in place let’s run a query which includes the item number.

SELECT 
  Id= split.Id,
  ItemNumber = ROW_NUMBER() OVER (PARTITION BY split.ID ORDER BY split.ItemIndex),
  split.Item
FROM dbo.itvf_TestSplit('999,123') AS split;

Output

Result set that includes ItemNumber

Execution plan

easy peasy lemon-squeezy

Still a fantastic execution plan the same low cost as before with more complexity. Again, the calculations happened already, this function is leveraging one or more indexes to retrieve, not compute, the required value.

Conclusion

The Indexed Split is another game changer yet barely scratches the surface of the power of the Virtual Index and the Indexed Function. When you add a physical numbers tally to you Indexed views they can do more than pre-join or pre-aggregate values. Here we introduced the world’s first “indexed split” and we’re still just getting started! Tomorrow I’m going to introduce the Indexed Hierarchy. Thanks for reading!