Virtual Indexing Part 9: The Indexed Hierarchy

Turbo-charge your SQL hierarchy with a numbers table.

Introduction

Did you know you can add Indexes to your parent-child hierarchy in SQL Server? As tricky as this sounds – it’s actually quite easy with a physical tally (numbers) table and an enumerated sort path (AKA “materialized path”.) The technique in this article will make it possible to easily traverse a DB parent-child hierarchy table with blazingly fast performance.

In Part 3 of this series you were introduced to the Indexed Function. In Part 8 we created an indexed splitter — an indexed view that splits a delimited string once, then stores the “items” in a clustered index for fast retrieval. Building on the indexed split concept, lets create an Indexed Hierarchy, a parent-child hierarchy stored in you SQL database. All upstream and downstream relationships in your hierarchy can be easily traversed without recursion, self-joins, XML or intermediate aggregation tables which need to be maintained. To pull this off we’ll create an indexed view that leverages a tally table and an enumerated sort path.

What I love most about this approach is how it’s not limited to SQL Server, it can be applied to any RDBMS or database system that supports indexed views or similar (which is pretty much everything).

Setup and Sample Data

First for the set-up DDL and sample data. The code below builds (1) a tally (numbers) table, (2) dbo.nodes, a table to store a parent-child relationship — nodeId and parentID, (3) an enumerated path, and (4) sample values for dbo.nodes.

--==== 1.1. Tally table code
IF 1=2 -- remove this if you need a tally table
BEGIN
  IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally;
  CREATE TABLE dbo.tally (N INT NOT NULL);
  
  --==== Add Primary Key and Clustered Index
  ALTER TABLE dbo.tally 
    ADD CONSTRAINT pk_cl_tally PRIMARY KEY CLUSTERED(N) 
      WITH FILLFACTOR=100;
  
  --==== Add a Unique Index; the optimizer picks this, it requires less IO
  ALTER TABLE dbo.tally 
    ADD CONSTRAINT uq_tally UNIQUE NONCLUSTERED(N);
  
  --==== Populate with at least 1 Million Rows
  INSERT dbo.tally(N) SELECT t.N FROM dbo.fnTally(1,1000000) AS t;
END
GO
--==== 1.2. Test Data DDL to build the hierarchy
CREATE TABLE dbo.nodes
(
  ParentId INT NULL,
  NodeId   INT NOT NULL,
  NodePath VARCHAR(200) NOT NULL DEFAULT(''),
  CONSTRAINT pk_nodes__nodeid PRIMARY KEY CLUSTERED(nodeId),
  CONSTRAINT fk_nodes__parentId FOREIGN KEY (parentid) REFERENCES dbo.nodes(nodeId),
);
;--==== 1.3. Add Hierarchy values
INSERT dbo.nodes (ParentId,NodeId)
VALUES (NULL,1),(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(2,9),(2,10),(3,20),(3,25),
  (3,30),(4,100),(4,101),(101,1000),(101,1012),(101,1014),(1014,2000);

Next, a recursive inline table valued function (iTVF) for building our enumerated path, and for maintenance.

;--==== 1.4. Create Function to build the node path; allows for individual updates
CREATE OR ALTER FUNCTION dbo.fnBuildNodePath
(
  @nodeId INT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH BuildNodePath AS 
(
  SELECT n.ParentId, n.NodeId, N=1
  FROM   dbo.Nodes AS n 
  WHERE  n.NodeId = @nodeId
  UNION ALL
  SELECT v.ParentId, v.NodeId, a.N-1
  FROM   dbo.Nodes     AS v
  JOIN   BuildNodePath AS a
    ON a.ParentId = v.NodeId
)
SELECT 
  NodeId   = @NodeId, 
  NodePath = STUFF((
               SELECT   CONCAT('.',p.NodeId)
               FROM     BuildNodePath AS p
               ORDER BY p.N
               FOR XML PATH('')),1,1,'');

dbo.fnBuildNodePath allows us to return the enumerated path for each NodeId. Note the query below and results.

--==== 1.5. Build the enumerated sort paths
SELECT      n.NodeId, b.NodePath
FROM        dbo.nodes                     AS n
CROSS APPLY dbo.fnBuildNodePath(n.NodeId) AS b;
Results
Figure 1: dbo.fnBuildNodePath query results

We can use dbo.fnBuildNodePath to add the enumerated path to dbo.nodes, which is required for our indexed hierarchy. dbo.fnBuildNodePath can also be used to modify records as needed.

The Indexed Hierarchy

Before we create our indexed view lets review the underlying logic. Using dbo.fnTally we could split the enumerated path and return each node and downstream members it’s node tree.

Indexed Hierarchy Logic
;--==== 2. Index parsing logic using fnTally
SELECT
  n.NodeId,
  n.ParentId,
  n.NodePath,
  CurrentNode = Nd.C,
  IsLeafNode  = IIF(n.nodePath=p.Nd,1,0), -- leaf or descendant
  NodeDepth   = LEN(Nd.C) - LEN(REPLACE(Nd.C,'.',''))+1
FROM        dbo.nodes AS n
CROSS APPLY (VALUES(LEN(n.NodePath)+1))             AS np(Ln)
CROSS APPLY dbo.fnTally(1,np.Ln)                    AS t
CROSS APPLY (VALUES(SUBSTRING(n.NodePath,1,t.N-1))) AS p(Nd)
CROSS APPLY (VALUES(ISNULL(p.Nd,0)))                AS Nd(C)
CROSS APPLY (VALUES(IIF(n.nodePath=p.Nd,1,0)))      AS l(LNd)
WHERE       (SUBSTRING(n.NodePath,t.N,1)='.' OR t.N=np.Ln)
--ORDER BY Nd.C, -IIF(n.nodePath=p.Nd,1,0); -- for presentation
The denormalized result set

This query splits returns each node to return a row for each of it’s descendants.

Figure 2: Denormalized subsets of all node trees in dbo.nodes

CurrentNode represents the top each node tree; all records where IsLeafNode = 0 are downstream members of CurrentNode‘s node tree. This design makes it easy to group, filter and analyze hierarchal while enjoying big performance gains. Reviewing the execution plan, note the absence of a sort operator, thanks to the presence of a suitably sorted stream (AKA virtual index).

Figure 3: Indexed hierarchy logic using fnTally

This will not be the final execution plan but it is the plan that you get for each row you modify in dbo.nodeId. The tally table is never modified so data modifications are very quick.

Indexed Hierarchy DDL

With SQL Server indexed views we can’t reference table valued functions nor leverage APPLY. The circumvent these limitations we’ll replace fnTally with a physical tally table, dbo.tally (DDL at the beginning of this article).

;--==== 3. Use the tally table logic to create an indexed view
CREATE VIEW dbo.vwNodes WITH SCHEMABINDING AS
SELECT
  n.ParentId,
  n.NodeId,
  n.NodePath,
  CurrentNode = ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0),
  IsLeafNode  = CASE n.nodePath WHEN SUBSTRING(n.NodePath, 1, t.N-1) THEN 1 ELSE 0 END, -- leaf or ancestor
  NodeDepth   = LEN(ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0)) -
                LEN(REPLACE(ISNULL(SUBSTRING(n.NodePath, 1, t.N-1), 0),'.',''))+1
FROM        dbo.nodes AS n
CROSS JOIN  dbo.tally AS t
WHERE       t.N BETWEEN 1 AND LEN(n.NodePath)+1 -- A node level Edge N-Gram "tokenization"
  AND       (SUBSTRING(n.NodePath,t.N,1)='.' OR t.N=LEN(n.NodePath)+1);

For all intent and purposes this logic, as is, will produce the same execution plan as the earlier fnTally logic and will perform relatively the same. Let’s super charge our view with some indexes.

--==== 4.1. Unique Clustered index to make it an indexed view
CREATE UNIQUE CLUSTERED INDEX uq_cl_vwNodes__currentNode
  ON dbo.vwNodes(currentNode, NodeId);

--==== 4.2. Nonclustered Indexes
CREATE NONCLUSTERED INDEX nc_vwNodes__1
  ON dbo.vwNodes(NodePath, NodeId, isLeafNode);
CREATE NONCLUSTERED INDEX nc_vwNodes__2
  ON dbo.vwNodes(NodeId, isLeafNode, nodeDepth) INCLUDE (NodePath,ParentId);
CREATE NONCLUSTERED INDEX nc_vwNodes__3
  ON dbo.vwNodes(IsLeafNode, nodeDepth);
CREATE NONCLUSTERED INDEX nc_vwNodes__4
  ON dbo.vwNodes(IsLeafNode, nodeId) INCLUDE (parentId,NodePath,NodeDepth);
CREATE NONCLUSTERED INDEX nc_vwNodes__5
  ON dbo.vwNodes(nodeDepth) INCLUDE(IsLeafNode);

Now we have clustered index (by way of an indexed view) as well as some nonclustered indexes which can be used when traversing our hierarchy.

Basic Retrieval

With our indexed view in place we can return the denormalized hierarchy structure from figure 2 (above) except with our hierarchy is stored nice and neatly in a single clustered index.

--==== 5. Basic SELECT against our indexed view
SELECT   h.*
FROM     dbo.vwNodes AS h WITH (NOEXPAND)
ORDER BY h.currentNode;

I included that NOEXPAND hint, this forces the optimizer to the clustered index or one it’s nonclustered indexes created on our indexed view afterwards. I generally don’t suggest query hints but I’ve had nothing but success with this technique. That said – make sure to do your on testing.

Execution Plan
Figure 4: Basic indexed hierarchy execution plan

It doesn’t get cleaner or more efficient than that.

Basic Hierarchy Analysis

Our indexed hierarchy can be used to join to other tables to perform aggregations and other analysis lets start with few basic aggregations against our indexed view. We’ll keep it simple for this article and run some basic aggregations against our indexed hierarchy without joining to anything else. Note the simplicity of these queries compared to the complexity of what is being returned. These queries below will return:

  1. A count of all descendants for each node with children
  2. A count of members by level
--==== 6.1. Count Descendants for each node with children
SELECT
  HLevel      = ISNULL(h.NodeDepth,0),
  CurrentNode = ISNULL(currentNode,'Total'),
  Descendents = COUNT(*)
FROM     dbo.vwNodes AS h WITH (NOEXPAND)
WHERE    h.isLeafNode = 0
GROUP BY h.NodeDepth, h.currentNode;

--==== 6.2. Count members by level 
SELECT
  HLevel   = ISNULL(n.nodeDepth,'Total'),
  Members  = COUNT(*)
FROM     dbo.vwNodes AS n WITH (NOEXPAND)
GROUP BY n.nodeDepth WITH ROLLUP;
Results
Figure 5: Left to right: results for Queries #1, #2 and #3

Again, virtually readless, no complex queries or intermediate data structures to maintain. Only a GROUP BY clause and you’re off to the races.

Execution Plans
Figure 6: Execution plan when leveraging an Indexed Hierarchy for traversal.

Conclusion

A physical tally table doesn’t just allow you to replace loops in your functions and stored procedures, nor should indexed views be limited to pre-aggregations and such. The topic of tally tables and indexed views is one that deserves much more attention.

Again, to keep things simple I restricted my examples to analysis of our hierarchy without joining to anything else. We’ll expand on this concept further in the next installment of this series. Thanks for reading!

Virtual Indexing Part 8: The Indexed Splitter Function

The fastest way to split a string is to have already done so.

Introduction

Happy 2021! Indexed views and tally tables – a virtually unexplored topic, one I will write about at length this year. This will be a quick read and there is likely to be at least a couple typos; time is short so today is about quality, not quantity. In Part 3: The Indexed Function I introduced you to I said “the fastest way to perform a calculation is to have already done so;” in this installment I will demonstrate that the fastest way to split a string is also to have already done so.

When I introduced the Indexed function I used prime numbers as my example. 7 is always a prime number, Factorial(5) always equals 120. Creating an Indexed prime numbers, factorial or any function where a function always returns the same output for a given argument. What if we wanted a more complex indexed function, such as one that splits a string?

I know about database normalization but I am not responsible for designing most of the databases I work with. If you’ve been in this racket for a couple fortnights you have run across that vender DB written by great application developers with terrible back-end databases. It’s not uncommon to see fields that contain a delimited list – ‘1,4,13,99’ instead of a single atomic value. You can write an ETL package that leverages a good string splitter (such as DelimitedSplit8K or DelimitedSplit8k_Lead) to regularly split and copy the values into a correctly normalized table. In this article I will show you a much faster, maintenance-free alternative which leverages a Numbers Table and an indexed view.

Introducing The “Indexed Splitter”

For this you will need a tally table with at least as many rows as the longest string you intend to split. 8,000 is enough unless you need to split string longer than 8K. My tally table (dbo.tally) has 1,000,000 rows. First the sample data.

Sample data

CREATE TABLE dbo.Test (ID INT, Column1 VARCHAR(20));
INSERT INTO  dbo.Test (ID,Column1) 
  VALUES (1,'ORM;PS;SUP'),(2,'ABC;XYZ;999;123');

Next we’re going to borrow Jeff Moden’s DelimitedSplit8K logic to create a set-based splitter that leverages permanent tally table (e.g. an regular ol properly indexed numbers table named dbo.tally. There’s a small performance hit compared to delimitedSplit8K but the upside makes it more than worth it, as you will see in a moment. Here’s our splitter as a view

The Splitter as a View

CREATE OR ALTER VIEW dbo.TestSplit WITH SCHEMABINDING AS 
SELECT 
  Id   = t.ID, 
  item = 
    SUBSTRING
    (
      t.Column1,
      tt.N+SIGN(tt.N-1),
      ISNULL(NULLIF((CHARINDEX(';',t.Column1,tt.N+1)),0),LEN(t.Column1)+1)-(tt.N)-SIGN(tt.N-1)
    ),
  ItemIndex = tt.N+1
FROM       dbo.Test  AS t
CROSS JOIN dbo.tally AS tt
WHERE      tt.N <= LEN(t.Column1)
AND        (tt.N = 1 OR SUBSTRING(t.column1,tt.N,1) = ';');
GO

The values are split and returned with their associated ID. This logic is going to be applied to an index view which means we can’t include parameters; the delimiter and source of the string need to be embedded in our logic. Above we are splitting the contents of dbo.test.Column1 using a semicolon as the delimiter.

SELECT f.Id, F.Item FROM dbo.TestSplit AS f;

Below is exactly what we expected.

Function Output

Result Set

Next the execution plan.

Splitter View Execution Plan

That’s a good plan: set-based, linear and fast. There are faster splitters out there but none that work without the APPLY table operator. APPLY is not allowed in Indexed Views. Subqueries either. My splitter logic doesn’t use APPLY , subqueries or any other logic that prevents me from adding an index to my view so let’s see if we can add a unique clustered index and turn this view into an indexed Split where the split happens ahead of time and only once (unless the value is modified or deleted).

CREATE UNIQUE CLUSTERED INDEX uq_cl__testSplit ON dbo.TestSplit(Id,Item)

No error. Let’s run the same SELECT query from earlier and check the execution plan.

The Indexed Split Execution plan

It doesn’t get cleaner or faster than that. This is a powerful example of the what you can accomplish when using a physical numbers table to build an Indexed view. This barely scratches the surface as I hope to demonstrate in the near future.

Make my index a double

In 2020 “make it a double” became:

2o2o got me like ^^^

I digress.

Now that your indexed function is in place you can expand it’s power by adding more indexes! I’ll have my clustered index with a nonclustered chaser. Here we’ll create an index for item.

CREATE NONCLUSTERED INDEX nc__testSplit__item ON dbo.TestSplit(Item);

Now a query that filters on item.

SELECT f.* 
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.item <> '999,123';

To my chagrin, the optimizer often ignores nonclustered indexes on indexed views which is the case here. This is why I’m using NOEXPAND. Nonetheless, the execution plan is even better than before: an indexed seek against a narrower and lighter nonclustered index.

Execution plan leveraging a second, nonclustered index

The Indexed Splitter Function

Let’s add our RowNumber column and wrap the guy in an iTVF for a Nasty fast index function that splits a string ahead of time and maintains itself as needed! We’ll include the option to exclude one or more values using an @exclude parameter.

The Function

CREATE FUNCTION dbo.itvf_TestSplit
(
  @exclude VARCHAR(20)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT f.ID, f.Item, f.ItemIndex
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.Item <> @exclude;
GO

Let’s give’r a spin and check the execution plan.

Execution Plan

This image has an empty alt attribute; its file name is image-7.png

Same as before but now we are accepting parameters which means we now have an indexed function that can split strings ahead of time. Not even STRING_SPLIT or a CLR can compete here.

What about an “Item Number”?

Sometimes you need the item’s ordinal in the string; for this we need an item number (labeled as “ItemNumber” in my function logic. To return an item number (without a sort in the execution plan) some assembly will be required. DelimitedSplit8K and DelimitedSplit8K_LEAD both use ROW_NUMBER() to produce the ItemNumber column. ROW_NUMBER is not allowed in indexed views which is why I included the “ItemIndex” . It represents the item’s position in the string, something I normally don’t need. Here it’s perfect however – it allows me to add an ItemNumber while still avoiding a sort in my final plan. Not a big deal with strings 8K or less, but it is for longer strings (something an indexed function can handle at no extra cost.) First the new index:

EATE NONCLUSTERED INDEX nc_poc__testSplit__ItemIndex ON dbo.TestSplit(Id, ItemIndex);

With that index in place let’s run a query which includes the item number.

SELECT 
  Id= split.Id,
  ItemNumber = ROW_NUMBER() OVER (PARTITION BY split.ID ORDER BY split.ItemIndex),
  split.Item
FROM dbo.itvf_TestSplit('999,123') AS split;

Output

Result set that includes ItemNumber

Execution plan

easy peasy lemon-squeezy

Still a fantastic execution plan the same low cost as before with more complexity. Again, the calculations happened already, this function is leveraging one or more indexes to retrieve, not compute, the required value.

Conclusion

The Indexed Split is another game changer yet barely scratches the surface of the power of the Virtual Index and the Indexed Function. When you add a physical numbers tally to you Indexed views they can do more than pre-join or pre-aggregate values. Here we introduced the world’s first “indexed split” and we’re still just getting started! Tomorrow I’m going to introduce the Indexed Hierarchy. Thanks for reading!

Virtual Indexing Part 7: Finite Opposite Numbers

Virtual Indexing continues…How do we to avoid sorting in either direction (ascending and descending)?

Introduction

This article is special for a few reasons, the first reason being that it’s about an idea that I came up with in Second grade and have built on since. This is arguably my favorite topic with respect to Virtual Indexing

My “Opposite Numbers”

First for a few subtraction problems solved the way we did it in the 80’s:

Figure 1.1. Subtraction Series #1

For me, these problems were very easy, each problem only requires one or two simple calculations. For the first problem (13-1) just subtract 1 from 3. For the next weI subtract 4 from 5, then 1 from 2 to get 11. I had subtraction down pat when subtracting smaller numbers from larger ones.  This next series would be more of a struggle:

Figure 1.2. Subtraction Series #2

The first calculations for each problem involve subtracting a one-digit number from a two-digit number. For 37-19 the first step is to 9 from 17; for #2, the first step is to subtract 6 from 13; for the third it’s 5 from 14. For whatever reason I struggled with this so I came up with opposite numbers, not the ones you know but my own. Though I would not describe it this way then, the first step was to assign a relation between the numbers 1 through 9(N) and it’s “opposite”; it was a little auxiliary table in my brain. The opposite of 1 is 9 (O(1)=9) ,the opposite of 2 is 8 (O(2)=8), 3 is 7, 4 is 6, and 5 is 5. Note the table below:

Figure 2. Set of Numbers(N) and it’s “Opposite”

Looking back at the problems in Figure 1.2 it was easier for me to first subtract 7 from 9 to get 2. The opposite of 2 is 8, then, each time I invoke my mental “opposite number” function I subtract 1 from the top number.  This was easier to me than remembering that 17-9=8. For the second problem (13-6) I calculate 6-3, then the opposite, which is 7. This can be expressed at O(6 – 3) = 7;

This made it possible for me to do subtraction much faster and is a mental technique I use to this day. 

Their Opposite Numbers

In 5th grade Pre Algebra we learned about Opposite Numbers (also known as additive inverse numbers.) Here the opposite of 5 is -5, the opposite of -20 is 20. Though okay with the concept, I was upset that someone took “Opposite Numbers” from me. Especially for something so trivial. In my head there were two types of opposite numbers: Mine and Theirs. 

Their system of “opposite numbers” was important for understanding equations such as: (-(-9))-(5-(+5))-(-1). Then later for equations like: (-(-@A))-(@B-(+@B))-(-@C).  In the world of relational algebra, expressed using T-SQL we have: 

DECLARE @A INT = 9, @B INT = 5, @C INT = 1;
SELECT (-(-@A))-(@B-(+@B))-(-@C); 

Countable and Uncountable Infinity

This article could survive without this section but it’s relevant enough, and such a cool topic that I decided to slip it in. Did you know that there are different kinds of Infinity? For the purpose of this article let’s look at Countable Infinity. A set of Integers are countably infinite when “its elements can be put in one-to-one correspondence with the set of natural numbers. In other words, one can count off all elements in the set in such a way that, even though the counting will take forever.” 

For example, this series in countably infinite because you can always add another number(1):
  1,2,3,4,5,6,7

This sequence is as well:
  0, -3, -6, -9, -13

Uncountable infinity looks more like:
  0.1, 0.2, 0.3… 0.9, 0.11, 0.12…. 0.99999991, 0.99999992

Uncountable infinity demonstrates that there is an infinite amount of numbers between 0 and 1. I only included this so you can better understand what Countable Infinity is. Moving on…

Introducing Finite Opposite Numbers

My opposite numbers” and “their opposite numbers” works in my head but, to pass the Idea I’m about to cover to others I had to come up with better nomenclature. What do my opposite numbers have in common with their opposite numbers? Each number in the set has an “opposite”, a value which is equal distance away from an arbitrary center point, AKA median. I say arbitrary because: who made 0 the official median value? Any number can represent a median right? 0 works for traditional opposite numbers because the set is infinite. Without an upper-bound or lower-bound values ; the numbers are countably infinite in two directions beginning with 0. 

Figure 3.1. Infinite Sequence traveling away from 0

This infinite set of numbers with zero as the median is what I have coined, Infinite Opposite Numbers.  When we are dealing with a finite set with both an upper and lower bound, I came up with Finite Opposite Numbers. Let’s take the numbers 1 through 11. Here the median is 6; O(5) = 7, O(4) = 8, O(3) = 9, etc. 

Figure X. Infinite Sequence traveling away from 6

Assigning upper and lower bounds moves the median value.

Figure 3.2. How the median shifts

In the example above we have an odd number of rows, what happens when the rowcount is even? Consider the difference between 1-7 and 1-6:

Figure 3.3. Infinite Sequence traveling away from 0

With the first example, 4 is the median, what is the median for the numbers 1-6?  “3.5” is not a valid integer.  For even numbers we’ll consider the two numbers closest to the middle (3.5 is in this example): the median would be the tuple containing the numbers 3 & 4 as a tuple in this example. To handle both odd and even sets I came up with this simple formula to calculate the opposite number: H+LN

The Finite Opposites Function

This may seem unnecessary to create a function for this calculation but I use it so that others know what I’m doing, especially considering that this is a new concept. As a T-SQL function we have:

Figure 4: Finite Opposite Number Logic

CREATE FUNCTION [dbo].[O] (@L INT,@H INT,@N INT)
RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT Op = @L+@H-@N;

Here’s the entire function with comments.

Basic Use

In each of these examples the function accepts the parameters L, H, and N then builds a set of numbers from L to H. Returning the finite opposite for each N in the set. 

Figure 5. Basic usage examples

DECLARE @L INT = 1, @H INT = 5;

--==== 1. Run dbo.fnTally
SELECT 
  N  = f.N,
  OP = N.Op
FROM        dbo.fnTally(@L,@H) AS f
CROSS APPLY o(@L,@H,f.N) AS N;

--==== 2. Set new values for @L and @H
SELECT @L = -2, @H = 2;

--==== Run dbo.GetNums
SELECT 
  N  = f.N,
  OP = N.Op
FROM        dbo.GetNums(-2,2) AS f
CROSS APPLY o(@L,@H,f.N) AS N;

Figure 6. Basic example results

Understanding the median tuple (1 or 2 members)

Below is an example of how the median value(s) are assigned.

Figure 7.1. Calculating the median

--==== 1. Even row count
DECLARE @L INT = 1, @H INT = 6;

SELECT 
  N  = f.N,  -- N
  OP = N.Op, -- O(N)
  DM = ABS(f.N-N.OP)/2 -- Distance from the median
FROM        dbo.fnTally(@L,@H) AS f
CROSS APPLY o(@L,@H,f.N) AS N;
GO
--==== 1. Even row count
DECLARE @L INT = 1, @H INT = 7;

SELECT 
  N  = f.N,  -- N
  OP = N.Op, -- O(N)
  DM = ABS(f.N-N.OP)/2 -- Distance from the median
FROM        dbo.fnTally(@L,@H) AS f
CROSS APPLY o(@L,@H,f.N) AS N;

7.2. Odd/Even row counts 

Quick Review

We have now defined types of “Opposite Numbers”:

  1. Infinite opposite numbers (no upper or lower boundaries, 0 is the median)
  2. Finite opposite numbers (both the upper and lower boundaries are defined, the median is calculated at ABS(N-OP)/2

Examples

There are a couple examples of finite opposites here. Below are a couple basic examples. 

Example #1: A Home-Grown REVERSE Function

Let’s start with a fun string problem and create our own REVERSE function. We can start by tokenizing the string into unigrams. I’ll use the old FOR XML PATH trick and using STRING_AGG for those on 2017+. 

Figure 8. Home-grown REVERSE examples

--==== 1. Basic REVERSE Function
DECLARE 
  @String     VARCHAR(8000) = 'ABC-123-XYZ',
  @pattern    VARCHAR(100)  = '%[A-Z]%';

--== 1.1. Using STRING_AGG (SQL 2017+)
SELECT
  Original = @String,
  Reversed = STRING_AGG(SUBSTRING(@string,N.Op,1),'') WITHIN GROUP(ORDER BY f.N)
FROM        dbo.fnTally(1,LEN(@String)) AS f
CROSS APPLY dbo.o(1,LEN(@String),f.N)   AS N;

--== 1.2. Using FOR XML PATH (2005 to 2014)
SELECT
  Original = @String,  
  Reversed = f.String
FROM
(
  SELECT      SUBSTRING(@string,N.Op,1)
  FROM        dbo.fnTally(1,LEN(@String)) AS f
  CROSS APPLY dbo.o(1,LEN(@String),f.N)   AS N
  ORDER BY f.N
  FOR XML PATH('')
) AS f(String);

Each of these works as expected. The performance can’t match the T-SQL REVERSE, this is an example of how to return data in either direction leveraging the Virtual Index to avoid a sort. Now let’s say you need a function to return the first N number of characters that match a pattern and need the option to return the string in a user-defined direction (forward or backward). 

Figure 9. Custom REVERSE Function Examples

--==== 2. Get first/Last @matches number of characters that match @pattern
DECLARE 
  @String     VARCHAR(8000) = 'ABC-123-XYZ',
  @pattern    VARCHAR(100)  = '%[A-Z]%',
  @matches    BIGINT        = 4,
  @descending BIT           = 0; 

SELECT STRING_AGG(f.chars,'')
FROM 
(
  SELECT TOP(@matches) chars = a.Txt
  FROM        dbo.fnTally(1,LEN(@String)) AS f
  CROSS APPLY dbo.o(1,LEN(@String),f.N)   AS N
  CROSS APPLY (VALUES(IIF(@descending=0,
             SUBSTRING(@string,f.N,1),
             SUBSTRING(@string,N.Op,1)))) AS a(Txt) 
  WHERE       a.Txt LIKE @pattern 
) AS f;

Example #2: Sortless Parallel Apply 

I’ll wrap up with this as it’s an excellent example of finite opposites in use. In Itizik Ben-Gan’s Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions Ben Gan demonstrates the “Parallel APPLY” trick, a way to improve the parallel plan produced by the optimizer. The code below is the setup code from the book (I misplace mine so I can’t correctly cite this, sorry.)

Figure 10. Parallel APPLY setup code (from the book above)

BEGIN
  IF OBJECT_ID('dbo.Transactions','U') IS NOT NULL DROP TABLE dbo.Transactions;
  IF OBJECT_ID('dbo.Accounts','U')     IS NOT NULL DROP TABLE dbo.Accounts;
  CREATE TABLE dbo.Accounts
  (
    actid   INT         NOT NULL,
    actname VARCHAR(50) NOT NULL,
    CONSTRAINT PK_Accounts PRIMARY KEY(actid)
  );
  CREATE TABLE dbo.Transactions
  (
    actid  INT   NOT NULL,
    tranid INT   NOT NULL,
    val    MONEY NOT NULL,
    CONSTRAINT PK_Transactions          PRIMARY KEY(actid, tranid),
    CONSTRAINT FK_Transactions_Accounts FOREIGN KEY(actid) REFERENCES dbo.Accounts(actid)
  );
  
  INSERT INTO dbo.Accounts(actid, actname) 
    VALUES(1,'account 1'),(2,'account 2'),(3,'account 3');
  INSERT INTO dbo.Transactions(actid, tranid, val) VALUES
    (1, 3, 5.00),(1, 4, 2.00),(1, 5, 1.00),(1, 6, 3.00),(1, 7,-4.00),(1, 8,-1.00),
    (1, 9,-2.00),(1,10,-3.00),(2, 1, 2.00),(2, 2, 1.00),(2, 3, 5.00),(2, 4, 1.00),
  	(2, 5,-5.00),(2, 6, 4.00),(2, 7, 2.00),(2, 8,-4.00),(2, 9,-5.00),(2,10, 4.00),
    (3, 1,-3.00),(3, 2, 3.00),(3, 3,-2.00),(3, 4, 1.00),(3, 5, 4.00),(3, 6,-1.00),
    (3, 7, 5.00),(3, 8, 3.00),(3, 9, 5.00),(3,10,-3.00),(1, 1, 4.00),(1, 2,-2.00);
  
  CREATE INDEX idx_actid_val_i_tranid
  ON dbo.Transactions(actid /* P */, val /* O */)
  INCLUDE(tranid /* C */);
END;

Let’s compare the original Parallel APPLY technique then another which leverages my finite opposites function (dbo.O). For this to work I need the total number of rows as my upper-bound. I don’t have the row count ahead of time I can use COUNT as a window aggregate partitioned by actid. 

Figure 11. Parallel APPLY with/without Finite Opposites

PRINT CHAR(10)+('Parallel APPLY')+CHAR(10)+REPLICATE('-',90);
SELECT      
  c.actid,
  fn_pre.tranid,
  fn_pre.val,
  fn_pre.rownumasc,
  fn_pre.rownumdesc
FROM  dbo.Accounts AS c
CROSS APPLY 
(
  SELECT t.tranid, t.val,
    rownumasc  = ROW_NUMBER() OVER(ORDER BY t.val),
    rownumdesc = ROW_NUMBER() OVER(ORDER BY t.val DESC)
  FROM  dbo.Transactions AS t
  WHERE t.actid = c.actid
) AS fn_pre
OPTION (QUERYTRACEON 8649);
GO

--== 1.6.4. Parallel APPLY + Finite Opposites
PRINT CHAR(10)+('Parallel APPLY+ Finite Opposites')+CHAR(10)+REPLICATE('-',90);
SELECT
  actid      = c.actid,
  tranid     = fn_pre.tranid,
  val        = fn_pre.val,
  rownumasc  = fn_pre.rownumasc,
  rownumdesc = n.Op
FROM dbo.Accounts AS c
CROSS APPLY
(
  SELECT t.tranid, t.val, 
    rownumasc = ROW_NUMBER() OVER(ORDER BY t.val),
    rowz      = COUNT(*) OVER (PARTITION BY t.actid ORDER BY (SELECT NULL))
  FROM   dbo.Transactions AS t
  WHERE  T.actid = C.actid
) AS fn_pre
CROSS APPLY dbo.o(1,fn_pre.rowz-fn_pre.rownumasc+1,fn_pre.rownumasc) AS N
OPTION (QUERYTRACEON 8649);

Figure 12. Execution Plans

Note the second plan does not require a sort.

I will update this article with some performance test results later this week. I realized a small test error as I would about to publish this. That said, the sortless version is faster. I promise.

Conclusion

Ordering rows in both Ascending and Descending order without a sort. Happy 2021!

Virtual Indexing Part 6: Combinatorial Index Optimization

In Part 5 we learned how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second. We can do better!

Disclaimer

I’ve never made it to Part 6 of a technical series. For me, a huge obstacle to getting quality content out in a timely manner is the proof-reading and final editing process. Like a few of my friends in this field, I have many new techniques and programming concepts that I’m dying to share which have been decaying in MS Word Docs and .sql files for years. I started my NTally article in 2014 for example.

In September, while quarantined and after listening to too much David Goggins, I decided to step up my writing (and research) game and simply began publishing the first draft of my work the moment I’m done authoring it… What I’m trying to say is, I’ve decided to err on the side of delivering content. This means there will be typos, now and in the future. Sorry. I feel better getting it done, however, typos and all, than taking my good ideas to the grave. If you are a writer/blogger or are aspiring to do so, I suggest following my lead and just get it done – It feels so much bettr! 😉

Introduction

This series began with the introduction of the Virtual Index, the suitably sorted stream of numbers returned by T-SQL ROW_NUMBER which act very similar to an physical index with respect to sorting, grouping and filtering. Unlike a physical index, created using DDL, the virtual index is invoked with a SELECT statement against fnTally and goes away when you’re query is finished. No disk space to consume, fragmentation to address, nothing to check into your CI/CD system – just a magical, in-memory index (suitably sorted stream) which comes and goes as requested.

In Part 5 we reviewed how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second using what I have coined, a multidimensional virtual index – that is, combining a virtual index with a real one speeds things up. More technically, we’re giving the optimizer a wider range of options than it would have with only one index in place.

So, if adding one virtual index to your query can speed things up, can we speed them up even further by adding more? I mean there basically free and can be pulled out of thin air – why not? That’s what we’re going to review today.

Dimensionality

Seasoned SQL developers as well as anyone who has written MDX expressions is familiar with the concept of dimensionality. Using SSAS for example, we can write and MDX expression to return a matrix with two or more dimensions as shown below. The Month dimension is on the x-axis, AKA the columns(0) axis in MDX; the Vehicle dimension is on y-axis, AKA rows(1). For brevity I included two Month members and three Vehicle members. The numbers can represent anything (e.g. Sales, Inventory, etc.) Using MDX we can easily write an MDX expression to return something that looks like this:

Figure 1. Two-dimensional matrix with Vehicle on the x-axis, month(s) on the y-axis.

Two-dimension matrix with a dimension for months and a second dimension for vehicles.

Now let’s say we wanted an deeper level of granularity for each vehicle color. We could add a third dimension for color to our expression to return something like this:

Figure 2. Three-dimensional matrix; a 3rd dimension added for color

Here we’re added a third dimension for “color” to transform it into a three-dimension matrix.

What does this have to do with indexing or performance you ask? Let’s apply this same concept, leveraging RangeAB, to perform what I’ve coined, Combinatorial Index Optimization. Think of it as combinatorial optimization for virtual indexes and indexed functions.

Combinatorial Index Optimization

Below is our query from Part 5 of this series where we reduced the the time it took to return a distinct set of numbers from a one billion set from 111 seconds using traditional methods, but only 0.07 seconds using a two-dimensional index.

  SELECT fnFilter.N1 --fnFilter.RN-(b.Mn-1)
  FROM  (SELECT MIN(t.N), MAX(t.N) FROM #t AS t) AS b(Mn,Mx)
  CROSS APPLY
  (
    SELECT      r.RN, r.N1
    FROM        core.rangeAB(b.Mn,b.Mx,1,1) AS r
    CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
  ) AS fnFilter
  GROUP BY fnFilter.N1, b.Mn
  ORDER BY fnFilter.N1;

As you may recall, we reduced the runtime by using the tally table for the left side of the one-to-many relationship and only returning numbers that exist in the source table (#t). We’ve reduced the question from, “where is the next distinct value?” to a simple yes/no question: “Is 1 in here? Is 2 in here?” This is an example of a two-dimensional virtual index created by way of combinatorial index optimization.

Figure 3. Two-dimensional Virtual Index

The tally table (RangeAB in this example)

Virtual Indexing while wearing 3-D glasses

As I have said before, the great thing about a virtual index is that it’s free. Armed with a correctly constructed tally table function I can conjure up one any time I wish just like an AD&D Level 7 Magic User.

In the real-world, for a table with millions/billions of rows there would almost certainly be a table or other indexed object (e.g. an indexed view) that I could use to join to for filtering. Either way, using rangeAB, and it’s @Gap parameter we can add another index for an even cooler type of filtering. First let’s review our 2-D solution. Note the code and my comments:

Figure 4. How the Two-dimensional Virtual Index works

Next our 3-D solution.

DECLARE @Gap BIGINT = 100;

SELECT @N = fnFilter.N1
  FROM        (SELECT MAX(t.N) FROM #t AS t) AS mx(N)
  CROSS APPLY core.rangeAB(1,mx.N,@Gap,1) AS b
  CROSS APPLY
  (
    SELECT r.RN, r.N1
    FROM   core.rangeAB(b.N1,b.N2-1,1,1) AS r
    CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
  ) AS fnFilter
  WHERE    EXISTS (SELECT 1 FROM #t AS t WHERE t.N BETWEEN b.N1 AND b.N2-1) -- $$$
  GROUP BY fnFilter.N1
  ORDER BY fnFilter.N1;

This image will help explain the changes.

Figure 5. How the Three-dimensional Virtual Index works

Adding this 3rd dimension will dramatically improve performance in two distinct ways. First it’s going to reduce the amount of work required simply by ignoring entire partitions of our sequence once the EXISTS statement determines it’s safe to do so. As Figure 2 should help you understand a 2-D virtual index, here’s what a 3-D virtual index looks like.

Figure 6. How the Three-dimensional virtual index works

Partitions without members are ignored

Performance Test 1

For brevity I set @gap to 3 for the example above. Notice how, for 1-3 and 13-15, our second tally table (by order of appearance in this article) is pre-filtering entire subsets of work. Next the performance test. For this test I used the same test harness from Part 5 of this series to create one billion rows with numbers ranging from 1 to 10,000 with very few gaps. For the 3-D solution I set @g to 100.

Figure 7. Test #1 Results – 1 Billion, Dense

The 3-D index improves with a parallel plan

In this case the 3-D index, with a serial plan slowed things down just a smidgen because I purposely made sure there were few gaps. I included this test to demonstrate how, unlike the 2-D index, we get a faster result set with a parallel plan. This despite the fact that the additional rangeAB call couldn’t really filter anything. Adding an additional dimension allows the optimizer better multithreading options. That is the second distinct advantage of the 3-D index over a 2-D one.

Performance Test 2

For this test I filled a table with the 50,000 numbers ranging from 1 to 1,000,000. Next I used my test harness to duplicate the rows until I had two Billion. The means my tally table must generate 1,000,000 rows for the left side of my query. This is where the 3-D index shines. I set my @gap parameter to 6400 for this test.

Figure 8. Test #2 Results – 2 Billion, Sparse

3-D virtual index with gap reduction saves the day.

The 2-D solution runs for 12 seconds regardless of how many CPUs you throw at it. The 3-D solution runs for 1.3 seconds serial, 0.396 seconds with a parallel plan. That’s correct, less than 1/2 a second to extract 50K distinct values from a set of 2,000,000,000.

Conclusion

This, again, is what I mean by More Fast, Less Code. In Part 7 we’ll dive into what is arguably my favorite topic: Finite Opposite Numbers. You’ll want to be seated for that one.

How to find strings longer than 8K in SQL Server (Part 1)

Ever try searching a string for a substring larger than 8,000 characters? It’s actually more difficult than you think… without a good N-Grams function, that is.

Before you continue

FIRST, this article leverages my NGrams2B function which can be found here: Nasty Fast N-Grams (Part 1). NGrams2B is the same as the 8K version but handles VARCHAR(MAX) strings. Since publishing this article I have since dropped the “2B” from the function name; you will see it referred to simply as dbo.ngrams in the code examples below.

SECOND, For people following my N-Grams series on SQL Server Central (SSC), let not your heart be troubled: I am in the process of wrapping up my N-Grams Series (Parts 2-6). This is a quick “one-off” intended to keep my new blog posts moving along…

Introduction

Riddle me this: what will the query below return?

DECLARE @SearchText VARCHAR(MAX) = REPLICATE('X',9000),
        @SomeString VARCHAR(MAX) = REPLICATE('X',8001);

SELECT CHARINDEX(@SearchText,@SomeString);

You would expect it to return a Zero(0) – @SearchText is a string of 9000 X’s, @SomeString is 8001 X’s. The query, however, returns a 1. That’s not right, how can this be? To get to the bottom of this let’s append this line to our query:

SELECT LEN(@SearchText), LEN(@SomeString)

This returns: 8000 for each. Ahhhhh, our sample strings are being implicitly converted from VARCHAR(MAX) to VARCHAR(8000). I make this mistake periodically when using REPLICATE for creating sample strings. To get around the 8K limitation we can case the input string as VARCHAR(MAX):

--==== Casting "X" as VARCHAR(MAX) circumvents the truncation issue
DECLARE 
  @SearchText VARCHAR(MAX) =       REPLICATE(CAST('X' AS VARCHAR(MAX)),9000),
  SomeString  VARCHAR(MAX) = '###'+REPLICATE(CAST('X' AS VARCHAR(MAX)),9001);

SELECT LEN(@SearchText), LEN(@SomeString);

This prevents the truncation. Let’s run add our CHARINDEX query again:

SELECT CHARINDEX(@SearchText,@SomeString);
Results:
Msg 8152, Level 16, State 10, Line 132
String or binary data would be truncated.

Not as easy as you thought huh?. Perhaps PATINDEX can save the day…

SELECT PATINDEX('%'+@SearchText+'%',@SomeString);

Same error, grrrr… What about LIKE? Surely we can solve this using LIKE right?

SELECT CASE WHEN @SearchText LIKE '%'+@SomeString+'%' THEN 1 ELSE 0 END;

Fail – same error. I encountered this problem for the first time earlier this year and attempting to solve it took a toll on my keyboard and self-esteem.

The Problem

To my knowledge, SQL Server doesn’t provide a way to search for a VACHAR longer than 8,000 or 4,000 for NVARCHAR. Based on my research it’s obvious that this is not a common problem. It will be more common as Developers continue to push the limits of what you can do with strings in SQL Server. And if I’m wrong? So what. Solving problems like this is fun and talking about N-Grams never gets old.

dbo.NGrams to the Rescue (as per usual)

dbo.NGrams and dbo.NGrams8K are Super heroes

Ok-LIKE, CHARINDEX and PATINDEX are not valid options. To quote Ned Flanders, “As melon scratchers go, this one is a honeydoodle!” Fortunately SQL Server is powerful enough to check two VARCHAR(MAX) strings for equality. This is where a good N-Grams function is a super hero.

First we’ll tokenize (split) the @SomeString into tokens as long as @SearchText. It’s six characters long so we can split @SomeString into 6-grams.

--==== Sample Search Text and String to analyze
DECLARE @SearchText VARCHAR(MAX) = 'ABC123',
        @SomeString VARCHAR(MAX) = 'AAABC123XYZ';

--==== Tokenize @SomeString into @SearchText-sized tokens
SELECT   ng.Position, ng.Token
FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng -- LEN(@SeachText=6)
ORDER BY ng.Position;
Results:
Position  Token
--------- ---------
1         AAABC1
2         AABC12
3         ABC123
4         BC123X
5         C123XY
6         123XYZ

The value of @SearchText (“ABC123”) is at Position 3: AAABC123XYZ. All that’s left is to add a WHERE filter to compare @SearchText to each token returned by dbo.NGrams.

SELECT   ng.Position, ng.Token
FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
WHERE    ng.Token = @SearchText
ORDER BY ng.Position;
Returns:
Position   Token   
---------- ---------------
3          ABC123

This is what we want. Note that, thanks to the virtual index that my N-Grams function uses, I can sort by the token’s position in the string without a sort in the execution plan. That’s going to help us because we need to add a TOP(1) clause to identify the location of the first instance of @SearchText . Remember, we only need to know if the text exists. Adding TOP(1), with an ORDER BY to keep it deterministic, has multiple benefits: not only do we get a predictable result, this also assists the cardinality estimator and enables us to tap into the SQL optimizer’s Row Goals which, when correctly leveraged, allow for obscene performance gains.

Here we’re going to use strings (8K+) as variables from earlier but with a couple small changes then:

  1. 1. Add the TOP(1) clause along with the proper ORDER BY clause
  2. 2. Wrap the logic up in a subquery named f
  3. 3. Retrieve the value MAX(f.ItemIndex) from f
  4. 4. Add an ISNULL our MAX() to return 0 when there isn’t a match
DECLARE @SearchText VARCHAR(MAX) = REPLICATE(CAST('X' AS VARCHAR(MAX)),9000),
        @SomeString VARCHAR(MAX) = '###'+REPLICATE(CAST('X' AS VARCHAR(MAX)),9001);

SELECT ItemIndex = ISNULL(MAX(f.ItemIndex),0)
FROM
(
  -- Returns the first position of @SearchText
  SELECT TOP(1) ng.Position
  FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
  WHERE    ng.Token = @SearchText
  ORDER BY ng.Position
) AS f(ItemIndex);

This returns: 4 which is correct – @SearchText is a series of 9000 X’s and exists in Position 4 in @SomeString. And that’s it – easy peasy. Next for performance.

Introducing the 2 Billion+ row test harness

I said two Billion, with a B – not a typo. I began the heading with “Introducing” because any test harness exceeding a billion rows deserves a proper introduction. I say, rows, not characters, because dbo.NGrams tokenizes (splits) strings into rows. You’re going to see many more billion+ row test harnesses as we explore the PerfML Universe.

The VARCHAR(MAX) character limit is 2,147,483,647. I rounded my sample values down to two billion, created a search string 50K characters long then stuffed that value into @SomeString, nested between 100,011 junk characters followed by two billion more for a grand total of 2,000,150,011 characters. We’ll run the query twice: once with a serial plan, another with a parallel plan using make_parallel() by Adam Machanic.

Test Harness

SET STATISTICS TIME ON;
PRINT 'Build the test string'+CHAR(13)+REPLICATE('-',90);

  DECLARE @Match VARCHAR(MAX) = REPLICATE(CAST('X' AS VARCHAR(MAX)),50000),
          @Junk1 VARCHAR(MAX) = REPLICATE(CAST('Z' AS VARCHAR(MAX)),100011),
          @Junk2 VARCHAR(MAX) = REPLICATE(CAST('#' AS VARCHAR(MAX)),2000000000);

  DECLARE @SearchText VARCHAR(MAX) = @Match,
          @SomeString VARCHAR(MAX) = @Junk1+@Match+@Junk2;

PRINT 'Performance Test #1: Serial Execution'+CHAR(13)+REPLICATE('-',90);

  SELECT ItemIndex = ISNULL(MAX(f.ItemIndex),0)
  FROM
  (
    -- Returns the first position of @SearchText
    SELECT TOP(1) ng.Position
    FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
    WHERE    ng.Token = @SearchText
    ORDER BY ng.Position
  ) AS f(ItemIndex)
  OPTION (MAXDOP 0);

PRINT 'Performance Test #2: Parallel Execution'+CHAR(13)+REPLICATE('-',90);

  SELECT ItemIndex = ISNULL(MAX(f.ItemIndex),0)
  FROM
  (
    -- Returns the first position of @SearchText
    SELECT TOP(1) ng.Position
    FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
    WHERE    ng.Token = @SearchText
    ORDER BY ng.Position
  ) AS f(ItemIndex)
  CROSS JOIN dbo.make_parallel() --  NOOOO >> OPTION (QUERYTRACEON 8649);

SET STATISTICS TIME OFF;

How do you think we’ll do? How many minutes do you think this will run? Drum roll………….

2B Row Test Results

Build the test string
------------------------------------------------------------------------------------------
 SQL Server Execution Times: CPU time = 12453 ms,  elapsed time = 12576 ms.
 SQL Server Execution Times: CPU time = 7032 ms,  elapsed time = 7060 ms.

Performance Test #1: Serial Execution
------------------------------------------------------------------------------------------
 SQL Server Execution Times: CPU time = 1953 ms,  elapsed time = 1950 ms.

Performance Test #2: Parallel Execution
------------------------------------------------------------------------------------------
 SQL Server Execution Times: CPU time = 3568 ms,  elapsed time = 3570 ms.

This ran for about 25 seconds on my laptop. These results are more impressive when we dig deeper. SQL Server spent ~20 seconds building the string, another 1.9 seconds to find what we were looking for with a serial plan, 3.5 seconds for parallel. make_parallel adds a bit of spaghetti to the plan but, aside from that, the plans are identical.

Execution plan for the first (serial) query

If we hover over the second Nested Loops Join operator in the plan we can see the actual number of rows generated by dbo.NGrams:

The matching search text is 10,011 characters deep which explains the 100,012 operations

100,012 rows (pseudo iterations) to parse a 2,000,150,011 character string; this in under 2 seconds with only one CPU. That’s Nasty Fast! Can you guess how we got to 100,012 rows? I buried the search string 100,011 characters deep. The deeper the search string is, the number of rows dbo.NGrams creates increases, something we’ll address in Part 2.

make_parallel is a Super Power

Queries which leverage dbo.NGrams and dbo.NGrams8k generally perform better with parallel execution but not in this case. Here I’d likely force a serial plan to prevent a slower parallel plan. I included a parallel solution to help you understand the magic which is make_parallel. I first discussed the obvious benefits of make_parallel in PerfML Part 2 Self-Tuning with a Tally Table. Now for a not-so-obvious benefit, and it’s big.

In my experience, make_parallel and TraceFlag 8649 basically perform identically ~95% of the time. The only reason I ever use TraceFlag 8649 is during development as make_parallel makes the execution plans less readable. Every once in a while, not often, TraceFlag 8649 causes a query to run for minutes (even hours) instead of seconds/milliseconds. This is due to Intra-Query Parallel Deadlocking. What makes this type of deadlocking distinctly painful is the pure randomness of it. Sometimes it happens every 10-100 query executions, sometimes it’s a few times a month. This an example of the type of adventures undocumented trace flags will take you on. make_parallel doesn’t have this issue. Just another reason make_parallel deserves a cape.

Conclusion

2B strings are not common but this should give you a taste for how this technique will hold up against millions or even billions of smaller strings. This is the power of the tally table and dbo.NGrams. Unless someone can point me to a book, article or forum post that details a faster way to find a string longer that 8K in SQL Server I’m going to aver that this is the fastest known method for finding VARCHARs 8001 characters long in SQL Server without an Index. But can we make it faster, much faster, unimaginably faster! See you at Part 2 of this series. Thanks for reading!

Virtual Indexing Part 2: Unexpected Pitfalls

Diving deeper into the world of virtual indexing while being reminded that set-based code is not always bulletproof

Introduction                                                 

In Virtual Indexing Part 1 I introduced the idea of the virtual index, the ordered (AKA “suitably  sorted”) stream of numbers returned by ROW_NUMBER which allows you to perform operations like grouping and window aggregations without the help of an index OR a sort in the execution plan. Today I want to draw your attention to some hidden CTE tally table dangers that are easy to detect and resolve but can be catastrophic when missed.

Problem #1: Gaps and the Hidden Row Explosion

Identifying gaps and islands in sequences is a common SQL task. One of the first things I learned to do with a tally table is identify gaps. It’s an easy concept to grasp, easier than identifying islands. In the query below we have a table variable populated with the numbers 1 to 10, but with the numbers 3,5 and 9 missing. Let’s use fnTally to identify the missing numbers.

Figure 1. Using fnTally to find gaps in a sequence

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

SELECT    f.N
FROM      dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t ON t.N = f.N
WHERE     t.N IS NULL;

This returns the expected results: [3 5 9]. Now the execution plan:

Figure 1.1. fnTally gaps in a sequence query execution plan

fnTally returns 10 rows, as expected, to create the numbers 1 to 10. But look at the clustered index scan against the table variable – 70 rows. That is tragic: 70 rows scanned to identify 3 missing values! Can you identify the problem? Take a minute to see if you can figure this out…

@t has 7 rows, fnTally is returning 10 rows: 7*10=70. The optimizer knows that the clustered index on @t is unique but is not taking advantage of how the numbers returned by fnTally are also unique and ordered. The optimizer is behaving as if a one-to-many relationship between t.N and f.N is possible. With this plan the execution engine is forced to compare every value in @t to every value returned by fnTally. As sad as this might make you feel at first, let not your heart be troubled. The workarounds are endless, the trick, really, is to understand which one is best and why.

Table Operator Modifications

Let’s start with the three solutions below, ordered by which I would recommend. Each of these will get the job done with minimal changes to the underlying logic. The first solution is to just add an additional filter: f.N <= @Max; the second is to replace fnTally with a physical tally table, the third is to use a hint.

Figure 2. fnTally gaps in a sequence solutions

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

--==== Solution #1: Add a WHERE filter for fnTally.N
SELECT    f.N
FROM      dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t ON t.N = f.N
WHERE     f.N <= @Max -- solves the fake one-to-many issue
  AND     t.N IS NULL;

--==== Solution #2: Use a persisted tally table with a WHERE filter
SELECT    f.N
FROM      dbo.tally AS f
LEFT JOIN @t        AS t 
  ON      t.N = f.N
WHERE     f.N <= @Max -- required becuase there is no TOP(@Max) clause
  AND     t.N IS NULL;

--==== Solution #3: Add a Query Hint to the left join
SELECT    f.N
FROM      dbo.fnTally(1,@Max) AS f
LEFT MERGE JOIN @t AS t -- use Merge Join algorithm instead of loop join
  ON      t.N = f.N
WHERE     t.N IS NULL;

Each query returns the correct result and in each case only 7 rows total are retrieved from @t instead of 7 rows for each number returned by fnTally.

Figure 3. Execution plan for the table operator solutions

I didn’t call it out in the screen shot but with each solution only 7 rows were retrieved instead of 70. The first solution is the easiest – just add an additional WHERE filter. This may seem redundant as it’s not necessary, but it works and it’s a simple fix. If it were always this simple I could end the discussion here but it’s not. The second solution is to use a physical tally table (dbo.tally in this example) instead; this solves the problem in this example and has other advantages I’ll cover momentarily. The third solution is to force a merge join using a query hint; this works but is my last choice. There have been times where a query hint is the only option which is why I’m calling it out now.

Anti-Join Solutions

Despite the fact that you have seen the term, Anti-join in the SSMS execution plan, it is not a commonly used phrase in the RDBMS world. An anti-join is but it is the best way to describe a scenario where you need all items from one table that do not exist in a second table.

Figure 4. Anti-join Venn Diagram

Reviewing the execution plan for the first set of solutions above (figure 2) you see that each used a Left Outer Join operator to identify the missing rows, then by a filter to exclude the others. Let’s examine two more solutions which leverage (NOT) EXISTS logical operator and the EXCEPT set operator.

Figure 5. Anti-join solutions

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

--==== Solution #4: EXCEPT >> Merge join handles the filter
SELECT f.N FROM dbo.fnTally(1,@Max) AS f
EXCEPT
SELECT t.N FROM @t AS t;

--==== Solution #5: NOT EXISTS >> Identical to above but with nested loop join
SELECT f.N
FROM   dbo.fnTally(1,@Max) AS f
WHERE  NOT EXISTS (SELECT t.N FROM @t AS t WHERE t.N = f.N)
AND    f.N <= @Max; -- Optional, forces loop join

Figure 5.1. Anti-join solutions execution plans

Both produce almost identical execution plans and are both efficient. The EXCEPT solution is the best IMO because it’s the cleanest.

Parallelism

Now let’s safely force a parallel execution plan against the left join solution with the WHERE filter (Figure #2, Solution 1) then, again, with our EXCEPT anti-join from Figure #5.

Figure 6. Parallel execution test

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

--==== Good when serial, bad for parallel
SELECT     f.N
FROM       dbo.fnTally(1,@Max) AS f
LEFT JOIN  @t AS t ON t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
WHERE      t.N IS NULL
AND        f.N <= @Max

--==== Winner for serial and parallel
SELECT f.N FROM dbo.fnTally(1,@Max) AS f 
EXCEPT
SELECT t.N FROM @t AS t CROSS JOIN dbo.make_parallel() AS x;

Below is the portion of each plan where the rows are retrieved. Even with the WHERE f.N <= @Max clause in place, which solved the 70-row explosion problem earlier with a serial plan, the row explosion returns with parallel execution. The EXCEPT solution, however, does not have this problem with a serial or parallel plan.

Figure 6.1. Parallel execution performance plan

Problem #2: Gaps, Left Join Aggregation and Parallelism

Keeping with the theme of gaps let’s use the same table variable from earlier but include duplicate and missing values. Using this sample data:

Figure 7. Problem #2 sample data

DECLARE @t TABLE (N BIGINT INDEX IX1 CLUSTERED NOT NULL);
INSERT  @t(N) VALUES(1),(1),(2),(2),(2),(2),(4),(4),(6),(7),(8),(8),(8),(10),(10),(10);

The expected output would be:

Figure 7.1. Problem #2 expected output

N     Total
----- -------
1     2
2     4
3     0
4     2
5     0
6     1
7     1
8     3
9     0
10    3

Like earlier, this query will suffer the aforementioned hidden row explosions; instead of retrieving 16 rows from @t, it’s retrieving 160 rows (16*@Max.)

Figure 8. Problem #2, Solution #1 (fail)

SELECT    f.N, Total = COUNT(t.N)
FROM      dbo.fnTally(1,@Max) AS f
LEFT JOIN @t                  AS t
  ON      t.N = f.N -- could not shake the row explosion
GROUP BY  f.N;

Again, an f.N <= @Max in a WHERE filter or as a join. t.N <= @Max will also do the trick in the WHERE or join filters. The problem, again, is that this only works when the optimizer chooses a serial execution plan. This means that, in this case, you must always use OPTION (MAXDOP 1) but be banished to eternal serial processing. C’mon man!

Problem #2 Workarounds

Merge Join hint will solve the problem but cannot get a parallel execution plan; in other words, make_parallel will be successful at solving the row explosion issue but not at forcing a parallel plan. The persisted tally table solution on the other hand, does not experience the row explosion issue and can enjoy both serial and parallel execution.

Figure 9. Problem #2 – Solutions #2 & #3

DECLARE @t TABLE (N BIGINT INDEX IX1 CLUSTERED NOT NULL);
INSERT  @t(N) VALUES(1),(1),(2),(2),(2),(2),(4),(4),(6),(7),(8),(8),(8),(10),(10),(10);

DECLARE @Max BIGINT = 10;

-- Solution 1 - works, query hint though & no parallel execution
SELECT     f.N, Total = COUNT(t.N)
FROM       dbo.fnTally(1,@Max) AS f
LEFT MERGE JOIN @t             AS t
  ON       t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
GROUP BY     f.N;

-- Solution 2 - good serial & Parallel
SELECT     f.N, Total = COUNT(t.N)
FROM       dbo.tally AS f
LEFT JOIN  @t AS t
  ON       t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
WHERE      f.N <= @Max
GROUP BY   f.N;

Figure 9.1. Problem #2 – Solutions #2 & #3 execution plans

In each case the row explosion is gone but only the physical tally table solution (dbo.tally) solves the problem with a serial or parallel plan while preserving the optimizer’s option to execute a parallel plan. This is an example of where fnTally simply cannot compete with dbo.tally.

Set-Based =! Bulletproof (Conclusion)

If there is a lesson to be learned today it’s that the very existence of a numbers table or the text, “tally” in your code does not guarantee that it will be fast. Furthermore, just because your set-based solution is fast – it can always be faster, exponentially faster in many cases as you will see in future articles. In this article we watched an innocent query to identify gaps in a sequence go terribly wrong. Fortunately, there are many workarounds provided that you are identify the problem when it arises. Hopefully you are now better prepared.  

Which is faster again? fnTally or a tally persisted tally table? It depends. As per usual.

Virtual Indexing Part 1: The Suitably Sorted Stream

Does a virtual auxiliary table of numbers include a virtual index?

Before you continue

This post is intended for new and old data professionals alike. To get the most of this article you should have a basic understanding of tally tables and Jeff Moden’s tally table function, fnTally. You also need a basic understanding of how to use APPLY table operator; if not read Understanding and Using APPLY (Part 1), Part 2 and The Cascading (CROSS) APPLY. Each is a quick read and can change your career.

Intro

Less Code, More Fast – this is my coding philosophy in four words and the tagline for this site. Replacing bloated iterative procedural code with clean, set-based, pure, reusable code makes it easy to deploy code faster and with fewer bugs. Veteran programmers who write high-performing declarative code, such as SQL Developers, know that pure, set-based, loop-free code performs exponentially faster than code that uses loops, mutable data structures and procedural constructs such as “goto”. A correctly written tally table function, such as fnTally, count from 0 to 1,000,000,000 in 90 seconds on my laptop. I wouldn’t even try with a loop. Replacing loops with a tally table (or some other form of lazy sequence) is a low-risk, low-cost, high-reward way to write cleaner, faster code.

The Virtual Index

One small benefit to writing loops to perform tasks in your functions and procedures is that you don’t have to worry about sorting. With a set-based approach that leverages a tally table, if the order of the numbers returned matters then the optimizer will have to sort the numbers for you unless they are presorted, AKA indexed.  When using physical tally table that is properly indexed sorting is not a concern, but what about with a tally table function like fnTally? You can’t index a function right?

Returning the numbers 1 to 7, in order, without a sort

An important consideration when replacing a loop with a tally table is sorting. The most common way to handle sorting in the SQL world is to presort using an index. fnTally is a function, you can’t add an index to a function so how do you return the numbers [ 1 2 3 4 5 6 7] in that order without sorting them? You can’t unless they are already sorted. fnTally returns it’s numbers as an ordered set. This ordered stream of numbers is what I refer to as the virtual index. Virtual indexing  can be thought of as the art/science of:

  1. Understanding what the virtual index is
  2. Knowing when\how to use the virtual index
  3. Understanding the alternatives and when to use them

If you are a set- based SQL coding warrior then you know that you can use ROW_NUMBER to create a Virtual Auxiliary Table of Numbers, AKA: tally table. With that virtual table comes with a virtual index. To understand and appreciate the difference between a virtual index and a physical index, lets create a mini-numbers temp table with the numbers 1 to 6. We won’t include an index which means that the numbers are unordered.

Figure 1 – Mini tally table (#nums)

--==== Create a small Number's table
IF OBJECT_ID('tempdb..#nums','U') IS NOT NULL DROP TABLE #nums;
CREATE TABLE #nums (N INT NOT NULL);
INSERT #nums VALUES (1),(2),(3),(4),(5),(6);

This table could be better described as a “tally heap” or a “virtual auxiliary heap of numbers” Let’s run a three queries against #nums without and index present then review execution plans.

Heap vs Index Tests

First to test our queries against #nums without an index in place:

Figure 2 – Test #1: No index  (Heap)

SELECT TOP(3) t.N
FROM     #nums AS t
ORDER BY t.N;

SELECT   t.N, AVG(t.N)
FROM     #nums AS t
GROUP BY t.N;

SELECT 
  [SUM] = SUM(t.N)     OVER (ORDER BY t.N),
  [LAG] = LAG(t.N,1,0) OVER (ORDER BY t.N)
FROM   #nums AS t;

Figure 3 – Test #1 execution plans without an index on #nums

Each query required a sort operator. TOP with an ORDER BY, grouping (the GROUP BY), and the window functions (SUM OVER and LAG) are examples of operations which require an ordered set, AKA “suitably sorted stream.” If the stream of input values is unordered, the most common remedy is a sort operator to create the suitably sorted stream when the query runs.

Hover over the properties for the stream aggregate and sequence operators in the second and third plans to see what each operator does and what they need:

Figure 3.1. – Stream Aggregate and Sequence Project Details

Suitable Sorting with a Physical Index

Sorting at runtime should be avoided, especially when dealing with a large volume of rows. The typical sort cost is n log n which is slower than linear. It’s like the exact opposite of a bulk discount – the cost per row increases as you increase rows. Let’s presort with a unique clustered index on #nums.

Figure 4 – Index for our numbers table (#nums)

CREATE UNIQUE CLUSTERED INDEX uq_nums ON #nums(N);

Now execute the same queries from Test #1(Figure 2) with the clustered index in place.

Figure 4.1. – Execution plan with index in place on #nums

Now that #nums is presorted there isn’t a need to sort at runtime.

The Suitably Sorted Virtual Index

First for the same queries but using fnTally and the execution plan.

Figure 5: Virtual Index Test Query

SELECT TOP(3) t.N FROM dbo.fnTally(1,6) AS t ORDER BY t.N;

SELECT t.N, AVG(t.N) FROM dbo.fnTally(1,6) AS t GROUP BY t.N;

SELECT [SUM] = SUM(t.N)     OVER (ORDER BY t.N),
       [LAG] = LAG(t.N,1,0) OVER (ORDER BY t.N)
FROM     dbo.fnTally(1,6) AS t;

Figure 5.1: Virtual Index Test Execution Plans


That’s the exact same execution plan as in figure 4.1, after the clustered index was added to #nums! The only difference is the nested loop joins on the right which are used to generate the rows. fnTally returns the numbers as an ordered set. In other words, for each query:

  1. Using #nums without an index forces the optimizer to add a sort operator to execution plan for handling TOP, GROUP BY and a couple window function functions (LAG and SUM() OVER())
  2. Adding an index to #nums presorts the values and eliminates the requirement to sort the values at runtime
  3. fnTally is an inline table values function(iTVF) and cannot be indexed
  4. In this example fnTally’s N column behaves like the indexed N column on #nums

The CTE tally table returns the numbers as an ordered set and generates an identical execution plan as when using a persisted tally table with an index. This is what I mean by, a virtual index. The virtual index here is the ordered set of numbers that ROW_NUMBER creates. There is a lot more to this topic that I look forward to exploring in Part 2 of this series.

PerfML Part 2: Self-Tuning with a Tally Table

In PerfML Part 1 we covered a some new algorithms for creating a self-tuning function. In Part 2 we’ll turbo charge our algorithm with parallelism and a tally table.

Before You Begin

This is the second article about PerfML, where I introduced some new concepts, as well as some less common ones which will be referenced heavily in this article, such as the dynamic algorithm (completely unrelated to dynamic programming). If you haven’t read Part 1, I suggest you do so before continuing as you will get much more out of this article. I use delimitedsplit8k, fnTally and make_parallel so grab a copy of each if you are playing along at home.

Introduction

In Part 1 of this series you were introduced to the self-tuning function and PerfML, the science of creating fast, deterministic, elegant self-tuning routines such as functions and stored procedures. We reviewed the difference between exact and approximation algorithms and how they can be combined to return a deterministic result. I also introduced a new type of parameter: the gap parameter (G)which can be used to add gaps to a sequence the purpose of creating an approximation algorithm. We then created a string search routine which leveraged an exact, brute force algorithm to find a specific sequence in a string. Then we transformed the function to leverage an approximation algorithm using our gap parameter which allowed us to introduce a margin of error, or range of possible valid values, while reducing the cost of the algorithm by a factor of G while preserving the ability to return an exact result when G=1. Lastly, we reviewed how to reduce the search area of our exact algorithm by first using an approximation to identify a much smaller range of possible valid answers.

Today I’ll define and teach you how to use a tuning parameter, the most important component of PerfML. A tuning parameter can be defined as:

A parameter that allows you to dynamically tune a deterministic function’s algorithm while guaranteeing the same deterministic return value.

The gap parameter (introduced in Part 1), is a form of tuning parameter. The tuning parameter is the core component of PerfML, tuning parameters are where your intelligence lives. For any given input there can be only one tuning parameter value which does the least amount of work, that value will be perform the best – both in speed and resource usage (memory, IO). Determining the best value for your tuning parameter is the most important (and daunting) task you will perform while developing self-tuning functions; master this and your algorithms will figure out how to mater themselves. Choosing the best tuning parameter value can be done as part of a pre-processing task, at runtime or a combination of both. This will all make more sense when you see the code examples.

Sequential to Set-Based with fnTally

In part 1 we created a self-tuning function leveraging a self-referencing recursive scalar UDF. The goal was not to develop the most performant function but rather to understand the algorithms and other new concepts which are vital to PerfML; iterative programming and basic recursion are easier to understand. Functions are a vital component of any major programming language and avoiding loops, and other iterative, RBAR-style logic is vital to writing fast clean code in most declarative programming languages. For this series I’m using Microsoft SQL server which means we’ll be using fnTally by SQL MVP Jeff Moden to replace our loop logic fromPart 1. There are many advantages to a tally table of a loop, these are the ones I consider most important:

  1. Speed
  2. Functional Purity
  3. Ability to measure performance
  4. Parallel execution
1.    Speed

First, a lazy sequence (CTE tally table in this case) will perform faster than a loop, cursor, iterative recursion (using recursion for counting) and any other iterative methods used for solving common programming problems. fnTally counts to one billion in 90 seconds on my laptop. Second, the fastest kind of T-SQL function in SQL Server is the Inline Table Valued Function (iTVF). With iTVFs, however, data structures are immutable so there is no way to perform your typical loop (do, while, etc.) This leaves you two options in SQL: recursion  or a tally table. Recursion is slow, tally tables are fast and virtually I/O free (readless). Because of the immutable data structures, the functional purity of an iTVF and  speed of a quality tally table or tally table function, the SQL optimizer a full range of options including the ability to multi-thread execution, that is, leverage parallelism.  

2.    Functional Purity

fnTally is an iTVF, which is SQL Server’s version of a pure function . Pure functions are generally side-effect free, don’t experience memory leaks and allow for very clean, concise and reusable code. Note the Advantages of pure functions section of this Microsoft article: Functional programming vs. imperative programming and the benefits on SQL iTVFs in Paul White’s article about APPLY.

The primary reason to implement functional transformations as pure functions is that pure functions are composable: that is, self-contained and stateless. These characteristics bring a number of benefits, including the following:

  1. Increased readability and maintainability. This is because each function is designed to accomplish a specific task given its arguments. The function doesn’t rely on any external state.
  2. Easier reiterative development. Because the code is easier to refactor, changes to design are often easier to implement. For example, suppose you write a complicated transformation, and then realize that some code is repeated several times in the transformation. If you refactor through a pure method, you can call your pure method at will without worrying about side effects.
  3. Easier testing and debugging. Because pure functions can more easily be tested in isolation, you can write test code that calls the pure function with typical values, valid edge cases, and invalid edge cases.

Starting with fast and clean pure functions for our self-tuning algorithms will keep your functions blazing while being easier to debug.

3.    Measuring Performance

It is very difficult to accurately measure scalar UDF performance (except when leveraging SQL Server 2019 scalar UDF inlining.) Key performance metric collection tasks such as measuring I/O, accurate run times and even extracting a usable execution plan are impossible when writing non-inline (impure) T-SQL scalar UDFs. The opposite is true with an inline table valued function (iTVF); fnTally is an iTVF and will be used later in this article for the logic used to replace our recursive scalar logic from Part 1 (dbo.stringSearchV1) .Below is a portion  of the execution plan from dbo.stringSearch (truncated for brevity) which leverages fnTally.

Our iTVF that generated the plan above uses the same “exact approximation” logic we used in our scalar UDF from the previous article. In this plan, because I’m familiar with my logic I can see that approximation algorithm retrieves 32,064 rows from my fnTally, which grows to 42,064 during the merge join. The exact algorithm then requires another 18,144 rows from fnTally which grows to 26,613 during the second merge join. The number of iterations for the scalar UDF is a mystery; all we know for sure as you will see, is that the fnTally version is profoundly faster.

4.    Parallel Execution

T-SQL scalar UDFs (as well as multi-statement table values functions) cannot utilize a parallel execution plan unless they can leverage SQL Server 2019 scalar UDF inlining, and still parallel execution is not possible with inlined scalar UDFs that are self-referencing (in my experience). iTVFs that leverage fnTally, on the other hand, can be processed asynchronously and are often substantially faster as you will see momentarily.

Most of my functions which leverage a tally table or correctly developed tally table function, such as dbo.fnTally, run much faster when leveraging a parallel execution plan. Sometimes a parallel plan yields no performance benefit at all which, then, is a liability – you have multiple CPUs doing the job of one instead doing something more productive. Sometimes parallel plans are slower than serial, that’s an even bigger bummer. Sometimes it’s system/instance level settings (e.g. cost threshold for parallelism) and sometimes even your set-based code is bad (set-based != bullet proof.) Sometimes the optimizer just gets it wrong. Either way, if you need a parallel execution plan you have two options, Trace Flag 8649 and make_parallel by Adam Machanic. Trace Flag 8649 is not documented and therefore not safe for production. make_parallel is a simple T-SQL function which, in short, freaks the optimizer out and causes it to choose parallel execution if it’s available. make_parallel adds a lot of bloat to the execution plan so: in Dev I use the trace flag, in Production I use make_parallel when I have determined that doing so is the best option.

Function Logic

We are going to build the same function from part 1, dbo.stringsearchV1, using fnTally for our iterative logic. These are the parameters we’ll use:

DECLARE
  @G BIGINT       = 5,   -- gap param
  @C VARCHAR(100) = 'X', -- search character
  @L BIGINT       = 1,   -- min length
  @H BIGINT       = 22,  -- max length
  @S VARCHAR(MAX) = '999999999-XXXXXXXXXXXXXX-999999999'; -- sample search string

fnTally takes two parameters: @ZeroOrOne, which determines is the first row will be zero or one; we will set this to 0. @MaxN will be the highest integer returned, for @MaxN I came up with this formula: (@H-@L+1)/@G+SIGN((@H-@L+1)%@G); it will determine the correct number of rows with the @G-sized gaps in the.

Avoiding the descending sort

SQL sorts generally have a complexity of N log N, which is worse than linear. This means that the more rows we sort, the cost to sort each row increases. For row counts in the 10’s of thousands, this is not a big deal but it becomes a problem when we get into the 100’s of thousands and beyond.

The recursive solution from Part 1 did not require a sort to determine the “longest” allowable string. For our fnTally version we will be seeking the longest value which does requires us to use an ORDER BY N DESC clause. To circumvent this problem we can return N is descending order using Jeff Moden’s formula: @MaxN-N+1, mentioned in the fnTally documentation. This only works, however, when we are seeking an exact result (@G=1) and, therefore, no gaps in the sequence.

To return the numbers beginning with @H, decrementing by @G until @L we’ll use: IIF(t.N<gap.N, @H-@G*t.N, @L). If the final number is less than @L, @L is returned for the final iteration. So far we have:

SELECT
  [N ASC]  = t.N,                          -- fnTally "N" (0-base row number)
  [N DESC] = IIF(t.N<gap.N, @H-@G*t.N, @L) -- Gap reduced sequence from @H to @L
FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- number of rows
CROSS APPLY dbo.fnTally(0,gap.N)                      AS t;     -- fnTally call

With this logic, setting @G=2 would return (truncated) [22 20 … 6 4 2 1] for “[N DESC]”; @G=5 returns [22 17 12 7 2 1], @G=6 returns [22 16 10 4 1], and so on. For the L column (the lower-bound value) we’re using [N Desc] from earlier. For H, the upper-bound, we’ll leverage LAG: f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1.

SELECT
  L = IIF(t.N<gap.N, @H-@G*t.N, @L), -- Gap reduced sequence from @H to @L
  H = f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
CROSS APPLY dbo.fnTally(0, gap.N)                     AS t
CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L)))   AS f(N);
Retrieving matched values

Then add a subquery that searches the string (@S) for an L-sized series of @C in the input string (@S), and exclude rows where there is no match:

CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S)))  AS p(Idx)
WHERE        p.Idx > 0

Finally, we wrap our logic into a subquery, add our TOP (1) clause, ordered by N. Note that I am using the aliased of “RN” for “Row Number”; I use RN as my sort-key, and “N” for my tally table N column. We’ll wrap all this up in an iTVF and create a VARCHAR(8000) and VARCHAR(MAX) version of our new stringsearch function. The only difference between the two functions is the data type for @S. The final tally table (lazy sequence) logic for the our two set-based approximation function are:

dbo.stringSearch8K

CREATE OR ALTER FUNCTION dbo.stringSearch8K
(
  @C CHAR(1),
  @S VARCHAR(8000),
  @L BIGINT,
  @H BIGINT,
  @G BIGINT
)
/* Created on 20201004 by Alan Burstein. */
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT TOP(1)
  L   = f2.L,  -- Lower Boundary
  H   = f2.H,  -- Upper Boundary
  Idx = p.Idx -- ItemIndex (position of the item match)
FROM
(
  SELECT t.N, f.N, f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
  FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
  CROSS APPLY dbo.fnTally(0, gap.N)                     AS t
  CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L)))   AS f(N)
) AS f2(RN,L,H)
CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S))) AS p(Idx)
WHERE        p.Idx > 0
ORDER BY     f2.RN;

dbo.stringSearch

CREATE OR ALTER FUNCTION dbo.stringSearch
(
  @C CHAR(1),
  @S VARCHAR(MAX),
  @L BIGINT,
  @H BIGINT,
  @G BIGINT
)
/* Created on 20201004 by Alan Burstein. */
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT TOP(1)
  L   = f2.L, -- Lower Boundary
  H   = f2.H, -- Upper Boundary
  Idx = p.Idx -- ItemIndex (position of the item match)
FROM
(
  SELECT t.N, f.N, f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
  FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
  CROSS APPLY dbo.fnTally(0, gap.N)                     AS t
  CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L)))   AS f(N)
) AS f2(RN,L,H)
CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S))) AS p(Idx)
WHERE        p.Idx > 0
ORDER BY     f2.RN;

The 8K version is slightly faster when dealing with smaller strings with a smaller range of acceptable values. As the strings and range of acceptable values grow, the Max version becomes faster.

The Tuning Procedure

The tuning procedure (see the attached DDL as a text file below) is a T-SQL stored procedure I spun up quickly to test different design patterns using values for @G. This will help us find the optimal parameter for a specific predictable input. The procedure accepts the following parameters:

The procedure accepts all the parameters required for dbo.stringsearch. @I is for the number of test iterations the stored procedure should perform. @mode is for selecting on of 10 different design patterns to test. The parameter, @gaps is a comma-delimited list of different gap parameters (@G) for our function to accept. @puff and @C build sample strings of NEWIDs with the search pattern stuffed in the middle. Setting @mode=1, @I=5, @gaps = ‘1,2,5,10’ will test a specific design pattern using dbo.stringsearch 5 times for each value in @gaps. There are 4 gap values so for this example that would be a total of 20 tests.

The proc has examples with a performance test for each design pattern; executed with serial and parallel execution plans. Again, in Development environments I use the trace_flag 8649 because it’s easier to read the execution plan. I use make_parallel in Production because it performs the same and is safe to use in Production.

Below is the example output from a test I ran against a large set of rows five times (@I=5) for seven parameters between 5 and 35 (@gaps=’5,10,15,20,25,30,35′). The Gap column is the value of @G, Total is the total run time, min and max represent the slowest and fastest runs respectively. TAvg is a rolling average of the total time.

Gap   Total  Min    Max    TAvg
----- ------ ------ ------ ------
25    69294  13140  14437  69294
20    69546  12804  14603  69420
30	  71460  13096  14950  70100
35	  74830  13514  15990  71282
15    75196  13790  16413  72065
10    94631  17383  20097  75826
5     165316 30423  34530  88610

In this above example, with the example data used, @G=20 had the fastest run at 12.8 seconds, but it appears that 25 is the optimal value based on the total run time.

In the real world we would replace our sample data table, ##strings, with a sampling of real data you are developing a self-tuning functions for. For example, once I used to write ETL packages that collected social media metrics from search and social media companies. The source data from each company came in as a text file where we would extract key metrics. The content of each file, and the input strings within, was looked very different for each company but, for each the text was predictable. The optimal use case for self-tuning function for string parsing here would be to determine the best gap parameter value for each companies source data. For this we could build a tuning procedure for each company and store the best parameter value for later use.

Design Patterns Tested

We have two versions of our function, one for 8K strings and one for VARCHAR(MAX). The performance is nearly identical for both functions except that the 8K version is faster with shorter strings (e.g. less than a 1000 characters). Note these parameters, the queries, my comments and the results.

Test Parameters and sample data

Executing this code will create the sample data table (##strings) leaving it intact for further analysis.

DECLARE
  @rows BIGINT        = 2000,
  @puff BIGINT        = 5,
  @gaps VARCHAR(8000) = '5,10,15,20,25,30,35', -- Gap parameters to test
  @C    VARCHAR(100)  = 'X',
  @L    BIGINT        = 1,
  @H    BIGINT        = 1000,
  @max  BIGINT        = 1100,
  @I    BIGINT        = 1,
  @out  BIGINT        = 1; -- general performance details of test run

  EXEC dbo.usp_stringsearch_test1 @rows, @puff, @G, @C, @L, @H, @max, 15, @out, @I, 0;

Please test away and let me know what you find.

Design Patterns

With our sample table in place we can test a few different design patterns; there are three: our exact algorithm, a dynamic algorithm (approximation > exact) and a dynamic algorithm with two approximation reductions (approximation > approximation > exact).

--==== 1. EXACT algorithm
SELECT
  s.stringId, s.string,             -- ##strings attributes
  Boundaries = CONCAT(f.L,'-',f.H), -- The Upper and lower bounds (exact because @G=1))
  ItemIndex  = f.Idx
FROM        ##strings                             AS s
CROSS APPLY dbo.stringSearch(@C,s.String,@L,@H,1) AS f -- @G=1 means EXACT
WHERE       f.L >= @L;

--==== 2. DYNAMIC algorithm: approximation to exact reduction
  SELECT      @X = f.L
  FROM        ##strings                                 AS s
  CROSS APPLY dbo.stringSearch(@C,s.String,@L,@H,@G)    AS f0
  CROSS APPLY dbo.stringSearch(@C,s.String,f0.L,f0.H,1) AS f
  WHERE       f.L >= @L
END;

--==== 3. DYNAMIC algorithm with two reductions: Approx to Approx to Exact
DECLARE @G2 BIGINT = @G/3+1; -- Simple reduction: split by 1/3rd then add 1 (min=2)

SELECT      @X = f.L
FROM        ##strings                                       AS s
CROSS APPLY dbo.stringSearch(@C, s.String, @L, @H, @G)      AS f0
CROSS APPLY dbo.stringSearch(@C, s.String, f0.L, f0.H, @G2) AS f1
CROSS APPLY dbo.stringSearch(@C, s.String, f1.L,
                              IIF(f1.L+2<=@H,f1.L+2,@H), 1) AS f
WHERE       f.L >= @L;

The first will be the slowest because it’s running in serial mode. The second is much faster because, again, parallel plans turbo charge the performance when combining fnTally and a pure dynamic alogrithm. The Third example, the two-reduction solution is slower for short strings but the fastest (by far) as strings and search patterns get longer. When I already know a query is going to be slow I’ll often set @I low (1 or 2) just to confirm while saving time.

Performance Test

Using the parameters posted above let’s execute dbo.usp_stringsearch_test1 to as shown below.

--==== 8K - Dynamic with 1 reduction - SERIAL
  EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 5,  @out, @I, 1;

--==== VARCHAR MAX - Dynamic with 1 reduction(#6) & 1 string reduction(#7) - PARALLEL
 SELECT @puff *= 10/*500*/, @I=5, @H=2000, @max=2000;
  EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 6,  @out, @I, 1;
  EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 7,  @out, @I, 1;

--==== 8K & VARCHAR MAX - Dynamic with 2X reductions (@G2=@G/2+1) - PARARLLEL
 SELECT @gaps = '100,250,500,800,1000,2000', @I=10;
  EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 8,  @out, @I, 1;
  EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 10, @out, @I, 1;

You can see from the results below that:

  1. Parallelism makes a huge difference
  2. The two reduction solution is the fastest
  3. The 8K and VARCHAR(MAX) versions perform almost identically
  4. With the one-reduction solutions the tuning parameter can be way too high or way too low, the two-reduction solution seems to perform better with the highest possible value for @G.

Note that, for modes 8 & 10 (the two-reduction test), I chose different (higher) values for the gap:

Conclusion

In Part 1 we learned how to develop a self-tuning function using a dynamic algorithm. Today we created a much faster version of our function leveraging fnTally. Things get even nasty faster when we add make_parallel. Another victory for set-based SQL and the tally table. In part 3 we’ll use these concepts to do something much cooler. Thanks for reading!

PerfML Part 1: The Self-Tuning Function

Is it possible for functions to make themselves faster? PerfML, a new performance-centric approach to writing code that leverages machine learning makes this possible. No libraries required, just some old school math.

Introduction

By the end of this article should understand and will have sample code which can be used to write what I call a self-tuning function – a function which can continuously tune itself better performance over time. This without libraries or third party code, only algorithms and concepts. What you may appreciate most about this approach to writing self-tuning functions is that you can apply it any major programming languages that supports side-effect free functions; especially Functional Programming (FP), Python, .NET and pretty much any flavor of SQL. I came up with this concept while working on a version of the traveling salesperson problem for a logistics company. I quickly discovered that these ideas could be easily applied to common everyday problems Throughout this series we will apply this concept to a simple string and pattern search algorithm then to some more complex but useful functions, all the way through the longest common subsequence between three strings which is NP-Complete (for the general case of an arbitrary number of input sequences[1]).

By the way, welcome to my blog! I created the CodingBlackArts to introduce a whole new approach to writing modular, scalable nasty fast code, language-agnostic. My background has been with Microsoft SQL Server but I’m experienced with most with various flavors of SQL and other declarative programming languages such as MDX and XSLT. Writing high-performing set-based SQL code is my second passion by my first true love was functional programming (FP) and, when possible I like to dabble with XSLT 3.0, Scala and Clojure. Sadly, I’ve never had an opportunity to focus on FP professionally so I’ve come up with ways b ring my favorite FP concepts into my SQL development. Pure functions, higher-order functions, functions as first Class Citizens, lazy sequences, modularity and even code as data. Some of this you will see throughout the series.

PerfML is a collection of ideas, concepts and algorithms intended to enable you to: (1) write super-fast, clean and deterministic functions, (2) teach your functions to make themselves faster (3) teach your functions how to teach themselves how to be faster. The self-tuning function is a core component of PerfML. Perf – as in performance, ML as in Machine Learning. PerfML relies heavily on some new approximation algorithms, reductions and even a new kind of parameter I developed. This is the first of a many articles on this topic. I’ve spent nearly seven years developing this new method of solving old problems and I’m excited to share it with you.

Disclaimers

First, functions are the center of the PerfML Universe and a the top of that hierarchy is fnTally, a T-SQL function that produces an ordered lazy sequence of numbers similar to python and Clojure’s range function.

This article has the first iteration of code examples which involve loops and T-SQL scalar user defined functions (referred to herein as scalar UDFs). This is for learning purposes only, loops and scalar UDFs perform poorly in MS SQL Server. Loops are generally gross as a math concept in my opinion, especially when developing functions. For now it’s important to understand the concepts, especially if you are new to programming, we’ll address performance later in the series.

Second, as with all CodingBlackArts content, this article is a living document and will be updated frequently. Please forgive any spelling/grammar issues, sample code issues and such. My TOP(1) priority is to deliver my my code, concepts and algorithms as quickly and frequently as possible. than getting my grammar perfect. Please comment below of ping me with ideas, thought, complaints and suggestions.

Third, don’t be discouraged if you are new to programming or SQL specifically. The subject matter is a little advanced but I will do my best to present this subject matter in a way that as many developers and curious onlookers can grasp. When I was new to programming I read a lot of articles that middle little or no sense to me – running and playing with code with was a great learning tool early on and still is.

The Requirement

Let’s begin with a basic string search algorithm then turn it into a self-tuning function. The requirement is a function that searches a string(S) for the longest consecutive series of a specific character(C), between L and H characters long. Consider these T-SQL parameters and function:

DECLARE @C CHAR(1)      = 'x',
        @S VARCHAR(MAX) = 'ABC.xxxyz.xxxx!!',
        @L BIGINT       = 3,
        @H BIGINT      `= 6;

SELECT dbo.stringSearch_exact(@C,@S,@L,@H);

Per the above parameters, we need dbo.stringSearch_exact to return the length of the longest series of the letter, “x in the string ABC.xxxyz.xxxx!!, that is between 3 and 6 characters long. The correct result is 4 because the text, “xxxx” appears at position 11.

Solution 1: An Exact Algorithm

First for the most direct but least performant solution leveraging a loop and a basic brute force search algorithm. The routine will search for a series of the character, @C that that is @L characters long. If no match is found the routine returns zero (0). If a match does exist the routine begins a search for a series with a length of @H, decrementing @H by 1 until a match is found or until @H=@L.

-- KEY: Note that, with this algorithm: When @H=@L you know @L is the correct answer.
DECLARE @C CHAR(1)      = 'x',
        @S VARCHAR(MAX) = 'zzzabc123xxxxxxxxgggggggfffxxxxxxx',
        @L BIGINT       = 5,
        @H BIGINT       = 20;

PRINT CHAR(13)+'Function logic as a loop'+CHAR(13)+REPLICATE('-',90);
WHILE @L<=@H
BEGIN
  DECLARE @hIdx BIGINT = CHARINDEX(REPLICATE(@C,@H),@S);
  PRINT CONCAT('@H:',@H,'; Idx:',@hIdx,';-',IIF(@hIdx>0,' Hit',' No Hit'));

  IF CHARINDEX(REPLICATE(@C,@L),@S)=0 BEGIN SELECT  0; BREAK END; -- You done
  IF @L=@H                            BEGIN SELECT @H; BREAK END; -- You done
  IF CHARINDEX(REPLICATE(@C,@H),@S)>0 BEGIN SELECT @H; BREAK END; -- You done
  SET @H=@H-1; -- Lets try again (variable mutation bad - how dare me)
END;

There are a couple subtle but slick optimizations worth pointing out. First – if there isn’t a match only one operation is performed. It would be easy to just start counting down from @H, performing O(H) operations to discover that there’s no match. Next, when @H-1 =@L the routine skips the final iteration because it’s done. This means that we a spared one operation in worst case scenarios. The former optimization is a big deal, the latter is a big deal when dealing with really big strings but not smaller ones. I point these optimizations out because they are both math victories. In programming I like to think of each math victory as points in a video game that help you advance to the next level. Even the small ones add up. I digress.

A PRINT statement is included to see what’s happening under the hood.

Output:

@H:20; Idx:0;  - No Hit
@H:19; Idx:0;  - No Hit
@H:18; Idx:0;  - No Hit
@H:17; Idx:0;  - No Hit
@H:16; Idx:0;  - No Hit
@H:15; Idx:0;  - No Hit
@H:14; Idx:0;  - No Hit
@H:13; Idx:0;  - No Hit
@H:12; Idx:0;  - No Hit
@H:10; Idx:0;  - No Hit
@H:11; Idx:0;  - No Hit
@H:9;  Idx:0;  - No Hit
@H:8;  Idx:10; - Hit

The parameters above will require the routine perform 13 iterations for the correct result. Things quickly get worse when the string gets longer and/or the distance between @L and @H increases. Try the routine above with different parameters. Note the best- and worst-case scenarios. Below is our routine wrapped in a scalar UDF :

dbo.stringSearch_exact
CREATE FUNCTION dbo.stringSearch_Exact
(
  @C CHAR(1),      -- Search character
  @S VARCHAR(MAX), -- Input String   
  @L BIGINT,       -- Lower Boundary 
  @H BIGINT        -- Upper Boundary 
)
/*
Created by Alan Burstein 20200916
WARNING - SLOW CODE, INTENDED FOR LEARNING PURPOSES ONLY

Note the RETURNS NULL ON NULL INPUT. This has performance benefits and will 
prevent an infinite loop for when @H is NULL
*/
RETURNS BIGINT WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT AS
BEGIN
  WHILE @L<=@H
  BEGIN  
    IF CHARINDEX(REPLICATE(@C,@L),@S)=0 RETURN 0;
    IF @L=@H                            RETURN @H;
    IF CHARINDEX(REPLICATE(@C,@H),@S)>0 RETURN @H;
    SET @H=@H-1;
  END
  RETURN NULL;
END;

We call dbo.stringSearch_Exact like this:

SELECT dbo.stringSearch_Exact('C', 'ABCCCCDEF', 1, 5); -- RETURNS: 4

One small thing to note is how the value of @H changes with each iteration. This is referred to as variable mutation a programming anti-pattern which is why this is the last time you will see it here. Variable mutation will be replaced with recursion later in our next solution.

Solution 2: An Approximation Algorithm

What if we didn’t need an exact match? What if we introduced a margin of error, allowing the answer can be off by a factor of N (N as a user-supplied boundary)? Say we are seeking a match between 5 and 10 characters long, but it doesn’t have to be exact.

Enter the approximation algorithm. Approximations are a vital component of PerfML, consider this definition from Wikipedia:

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable [deterministic] guarantees on the distance of the returned solution to the optimal one.[1]

Approximation Algorithms – Wikipedia

The intended class of problems are NP-hard, a a class of problems most people outside the data science universe rarely deal with. Approximation algorithms were created for this class of problems but work great on the easier stuff, especially when brute force is involved. This needs to change, too many developers are missing out.

Let’s transform our exact algorithm into an approximation algorithm. For this we’ll use a gap parameter. A gap parameter (we’ll call @Gap or @G for short) allows us to decrease the size of a sequence (AKA set) by decrementing by the value of @G instead of 1. So, for example, when: @L=1, @H=10, @G=1, the routine counts down from 10,9,8…2,1. That’s the exact algorithm from above (the a static value instead of @G. If you set @G=2 the routine counts down – 10,8,6…

DECLARE @L   BIGINT       = 1,
        @H   BIGINT       = 20,
        @G   BIGINT       = 2;

PRINT CONCAT('@N:',@H,' (Start)');
WHILE @L<@H
BEGIN
  SET @H=IIF(@H-@G < @L, @L, @H-@G);
  PRINT CONCAT('@H:',@H);
END

To prevent the final iteration from being negative or less than @L:

SET @H=IIF(@H-@G<@L,@L,@H-@G)

Again, we still have the option of @G=1 for an exact result by setting the gap parameter to 1: @G=1, or it can be greater. For brevity I’m not adding logic to prevent you from setting @G less than 1, NULL or longer than your string but keep that in mind for now.

DECLARE @C CHAR(1)        = 'x',
        @S VARCHAR(MAX) = '123xxxyyyzzz123xxxxxxxxgggggggxxxxxxx',
        @L BIGINT       = 5,
        @H BIGINT       = 20,
        @G BIGINT       = 2;

PRINT CHAR(13)+'Function logic as a loop'+CHAR(13)+REPLICATE('-',90);
DECLARE @I BIGINT = 1; 

WHILE @L<=@H
BEGIN
  DECLARE @hIdx BIGINT = CHARINDEX(REPLICATE(@C,@H),@S);
  PRINT CONCAT('@H:',@H,'; Idx:',@hIdx,';-',IIF(@hIdx>0,' Hit',' No Hit'));

  IF CHARINDEX(REPLICATE(@C,@L),@S)=0 BEGIN SELECT RESULTS=0;  BREAK END;
  IF @L=@H                            BEGIN SELECT RESULTS=@H; BREAK END;
  IF CHARINDEX(REPLICATE(@C,@H),@S)>0 BEGIN SELECT RESULTS=@H; BREAK END;
  SELECT @H=@H-@G, @I=@I+@G;
END

To understand how different values for @G impact the workload consider:


The beauty here is how easy it is to calculate the work/accuracy trade-off for as @G increases reducing the workload while increasing your margin of error. In the analysis above the cost/accuracy trade-off appears to diminish when @G=3. Here, the “sweet spot” is somewhere between 2 and 3. The sweet spot is, however, a constantly moving target; finding it is both Art and Science. The important thing is to understand is the best/worst case scenarios and associated trade-offs.

In the world of Business Intelligence there is a saying, “if you can’t measure it, you can’t manage it.” For example, how could you manage your direct reports if you didn’t know how many you had? Once we go set-based with fnTally we’ll be able to capture real metrics from the execution plan to determine which parameter would have been best for the different kinds of data you are dealing with. Now, when we do a performance test, we aren’t only testing performance, we running a simulation! Black Mirror fans should be excited and nervous.

The Approximation Function

Our new logic is going to be in the form of a self-referencing recursive scalar UDF. I chose a recursive scalar UDF because:

  1. They allow for the cleanest, most lightweight, universal and portable syntax possible in T-SQL. Even programmers unfamiliar with SQL should understand and apply this code to the language of their choosing provided they understand basic recursion (don’t feel bad if you don’t, just remember that recursion should cause you to think about recursion).

    Note that logic for dbo.stringSearchV1 is only three lines.
  2. There will be good recursion use later on leveraging recursive CTEs which is a little more complicated, the more simple recursive scalar UDF will serve as a good warm-up.
  3. A slow scalar recursive UDF is a great way to demonstrate the power of the algorithm; in the test below we’ll look for matching patterns in strings millions and even billions of characters long; we’ll get our answer in seconds.
  4. There’s is barely any documentation available on recursive Scalar UDFs in SQL and nothing anywhere about tail recursion in T-SQL server functions – an important topic.
Approximation Function Version #1: dbo.stringSearchV1
CREATE OR ALTER FUNCTION dbo.stringSearchV1
(
  @C VARCHAR(100), -- Search String  
  @S VARCHAR(MAX), -- Input String   
  @L BIGINT,       -- Lower Boundary 
  @H BIGINT,       -- Upper Boundary 
  @G BIGINT        -- Gap Optimizer
)
/*
Created by Alan Burstein on 20200916
WARNING: SLOW FUNCTION - FOR LEARNING PURPOSES ONLY
*/
RETURNS BIGINT WITH RETURNS NULL ON NULL INPUT AS
BEGIN
  IF CHARINDEX(REPLICATE(@C,@L),@S)=0          RETURN 0;         -- Startup Predicate
  IF CHARINDEX(REPLICATE(@C,@H),@S)>0 OR @H=@L RETURN @H;        -- Final Recursive call
  RETURN dbo.stringSearchV1(@C,@S,@L,IIF(@H-@G<@L,@L,@H-@G),@G); -- Next Recursive call
END;

For each iteration where there is still no match without a match, the function calls itself reducing @H by @G using with this formula:

IIF(@H-@G<@L,@L,@H-@G)

If, on the final iteration, @H-@G is less than @L (and therefore invalid), @L is used instead of @H-@G. E.g when @H=10, @L=1, @G=4: the pattern size evaluated would be 10,6,2, then 1. If @H=20, @L=10, @G=6: the sizes would be 20,14,10.

As mentioned earlier, and especially in the world of declarative programming (which includes T-SQL), performance is best when parameters and variables are immutable. WRONG:

SET @H = IIF(@H-@G < @L, @L, @H-@G)
RETURN dbo.stringSearchV1(@C, @S, @L, @H, @G)

Instead of mutating the value of @H the function recursively calls itself with new parameter values. CORRECT:

RETURN dbo.stringSearchV1(@C, @S, @L, IIF(@H-@G<@L,@L,@H-@G), @G)

Approximation Function Performance Test

Again, this article is to get you familiar with the algorithm in Part 2 we’ll turbo charge things with dbo.fnTally, a tally table function.

For our performance test we’re going to:

  1. Set up a sample string 21,470,000+ characters long, a good value for stress testing
  2. Create and populate a table variable (@gaps) with different values to assign to our gap parameter (@G)
  3. Run a performance test with each parameter values for @G
  4. Prints the results for review

21,470,000 characters is a good for stress testing, it’s ~1% of the VARCHAR-MAX limit (2,147,483,647). Using the parameters below, dbo.stringSearchV1 takes roughly 2 seconds when @G=10. Since it’s performance is linear, I know (and have confirmed) that 214,700,000 characters will take ~20 seconds, 2.147 billion characters (the limit) takes roughly 200 seconds (~3.5 minutes.) Not bad for a scalar UDF.

-- 1. Set up the sample string and parameters
DECLARE @X VARCHAR(MAX) = '?';
DECLARE @C VARCHAR(100) = 'x',
        @S VARCHAR(MAX) = '555xx'+@X+'xxxxxxx'+REPLICATE(@X,21470000)+'zzz123xxxggffxxx',
        @L BIGINT = 1, @H BIGINT = 20, @G BIGINT, @O BIGINT, @ST DATETIME;

-- 2. Create and populate a temp table with gap parameter(@G) values
DECLARE @gaps TABLE (G BIGINT PRIMARY KEY);
INSERT  @gaps VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(20),(50),(100),(500),(1000);

-- 3. Start the test
PRINT 'APPROXIMAION TEST'+CHAR(13)+REPLICATE('-',90);
WHILE EXISTS (SELECT * FROM @gaps)
BEGIN
  SELECT TOP(1) @G = g.G FROM @gaps AS g ORDER BY G;
  DELETE FROM @gaps WHERE g=@G;

    SELECT @ST=GETDATE();  -- Set the test start time
    SELECT L = reduce.L FROM (VALUES(dbo.stringSearch(@C,@S,@L,@H,@G))) AS reduce(L);
-- 4. Print the results
  PRINT CONCAT(@G,': ',DATEDIFF(MS,@st,GETDATE()));
END;
Approximation Function Test Results

I colored the values green for what I consider the best performance/accuracy trade-offs. The slowest, but correct, answer is 7 at 4.3 seconds for an exact result (@G=1). 2 seconds to determine that the value is between 5 and 9 (@G=5), 1 second to determine that it’s between 4 & 11 (@G=8) and 0.5 seconds to determine that it’s between 1 & 20 (@G=20.) This is another example of how to trade accuracy for performance in a very predictable fashion.

Solution 3: The Dynamic Algorithm

Can we eat our cake and have it too? Can we enjoy the performance benefits of the approximation algorithm without compromising on accuracy? What if we combined our approximation with an exact algorithm?

This is where things get good. Let’s revisit our approximation performance test and make a quick change. Let’s use a simplified string and set @G=5, which will assign the lowest allowable value @Low. Next assign the highest possible value to @High, which is @Low+@G-1.

DECLARE @C VARCHAR(100) = 'x',
        @S VARCHAR(MAX) = '999xxxxxxxABC123',
        @L BIGINT = 1, @H BIGINT = 20, @G BIGINT = 5; -- Gap Parameter set to 5

-- 1. Reduce the search area leveraging the approximation algorithm:
DECLARE @Low  BIGINT = dbo.stringSearchV1(@C,@S,@L,@H,@G); -- Approximation algorithm
DECLARE @High BIGINT = @Low+@G-1;

-- 2. Exact Algorithm
SELECT LowerBound = @Low, UpperBound = @High, Exact = dbo.stringSearchV1(@C,@S,@Low,@High,1);

Results:



Using the exact algorithm (@G=1) would require 13 iterations to determine the correct answer is 7. Seeking an approximation, accurate to within 4, we set @G=5. First we perform 4 iterations to determine that the search item is at least 5 but no more than 9. Next we make a second function call with @L=5, @H=9, and set @G=1 for an exact result, which takes 3 iterations. That’s a total of 7 iterations instead of 13! We just reduced the work by almost 1/2 without any loss of accuracy. That’s huge! Data Scientists and Complexity Theory nerds refer to this type of optimization as a combinatorial optimization leveraging a dynamic algorithm. A dynamic algorithm can be defined as:

A general class of algorithms which solve problems by solving smaller versions of the problem, saving the solutions to the small problems and then combining them to solve the larger problem.

The trick is to choose the correct value for @G or, better yet, write a routine that picks the best value for you. Consider the chart below which shows the worst case scenario for when @L=1 and @H=20. “@G” is the gap parameter values from 1-10. “Approx” represents the maximum number of approximation iterations, “Exact” represents the maximum number of exact iterations. “Total” combines the the Approx+Exact for the worst case scenario. The worst case scenario will always be when @G=1, things improve as @G get’s higher.

Notice that 4 and 5 appear to be the sweet spot. This does not mean that when @G=4, or @G=5 the query is guaranteed to be the fastest, only that they are most likely to be best values. Lets run our performance test from earlier but this time we’ll leverage our new dynamic algorithm for an exact result.

Dynamic Algorithm Performance Test

DECLARE @X VARCHAR(MAX) = '?';
DECLARE @C VARCHAR(100) = 'x',
        @S VARCHAR(MAX) = '555xx'+@X+'xxxxxxx'+REPLICATE(@X,21470000)+'zzz123xxxggffxxx',
        @L BIGINT = 1, @H BIGINT = 20, @G BIGINT, @O BIGINT, @ST DATETIME;

PRINT 'APPROXIMAION TEST'+CHAR(13)+REPLICATE('-',90);
DECLARE @gaps TABLE (G BIGINT PRIMARY KEY);

PRINT 'EXACT APPROXIMAION (DYNAMIC)'+CHAR(13)+REPLICATE('-',90);
INSERT  @gaps VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(15),(20),(30);

WHILE EXISTS (SELECT * FROM @gaps)
BEGIN
  SELECT TOP(1) @G = g.G FROM @gaps AS g ORDER BY G;
  DELETE FROM @gaps WHERE g=@G;

  SELECT @ST=GETDATE();  -- Start the test
    DECLARE @Low  BIGINT = dbo.stringSearchV1(@C,@S,@L,@H,@G); -- Approximation
    DECLARE @High BIGINT = @Low+@G-1;

    SELECT dbo.stringSearchV1(@C,@S,@Low,@High,1); -- @G=1: Exact

  PRINT CONCAT(@G,': ',DATEDIFF(MS,@st,GETDATE())); -- Print the results
END;

Dynamic Algorithm Performance Test Results:

With the final, dynamic algorithm it appears that 4 and 6 were the best for this specific problem. We were able to determine this by running a test against a single string while trying out different values for our gap parameter (@G). In Part 2 I will show you how to automate the selection of the best value for @G so as to transform our function from one we’ve tuned manually into one that tunes itself. See you there!

References

https://www.geeksforgeeks.org/np-completeness-set-1/