Solving NP-Complete Problems Faster Part 1 – Sequence Reduction

Disclaimer

This article is a little more advanced than previous ones but will still be useful to programmers of all skill levels. To get the most from this article I suggest running the attached code locally, then reading what I have to say about it. I rushed to get this out so there may be typos. I’ll let my code do a lot of the talking.

Functions used in this article:

Introduction

In the my very first article I mentioned that I was going to use leverage the techniques detailed in this blog to present a solution to the Longest Common Subsequence (LCSQ) between three strings which is an NP-Complete [1] problem. In April 2013 I posted an article titled, A Set-Based Solution to the Longest Common Subsequence Problem. It was my first attempt at solving this problem using only a lazy sequence (AKA tally table or numbers table) exclusively, meaning no loops, cursors, temp tables and all other procedural/iterative techniques. The article was my most popular article even though my solution was wrong as a reader kindly pointed out. Instead of taking the article down I tried to develop a correct purely set-based LCSQ function and update the article once I did. I tried and tried unsuccessfully with no avail. This was infuriating because, though not easy, the problem seemed simple enough to solve with 10-20 hours of effort, 30 tops. For two years, I would periodically take a swing at it as it was a such an intriguing problem. This is when/how I learned about NP Complexity, Complexity Theory and the unsolved math problem known simply as P vs NP.

Long story short, if you can come up with an algorithm that solves any NP-Complete problem in deterministic polynomial time, you have proven the P=NP and, in theory, can be awarded $1,000,000. Alternatively, if you can prove that no such algorithm is possible, that P!=NP you will also get paid. Most people who study this problem, roughly 99%, believe P != NP. I am in the tiny minority that believes otherwise, in part, for the reasons discussed here (6:30 minutes in.)

25 years of programming has taught me is that the tools that our Math Teachers and Professors have given us are not adequate to solve complex problems quickly, only quick enough. The purpose of this entire blog is to introduce new tools that empower developers to solve any programming problem faster, including NP-Hard and NP-Complete ones. This, in almost any programming language. The code here is in T-SQL because it’s what I know best, but I intend to re-write the code in this series using Python and JavaScript.

What’s ahead

In this installment we’re going to:

  1. Use a Longest Common Subsequence (LCSQ) function for some basic AI as we determine what someone meant to say.
  2. Learn how to speed up any LCSQ function (two or more strings), written in any programming language, without changing any code, leveraging what I have coined: Sequence Reduction.
  3. Introduce a set-based solution to the LCSQ between two strings (LCSQ2) and three strings (LCSQ3)

Unless someone can demonstrate otherwise, I will aver the LCSQ3 function posted below is the first purely set-based, immutable, loop and recursion free, leveraging nothing more than a lazy sequence and some new ideas. Better yet, our set-based function is going to give you:

  1. A deterministic result
  2. The length and text of the longest common subsequence
  3. One row per subsequence when there are ties
  4. The location where the subsequence(s) starts, ends and it’s pattern
  5. A tuning parameter (discussed here) that allows you to transform the function into an approximation or heuristic, or a combination of the three for a much, much faster, deterministic result

Let’s do a quick demo:

DECLARE @S1  VARCHAR(8000) = 'XA.BCD123',
        @S2  VARCHAR(8000) = 'ABC-D-12Z3',
        @S3  VARCHAR(8000) = '12DABCYP123';

SELECT lcsq3.* FROM samd.LCSQ3_AJB_POC(@S1,@S2,@S3,1) AS lcsq3; --==== Longest Common Subsequence – three strings
Results:
LCSQ3 Results

The compressed length, 6 in this case, represents the length of the longest common subsequence between the strings, XA.BCD123, ABC-D-123Z and 12DABCYP123. S1_SfxIdx is where the subsequence begins, and the compression key represents the pattern of the sequence. Consider this table which will help you visualize how the compression key works:

The function’s name ends with _POC, for Proof of Concept. I wanted to show that it can be done. In Part2 we’ll dive deep into how to make our LCSQ functions even faster leveraging techniques I’ve covered in earlier articles including finite opposite numbers, indexed functions, combinatorial index optimization and much more. Throughout this series we’ll learn how to make your function faster the more CPUs you throw at it. Better yet, we’re going to learn how to teach your LCSQ3 function to teach itself to solve problems faster. From here on in, I intend to talk as little as possible and let my portable declarative code do my talking for me.

Playing along at home

Attached in the intro is all the code used in this series. The are 31 functions total which you’ll become more familiar with as this series progresses. There’s~1800 lines of code but 90% of it is inline documentation. Please feel free to load it locally and play mad data scientist. Remember, however, to give credit where credit where credit is due.

The code is formatted in a way that you can go to Edit > Outlining > Toggle all Outlining to collapse the code for a high-level view of what’s included.

Using the Longest Common Subsequence for some basic AI

The challenge

Black Mirror is my favorite show of all time. In the White Christmas episode we learned about cookies.  The Irony here that we’re using N-Grams to analyze the names of characters whose digital consciousness was transferred to “the cloud” using a recaller to extract and upload their memory ngrams. I digress.

We have three names in a table named @cookie containing an ID and Name of three Black Mirror characters. The second table, @unmatched, are names of each character spelled incorrectly. We need to match the misspelled name with the best guess for what they were trying to spell.

Cookie Sample data

DECLARE @cookie TABLE (CookieId INT IDENTITY, CookieName VARCHAR(20));
INSERT  @cookie VALUES('Joe Porter'),('Mr. Matt Trent'),('Greta Cookie');

DECLARE @unmatched TABLE (Unmatched VARCHAR(30));
INSERT  @unmatched VALUES('Mr. Matthew Trent'),('Gretta Cookie'),('Joey Porter');

Get the similarity between all possible combinations

First a basic cross join (cartesian product) to get the LCSQ between each possible combination. We are using Phil Factor’s LCSQ function to extract the LCSQ between each pair of names.

--==== How to get the similarity between all strings
SELECT      
  UnmatchedName = u.Unmatched, 	
  QualifiedName = c.CookieName,
  LCSq          = dbo.LongestCommonSubsequence(c.CookieName,u.Unmatched)
FROM        @cookie    AS c
CROSS JOIN  @unmatched AS u;
Results:
UnmatchedName       QualifiedName        LCSq
------------------- -------------------- -----------------
Mr. Matthew Trent   Joe Porter           e rt
Mr. Matthew Trent   Mr. Matt Trent       Mr. Matt Trent
Mr. Matthew Trent   Greta Cookie         rete
Gretta Cookie       Joe Porter           e oe
Gretta Cookie       Mr. Matt Trent       rtt e
Gretta Cookie       Greta Cookie         Greta Cookie
Joey Porter         Joe Porter           Joe Porter
Joey Porter         Mr. Matt Trent       rtr
Joey Porter         Greta Cookie         e oe

Next, we can use our Bernie8K function to calculate the similarity between the two strings dividing LCSQ by L2.

--== 1.1.2. How to get the similarity between all strings
SELECT
  UnmatchedName = u.Unmatched, 
  QualifiedName = c.CookieName,
  LCSq          = lcsq.Txt,
  LCSqLength    = Txt.Ln,
  Similarity    = CAST(1.*Txt.Ln/b.L2 AS DECIMAL(3,3))
FROM        @cookie                                           AS c
CROSS JOIN  @unmatched                                        AS u
CROSS APPLY samd.bernie8k(c.CookieName,u.Unmatched)           AS b
CROSS APPLY (VALUES(dbo.LongestCommonSubsequence(b.S1,b.S2))) AS lcsq(Txt)
CROSS APPLY (VALUES(LEN(lcsq.Txt)))                           AS Txt(Ln);
Results:
UnmatchedName        QualifiedName        LCSq               LCSqLength  Similarity
-------------------- -------------------- ------------------ ----------- -----------
Mr. Matthew Trent    Joe Porter           e rt               4           0.235
Mr. Matthew Trent    Mr. Matt Trent       Mr. Matt Trent     14          0.824
Mr. Matthew Trent    Greta Cookie         rete               4           0.235
Gretta Cookie        Joe Porter           e oe               4           0.308
Gretta Cookie        Mr. Matt Trent       rtt e              5           0.357
Gretta Cookie        Greta Cookie         Greta Cookie       12          0.923
Joey Porter          Joe Porter           Joe Porter         10          0.909	
Joey Porter          Mr. Matt Trent        rt                3           0.214
Joey Porter          Greta Cookie         e oe               4           0.333

Now let’s get the best suggestion for each misspelled name. For this we’ll wrap our logic in a subquery then, using TOP(1) WITH ties, we can get the best suggestion for each with this ORDER BY clause.

ORDER BY ROW_NUMBER()
OVER (PARTITION BY similarity.Unmatched ORDER BY similarity.Sim DESC);
--== 1.1.3. Get the best suggestion
SELECT TOP(1) WITH TIES similarity.*
FROM
(
  SELECT
    Unmatched  = u.Unmatched, 
    Qualified  = c.CookieName,
    LCSq       = lcsq.Txt,
    LCSqLength = Txt.Ln,
    Sim        = CAST(1.*Txt.Ln/b.L2 AS DECIMAL(3,3))
  FROM        @cookie                                           AS c
  CROSS JOIN  @unmatched                                        AS u
  CROSS APPLY samd.bernie8k(c.CookieName,u.Unmatched)           AS b
  CROSS APPLY (VALUES(dbo.LongestCommonSubsequence(b.S1,b.S2))) AS lcsq(Txt)
  CROSS APPLY (VALUES(LEN(lcsq.Txt)))                           AS Txt(Ln)
) AS Similarity  
ORDER BY ROW_NUMBER() 
 OVER (PARTITION BY similarity.Unmatched ORDER BY similarity.Sim DESC);
Results:
UnmatchedName      QualifiedName    LCSq            LCSqLength  Similarity
------------------ ---------------- --------------- ----------- ----------
Mr. Matthew Trent  Mr. Matt Trent   Mr. Matt Trent  14          0.824
Gretta Cookie      Greta Cookie     Greta Cookie    12          0.923
Joey Porter        Joe Porter       Joe Porter      10          0.909

And there you have it! We just wrote a query that figured out (presumably) what the name the user was intending to type. Hopefully this demystifies what’s going on behind the scenes with stuff like spell-check and auto-suggestion algorithms.

Sequence Reduction

As mentioned earlier, LCSQ2 is NP-Hard and LCSQ3 is NP-Complete. We’ll use sequence reduction against two strings so that you can calculate similarity faster, then against three strings to fulfil my promise to demonstrate how to “solve problems in NP without any code changes.”

First the LCSQ between two strings. The longest common subsequence between the strings: “AWWWDABGAGWAWADADAXAAWA” and “BBAXZZXXXBZBGZGZZZZGGGZ” is ABGG. I colored the first instance of ABGG in red for each string. Leveraging sequence cleaning we can reduce the characters in the set and get the correct answer by comparing AABGAGAAAAXAAA & BBAXXXXBBGGGGG. Leveraging sequence compression, we get the correct answer by comparing AABGAGAAAAXAA & BAXXBBGGGG. Sequence reduction allows us to get the correct answer by comparing ABGAGAXA & BAXBGG.

This chart will help you visualize the savings we enjoy for each technique.

The original pair of strings are each 23 characters long, the pair of strings transformed via sequence reduction are 8 and 6 characters long. To be clear, in this example, it’s possible to calculate the longest common subsequence between two 23 character strings by shrinking the size of the input for a dramatic increase in better performance while still guaranteeing the  correct answer. Furthermore, since we’re only using functional pure, set-based code – our sequence reductions functions will run in nasty fast time (0MS.)

In the example above we can reduce the original input, which totals 46 characters, to 14 characters leveraging sequence reduction. Here, we’ve reduced the complexity of LCSQ2 from O(246) to O(214). That’s good news for any exact algorithm, approximation, or heuristic. Now Let’s review how to clean, compress, and reduce a set of two or more strings.

1. Sequence Cleaning

Sequence cleaning is performed by removing characters that exist in one string but not the other (orphans). Predicate logic dictates that characters that exist in only one string cannot be part of any common subsequence. Applying this concept to our strings, these:

Become:

The W’s, D’s and Z’s have been removed. Simple enough, eh?

2. Sequence Compression

Taking this concept to the next level, let’s say we string1 contains only one “A” and string2 contains three. If “A” is part of the LCSQ, and only one “A” exists in string1, then only one of the three A’s in string2 will be matched. It’s not possible to determine which “A” will be included in the common subsequence but there is a way to determine, in some instances, which ones can be removed. Enter Sequence Compression.

Consider the LCSQ between “ABC“ and each of these: “ABAAC”, “ABBBCA”, “ABCCDC”. “ABC” contains only one A, one B and one C. Therefore any consecutive series of A’s, B’s and/or C’s can be compressed to one. Compare the before and after for these three “sequence compressed” strings.

Sequence compression can be applied whenever on of the strings in the set has more of any specific character than the other strings in the set. When comparing ABCB to ABBBBC we can reduce the second string to ABBC. There are only two B’s in the first so BBBB can become BB. Think of sequence compression as an advanced form of sequence cleaning. Applying this concept to our strings, these:

…are transformed into these:

Did you notice that the letter D was removed (cleaned) from the third string. Ther letter D is an orphan and is compressed to 0. With sequence compression, sequence cleaning occurs organically. With sequence compression you are both cleaning orphan characters and compressing islands of repeated characters (when possible).

So, with sequence compression we’re cleaning and compressing strings at the same time. That’s a good thing but it’s also a bad thing. What we really want is to clean then compress our strings! Why? Because cleaning leads to better compression. Enter Sequence Reduction.

3. Clean then Compression = Sequence Reduction

Sequence Reduction is performed by cleaning your string first, then performing sequence compression. Consider these queries:

DECLARE @S1 VARCHAR(8000) = 'AWWWDABGAGWAWADADAXAAWA',
        @S2 VARCHAR(8000) = 'BBAXZZXXXBZBGZGZZZZGGGZ',
        @A  VARCHAR(256)  = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';

;--==== No Reduction
SELECT S1 = @S1, S2 = @S2, L1 = LEN(@S1), L2 = LEN(@S2), 
       LCSS = dbo.LongestCommonSubsequence(@S1,@S2);

;--==== Cleaning Only
SELECT sc.S1, sc.S2, L1 = LEN(sc.S1), L2 = LEN(sc.S2),
       LCSS = dbo.LongestCommonSubsequence(sc.S1,sc.S2)
FROM   samd.SequenceClean8KLight(@S1,@S2) AS sc;-- AABGAGAAAAXAAA BBAXXXXBBGGGGG


;--==== Cleaning AND Compression
SELECT sr.S1_New, sr.S2_New, L1 = LEN(sr.S2_New), L2 = LEN(sr.S2_New), 
       LCSS = dbo.LongestCommonSubsequence(sr.S1_New,sr.S2_New)
FROM   samd.SequenceCompressAll8K_NG(@S1,@S2,@A,0) AS sr; -- AABGAGAAAAXAA BAXXBBGGGG

;--==== Clean THEN Compress (Proper Sequence Reduction: Cleaning THEN Compression) 
SELECT sr.S1_New, sr.S2_New, L1 = LEN(sr.S2_New), L2 = LEN(sr.S2_New), 
       LCSS = dbo.LongestCommonSubsequence(sr.S1_New,sr.S2_New)
FROM samd.SequenceReduce8K_NG(@S1,@S2,@A,0) AS sr --ABGAGAXA BAXBGG

The chart below details which function is responsible for which transformation.

In this article I will focus on sequence cleaning for brevity and explore sequence cleaning in Part2, sequence reduction in Part3.

Sequence Cleaning with samd.SequenceClean8KLight

samd.SequenceClean8KLight is simple, three simple functions:

  1. samd.SequenceGroup8K
  2. samd.SequenceExtract8K
  3. samd.PatReplace8K

samd.SequenceGroup8K, our sequence grouper, identifies characters that exist in one string but not the other. samd.SequenceExtract8K returns a distinct set of unmatched characters as a string. That string is then passed to PatReplace8K and all unmatched characters are removed from both strings.

Sequence Grouping is the process of returning all characters in two or more strings along with a count of each character for each string compared. This is how we can determine why two strings are different. Consider this query:

SELECT sg.* FROM samd.SequenceGroup8K('AABDE23', 'ACC123X') AS sg;
Returns:

The first string (S1) contains one “B” and the seconds string (S2) doesn’t contain any. The letter “C” appears twice in S2 but not at all in S1. Any character that has a 0 count under S1 or S2 can be removed.

Sequence Extraction is the process of extracting a distinct set of common (matched) characters, e.g. a character that appears in both strings) or a distinct set of unmatched characters. The results are returned in the form of a string for consumption by samd.Patexclude8K.Consider these two queries:

DECLARE @S1 VARCHAR(8000) = 'AABDE23',
        @S2 VARCHAR(8000) = 'ACC123X';

SELECT se.* FROM samd.SequenceExtract8K(@S1,@S2,0) AS se; -- Returns: 1BCDEX
SELECT se.* FROM samd.SequenceExtract8K(@S1,@S2,1) AS se; -- Returns: 23A

The first returns the orphans: 1BCDEX; the second query returns the common or matched characters: 23A. The matched characters represent your alphabet; we’ll discuss this later in the series.

Sequence Cleaning, as mentioned earlier is the process of removing unmatched characters from both strings, a vital step in performing Sequence Reduction. For this article I’ll be using samd.SequenceClean8KLight. It is the barebones version of a more advanced sequence cleaning function. Let’s clean @S1 and @S2.

DECLARE @S1 VARCHAR(8000) = 'AABDE23',
        @S2 VARCHAR(8000) = 'ACC123X';

SELECT S1_original = sc.S1,
       S2_original = sc.S2,
       S1_New      = sc.S1,
       S1_New      = sc.S2
FROM   samd.SequenceClean8KLight(@S1,@S2) AS sc;

Next for underlying programming logic behind each function.

Set-Based Grouping, Extraction and Cleaning

Set-based, functionally pure code is vital. Loops, recursion, mutable variables/data structures, and impure functions will ruin everything. NGrams8k has your back.

1. Sequence Grouping via samd.SequenceGroup8K

For this article we’ll use samd.SequenceGroup8K. It’s nasty fast not the fastest; we’ll introduce a much better sequence grouper later in this series. I started with this version as it’s the easiest to understand.

DECLARE @S1 VARCHAR(8000) = 'AABDE23',
        @S2 VARCHAR(8000) = 'ACC123X';

SELECT 
  Token  = ISNULL(S1.Token,S2.Token),
  S1     = f.S1,
  S2     = f.S2,	
  Total  = ABS(f.S1-f.S2) -- Diffs
FROM
( --==== Split @S1 into unigrams
  SELECT   ng.Token, Total = COUNT(*)
  FROM     samd.ngrams8k(@S1,1) AS ng
  GROUP BY ng.Token --==== Group and count @S1 unigrams
) AS S1
FULL JOIN
( --==== Split @S2 into unigrams	
  SELECT   ng.Token, Total = COUNT(*)
  FROM     samd.ngrams8k(@S2,1) AS ng
  GROUP BY ng.Token      --==== Group and count @S2 unigrams
) AS S2 
  ON S1.Token = S2.Token --==== Find matching unigrams
CROSS APPLY (VALUES(ISNULL(S1.Total,0),ISNULL(S2.Total,0))) AS f(S1,S2); --==== Pre-Agg

The logic is simple provided you are comfortable with full outer joins. We can get a count of each character with a simple GROUP BY, then joining on the character returning a 0 when there isn’t a match. If we wanted to perform this operation on three or more strings, we would simply add another subquery per additional string. Our ISNULL(S1.Token,S2.Token) will be replaced with COALESCE(S1.Token,S2.Token, S3.Token…) We will do this later while performing sequence cleaning on three strings.

2. Sequence Extraction Logic

Now the sequence extractor. This guy takes two input strings, performs a sequence grouping, then leverages STRING_AGG to return the matched or unmatched characters as a string.

Logic:
SELECT Pattern = STRING_AGG(sg.Token,'') -- Order doesnt matter; no need for WITHIN GROUP
FROM   samd.SequenceGroup8K(@S1,@S2) AS sg
WHERE  SIGN(sg.S1*sg.S2) = 0; -- 0 for unmatched, 1 for matched

The result set for these queries will help you better understand samd.SequenceExtract8K.

DECLARE @S1 VARCHAR(8000) = 'ABC123AABBCC##',
          @S2 VARCHAR(8000) = 'ABCXXYYYZZ123CC';

SELECT sg.* FROM samd.SequenceGroup8K(@S1,@S2) AS sg;

SELECT se.Pattern FROM samd.SequenceExtract8K(@S1,@S2,0) AS se;
SELECT se.Pattern FROM samd.SequenceExtract8K(@S1,@S2,1) AS se;
Results:

3. Sequence Cleaning Logic

The most straight-forward way to accomplish this is to use SeqeunceExtract8k to extract a distinct string of unmatched characters and pass them to PatReplace8K as a pattern of characters to replace with nothing (remove).

3.1. PatReplace8K version logic

SELECT
  S1 = S1.NewString,
  S2 = S2.NewString
FROM        samd.SequenceExtract8K(@S1,@S2,0)                    AS se
CROSS APPLY samd.patReplace8K(@S1,STUFF('[]',2,0,se.Pattern),'') AS S1
CROSS APPLY samd.patReplace8K(@S2,STUFF('[]',2,0,se.Pattern),'') AS S2;

3.2. TRANSLATE version logic

Included with the attached code is a T-SQL TRANSLATE version of SequenceReduce8KLight name SequenceReduce8KLight_T2; If you are on SQL 2017+ you may want to try it out; performs a wee-bit better than the PatReplace8K version.

SELECT
  S1 = S1.NewString,
  S2 = S2.NewString
FROM        samd.SequenceExtract8K(@S1,@S2,0)                             AS se
CROSS APPLY (VALUES(REPLICATE(CHAR(2),LEN(se.Pattern))))                  AS t(Pat)
CROSS APPLY (VALUES(REPLACE(TRANSLATE(@S1,se.Pattern,t.Pat),CHAR(2),''))) AS S1(NewString)
CROSS APPLY (VALUES(REPLACE(TRANSLATE(@S2,se.Pattern,t.Pat),CHAR(2),''))) AS S2(NewString);

Applying sequence cleaning to our Black Mirror cookie example

Below is the original logic without cleaning, the second example demonstrates how to clean your strings before they are passed to the LCSQ.

Original Logic

SELECT TOP(1) WITH TIES similarity.*
FROM
(
  SELECT
    Unmatched  = u.Unmatched, 
    Qualified  = c.CookieName,
    LCSq       = lcsq.Txt,
    LCSqLength = Txt.Ln,
    Sim        = CAST(1.*Txt.Ln/b.L2 AS DECIMAL(3,3))
  FROM        @cookie                                           AS c
  CROSS JOIN  @unmatched                                        AS u
  CROSS APPLY samd.bernie8k(c.CookieName,u.Unmatched)           AS b
  CROSS APPLY (VALUES(dbo.LongestCommonSubsequence(b.S1,b.S2))) AS lcsq(Txt)
  CROSS APPLY (VALUES(LEN(lcsq.Txt)))                           AS Txt(Ln)
) AS Similarity  
ORDER BY ROW_NUMBER() 
 OVER (PARTITION BY similarity.Unmatched ORDER BY similarity.Sim DESC);

Sequence cleaning added

SELECT TOP(1) WITH TIES similarity.*
FROM
(
  SELECT
    Unmatched  = u.Unmatched, 
    Qualified  = c.CookieName,
    LCSq       = lcsq.Txt,
    LCSqLength = Txt.Ln,
    Sim        = CAST(1.*Txt.Ln/b.L2 AS DECIMAL(3,3))
  FROM        @cookie                                           AS c
  CROSS JOIN  @unmatched                                        AS u
  CROSS APPLY samd.bernie8k(c.CookieName,u.Unmatched)           AS b
  CROSS APPLY samd.SequenceClean8KLight(b.S1,b.S2)              AS r
  CROSS APPLY (VALUES(dbo.LongestCommonSubsequence(r.S1,r.S2))) AS lcsq(Txt)
  CROSS APPLY (VALUES(LEN(lcsq.Txt)))                           AS Txt(Ln)
) AS Similarity  
ORDER BY ROW_NUMBER() 
 OVER (PARTITION BY similarity.Unmatched ORDER BY similarity.Sim DESC);

Both return the same thing, the second example is done by comparing less characters. This query demonstrates the savings from sequence cleaning only:

DECLARE @cookie TABLE (CookieId INT IDENTITY, CookieName VARCHAR(20));
INSERT  @cookie VALUES('Joe Porter'),('Matthew Trent'),('Greta Cookie');

DECLARE @unmatched TABLE (Unmatched VARCHAR(30));
INSERT  @unmatched VALUES('Matthew Trint'),('Gretta Cookie'),('Gretta'),('Joey Porter');

SELECT 
  S1  = c.CookieName,
  S2  = u.Unmatched, 
  New = r.S1,
  New = r.S2,
  OriginalLen = LEN(c.CookieName+u.Unmatched),
  NewLen      = LEN(r.S1+r.S2),
  Savings     = LEN(c.CookieName+u.Unmatched)-LEN(r.S1+r.S2)
FROM        @cookie    AS c
CROSS JOIN  @unmatched AS u
CROSS APPLY samd.SequenceClean8KLight(c.CookieName,u.Unmatched) AS r;
Returns:
S1            S2             New        New           OriginalLen NewLen  Savings
------------- -------------- ---------- ------------- ----------- ------- --------
Joe Porter    Matthew Trint  e rter     tte Trt       23          13      10
Joe Porter    Gretta Cookie  oe orte    rett ooe      23          16      7
Joe Porter    Gretta         erter      rett          16          9       7
Joe Porter    Joey Porter    Joe Porter Joe Porter    21          20      1
Matthew Trent Matthew Trint  Matthew .. Matthew Trnt  26          25      1
Matthew Trent Gretta Cookie  atte Tret  retta e       26          16      10
Matthew Trent Gretta         atteTret   retta         19          13      6
Matthew Trent Joey Porter    tte Tret   e rter        24          14      10
Greta Cookie  Matthew Trint  reta ie    atte Trit     25          16      9
Greta Cookie  Gretta Cookie  Greta Co.. Gretta Cookie 25          25      0
Greta Cookie  Gretta         Gretae     Gretta        18          12      6
Greta Cookie  Joey Porter    ret ooe    oe orter      23          15      8

That’s 75 fewer characters leveraging sequence cleaning. Score!

A Set Based Solution to the Longest Common Subsequence (Three Strings)

I’ll exclude my two-string LCSQ function for now but included it with the attached code. We’ll dive deeper into the inner workings of both LCSQ functions in Part2; samd.LCSQ3_AJB_POC is doing a quick cameo for the purposes of demonstrating the power of sequence cleaning against three strings (and that you can solve basically any problem without a loop).

What’s worth calling out now is how both LCSQ functions accept a gap parameter (@gap). Yes, we’re using a brute-force algorithm but we’re doing so without the need to evaluate every subsequence. Both functions leverage the design pattern detailed in PerfML Part 2: Self-Tuning with a Tally Table. This design pattern allows you to solve LCSQ2 in 2(N/G+G-1)time and LCSQ3 in 3(N/G+G-1) time; something we’ll also dig deeper into in Part2. All you need to know for now is that our LCSQ3 is set-based, relatively fast (for all of its functionality), runs faster with a parallel execution plan, and deterministic. The function name ends with “_POC” as it’s my proof-of-concept. The focus here is more about learning than tuning.

The Function

samd.LCSQ3_AJB_POC accepts three input strings and calculates the LCSQ between the three. samd.Bernie8K_3S is used to identify the shortest of the three strings as L1. L1 represents the longest possible subsequence between the three strings. samd.SubSequenceExtractALL looks like it was named by Yoda but works as advertised. It takes three parameters: an input string (@string), a gap parameter (@gap) and maximum subsequence length (@SL). When @gap=1, the function extracts every subsequence size 1 through @SL. When @gap=2 the function returns every subsequence sizes 1,3,5,etc though @SL. We use using L1 as maximum subsequence length (@SL) when splitting our strings into since there aren’t going to be any common subsequences longer than the shortest string in the set.

All three strings are passed to samd.SubSequenceExtractALL as separate calls and only subsequences that exist in all three strings are returned. Then we sort by the length of the matched subsequences in descending order returning the longest match. Leveraging TOP(1) WITH TIES sorted by the length of the subsequence we are able to return ties if they exist. The @gap parameter allows you to use the function for approximations. When @gap = 1, and returns a six character subsequence, you know that the LCSQ is six characters long. When @gap = 2 the function is 2X as fast can be off by one; in this case, for example, if the function returns a six-character string you know that the LCSQ is 6-7 characters long. When @gap = 3, the function is 3X faster but can be off by as much as two; if it says the LCSQ is six characters long, you know that it might by 6, 7 or 8. A classic case of trading performance for accuracy.

CREATE OR ALTER FUNCTION samd.LCSQ3_AJB_POC
(
  @S1  VARCHAR(20),
  @S2  VARCHAR(20),
  @S3  VARCHAR(20),
  @gap BIGINT
)
RETURNS TABLE AS RETURN
SELECT DISTINCT TOP(1) WITH TIES
  CompressedLength    = s1.CompressedLength,
  S1_SfxIdx           = s1.SuffixIndex,
  S1_Compression_key  = s1.CompressionKey,
  S1_Uncompressed_Len = s1.UncompressedLength,
  S1_Subsequence      = s1.Subsequence,
  S2_SfxIdx           = s2.SuffixIndex,
  S2_Compression_key  = s2.CompressionKey,
  S2_Uncompressed_Len = s2.UncompressedLength,
  S2_Subsequence      = s2.Subsequence,
  S3_SfxIdx           = s3.SuffixIndex,
  S3_Compression_key  = s3.CompressionKey,
  S3_Uncompressed_Len = s3.UncompressedLength,
  S3_Subsequence      = s3.Subsequence
FROM        samd.Bernie8K_3S(@S1,@S2,@S3)              AS b
CROSS APPLY samd.SubSequenceExtractALL(b.S1,@gap,b.L1) AS s1
CROSS APPLY samd.SubSequenceExtractALL(b.S2,@gap,b.L1) AS s2
CROSS APPLY samd.SubSequenceExtractALL(b.S3,@gap,b.L1) AS s3
WHERE       s1.Subsequence      = s2.Subsequence
  AND       s1.Subsequence      = s3.Subsequence
ORDER BY    s1.CompressedLength DESC;

Again, we’ll keep making this guy faster and faster throughout this series until you’re sick of it. Stay tuned.

A Three String Circus (Performance Test)

Unlike problems in P, with problems in NP, even reducing the input size (N) by one is a big gain. Let’s create a test harness with our example from earlier to demonstrate. Here we’re going to get the IO stats and CPU time for:

  • The cleaning function only (SequenceReduce3s8KLight) which cleans three strings
  • samd.LCSQ3_AJB_POC without any reduction
  • samd.LCSQ3_AJB_POC with sequence cleaning ran three times, once for @gap=1, once for @gap=2 and once for @gap=3.

LCSQ3 Test Harness

DECLARE @S1  VARCHAR(8000) = 'XA.BCD123',
        @S2  VARCHAR(8000) = 'ABC-D-12Z3',
        @S3  VARCHAR(8000) = '12DABCYP123';

SET STATISTICS IO, TIME ON;
PRINT CHAR(10)+'Cleaning Only'+CHAR(10)+REPLICATE('-',90);
 SELECT sr3.* FROM  samd.SequenceReduce3s8KLight(@S1,@S2,@S3) AS sr3;

PRINT CHAR(10)+'LCSQ3 CORE'+CHAR(10)+REPLICATE('-',90);
 SELECT lcsq3.*  FROM samd.LCSQ3_AJB_POC(@S1,@S2,@S3,1) AS lcsq3;

PRINT CHAR(10)+'Reduce Light serial @gap = 1'+CHAR(10)+REPLICATE('-',90);
 SELECT      lcsq3.* 
 FROM        samd.SequenceReduce3s8KLight(@S1,@S2,@S3)  AS sr3
 CROSS APPLY samd.LCSQ3_AJB_POC(sr3.S1,sr3.S2,sr3.S3,1) AS lcsq3
 OPTION (MAXDOP 1); -- default optimizer option

PRINT CHAR(10)+'Reduce Light parallel @gap = 1'+CHAR(10)+REPLICATE('-',90);
 SELECT      lcsq3.* 
 FROM        samd.SequenceReduce3s8KLight(@S1,@S2,@S3)  AS sr3
 CROSS APPLY samd.LCSQ3_AJB_POC(sr3.S1,sr3.S2,sr3.S3,1) AS lcsq3
 CROSS APPLY core.make_parallel();

PRINT CHAR(10)+'Reduce Light parallel @gap = 2'+CHAR(10)+REPLICATE('-',90);
 SELECT      lcsq3.* 
 FROM        samd.SequenceReduce3s8KLight(@S1,@S2,@S3)  AS sr3
 CROSS APPLY samd.LCSQ3_AJB_POC(sr3.S1,sr3.S2,sr3.S3,2) AS lcsq3
 CROSS APPLY core.make_parallel();

PRINT CHAR(10)+'Reduce Light parallel @gap = 3'+CHAR(10)+REPLICATE('-',90);
  SELECT      lcsq3.* 
  FROM        samd.SequenceReduce3s8KLight(@S1,@S2,@S3)  AS sr3
  CROSS APPLY samd.LCSQ3_AJB_POC(sr3.S1,sr3.S2,sr3.S3,3) AS lcsq3
  CROSS APPLY core.make_parallel();
GO
SET STATISTICS IO, TIME OFF;

Results:

Reduction Only
------------------------------------------------------------------------------------------
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
 SQL Server Execution Times: CPU time = 0 ms,  elapsed time = 1 ms.

LCSQ3 CORE
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 167, logical reads 534191, physical reads 0, page server reads 0, read-ahead reads 187, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
 SQL Server Execution Times: CPU time = 2079 ms,  elapsed time = 1840 ms.

LCSQ3 + Reduce Light serial @gap = 1
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 137, logical reads 210268, physical reads 0, page server reads 0, read-ahead reads 23, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
 SQL Server Execution Times: CPU time = 750 ms,  elapsed time = 769 ms.

Reduce Light parallel @gap = 1
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 129, logical reads 111171, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
 SQL Server Execution Times: CPU time = 515 ms,  elapsed time = 524 ms.

Reduce Light parallel @gap = 2
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 81, logical reads 61921, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
 SQL Server Execution Times: CPU time = 219 ms,  elapsed time = 210 ms.

Reduce Light parallel @gap = 3
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 65, logical reads 45475, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
 SQL Server Execution Times: CPU time = 234 ms,  elapsed time = 234 ms.

Completion time: 2021-08-10T20:54:09.6885549-05:00

Reviewing the results, the key take-aways are:

  • The cleaning process is instantaneous – 0MS
  • The core LCSQ3 version without any reduction is the slowest; even with a parallel plan the runtime is 1.84 seconds.
  • Adding sequence cleaning speeds the time up to 0.75 seconds with a parallel plan, ~0.5 seconds when parallel
  • Sequence cleaning reduced the reads from 534,191 to 210,268 with a serial execution plan, 111,171 with a parallel plan
  • Setting @gap=2 gets us down to 0.2 seconds with a margin of error of 1
  • Setting @gap=3 actually slows things down it pinch, back to 0.23 seconds and a margin of error of 2
  • The @gap parameter allows for informed accuracy vs performance decisions

Conclusion                                                     

I think we’ve covered a lot of ground. We learned (or been reminded that):

  1. Pretty much any complex problem can be solved without loops, iterative recursion, mutable objects or any other procedural, iterative programming logic. Mark another one up for the tally table!
  2. You can speed up the Longest Common Subsequence, in any language, without changing any code, simply by reducing the size of the input.
  3. Leveraging pure set-based code while maintaining functional purity, we can make our function even faster when forcing a parallel execution plan.
  4. With RangeAB we can leverage the @gap parameter to turn our LCSQ2 and LCSQ3 functions into approximations.

Lastly, I want to send a quick shout-out to Sean Lange. Sean’s always been one of my unofficial internet SQL mentors and I’ve learned a ton from him over the last decade plus. A couple years ago I posted an answer on StackOverflow where I introduced some of these ideas. Sean’s short response lit a fire under my butt and deserves some of the credit.

I hope to have stimulated a few extra synapses out there. In Part2 we’re going to dive deeper into the world of sequence compression along with how to turbo charge the socks off our LCSQ2 and LCSQ3 functions.

Thanks for reading and happy hunting!

Virtual Indexing Part 8: The Indexed Splitter Function

The fastest way to split a string is to have already done so.

Introduction

Happy 2021! Indexed views and tally tables – a virtually unexplored topic, one I will write about at length this year. This will be a quick read and there is likely to be at least a couple typos; time is short so today is about quality, not quantity. In Part 3: The Indexed Function I introduced you to I said “the fastest way to perform a calculation is to have already done so;” in this installment I will demonstrate that the fastest way to split a string is also to have already done so.

When I introduced the Indexed function I used prime numbers as my example. 7 is always a prime number, Factorial(5) always equals 120. Creating an Indexed prime numbers, factorial or any function where a function always returns the same output for a given argument. What if we wanted a more complex indexed function, such as one that splits a string?

I know about database normalization but I am not responsible for designing most of the databases I work with. If you’ve been in this racket for a couple fortnights you have run across that vender DB written by great application developers with terrible back-end databases. It’s not uncommon to see fields that contain a delimited list – ‘1,4,13,99’ instead of a single atomic value. You can write an ETL package that leverages a good string splitter (such as DelimitedSplit8K or DelimitedSplit8k_Lead) to regularly split and copy the values into a correctly normalized table. In this article I will show you a much faster, maintenance-free alternative which leverages a Numbers Table and an indexed view.

Introducing The “Indexed Splitter”

For this you will need a tally table with at least as many rows as the longest string you intend to split. 8,000 is enough unless you need to split string longer than 8K. My tally table (dbo.tally) has 1,000,000 rows. First the sample data.

Sample data

CREATE TABLE dbo.Test (ID INT, Column1 VARCHAR(20));
INSERT INTO  dbo.Test (ID,Column1) 
  VALUES (1,'ORM;PS;SUP'),(2,'ABC;XYZ;999;123');

Next we’re going to borrow Jeff Moden’s DelimitedSplit8K logic to create a set-based splitter that leverages permanent tally table (e.g. an regular ol properly indexed numbers table named dbo.tally. There’s a small performance hit compared to delimitedSplit8K but the upside makes it more than worth it, as you will see in a moment. Here’s our splitter as a view

The Splitter as a View

CREATE OR ALTER VIEW dbo.TestSplit WITH SCHEMABINDING AS 
SELECT 
  Id   = t.ID, 
  item = 
    SUBSTRING
    (
      t.Column1,
      tt.N+SIGN(tt.N-1),
      ISNULL(NULLIF((CHARINDEX(';',t.Column1,tt.N+1)),0),LEN(t.Column1)+1)-(tt.N)-SIGN(tt.N-1)
    ),
  ItemIndex = tt.N+1
FROM       dbo.Test  AS t
CROSS JOIN dbo.tally AS tt
WHERE      tt.N <= LEN(t.Column1)
AND        (tt.N = 1 OR SUBSTRING(t.column1,tt.N,1) = ';');
GO

The values are split and returned with their associated ID. This logic is going to be applied to an index view which means we can’t include parameters; the delimiter and source of the string need to be embedded in our logic. Above we are splitting the contents of dbo.test.Column1 using a semicolon as the delimiter.

SELECT f.Id, F.Item FROM dbo.TestSplit AS f;

Below is exactly what we expected.

Function Output

Result Set

Next the execution plan.

Splitter View Execution Plan

That’s a good plan: set-based, linear and fast. There are faster splitters out there but none that work without the APPLY table operator. APPLY is not allowed in Indexed Views. Subqueries either. My splitter logic doesn’t use APPLY , subqueries or any other logic that prevents me from adding an index to my view so let’s see if we can add a unique clustered index and turn this view into an indexed Split where the split happens ahead of time and only once (unless the value is modified or deleted).

CREATE UNIQUE CLUSTERED INDEX uq_cl__testSplit ON dbo.TestSplit(Id,Item)

No error. Let’s run the same SELECT query from earlier and check the execution plan.

The Indexed Split Execution plan

It doesn’t get cleaner or faster than that. This is a powerful example of the what you can accomplish when using a physical numbers table to build an Indexed view. This barely scratches the surface as I hope to demonstrate in the near future.

Make my index a double

In 2020 “make it a double” became:

2o2o got me like ^^^

I digress.

Now that your indexed function is in place you can expand it’s power by adding more indexes! I’ll have my clustered index with a nonclustered chaser. Here we’ll create an index for item.

CREATE NONCLUSTERED INDEX nc__testSplit__item ON dbo.TestSplit(Item);

Now a query that filters on item.

SELECT f.* 
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.item <> '999,123';

To my chagrin, the optimizer often ignores nonclustered indexes on indexed views which is the case here. This is why I’m using NOEXPAND. Nonetheless, the execution plan is even better than before: an indexed seek against a narrower and lighter nonclustered index.

Execution plan leveraging a second, nonclustered index

The Indexed Splitter Function

Let’s add our RowNumber column and wrap the guy in an iTVF for a Nasty fast index function that splits a string ahead of time and maintains itself as needed! We’ll include the option to exclude one or more values using an @exclude parameter.

The Function

CREATE FUNCTION dbo.itvf_TestSplit
(
  @exclude VARCHAR(20)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT f.ID, f.Item, f.ItemIndex
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.Item <> @exclude;
GO

Let’s give’r a spin and check the execution plan.

Execution Plan

This image has an empty alt attribute; its file name is image-7.png

Same as before but now we are accepting parameters which means we now have an indexed function that can split strings ahead of time. Not even STRING_SPLIT or a CLR can compete here.

What about an “Item Number”?

Sometimes you need the item’s ordinal in the string; for this we need an item number (labeled as “ItemNumber” in my function logic. To return an item number (without a sort in the execution plan) some assembly will be required. DelimitedSplit8K and DelimitedSplit8K_LEAD both use ROW_NUMBER() to produce the ItemNumber column. ROW_NUMBER is not allowed in indexed views which is why I included the “ItemIndex” . It represents the item’s position in the string, something I normally don’t need. Here it’s perfect however – it allows me to add an ItemNumber while still avoiding a sort in my final plan. Not a big deal with strings 8K or less, but it is for longer strings (something an indexed function can handle at no extra cost.) First the new index:

EATE NONCLUSTERED INDEX nc_poc__testSplit__ItemIndex ON dbo.TestSplit(Id, ItemIndex);

With that index in place let’s run a query which includes the item number.

SELECT 
  Id= split.Id,
  ItemNumber = ROW_NUMBER() OVER (PARTITION BY split.ID ORDER BY split.ItemIndex),
  split.Item
FROM dbo.itvf_TestSplit('999,123') AS split;

Output

Result set that includes ItemNumber

Execution plan

easy peasy lemon-squeezy

Still a fantastic execution plan the same low cost as before with more complexity. Again, the calculations happened already, this function is leveraging one or more indexes to retrieve, not compute, the required value.

Conclusion

The Indexed Split is another game changer yet barely scratches the surface of the power of the Virtual Index and the Indexed Function. When you add a physical numbers tally to you Indexed views they can do more than pre-join or pre-aggregate values. Here we introduced the world’s first “indexed split” and we’re still just getting started! Tomorrow I’m going to introduce the Indexed Hierarchy. Thanks for reading!

Virtual Indexing Part 6: Combinatorial Index Optimization

In Part 5 we learned how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second. We can do better!

Disclaimer

I’ve never made it to Part 6 of a technical series. For me, a huge obstacle to getting quality content out in a timely manner is the proof-reading and final editing process. Like a few of my friends in this field, I have many new techniques and programming concepts that I’m dying to share which have been decaying in MS Word Docs and .sql files for years. I started my NTally article in 2014 for example.

In September, while quarantined and after listening to too much David Goggins, I decided to step up my writing (and research) game and simply began publishing the first draft of my work the moment I’m done authoring it… What I’m trying to say is, I’ve decided to err on the side of delivering content. This means there will be typos, now and in the future. Sorry. I feel better getting it done, however, typos and all, than taking my good ideas to the grave. If you are a writer/blogger or are aspiring to do so, I suggest following my lead and just get it done – It feels so much bettr! 😉

Introduction

This series began with the introduction of the Virtual Index, the suitably sorted stream of numbers returned by T-SQL ROW_NUMBER which act very similar to an physical index with respect to sorting, grouping and filtering. Unlike a physical index, created using DDL, the virtual index is invoked with a SELECT statement against fnTally and goes away when you’re query is finished. No disk space to consume, fragmentation to address, nothing to check into your CI/CD system – just a magical, in-memory index (suitably sorted stream) which comes and goes as requested.

In Part 5 we reviewed how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second using what I have coined, a multidimensional virtual index – that is, combining a virtual index with a real one speeds things up. More technically, we’re giving the optimizer a wider range of options than it would have with only one index in place.

So, if adding one virtual index to your query can speed things up, can we speed them up even further by adding more? I mean there basically free and can be pulled out of thin air – why not? That’s what we’re going to review today.

Dimensionality

Seasoned SQL developers as well as anyone who has written MDX expressions is familiar with the concept of dimensionality. Using SSAS for example, we can write and MDX expression to return a matrix with two or more dimensions as shown below. The Month dimension is on the x-axis, AKA the columns(0) axis in MDX; the Vehicle dimension is on y-axis, AKA rows(1). For brevity I included two Month members and three Vehicle members. The numbers can represent anything (e.g. Sales, Inventory, etc.) Using MDX we can easily write an MDX expression to return something that looks like this:

Figure 1. Two-dimensional matrix with Vehicle on the x-axis, month(s) on the y-axis.

Two-dimension matrix with a dimension for months and a second dimension for vehicles.

Now let’s say we wanted an deeper level of granularity for each vehicle color. We could add a third dimension for color to our expression to return something like this:

Figure 2. Three-dimensional matrix; a 3rd dimension added for color

Here we’re added a third dimension for “color” to transform it into a three-dimension matrix.

What does this have to do with indexing or performance you ask? Let’s apply this same concept, leveraging RangeAB, to perform what I’ve coined, Combinatorial Index Optimization. Think of it as combinatorial optimization for virtual indexes and indexed functions.

Combinatorial Index Optimization

Below is our query from Part 5 of this series where we reduced the the time it took to return a distinct set of numbers from a one billion set from 111 seconds using traditional methods, but only 0.07 seconds using a two-dimensional index.

  SELECT fnFilter.N1 --fnFilter.RN-(b.Mn-1)
  FROM  (SELECT MIN(t.N), MAX(t.N) FROM #t AS t) AS b(Mn,Mx)
  CROSS APPLY
  (
    SELECT      r.RN, r.N1
    FROM        core.rangeAB(b.Mn,b.Mx,1,1) AS r
    CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
  ) AS fnFilter
  GROUP BY fnFilter.N1, b.Mn
  ORDER BY fnFilter.N1;

As you may recall, we reduced the runtime by using the tally table for the left side of the one-to-many relationship and only returning numbers that exist in the source table (#t). We’ve reduced the question from, “where is the next distinct value?” to a simple yes/no question: “Is 1 in here? Is 2 in here?” This is an example of a two-dimensional virtual index created by way of combinatorial index optimization.

Figure 3. Two-dimensional Virtual Index

The tally table (RangeAB in this example)

Virtual Indexing while wearing 3-D glasses

As I have said before, the great thing about a virtual index is that it’s free. Armed with a correctly constructed tally table function I can conjure up one any time I wish just like an AD&D Level 7 Magic User.

In the real-world, for a table with millions/billions of rows there would almost certainly be a table or other indexed object (e.g. an indexed view) that I could use to join to for filtering. Either way, using rangeAB, and it’s @Gap parameter we can add another index for an even cooler type of filtering. First let’s review our 2-D solution. Note the code and my comments:

Figure 4. How the Two-dimensional Virtual Index works

Next our 3-D solution.

DECLARE @Gap BIGINT = 100;

SELECT @N = fnFilter.N1
  FROM        (SELECT MAX(t.N) FROM #t AS t) AS mx(N)
  CROSS APPLY core.rangeAB(1,mx.N,@Gap,1) AS b
  CROSS APPLY
  (
    SELECT r.RN, r.N1
    FROM   core.rangeAB(b.N1,b.N2-1,1,1) AS r
    CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
  ) AS fnFilter
  WHERE    EXISTS (SELECT 1 FROM #t AS t WHERE t.N BETWEEN b.N1 AND b.N2-1) -- $$$
  GROUP BY fnFilter.N1
  ORDER BY fnFilter.N1;

This image will help explain the changes.

Figure 5. How the Three-dimensional Virtual Index works

Adding this 3rd dimension will dramatically improve performance in two distinct ways. First it’s going to reduce the amount of work required simply by ignoring entire partitions of our sequence once the EXISTS statement determines it’s safe to do so. As Figure 2 should help you understand a 2-D virtual index, here’s what a 3-D virtual index looks like.

Figure 6. How the Three-dimensional virtual index works

Partitions without members are ignored

Performance Test 1

For brevity I set @gap to 3 for the example above. Notice how, for 1-3 and 13-15, our second tally table (by order of appearance in this article) is pre-filtering entire subsets of work. Next the performance test. For this test I used the same test harness from Part 5 of this series to create one billion rows with numbers ranging from 1 to 10,000 with very few gaps. For the 3-D solution I set @g to 100.

Figure 7. Test #1 Results – 1 Billion, Dense

The 3-D index improves with a parallel plan

In this case the 3-D index, with a serial plan slowed things down just a smidgen because I purposely made sure there were few gaps. I included this test to demonstrate how, unlike the 2-D index, we get a faster result set with a parallel plan. This despite the fact that the additional rangeAB call couldn’t really filter anything. Adding an additional dimension allows the optimizer better multithreading options. That is the second distinct advantage of the 3-D index over a 2-D one.

Performance Test 2

For this test I filled a table with the 50,000 numbers ranging from 1 to 1,000,000. Next I used my test harness to duplicate the rows until I had two Billion. The means my tally table must generate 1,000,000 rows for the left side of my query. This is where the 3-D index shines. I set my @gap parameter to 6400 for this test.

Figure 8. Test #2 Results – 2 Billion, Sparse

3-D virtual index with gap reduction saves the day.

The 2-D solution runs for 12 seconds regardless of how many CPUs you throw at it. The 3-D solution runs for 1.3 seconds serial, 0.396 seconds with a parallel plan. That’s correct, less than 1/2 a second to extract 50K distinct values from a set of 2,000,000,000.

Conclusion

This, again, is what I mean by More Fast, Less Code. In Part 7 we’ll dive into what is arguably my favorite topic: Finite Opposite Numbers. You’ll want to be seated for that one.

Virtual Indexing Part 5: The Multidimensional Virtual Index

In this installment of the Virtual Indexing series we’ll examine how to combine indexes to return a distinct set of numbers faster than using GROUP BY or DISTINCT

Introduction

As you may be aware, the challenge is on! The primary objective of this article is to further illustrate the power of rangeAB.  In Part 1 of this series I introduced the Virtual Index, the suitably sorted stream returned by SQL window functions which can be used for sorting, grouping and alike. In Part 3 we introduced the Indexed Function as well as my RangeAB functions. In this installment we’re going to combine a physical index to a virtual one to create what I call, a multi-dimensional virtual Index, and use it to return s distinct set of numbers from a very large set. Based on my testing, this technique completely decimates DISTINCT, GROUP BY, ROW_NUMBER or any other method for returning a distinct set.

The Problem

A common annoyance for me when developing ETL packages, is when I need to extract millions of non-distinct numbers into a table then use that table as a lookup downstream. The first solution to this problem is to perform a SELECT DISTINCT against that table each time I need unique values. A better solution is to create a new table to store a distinct set of values and then reference that table downstream. Either way, a DISTINCT against millions of rows can be slow and IO intensive, even with an index in place.

Sample Data

WARNING – the code below, as-is, creates One Billion rows in ~45 minutes. I need to use that many rows because of how efficient this technique is. If you are playing along at home be sure to adjust the @ii as needed, each iteration returns 10,000,000 rows. @ii is set to 100 for my billion-row test (100*10000000)= one billion. With the clustered index in place the sample data takes up 200MB/100 Million rows, a total of 20GB for one billion.

If you’re playing along at home you’ll need a copy of my random numbers function. The routine works like this: first it creates 15,000 random numbers between 1 and 25,000. This creates gaps in the sequence which will matter later in this series. The while loop then cross joins #t to itself and re-inserts 10,000,000 duplicates for each iteration (@ii). The random numbers function is attached here:

Figure 1. The Setup Routine

--==== WARNING – AS-IS, this routine will create 1,000,000,000 rows in 45 minutes...
--==== On my PC this runs about 4 minutes per 10 iterations
--==== With that in mind – set @ii accordingly!
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t (N INT NOT NULL INDEX nc_n CLUSTERED(N));
GO
INSERT #t SELECT r.RND FROM dbo.randomNumbers(15000,1,25000) AS r;
GO
--==== 1 Billion in 45 minutes  at 10M 100X batches (100M in 4 minutes)
DECLARE 
  @st DATETIME = GETDATE(),
  @i  INT      = 1,
  @ii INT      = 100;

WHILE @i<=@ii
BEGIN
  PRINT  @i;
  INSERT #t SELECT TOP (10000000)  t.N FROM #t AS t, #t AS t2 ORDER BY (SELECT NULL);
  SET @i+=1
END;

Returning unique values from a large set – Old School

Below is the typical way we would retrieve a unique set from the sample data stored in #t. Let’s run a quick performance test, first with a serial execution plan, the second with a parallel plan.

Figure 2.1. Returning a DISTINCT set using GROUP BY

PRINT CHAR(10)+'GROUP BY - serial'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
  SELECT   @N=t.N
  FROM     #t AS t
  GROUP BY t.N
  OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@st,GETDATE());
GO
PRINT CHAR(10)+'GROUP BY - parallel'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
  SELECT   @N=t.N
  FROM     #t AS t
  GROUP BY t.N
  OPTION (QUERYTRACEON 8649);
PRINT DATEDIFF(MS,@st,GETDATE());
GO

Note the straightforward execution plan. All billion+ rows are retrieved from #t then a hash match is used to filter out duplicates. With a serial plan the query runs for 2 minutes, with a parallel plan it runs for 12 seconds.

Figure 2.2. Old School Execution plan

Returning a distinct set using GROUP BY

Returning a distinct set from a large set of numbers – New School

Now we’re going to use RangeAB to return a unique set, note my comments.

Figure 3.1. Using RangeAB to return a distinct set

  SELECT N = fnFilter.N1
  FROM  (SELECT MAX(t.N) FROM #t AS t) AS b(Mx) -- (1) get the highest N from #t: MAX(N)
  CROSS APPLY
  (                         -- (2) Call RangeAB to retrieve the numbers 1 through MAX(N)
    SELECT       r.RN, r.N1 -- (3) Only return numbers from rangeAB that exist in #t
    FROM         core.rangeAB(1,b.Mx,1,1) AS r  -- Note that b.Mx is the MAX(#t.N) 
    CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x) -- gets row goals
  ) AS fnFilter -- Inline Function for filtering
  GROUP BY fnFilter.N1      -- (4) Use GROUP BY to eliminate the 1-many between r.N1 & t.N
  ORDER BY fnFilter.N1;     -- Not required but no penalty

The result set comes back instantly. It doesn’t matter if I force a serial or parallel plan. This, regardless of whether I leverage batch mode processing. This is true with both RangeAB and RangeAB2. If I run this with statistics time and IO on, we see:

Figure 3.2. STATISTICS TIME, IO results

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0...
Table '#t_00000001E3BF'. Scan count 25000, logical reads 108801, physical reads 0...

 SQL Server Execution Times: CPU time = 78 ms,  elapsed time = 174 ms.

That’s one heck of an improvement. Next let’s review what’s happening.

How it Works

Instead of dragging out all billion+ rows we are retrieving the numbers 1 to MAX(#t.N) from our tally table – approximately 25,000 rows. The execution plan reveals how the magic happens:

Figure 4.1. New School (using RangeAB) execution plan details

Note that we see 27,000 rows are pulled from our tally table – I don’t know what the extra 2,000 rows are about, but it won’t impact performance here

The query scans #t once for each unique number (N1) returned by our numbers table function but only returns them if there is a match. For each of the 25,000 possible numbers returned from the range function a scan is performed against #t for the number’s existence. This is where the “Scan count 25000” from Figure 3.2 comes from. 25,000 scans may seem excessive, but that number represents one-row index seeks which is nothing compared to a scan against a billion rows. In this example, there are 11,335 unique numbers – the exact number of rows returned from #t.

This two dimensional matrix will help visualize what is happening: instead of scanning four rows from #t to determine that 3 is one of the unique values, we are visiting each possible value only once from virtual index on RangeAB.

Figure 4.2. 2-Dimensional Matrix

Multidimensional Index Performance Test

For the test data I am leveraging the setup routine from Figure 1. Note that I run my alternative queries five times each but the old school, GROUP BY versions only run once. This is because they are that much slower.

Figure 5.1. New School Test Harness

DECLARE @rowz INT = (SELECT COUNT(*) FROM #t);
PRINT CONCAT('Multidimensional Virtual Index Test:, ',@rowz,' rows, N=1-25K'+
  CHAR(10),REPLICATE('-',90));
GO
PRINT CHAR(10)+'GROUP BY - serial'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
  SELECT   @N=t.N
  FROM     #t AS t
  GROUP BY t.N
  OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@st,GETDATE());
GO
PRINT CHAR(10)+'GROUP BY - parallel'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
  SELECT   @N=t.N
  FROM     #t AS t
  GROUP BY t.N
  OPTION (QUERYTRACEON 8649);
PRINT DATEDIFF(MS,@st,GETDATE());
GO
--==== MAGIC WITH CLUSTERED INDEX, NO INDEX = DEATH ON HUGE GAPS
PRINT CHAR(10)+'APPLY + TOP 1 - Serial'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
  SELECT @N = fnFilter.N1
  FROM  (SELECT MAX(t.N) FROM #t AS t) AS b(Mx) -- (1) get the highest N from #t: MAX(N)
  CROSS APPLY               -- (2) Call RangeAB to retrieve the numbers 1 through MAX(N)
  (
    SELECT       r.RN, r.N1 -- (3) Only return numbers from rangeAB that exist in #t
    FROM         core.rangeAB(1,b.Mx,1,1) AS r
    CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
  ) AS fnFilter
  GROUP BY fnFilter.N1      -- (4) Use GROUP BY to eliminate the 1-many between r.N1 & t.N
  ORDER BY fnFilter.N1      -- Not required but no penalty
  OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@st,GETDATE());
GO 5
PRINT CHAR(10)+'APPLY + TOP 1 - Parallel'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
  SELECT @N = fnFilter.N1
  FROM  (SELECT MAX(t.N) FROM #t AS t) AS b(Mx)
  CROSS APPLY
  (
    SELECT      r.RN, r.N1
    FROM        core.rangeAB(b.Mn,b.Mx,1,1) AS r
    CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
  ) AS fnFilter
  GROUP BY fnFilter.N1
  ORDER BY fnFilter.N1 -- No penalty
  OPTION (QUERYTRACEON 8649);
PRINT DATEDIFF(MS,@st,GETDATE());
GO 5

Figure 5.2. New School test results

Multidimensional Virtual Index, 1000015000 rows, N=1-25K
------------------------------------------------------------------------------------------

GROUP BY - serial
------------------------------------------------------------------------------------------
111353

GROUP BY - parallel
------------------------------------------------------------------------------------------
12147

APPLY + TOP 1 - Serial
------------------------------------------------------------------------------------------
Beginning execution loop
70
66
70
63
64
Batch execution completed 5 times.

APPLY + TOP 1 - Parallel
------------------------------------------------------------------------------------------
Beginning execution loop
84
97
100
77
100
Batch execution completed 5 times.

Completion time: 2020-12-12T12:21:49.8432474-06:00

The old school GROUP BY solution runs for 113 seconds with a serial plan and speeds up by a factor of ten (nearly) with a parallel plan. The rangeAB version runs for 100MS or less with serial or parallel execution plans. It’s interesting to note that parallel execution does not improve the time for the RangeAB solution. In Part 6 of this series we’ll add an additional dimension to speed things up even more especially with a parallel execution plan. See you there, thanks for reading!

Virtual Indexing Part 4: The NTally Table

A set-based, more performant alternative to SQL Server’s NTILE function coming right up.

Introduction

Few authors have impacted my career like Itzik Ben-Gan. If you are trying to get a grasp on how to develop high-performing, set-based SQL then a good Ben-Gan book will be a vital part of your SQL toolbox. Today Mr. Ben-Gan posted a challenge to create the fastest number generator. I offered up rangeAB which leverages Jeff Moden’s fnTally function which the fastest “number generator” in the business based on my experience.

When I first learned about the how to use a numbers table I began trying to use them to solve problems I’d never seen other people solve without a loop or other iterative method. I thought it was fun and doing so helped me grow my skills. One thing I never saw anyone do is use a numbers table to develop a set-based alternative to NTILE. I even heard people say that it’s not possible. Today we’re going to come up with two functions, each which blow the doors of NTILE with respect to speed and IO. The fist using a virtual auxiliary table of numbers (AKA “Tally Table”), the second using a physical numbers tables.

Quick NTILE overview

Before I dive into the algorithm let’s quickly review what NTILE does. NTILE is a window ranking function used to divide a set of rows as evenly as possible.

In High-Performance T-SQL Using Window Functions Ben-Gan explains it like this:

The NTILE function allows you to arrange the rows within the window partition in roughly equally sized tiles, based on the input number of tiles and specified window ordering. For example, suppose that you want to arrange the rows from the OrderValues view in 10 equally sized tiles based on val ordering. There are 830 rows in the view; hence, with 10 requested tiles, the tile size is 83 (that’s 830 divided by 10). So the first 83 rows (the first tenth) based on val ordering will be assigned with tile number 1, the next 83 with tile number 2, and so on.

High-Performance T-SQL Using Window Functions, Itzik Ben Gan

Let’s use fnTally let’s demo generate 13 rows and use NTILE to divide the rows into 4 tile groups.

Figure 1: NTILE test

DECLARE  @tiles BIGINT = 4,
         @rows  BIGINT = 13;

--==== NTILE CLASSIC
SELECT f.N, Tile = NTILE(@tiles) OVER (ORDER BY f.N)
FROM   dbo.fnTally(1,@rows) AS f;

Figure 1.2: NTILE test results

N     Tile
----- -------
1     1
2     1
3     1
4     1
5     2
6     2
7     2
8     3
9     3
10    3
11    4
12    4
13    4

To calculate the number of tiles required, for each number, from 1 to @tile, the quantity is calculated by taking the @rows/@tiles, then adding 1 for each number that is less than, or equal to: @rows%@tiles. In the example below we get 3 tile numbers for 1,2,3 and 4 (13/4 = 3). Then the number 1 gets an additional member because 1<=(13%4). If add a row then 2 will also get an additional member (14%4 = 2), if we set @rows to 15 then the number 1,2&3 will have 4 members (15%4=3). If we set @rows to 16 we get 4 rows for the numbers 1,2,3 & 4 (16/4 = 4, 16%4 = 0.)

This is how NTILE works, this is how our alternative must also work.

The NTally Table – What it is and how it replaces NTILE

This heading is a nod to The “Numbers” or “Tally” Table: What it is and how it replaces a loop. You can think of an NTally table as a tally table with tiles or a tiled tally table. I developed my first “NTally” in 2014 using a physical numbers table (the DDL for my numbers table can be found here.) An example of my first NTally table is posted here: Odd (to me) Behavior with NTILE() and NULL Values in Source Data. I have since improved on the physical numbers table version and developed one using a virtual table. Let’s examine the logic.

Figure 2.1: NTally Logic

DECLARE  @tiles BIGINT = 4,
         @rows  BIGINT = 13,
         @desc  BIT    = 0;

--==== NTILE CLASSIC
SELECT RN = f.N, Tile = NTILE(@tiles) OVER (ORDER BY f.N)
FROM   dbo.fnTally(1,@rows) AS f;

--==== ALTERNATIVE #1: PHYSICAL Auxiliary table of Numbers (dbo.NTally)
SELECT RN = ROW_NUMBER() OVER (ORDER BY t.N), Tile = t.N
FROM   dbo.tally AS t
CROSS APPLY
(
  SELECT TOP(@rows/@tiles+IIF(t.N<=@rows%@tiles,1,0)) $
  FROM       dbo.tally AS t1
  CROSS JOIN dbo.tally AS t2
) AS x(x)
WHERE t.N<=@tiles;

--==== ALTERNATIVE #2: PHYSICAL Auxiliary table of Numbers (dbo.NTallyRangeAB)
SELECT      RN     = ROW_NUMBER() OVER (ORDER BY t.RN),
            Tile   = t.RN
FROM        core.rangeAB(1,@tiles,1,1)                                  AS t
CROSS APPLY (VALUES(IIF(@desc=0,t.RN,t.OP)))                            AS d(D)
CROSS APPLY core.rangeAB(1,@rows/@tiles+IIF(d.D<=@rows%@tiles,1,0),1,1) AS x;

Each query returns the same thing except that I renamed “N” to “RN”.

Figure 2.2: NTally Results

RN    Tile
----- ------
1     1
2     1
3     1
4     1
5     2
6     2
7     2
8     3
9     3
10    3
11    4
12    4
13    4
The functions:
Simple Demo

The NTally table is not ready to replace NTILE out of the box, a small amount of assembly is required; this is what the RN column is for. Below is an example of how to “tile” a table using NTILE followed by examples which leverage dbo.NTally and dbo.NTallyRangeAB, which leverages dbo.rangeAB.

The NTally solutions are easy to assemble: instead of “NTILE(@tiles) OVER (ORDER BY s.SomeID)” we’ll use ROW_NUMBER inside a CTE with the same ORDER BY. With the row number in place (RN) we can use it to join to our NTally table and return a “tiled” result set without NTILE.

Note that the function needs the number of rows ahead of time, a simple COUNT of the rows passed to the function as the @rows parameter does the trick. This will add a little penalty but we’ll still see a huge time/IO gains nonetheless.

Figure 3.1: NTally Functional Demo

--==== 1. Sample Data
IF OBJECT_ID('tempdb..#SomeTable') IS NOT NULL DROP TABLE #SomeTable;
CREATE TABLE #SomeTable
(
  SomeId BIGINT PRIMARY KEY
);
INSERT #SomeTable(SomeId) SELECT f.N FROM dbo.GetNums(1,14) AS f;

--==== 2. Parameters and examples
DECLARE  @tiles BIGINT = 5,
         @rows  BIGINT = 14,
         @desc  BIT    = 0;

PRINT CHAR(13)+'NTILE CLASSIC'+CHAR(13)+REPLICATE('-',90);
  SELECT s.SomeId, Tile = NTILE(@tiles) OVER (ORDER BY s.SomeID)
  FROM   #SomeTable AS s;

PRINT CHAR(13)+'NTally'+CHAR(13)+REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      a.SomeID, nt.Tile
  FROM        anchor AS a
  CROSS JOIN  dbo.NTally(@tiles, (SELECT COUNT(*) FROM #SomeTable)) AS nt
  WHERE       a.rn = nt.rn

PRINT CHAR(13)+'NTallyRangeAB'+CHAR(13)+REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      a.SomeID, nt.Tile
  FROM        anchor AS a
  CROSS APPLY core.NTallyRangeAB(@tiles, (SELECT COUNT(*) FROM #SomeTable),@desc) AS nt
  WHERE       a.rn = nt.rn;

Each query returns the correct result set:

Figure 3.2: Results

SomeID  Tile
------- -------
1       1
2       1
3       1
4       2
5       2
6       2
7       3
8       3
9       3
10      4
11      4
12      4
13      5
14      5

If you have kept up with this series you can think of dbo.NTally as an indexed function and dbo.NTallyRangeAB as a function that leverages the virtual index. Now for performance.

Test #1: 1,000,000 Rows – Time and IO

I’m running SQL 2019 which allows for batch mode on rowstore. I’m going to run the NTILE query twice, first as-is, then with batch mode disabled. I can’t compete with batch mode but we’ll get close. We’ll compare the results with the NTally functions. Unlike the NTILE examples here, the NTally versions get faster when they get a parallel plan; I excluded the parallel NTILE test because it was too awful.

Note, too, that the optimizer tries to use a Hash Match join algorithm for joining the row numbers. A merge join is the right join for this task; in the examples below I’ll let the optimizer pick the join, then force a merge join so you can see the performance difference.

Figure 4: 1M Row Test

IF OBJECT_ID('tempdb..#SomeTable') IS NOT NULL DROP TABLE #SomeTable;
CREATE TABLE #SomeTable
(
  SomeId BIGINT PRIMARY KEY
);
INSERT #SomeTable(SomeId) SELECT f.N FROM dbo.GetNums(1,1000000) AS f;

SET STATISTICS TIME, IO ON;
  DECLARE @Tiles BIGINT = 3, @NT BIGINT;

  PRINT CHAR(13)+'NTILE CLASSIC with Batch Mode'+CHAR(13)+REPLICATE('-',90);
  SELECT @NT = NTILE(@Tiles) OVER (ORDER BY s.SomeID)
  FROM   #SomeTable AS s;

  PRINT CHAR(13)+'NTILE CLASSIC No Batch Mode'+CHAR(13)+REPLICATE('-',90);
  SELECT @NT = NTILE(@Tiles) OVER (ORDER BY s.SomeID)
  FROM   #SomeTable AS s
  OPTION(USE HINT('DISALLOW_BATCH_MODE')); -- Crawls when Parallel

  PRINT CHAR(13)+'dbo.NTALLY SERIAL, Hash Join'+CHAR(13)+REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
  FROM        anchor AS a
  CROSS APPLY dbo.NTally(3, (SELECT COUNT(*) FROM #SomeTable)) AS nt
  WHERE       a.rn = nt.rn
  OPTION (MAXDOP 1);

  PRINT CHAR(13)+'dbo.NTally SERIAL, Merge Join'+CHAR(13)+REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
  FROM        anchor AS a
  INNER 
  MERGE JOIN dbo.NTally(@Tiles, (SELECT COUNT(*) FROM #SomeTable)) AS nt
    ON       a.rn = nt.rn
  OPTION (MAXDOP 1);

  PRINT CHAR(13)+'dbo.NTally PARALLEL, Merge Join'+CHAR(13)+REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
  FROM        anchor AS a
  INNER 
  MERGE JOIN dbo.NTally(@Tiles, (SELECT COUNT(*) FROM #SomeTable)) AS nt
    ON       a.rn = nt.rn
  CROSS JOIN dbo.make_parallel() AS x;

  PRINT CHAR(13)+'dbo.NTallyRangeAB SERIAL, Hash Join'+CHAR(13)+REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
  FROM        anchor AS a
  CROSS APPLY core.NTallyRangeAB(@Tiles, (SELECT COUNT(*) FROM #SomeTable),0) AS nt
  WHERE       a.rn = nt.rn
  OPTION (MAXDOP 1);

  PRINT CHAR(13)+'dbo.NTallyRangeAB SERIAL, Merge Join'+CHAR(13)+
    REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
  FROM        anchor AS a
  INNER 
  MERGE JOIN core.NTallyRangeAB(@Tiles, (SELECT COUNT(*) FROM #SomeTable),0) AS nt
    ON       a.rn = nt.rn
  OPTION (MAXDOP 1);

  PRINT CHAR(13)+'dbo.NTallyRangeAB Parallel, Merge Join'+CHAR(13)+
    REPLICATE('-',90);
  WITH anchor AS
  (
   SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
   FROM   #SomeTable AS s
  )
  SELECT      @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
  FROM        anchor AS a
  INNER 
  MERGE JOIN core.NTallyRangeAB(@Tiles, (SELECT COUNT(*) FROM #SomeTable),0) AS nt
    ON       a.rn = nt.rn
  CROSS JOIN dbo.make_parallel() AS x;
SET STATISTICS TIME, IO OFF;
GO

Figure 4.2 Results:

NTILE CLASSIC with Batch Mode
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 1, logical reads 2108...
Table 'Worktable'. Scan count 1, logical reads 46274, physical reads 0...

 SQL Server Execution Times: CPU time = 281 ms,  elapsed time = 284 ms.

NTILE CLASSIC No Batch Mode
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 3, logical reads 2867958, physical reads 0... 
Table '#SomeTable_00000001CF14'. Scan count 1, logical reads 2108...

 SQL Server Execution Times: CPU time = 2375 ms,  elapsed time = 2378 ms.

dbo.NTALLY SERIAL, Hash Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750
Table 'tally'. Scan count 3, logical reads 1756, physical reads 0...

 SQL Server Execution Times: CPU time = 984 ms,  elapsed time = 997 ms.

dbo.NTally SERIAL, Merge Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0...
Table 'Worktable'. Scan count 0...
Table 'tally'. Scan count 3, logical reads 599, physical reads 0...

 SQL Server Execution Times: CPU time = 922 ms,  elapsed time = 917 ms.

dbo.NTally PARALLEL, Merge Join
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 0...
Table 'Worktable'. Scan count 0...
Table 'tally'. Scan count 9, logical reads 601, physical reads 0...
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0

 SQL Server Execution Times: CPU time = 2408 ms,  elapsed time = 790 ms.

dbo.NTallyRangeAB SERIAL, Hash Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0

 SQL Server Execution Times: CPU time = 906 ms,  elapsed time = 946 ms.

dbo.NTallyRangeAB SERIAL, Merge Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0...
Table 'Worktable'. Scan count 0...

 SQL Server Execution Times: CPU time = 969 ms,  elapsed time = 977 ms.

dbo.NTallyRangeAB Parallel, Merge Join
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 0, logical reads 0...
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0...

 SQL Server Execution Times: CPU time = 1639 ms,  elapsed time = 535 ms.

Completion time: 2020-12-10T20:02:28.6874120-06:00

NTILE with batch mode runs for 284MS with a serial plan. Without batch mode, however, just can’t compete with the dbo.NTally or dbo.NTallyRangeAB. First, check those reads – 2,867,958 and it runs for 2 seconds. Each NTally solution runs for under 1 second with the best performance being NTallyRangeAB as the winner at 1/2 a second and 14,750 reads – 4X speed increase and a 99.5% I/O reduction.

No Looking Back

I snuck a bonus feature on dbo.NTallyRangeAB. You may have noticed that I added a third column named TileOP, this is just the tile group in descending order with the distribution flipped, e.g. the “overflow” tiles are assigned to the highest tile numbers first. For this we simply return TileOP instead of the Tile row. Let’s use our 14 row test from figure 3.2

Figure 5.1 NTallyRangeAB TileOP Demo:

DECLARE  @tiles BIGINT = 5,
         @rows  BIGINT = 14,
         @desc  BIT    = 0;

WITH anchor AS
(
 SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
 FROM   #SomeTable AS s
)
SELECT      a.SomeID, nt.Tile, nt.TileOP
FROM        anchor AS a
CROSS APPLY core.NTallyRangeAB(@tiles, (SELECT COUNT(*) FROM #SomeTable),@desc) AS nt
WHERE       a.rn = nt.rn
ORDER BY    a.SomeId

I returned the Tile and TileOP columns so you can see the difference.

Figure 5.2. Results:

SomeID   Tile   TileOP
-------- ------ --------
1        1      5
2        1      5
3        1      5
4        2      4
5        2      4
6        2      4
7        3      3
8        3      3
9        3      3
10       4      2
11       4      2
12       4      2
13       5      1
14       5      1

Some interesting lessons here. First, NTILE teamed with batch mode is nasty fast and tough to beat. If batch mode is not an option then NTally and NTallyRangeAB are the only game in town as far as I’m concerned .

This concludes Part 4 of this series. Forgive any typos – this is the first/last draft. Cheers!

Virtual Indexing Part 2: Unexpected Pitfalls

Diving deeper into the world of virtual indexing while being reminded that set-based code is not always bulletproof

Introduction                                                 

In Virtual Indexing Part 1 I introduced the idea of the virtual index, the ordered (AKA “suitably  sorted”) stream of numbers returned by ROW_NUMBER which allows you to perform operations like grouping and window aggregations without the help of an index OR a sort in the execution plan. Today I want to draw your attention to some hidden CTE tally table dangers that are easy to detect and resolve but can be catastrophic when missed.

Problem #1: Gaps and the Hidden Row Explosion

Identifying gaps and islands in sequences is a common SQL task. One of the first things I learned to do with a tally table is identify gaps. It’s an easy concept to grasp, easier than identifying islands. In the query below we have a table variable populated with the numbers 1 to 10, but with the numbers 3,5 and 9 missing. Let’s use fnTally to identify the missing numbers.

Figure 1. Using fnTally to find gaps in a sequence

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

SELECT    f.N
FROM      dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t ON t.N = f.N
WHERE     t.N IS NULL;

This returns the expected results: [3 5 9]. Now the execution plan:

Figure 1.1. fnTally gaps in a sequence query execution plan

fnTally returns 10 rows, as expected, to create the numbers 1 to 10. But look at the clustered index scan against the table variable – 70 rows. That is tragic: 70 rows scanned to identify 3 missing values! Can you identify the problem? Take a minute to see if you can figure this out…

@t has 7 rows, fnTally is returning 10 rows: 7*10=70. The optimizer knows that the clustered index on @t is unique but is not taking advantage of how the numbers returned by fnTally are also unique and ordered. The optimizer is behaving as if a one-to-many relationship between t.N and f.N is possible. With this plan the execution engine is forced to compare every value in @t to every value returned by fnTally. As sad as this might make you feel at first, let not your heart be troubled. The workarounds are endless, the trick, really, is to understand which one is best and why.

Table Operator Modifications

Let’s start with the three solutions below, ordered by which I would recommend. Each of these will get the job done with minimal changes to the underlying logic. The first solution is to just add an additional filter: f.N <= @Max; the second is to replace fnTally with a physical tally table, the third is to use a hint.

Figure 2. fnTally gaps in a sequence solutions

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

--==== Solution #1: Add a WHERE filter for fnTally.N
SELECT    f.N
FROM      dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t ON t.N = f.N
WHERE     f.N <= @Max -- solves the fake one-to-many issue
  AND     t.N IS NULL;

--==== Solution #2: Use a persisted tally table with a WHERE filter
SELECT    f.N
FROM      dbo.tally AS f
LEFT JOIN @t        AS t 
  ON      t.N = f.N
WHERE     f.N <= @Max -- required becuase there is no TOP(@Max) clause
  AND     t.N IS NULL;

--==== Solution #3: Add a Query Hint to the left join
SELECT    f.N
FROM      dbo.fnTally(1,@Max) AS f
LEFT MERGE JOIN @t AS t -- use Merge Join algorithm instead of loop join
  ON      t.N = f.N
WHERE     t.N IS NULL;

Each query returns the correct result and in each case only 7 rows total are retrieved from @t instead of 7 rows for each number returned by fnTally.

Figure 3. Execution plan for the table operator solutions

I didn’t call it out in the screen shot but with each solution only 7 rows were retrieved instead of 70. The first solution is the easiest – just add an additional WHERE filter. This may seem redundant as it’s not necessary, but it works and it’s a simple fix. If it were always this simple I could end the discussion here but it’s not. The second solution is to use a physical tally table (dbo.tally in this example) instead; this solves the problem in this example and has other advantages I’ll cover momentarily. The third solution is to force a merge join using a query hint; this works but is my last choice. There have been times where a query hint is the only option which is why I’m calling it out now.

Anti-Join Solutions

Despite the fact that you have seen the term, Anti-join in the SSMS execution plan, it is not a commonly used phrase in the RDBMS world. An anti-join is but it is the best way to describe a scenario where you need all items from one table that do not exist in a second table.

Figure 4. Anti-join Venn Diagram

Reviewing the execution plan for the first set of solutions above (figure 2) you see that each used a Left Outer Join operator to identify the missing rows, then by a filter to exclude the others. Let’s examine two more solutions which leverage (NOT) EXISTS logical operator and the EXCEPT set operator.

Figure 5. Anti-join solutions

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

--==== Solution #4: EXCEPT >> Merge join handles the filter
SELECT f.N FROM dbo.fnTally(1,@Max) AS f
EXCEPT
SELECT t.N FROM @t AS t;

--==== Solution #5: NOT EXISTS >> Identical to above but with nested loop join
SELECT f.N
FROM   dbo.fnTally(1,@Max) AS f
WHERE  NOT EXISTS (SELECT t.N FROM @t AS t WHERE t.N = f.N)
AND    f.N <= @Max; -- Optional, forces loop join

Figure 5.1. Anti-join solutions execution plans

Both produce almost identical execution plans and are both efficient. The EXCEPT solution is the best IMO because it’s the cleanest.

Parallelism

Now let’s safely force a parallel execution plan against the left join solution with the WHERE filter (Figure #2, Solution 1) then, again, with our EXCEPT anti-join from Figure #5.

Figure 6. Parallel execution test

DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT  @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);

DECLARE @Max BIGINT = 10;

--==== Good when serial, bad for parallel
SELECT     f.N
FROM       dbo.fnTally(1,@Max) AS f
LEFT JOIN  @t AS t ON t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
WHERE      t.N IS NULL
AND        f.N <= @Max

--==== Winner for serial and parallel
SELECT f.N FROM dbo.fnTally(1,@Max) AS f 
EXCEPT
SELECT t.N FROM @t AS t CROSS JOIN dbo.make_parallel() AS x;

Below is the portion of each plan where the rows are retrieved. Even with the WHERE f.N <= @Max clause in place, which solved the 70-row explosion problem earlier with a serial plan, the row explosion returns with parallel execution. The EXCEPT solution, however, does not have this problem with a serial or parallel plan.

Figure 6.1. Parallel execution performance plan

Problem #2: Gaps, Left Join Aggregation and Parallelism

Keeping with the theme of gaps let’s use the same table variable from earlier but include duplicate and missing values. Using this sample data:

Figure 7. Problem #2 sample data

DECLARE @t TABLE (N BIGINT INDEX IX1 CLUSTERED NOT NULL);
INSERT  @t(N) VALUES(1),(1),(2),(2),(2),(2),(4),(4),(6),(7),(8),(8),(8),(10),(10),(10);

The expected output would be:

Figure 7.1. Problem #2 expected output

N     Total
----- -------
1     2
2     4
3     0
4     2
5     0
6     1
7     1
8     3
9     0
10    3

Like earlier, this query will suffer the aforementioned hidden row explosions; instead of retrieving 16 rows from @t, it’s retrieving 160 rows (16*@Max.)

Figure 8. Problem #2, Solution #1 (fail)

SELECT    f.N, Total = COUNT(t.N)
FROM      dbo.fnTally(1,@Max) AS f
LEFT JOIN @t                  AS t
  ON      t.N = f.N -- could not shake the row explosion
GROUP BY  f.N;

Again, an f.N <= @Max in a WHERE filter or as a join. t.N <= @Max will also do the trick in the WHERE or join filters. The problem, again, is that this only works when the optimizer chooses a serial execution plan. This means that, in this case, you must always use OPTION (MAXDOP 1) but be banished to eternal serial processing. C’mon man!

Problem #2 Workarounds

Merge Join hint will solve the problem but cannot get a parallel execution plan; in other words, make_parallel will be successful at solving the row explosion issue but not at forcing a parallel plan. The persisted tally table solution on the other hand, does not experience the row explosion issue and can enjoy both serial and parallel execution.

Figure 9. Problem #2 – Solutions #2 & #3

DECLARE @t TABLE (N BIGINT INDEX IX1 CLUSTERED NOT NULL);
INSERT  @t(N) VALUES(1),(1),(2),(2),(2),(2),(4),(4),(6),(7),(8),(8),(8),(10),(10),(10);

DECLARE @Max BIGINT = 10;

-- Solution 1 - works, query hint though & no parallel execution
SELECT     f.N, Total = COUNT(t.N)
FROM       dbo.fnTally(1,@Max) AS f
LEFT MERGE JOIN @t             AS t
  ON       t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
GROUP BY     f.N;

-- Solution 2 - good serial & Parallel
SELECT     f.N, Total = COUNT(t.N)
FROM       dbo.tally AS f
LEFT JOIN  @t AS t
  ON       t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
WHERE      f.N <= @Max
GROUP BY   f.N;

Figure 9.1. Problem #2 – Solutions #2 & #3 execution plans

In each case the row explosion is gone but only the physical tally table solution (dbo.tally) solves the problem with a serial or parallel plan while preserving the optimizer’s option to execute a parallel plan. This is an example of where fnTally simply cannot compete with dbo.tally.

Set-Based =! Bulletproof (Conclusion)

If there is a lesson to be learned today it’s that the very existence of a numbers table or the text, “tally” in your code does not guarantee that it will be fast. Furthermore, just because your set-based solution is fast – it can always be faster, exponentially faster in many cases as you will see in future articles. In this article we watched an innocent query to identify gaps in a sequence go terribly wrong. Fortunately, there are many workarounds provided that you are identify the problem when it arises. Hopefully you are now better prepared.  

Which is faster again? fnTally or a tally persisted tally table? It depends. As per usual.