## Virtual Indexing Part 8: The Indexed Splitter Function

The fastest way to split a string is to have already done so.

## Introduction

Happy 2021! Indexed views and tally tables – a virtually unexplored topic, one I will write about at length this year. This will be a quick read and there is likely to be at least a couple typos; time is short so today is about quality, not quantity. In Part 3: The Indexed Function I introduced you to I said “the fastest way to perform a calculation is to have already done so;” in this installment I will demonstrate that the fastest way to split a string is also to have already done so.

When I introduced the Indexed function I used prime numbers as my example. 7 is always a prime number, Factorial(5) always equals 120. Creating an Indexed prime numbers, factorial or any function where a function always returns the same output for a given argument. What if we wanted a more complex indexed function, such as one that splits a string?

I know about database normalization but I am not responsible for designing most of the databases I work with. If you’ve been in this racket for a couple fortnights you have run across that vender DB written by great application developers with terrible back-end databases. It’s not uncommon to see fields that contain a delimited list – ‘1,4,13,99’ instead of a single atomic value. You can write an ETL package that leverages a good string splitter (such as DelimitedSplit8K or DelimitedSplit8k_Lead) to regularly split and copy the values into a correctly normalized table. In this article I will show you a much faster, maintenance-free alternative which leverages a Numbers Table and an indexed view.

## Introducing The “Indexed Splitter”

For this you will need a tally table with at least as many rows as the longest string you intend to split. 8,000 is enough unless you need to split string longer than 8K. My tally table (dbo.tally) has 1,000,000 rows. First the sample data.

Sample data

``````CREATE TABLE dbo.Test (ID INT, Column1 VARCHAR(20));
INSERT INTO  dbo.Test (ID,Column1)
VALUES (1,'ORM;PS;SUP'),(2,'ABC;XYZ;999;123');``````

Next we’re going to borrow Jeff Moden’s DelimitedSplit8K logic to create a set-based splitter that leverages permanent tally table (e.g. an regular ol properly indexed numbers table named dbo.tally. There’s a small performance hit compared to delimitedSplit8K but the upside makes it more than worth it, as you will see in a moment. Here’s our splitter as a view

The Splitter as a View

``````CREATE OR ALTER VIEW dbo.TestSplit WITH SCHEMABINDING AS
SELECT
Id   = t.ID,
item =
SUBSTRING
(
t.Column1,
tt.N+SIGN(tt.N-1),
ISNULL(NULLIF((CHARINDEX(';',t.Column1,tt.N+1)),0),LEN(t.Column1)+1)-(tt.N)-SIGN(tt.N-1)
),
ItemIndex = tt.N+1
FROM       dbo.Test  AS t
CROSS JOIN dbo.tally AS tt
WHERE      tt.N <= LEN(t.Column1)
AND        (tt.N = 1 OR SUBSTRING(t.column1,tt.N,1) = ';');
GO``````

The values are split and returned with their associated ID. This logic is going to be applied to an index view which means we can’t include parameters; the delimiter and source of the string need to be embedded in our logic. Above we are splitting the contents of dbo.test.Column1 using a semicolon as the delimiter.

``SELECT f.Id, F.Item FROM dbo.TestSplit AS f;``

Below is exactly what we expected.

Function Output

Next the execution plan.

That’s a good plan: set-based, linear and fast. There are faster splitters out there but none that work without the APPLY table operator. APPLY is not allowed in Indexed Views. Subqueries either. My splitter logic doesn’t use APPLY , subqueries or any other logic that prevents me from adding an index to my view so let’s see if we can add a unique clustered index and turn this view into an indexed Split where the split happens ahead of time and only once (unless the value is modified or deleted).

``CREATE UNIQUE CLUSTERED INDEX uq_cl__testSplit ON dbo.TestSplit(Id,Item)``

No error. Let’s run the same SELECT query from earlier and check the execution plan.

It doesn’t get cleaner or faster than that. This is a powerful example of the what you can accomplish when using a physical numbers table to build an Indexed view. This barely scratches the surface as I hope to demonstrate in the near future.

## Make my index a double

In 2020 “make it a double” became:

I digress.

Now that your indexed function is in place you can expand it’s power by adding more indexes! I’ll have my clustered index with a nonclustered chaser. Here we’ll create an index for item.

``CREATE NONCLUSTERED INDEX nc__testSplit__item ON dbo.TestSplit(Item);``

Now a query that filters on item.

``````SELECT f.*
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.item <> '999,123';``````

To my chagrin, the optimizer often ignores nonclustered indexes on indexed views which is the case here. This is why I’m using NOEXPAND. Nonetheless, the execution plan is even better than before: an indexed seek against a narrower and lighter nonclustered index.

## The Indexed Splitter Function

Let’s add our RowNumber column and wrap the guy in an iTVF for a Nasty fast index function that splits a string ahead of time and maintains itself as needed! We’ll include the option to exclude one or more values using an @exclude parameter.

The Function

``````CREATE FUNCTION dbo.itvf_TestSplit
(
@exclude VARCHAR(20)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT f.ID, f.Item, f.ItemIndex
FROM   dbo.TestSplit AS f WITH (NOEXPAND)
WHERE  f.Item <> @exclude;
GO``````

Let’s give’r a spin and check the execution plan.

Execution Plan

Same as before but now we are accepting parameters which means we now have an indexed function that can split strings ahead of time. Not even STRING_SPLIT or a CLR can compete here.

## What about an “Item Number”?

Sometimes you need the item’s ordinal in the string; for this we need an item number (labeled as “ItemNumber” in my function logic. To return an item number (without a sort in the execution plan) some assembly will be required. DelimitedSplit8K and DelimitedSplit8K_LEAD both use ROW_NUMBER() to produce the ItemNumber column. ROW_NUMBER is not allowed in indexed views which is why I included the “ItemIndex” . It represents the item’s position in the string, something I normally don’t need. Here it’s perfect however – it allows me to add an ItemNumber while still avoiding a sort in my final plan. Not a big deal with strings 8K or less, but it is for longer strings (something an indexed function can handle at no extra cost.) First the new index:

``EATE NONCLUSTERED INDEX nc_poc__testSplit__ItemIndex ON dbo.TestSplit(Id, ItemIndex);``

With that index in place let’s run a query which includes the item number.

``````SELECT
Id= split.Id,
ItemNumber = ROW_NUMBER() OVER (PARTITION BY split.ID ORDER BY split.ItemIndex),
split.Item
FROM dbo.itvf_TestSplit('999,123') AS split;``````

Output

Execution plan

Still a fantastic execution plan the same low cost as before with more complexity. Again, the calculations happened already, this function is leveraging one or more indexes to retrieve, not compute, the required value.

## Conclusion

The Indexed Split is another game changer yet barely scratches the surface of the power of the Virtual Index and the Indexed Function. When you add a physical numbers tally to you Indexed views they can do more than pre-join or pre-aggregate values. Here we introduced the world’s first “indexed split” and we’re still just getting started! Tomorrow I’m going to introduce the Indexed Hierarchy. Thanks for reading!

## Virtual Indexing Part 6: Combinatorial Index Optimization

In Part 5 we learned how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second. We can do better!

## Disclaimer

I’ve never made it to Part 6 of a technical series. For me, a huge obstacle to getting quality content out in a timely manner is the proof-reading and final editing process. Like a few of my friends in this field, I have many new techniques and programming concepts that I’m dying to share which have been decaying in MS Word Docs and .sql files for years. I started my NTally article in 2014 for example.

In September, while quarantined and after listening to too much David Goggins, I decided to step up my writing (and research) game and simply began publishing the first draft of my work the moment I’m done authoring it… What I’m trying to say is, I’ve decided to err on the side of delivering content. This means there will be typos, now and in the future. Sorry. I feel better getting it done, however, typos and all, than taking my good ideas to the grave. If you are a writer/blogger or are aspiring to do so, I suggest following my lead and just get it done – It feels so much bettr! 😉

## Introduction

This series began with the introduction of the Virtual Index, the suitably sorted stream of numbers returned by T-SQL ROW_NUMBER which act very similar to an physical index with respect to sorting, grouping and filtering. Unlike a physical index, created using DDL, the virtual index is invoked with a SELECT statement against fnTally and goes away when you’re query is finished. No disk space to consume, fragmentation to address, nothing to check into your CI/CD system – just a magical, in-memory index (suitably sorted stream) which comes and goes as requested.

In Part 5 we reviewed how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second using what I have coined, a multidimensional virtual index – that is, combining a virtual index with a real one speeds things up. More technically, we’re giving the optimizer a wider range of options than it would have with only one index in place.

So, if adding one virtual index to your query can speed things up, can we speed them up even further by adding more? I mean there basically free and can be pulled out of thin air – why not? That’s what we’re going to review today.

## Dimensionality

Seasoned SQL developers as well as anyone who has written MDX expressions is familiar with the concept of dimensionality. Using SSAS for example, we can write and MDX expression to return a matrix with two or more dimensions as shown below. The Month dimension is on the x-axis, AKA the columns(0) axis in MDX; the Vehicle dimension is on y-axis, AKA rows(1). For brevity I included two Month members and three Vehicle members. The numbers can represent anything (e.g. Sales, Inventory, etc.) Using MDX we can easily write an MDX expression to return something that looks like this:

Figure 1. Two-dimensional matrix with Vehicle on the x-axis, month(s) on the y-axis. Two-dimension matrix with a dimension for months and a second dimension for vehicles.

Now let’s say we wanted an deeper level of granularity for each vehicle color. We could add a third dimension for color to our expression to return something like this:

Figure 2. Three-dimensional matrix; a 3rd dimension added for color Here we’re added a third dimension for “color” to transform it into a three-dimension matrix.

What does this have to do with indexing or performance you ask? Let’s apply this same concept, leveraging RangeAB, to perform what I’ve coined, Combinatorial Index Optimization. Think of it as combinatorial optimization for virtual indexes and indexed functions.

## Combinatorial Index Optimization

Below is our query from Part 5 of this series where we reduced the the time it took to return a distinct set of numbers from a one billion set from 111 seconds using traditional methods, but only 0.07 seconds using a two-dimensional index.

``````  SELECT fnFilter.N1 --fnFilter.RN-(b.Mn-1)
FROM  (SELECT MIN(t.N), MAX(t.N) FROM #t AS t) AS b(Mn,Mx)
CROSS APPLY
(
SELECT      r.RN, r.N1
FROM        core.rangeAB(b.Mn,b.Mx,1,1) AS r
CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
) AS fnFilter
GROUP BY fnFilter.N1, b.Mn
ORDER BY fnFilter.N1;``````

As you may recall, we reduced the runtime by using the tally table for the left side of the one-to-many relationship and only returning numbers that exist in the source table (#t). We’ve reduced the question from, “where is the next distinct value?” to a simple yes/no question: “Is 1 in here? Is 2 in here?” This is an example of a two-dimensional virtual index created by way of combinatorial index optimization.

Figure 3. Two-dimensional Virtual Index

## Virtual Indexing while wearing 3-D glasses

As I have said before, the great thing about a virtual index is that it’s free. Armed with a correctly constructed tally table function I can conjure up one any time I wish just like an AD&D Level 7 Magic User.

In the real-world, for a table with millions/billions of rows there would almost certainly be a table or other indexed object (e.g. an indexed view) that I could use to join to for filtering. Either way, using rangeAB, and it’s @Gap parameter we can add another index for an even cooler type of filtering. First let’s review our 2-D solution. Note the code and my comments:

Figure 4. How the Two-dimensional Virtual Index works

Next our 3-D solution.

``````DECLARE @Gap BIGINT = 100;

SELECT @N = fnFilter.N1
FROM        (SELECT MAX(t.N) FROM #t AS t) AS mx(N)
CROSS APPLY core.rangeAB(1,mx.N,@Gap,1) AS b
CROSS APPLY
(
SELECT r.RN, r.N1
FROM   core.rangeAB(b.N1,b.N2-1,1,1) AS r
CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
) AS fnFilter
WHERE    EXISTS (SELECT 1 FROM #t AS t WHERE t.N BETWEEN b.N1 AND b.N2-1) -- \$\$\$
GROUP BY fnFilter.N1
ORDER BY fnFilter.N1;``````

This image will help explain the changes.

Figure 5. How the Three-dimensional Virtual Index works

Adding this 3rd dimension will dramatically improve performance in two distinct ways. First it’s going to reduce the amount of work required simply by ignoring entire partitions of our sequence once the EXISTS statement determines it’s safe to do so. As Figure 2 should help you understand a 2-D virtual index, here’s what a 3-D virtual index looks like.

Figure 6. How the Three-dimensional virtual index works

## Performance Test 1

For brevity I set @gap to 3 for the example above. Notice how, for 1-3 and 13-15, our second tally table (by order of appearance in this article) is pre-filtering entire subsets of work. Next the performance test. For this test I used the same test harness from Part 5 of this series to create one billion rows with numbers ranging from 1 to 10,000 with very few gaps. For the 3-D solution I set @g to 100.

Figure 7. Test #1 Results – 1 Billion, Dense

In this case the 3-D index, with a serial plan slowed things down just a smidgen because I purposely made sure there were few gaps. I included this test to demonstrate how, unlike the 2-D index, we get a faster result set with a parallel plan. This despite the fact that the additional rangeAB call couldn’t really filter anything. Adding an additional dimension allows the optimizer better multithreading options. That is the second distinct advantage of the 3-D index over a 2-D one.

## Performance Test 2

For this test I filled a table with the 50,000 numbers ranging from 1 to 1,000,000. Next I used my test harness to duplicate the rows until I had two Billion. The means my tally table must generate 1,000,000 rows for the left side of my query. This is where the 3-D index shines. I set my @gap parameter to 6400 for this test.

Figure 8. Test #2 Results – 2 Billion, Sparse 3-D virtual index with gap reduction saves the day.

The 2-D solution runs for 12 seconds regardless of how many CPUs you throw at it. The 3-D solution runs for 1.3 seconds serial, 0.396 seconds with a parallel plan. That’s correct, less than 1/2 a second to extract 50K distinct values from a set of 2,000,000,000.

## Conclusion

This, again, is what I mean by More Fast, Less Code. In Part 7 we’ll dive into what is arguably my favorite topic: Finite Opposite Numbers. You’ll want to be seated for that one.

## Virtual Indexing Part 3: The Indexed Function

“The fastest way to perform a calculation is to have already done so.”

“The fastest way to perform a calculation is to have already done so.”

## Introduction

Here’s an interesting problem: given a sequence of numbers between M and N, how many are prime numbers? How would you solve this and how efficient will your solution be if M=0 and N=10,000,000? On my PC I’m able to get the job done in about 3 seconds.

## Introducing the Indexed Function

One day while trying to figure out some recursive query I stumble across a SQL Server Central post about calculating a Fibonacci sequence where someone asked, “why not just store the values in a table?” It seems obscenely obvious to me now but at the time, but it blew me away that I never thought of it. It still blows my mind how so few developers take advantage of this. Just populate a table (on-disk or in-memory) with the values you need, add indices as needed, then build an inline table valued function (iTVF) to retrieve the value with an index seek. This allows your function to run in O(1) time for each value passed to it. Indexed values are stored as a  suitably sorted stream which makes it possible to retrieve the value(s) without a scan. A web search for “create Fibonacci function“,  “function for prime numbers” or something similar will generally return various algorithms (mostly recursive) but scarce mentions of pre-calculating problems where the values never change. Doesn’t Fibonacci(7) always 13? Factorial(5) = 120? When is 7 not a prime number? I suspect that, the fastest way to perform a calculation is to have already done so, then save the result in an index for fast retrieval.

In Paul White’s article, Understanding and using APPLY (Part 1) he refers to an inline table valued function (iTVF) as “a view that accepts parameters”. With that in mind, what would be the best way to describe an indexed view (or table) that accepts parameters?  I’m thinking indexed function.

## Nasty Fast Primes

Primes is a great example since calculating them can be costly, especially when using a solution which “counts up” iteratively or recursively. For this exercise we’re going to build a prime number function that can determine if an integer between 1 and 10,000,000 is prime and do so in O(1) time and space. I know that 99.999% of you don’t need a prime number function so remember that this is for learning and to prove the power of this concept; this technique will work against any immutable set of values.You don’t have to play along or can test this with a smaller number of primes. I’m including the DDL used to create the function so you can keep me honest about the performance tests later in the article.

### The Setup Procedure

This proc indexes and populates a prime numbers table called idx.prime. There are two parameters: @Counter and @maxrows. @counter is the number that the proc starts at and should be left at 14. You can use @counter to set up batch inserts. If you execute usp_primes and accept the default values, the function will support 1-10M with a bit column indicating if the number is prime. You don’t have to do 10M rows but it’s ~300MB, uncompressed, if you do. For me it this runs 20 minutes without batching the insert; I did it all in one shot. Note that I keep my indexed functions in an “idx” schema, no harm in using dbo. Once the procedure is done you can create our prime numbers function as an indexed function. As mentioned in Part 2 of my PerfML series, functional purity is key so only an iTVF will do. If you’re playing along you will need a one million row tally table, you can get the DDL here and the stored procedure to create an indexed prime number function is attached below.

Note that it creates a schema name idx; this is where my indexed functions live but it’s not necessary.

### The Indexed Function (idx.fnPrimes)

Now we have a sorted set of numbers from 1 to 10M. Our function takes @N as an input and performs a one-row index seek.

Figure 1: idx.fnPrimes

``````CREATE FUNCTION idx.fnPrimes
(
@N BIGINT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT p.N, p.PRIME FROM idx.prime AS p WHERE p.N = @N;
GO``````

To check if a number is prime:

Figure 2: Basic idx.fnPrimes demo

``SELECT p.N, p.PRIME FROM idx.fnPrimes(11) AS p;``

To return the numbers 1 to 7 along with their prime value we use fnTally to generate the stream of numbers and map them to fnPrimes.

``````SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)  AS f
CROSS APPLY idx.fnPrimes(f.N) AS p;``````

This returns: [(1 1),(2 1),(3 1),(4 0),(5 1),(6 0),(7,1)]; now a quick performance test.

## Performance Test #1: Calculate Primes 1 to 1 Million

The purpose of this test is to prove out the concept of the indexed function. This query will take the values 1 through 10,000,000 and determine if they are prime or not. I am forcing a serial plan since the serial plan actually performs best. @P variable captures the value so as to avoid SSMS spitting out 10 million rows and ruining the performance metrics.

Figure 3.1: idx.fnPrimes Performance Test

``````PRINT CHAR(13)+'fnTally serial'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @ST DATETIME = GETDATE(), @P BIT;
SELECT      @P = p.PRIME
FROM        dbo.fnTally(1,10000000) AS f
CROSS APPLY idx.fnPrimes(f.N)       AS p
OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@ST,GETDATE());
GO 3``````

Figure 3.2: idx.fnPrimes Performance Test Results

``````fnTally serial
------------------------------------------------------------------------------------------
Beginning execution loop
2660
2727
2750``````

Under 3 seconds for idx.fnPrimes to accept the numbers 1 to 10,000,000 and determine if which ones are prime.  This with a serial execution plan!

## Unexpected Virtual Index Pitfalls (continued…)

One motivation for giving a name to what I call, the “virtual index,” is to make the following type of problem (and solution) easier to explain. Part 2 of this series you were introduced to some unexpected virtual indexing pitfalls which can be tricky to identify. There’s another issue to watch out for people using SQL 2019 which includes batch mode over rowstore, or users or who leverage Itzik Ben-Gan’s batch mode processing trick to enjoy this functionality on SQL Server 2016+.  I use SQL 2019 and the optimizer leverages a window aggregate function that invokes batch mode over rowstore which is the culprit here. Below are two techniques to retrieve the numbers 1 to 7 and identify which are prime.

Figure 4: Hidden CTE Tally table row retrieval issue

``````--==== 1. Leveraging fnTally
SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)  AS f
CROSS APPLY idx.fnPrimes(f.N) AS p;

--====  2. Static Values
SELECT      p.N, p.PRIME
FROM        (VALUES(1),(2),(3),(4),(5),(6),(7)) AS f(N)
CROSS APPLY idx.fnPrimes(f.N)                  AS p; ``````

Each function returns the expected result set: [(1 1),(2 1),(3 1),(4 0),(5 1),(6 0),(7,1)]; The execution plan for the fnTally solution.

Figure 4.1: Execution Plan Details

fnTally retrieved 2700 rows to generate the numbers 1-7! The second query didn’t have this issue. Luckily there are more than a few ways to solve this.

### Solution 1: Disable batch mode over rowstore

Let’s temporarily disable batch mode over rowstore and try again.

Figure 5.1: Solution #1: Disable batch mode over rowstore query

``````SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)    AS f
CROSS APPLY idx.fnPrimes(f.N)   AS p;
OPTION(USE HINT('DISALLOW_BATCH_MODE'));``````

Figure 5.2: Solution #1: Execution plan

7 rows only, much better!

### Solution 2: make_parallel

In my experience Adam Machanic’s make_parallel always forces plans that don’t include batch mode over rowstore. I don’t know if it’s an option with make_parallel in play.

Figure 6.1: Solution #2: make_parallel

``````SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)    AS f
CROSS APPLY idx.fnPrimes(f.N)   AS p
CROSS JOIN  dbo.make_parallel() AS parallel_execution;``````

Below is the portion of the plan where the ROW_NUMBER sequence is created; again, only 7 rows

Figure 6.2: Solution #2:  Execution plan

### Solution 3: Use a physical tally table

The conventional wisdom seems to be that the CTE tally table is faster than a physical table pretty much all the time, a reasonable assumption since the CTE tally table is readless but physical tally tables do require IO. Here is an example of where the physical tally table (correctly indexed) has an edge.

Figure 7.1: Solution #3: Physical Tally Table

Note that both a TOP + ORDER BY, WHERE clause filter, or a both together will do just fine.

``````SELECT TOP(7) p.N, p.PRIME
FROM        dbo.tally         AS f
CROSS APPLY idx.fnPrimes(f.N) AS p
ORDER BY    f.N;``````

Again, only 7 numbers retrieved.

Figure 7.2: Solution #3: Execution Plan

This is one of many subtle differences between a persisted tally table and a CTE version. Let’s explore deeper.

## RangeAB and RangeAB2

We will conclude by running two performance tests against one million rows. The one million limit is because that’s what dbo.tally is capped at. For this test let two tally table functions: the first can be considered an “add-on” to dbo.fnTally named dbo.rangeAB. The second takes identical parameters and returns identical results, the only difference is that it leverages physical tally table (dbo.tally) instead of dbo.fnTally.

Both functions can be used to help enforce best practices with as little code as possible. Even better: Both functions return the exact result set given the same parameters. This gives you the ability to test your set-based code against both a physical and virtual tally table index simply adding or removing the number “2” from the end of “RangeAB.”

The functions and usage examples are located here. For the purposes of this article I will show you the execution plans for a basic usage scenario so we can compare execution plans. Both examples use one of the RangeAB functions to return a series of weeks. We’ll run these with “show actual execution plan” turned on.

Figure 8.1: Basic Usage Example

``````DECLARE
@startdate DATE   = '20200831',
@weeks     BIGINT = 5,
@row1      BIGINT = 1;

BEGIN
--==== RangeAB (fnTally version)
SELECT
WeekNumber     = f.RN,
WeeksLeft      = f.OP,
DaysFromMedian = ABS(f.DM),
FROM     dbo.rangeAB(@row1, @weeks*7-(1-@row1), 7, @row1) AS f
ORDER BY f.RN;

--==== RangeAB2 (dbo.tally version)
SELECT
WeekNumber = f.RN,
WeeksLeft  = f.OP,
DaysFromMedian = ABS(f.DM),
FROM dbo.rangeAB2(@row1, @weeks*7-(1-@row1), 7, @row1) AS f
ORDER BY f.RN;
END;``````

Figure 8.2: Results (for each)

``````    WeekNumber  WeeksLeft  DaysFromMedian  WeekStart  WeekEnd
----------- ---------- --------------- ---------- ----------
1           5          14              2020-08-31 2020-09-06
2           4          7               2020-09-07 2020-09-13
3           3          0               2020-09-14 2020-09-20
4           2          7               2020-09-21 2020-09-27
5           1          14              2020-09-28 2020-10-04
``````

### Execution plans

Ignore the 40/60% estimates, they aren’t accurate, both perform almost identically. The win is that we can use each to quickly determine which is better – dbo.tally or dbo.fnTally without any logic changes. fnTally uses the Itzik Ben-Gan style ROW_NUMBER over cross joins, the other gets its rows from dbo.tally.

Figure 8.3: Execution Plans

## Prime Numbers Test #1: RangeAB for a running total of Primes

Now back to our indexed function, idx.fnPrimes. This first example is a simple running total of prime numbers from 1 to 50. Ignore the @gap parameter for now.

``````DECLARE @gap BIGINT = 1,
@low BIGINT = 0,
@hgh BIGINT = 50;

--==== Solution #1: Primes running total over an Indexed Function + Virtual Index (RangeAB)
SELECT
N           = ((rnd.RN-2)*@gap+@low)+1,
TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)            AS p
GROUP BY    rnd.RN
ORDER BY    rnd.RN;

--==== Solution #2: Primes running total over two indexed functions
SELECT
N           = ((rnd.RN-2)*@gap+@low)+1,
TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)             AS p
GROUP BY    rnd.RN
ORDER BY    rnd.RN; ``````

Figure 10.1: Primes Test #1 Results (truncated for brevity)

``````N       TotalPrimes
------- -----------
1       1
2       2
3       3
4       3
5       4
6       4
7       5
8       5
...
46      15
47      16
48      16
49      16
50      16``````

## Prime Numbers Test #2: Performance

Here we’ll do two tests, first with a serial plan, then with a parallel plan. What we need to do is create @gap-sized groups and collect a between @low and @hgh along with a count of prime numbers in that group. We’ll compare the performance RangeAB vs RangeAB2. Again, the logic is identical except that the latter leverages dbo.tally. Note the parameters.

First to test serial execution using MAXDOP 1.

Figure 11.1 Serial Test

``````SET STATISTICS IO, TIME ON;

DECLARE
@gap BIGINT = 100000,
@low BIGINT = 0,
@hgh BIGINT = 1000000;

PRINT CHAR(10)+'fnTally version'+CHAR(10)+REPLICATE('-',90);
SELECT
L           = ((rnd.RN-2)*@gap+@low)+1,
H           = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)            AS p
GROUP BY   rnd.RN
ORDER BY   rnd.RN
OPTION (MAXDOP 1);

PRINT CHAR(10)+'Tally Function V2 - Indexed'+CHAR(10)+REPLICATE('-',90);
SELECT
L           = ((rnd.RN-2)*@gap+@low)+1,
H           = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)             AS p
GROUP BY   rnd.RN
ORDER BY   rnd.RN
OPTION (MAXDOP 1);
SET STATISTICS IO, TIME OFF;``````

Figure 11.2: Primes Test #2 Result Set

``````L           H            TotalPrimes
----------- ------------ -----------
1           100000       19187
100001      200000       38367
200001      300000       57548
300001      400000       76728
400001      500000       95910
500001      600000       115092
600001      700000       134272
700001      800000       153453
800001      900000       172634
900001      1000000      191814``````

Figure 11.3: Primes Test #2 Performance (Serial: Time & IO)

``````fnTally version
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843792, physical reads 0...
Table 'Worktable'. Scan count 0, logical reads 15101551, physical reads 0...

SQL Server Execution Times: CPU time = 23312 ms,  elapsed time = 23452 ms.

Tally Function V2 - Indexed
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843792, physical reads 0...
Table 'tally'. Scan count 2, logical reads 9555, physical reads 0...

SQL Server Execution Times: CPU time = 9110 ms,  elapsed time = 9163 ms.``````

This a case where dbo.tally finds redemption. 9 Seconds vs 23 – not bad. Next for a parallel execution plan leveraging dbo.make_parallel.

Figure 11.4: Parallel Test

``````SET STATISTICS IO, TIME ON;

DECLARE
@gap BIGINT = 100000,
@low BIGINT = 0,
@hgh BIGINT = 1000000;

PRINT CHAR(10)+'fnTally version'+CHAR(10)+REPLICATE('-',90);
SELECT
L           = ((rnd.RN-2)*@gap+@low)+1,
H           = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)            AS p
CROSS JOIN  dbo.make_parallel()           AS x
GROUP BY   rnd.RN
ORDER BY   rnd.RN;

PRINT CHAR(10)+'Tally Function V2 - Indexed'+CHAR(10)+REPLICATE('-',90);
SELECT
L           = ((rnd.RN-2)*@gap+@low)+1,
H           = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)             AS p
CROSS JOIN  dbo.make_parallel()            AS x
GROUP BY   rnd.RN
ORDER BY   rnd.RN;
SET STATISTICS IO, TIME OFF;``````

Figure 11.5: Primes Test #2 Performance (Parallel: Time & IO)

``````fnTally version (RangeAB)
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843855, physical reads 0...
Table 'Worktable'. Scan count 0, logical reads 15101558, physical reads 0...

SQL Server Execution Times: CPU time = 35877 ms,  elapsed time = 7816 ms.

dbo.tally version (RangeAB2)
------------------------------------------------------------------------------------------
Table 'tally'. Scan count 9, logical reads 9604, physical reads 0...
Table 'prime'. Scan count 0, logical reads 16843855, physical reads 0...

SQL Server Execution Times: CPU time = 16687 ms,  elapsed time = 3638 ms.``````

The great news is that, in both cases, parallel execution speeds things up for both. The fist version (virtual index) increasing to 7.8 seconds, the second (the indexed function) completes in 3.6 seconds. The reads against the primes table (via the idx.primes indexed function) are excessive. That said, to calculate the numbers of primes, split into 100K chunks up to 1,000,000… that should take longer than 3.6 seconds.

## Conclusion

We have a new way to take complicated functions and speed them up to O(1) time leveraging a new concept I coined as the Indexed Function.; a powerful tool for you set-based programming toolbox. We have a way to quickly compare a CTE-Tally to a real one. We’ve learned that a physical tally table, in some cases, can blow the doors off a virtual one.

The Virtual Index and Indexed Functions are indispensable tools in my toolbox. Thanks for reading.

## How to find strings longer than 8K in SQL Server (Part 1)

Ever try searching a string for a substring larger than 8,000 characters? It’s actually more difficult than you think… without a good N-Grams function, that is.

## Before you continue

FIRST, this article leverages my NGrams2B function which can be found here: Nasty Fast N-Grams (Part 1). NGrams2B is the same as the 8K version but handles VARCHAR(MAX) strings. Since publishing this article I have since dropped the “2B” from the function name; you will see it referred to simply as dbo.ngrams in the code examples below.

SECOND, For people following my N-Grams series on SQL Server Central (SSC), let not your heart be troubled: I am in the process of wrapping up my N-Grams Series (Parts 2-6). This is a quick “one-off” intended to keep my new blog posts moving along…

## Introduction

Riddle me this: what will the query below return?

``````DECLARE @SearchText VARCHAR(MAX) = REPLICATE('X',9000),
@SomeString VARCHAR(MAX) = REPLICATE('X',8001);

SELECT CHARINDEX(@SearchText,@SomeString);``````

You would expect it to return a Zero(0) – @SearchText is a string of 9000 X’s, @SomeString is 8001 X’s. The query, however, returns a 1. That’s not right, how can this be? To get to the bottom of this let’s append this line to our query:

``SELECT LEN(@SearchText), LEN(@SomeString)``

This returns: 8000 for each. Ahhhhh, our sample strings are being implicitly converted from VARCHAR(MAX) to VARCHAR(8000). I make this mistake periodically when using REPLICATE for creating sample strings. To get around the 8K limitation we can case the input string as VARCHAR(MAX):

``````--==== Casting "X" as VARCHAR(MAX) circumvents the truncation issue
DECLARE
@SearchText VARCHAR(MAX) =       REPLICATE(CAST('X' AS VARCHAR(MAX)),9000),
SomeString  VARCHAR(MAX) = '###'+REPLICATE(CAST('X' AS VARCHAR(MAX)),9001);

SELECT LEN(@SearchText), LEN(@SomeString);``````

This prevents the truncation. Let’s run add our CHARINDEX query again:

``SELECT CHARINDEX(@SearchText,@SomeString);``
###### Results:
``````Msg 8152, Level 16, State 10, Line 132
String or binary data would be truncated.``````

Not as easy as you thought huh?. Perhaps PATINDEX can save the day…

``SELECT PATINDEX('%'+@SearchText+'%',@SomeString);``

Same error, grrrr… What about LIKE? Surely we can solve this using LIKE right?

``SELECT CASE WHEN @SearchText LIKE '%'+@SomeString+'%' THEN 1 ELSE 0 END;``

Fail – same error. I encountered this problem for the first time earlier this year and attempting to solve it took a toll on my keyboard and self-esteem.

## The Problem

To my knowledge, SQL Server doesn’t provide a way to search for a VACHAR longer than 8,000 or 4,000 for NVARCHAR. Based on my research it’s obvious that this is not a common problem. It will be more common as Developers continue to push the limits of what you can do with strings in SQL Server. And if I’m wrong? So what. Solving problems like this is fun and talking about N-Grams never gets old.

## dbo.NGrams to the Rescue (as per usual)

Ok-LIKE, CHARINDEX and PATINDEX are not valid options. To quote Ned Flanders, “As melon scratchers go, this one is a honeydoodle!” Fortunately SQL Server is powerful enough to check two VARCHAR(MAX) strings for equality. This is where a good N-Grams function is a super hero.

First we’ll tokenize (split) the @SomeString into tokens as long as @SearchText. It’s six characters long so we can split @SomeString into 6-grams.

``````--==== Sample Search Text and String to analyze
DECLARE @SearchText VARCHAR(MAX) = 'ABC123',
@SomeString VARCHAR(MAX) = 'AAABC123XYZ';

--==== Tokenize @SomeString into @SearchText-sized tokens
SELECT   ng.Position, ng.Token
FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng -- LEN(@SeachText=6)
ORDER BY ng.Position;``````
###### Results:
``````Position  Token
--------- ---------
1         AAABC1
2         AABC12
3         ABC123
4         BC123X
5         C123XY
6         123XYZ``````

The value of @SearchText (“ABC123”) is at Position 3: AAABC123XYZ. All that’s left is to add a WHERE filter to compare @SearchText to each token returned by dbo.NGrams.

``````SELECT   ng.Position, ng.Token
FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
WHERE    ng.Token = @SearchText
ORDER BY ng.Position;``````
###### Returns:
``````Position   Token
---------- ---------------
3          ABC123``````

This is what we want. Note that, thanks to the virtual index that my N-Grams function uses, I can sort by the token’s position in the string without a sort in the execution plan. That’s going to help us because we need to add a TOP(1) clause to identify the location of the first instance of @SearchText . Remember, we only need to know if the text exists. Adding TOP(1), with an ORDER BY to keep it deterministic, has multiple benefits: not only do we get a predictable result, this also assists the cardinality estimator and enables us to tap into the SQL optimizer’s Row Goals which, when correctly leveraged, allow for obscene performance gains.

Here we’re going to use strings (8K+) as variables from earlier but with a couple small changes then:

1. 1. Add the TOP(1) clause along with the proper ORDER BY clause
2. 2. Wrap the logic up in a subquery named f
3. 3. Retrieve the value MAX(f.ItemIndex) from f
4. 4. Add an ISNULL our MAX() to return 0 when there isn’t a match
``````DECLARE @SearchText VARCHAR(MAX) = REPLICATE(CAST('X' AS VARCHAR(MAX)),9000),
@SomeString VARCHAR(MAX) = '###'+REPLICATE(CAST('X' AS VARCHAR(MAX)),9001);

SELECT ItemIndex = ISNULL(MAX(f.ItemIndex),0)
FROM
(
-- Returns the first position of @SearchText
SELECT TOP(1) ng.Position
FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
WHERE    ng.Token = @SearchText
ORDER BY ng.Position
) AS f(ItemIndex);``````

This returns: 4 which is correct – @SearchText is a series of 9000 X’s and exists in Position 4 in @SomeString. And that’s it – easy peasy. Next for performance.

## Introducing the 2 Billion+ row test harness

I said two Billion, with a B – not a typo. I began the heading with “Introducing” because any test harness exceeding a billion rows deserves a proper introduction. I say, rows, not characters, because dbo.NGrams tokenizes (splits) strings into rows. You’re going to see many more billion+ row test harnesses as we explore the PerfML Universe.

The VARCHAR(MAX) character limit is 2,147,483,647. I rounded my sample values down to two billion, created a search string 50K characters long then stuffed that value into @SomeString, nested between 100,011 junk characters followed by two billion more for a grand total of 2,000,150,011 characters. We’ll run the query twice: once with a serial plan, another with a parallel plan using make_parallel() by Adam Machanic.

### Test Harness

``````SET STATISTICS TIME ON;
PRINT 'Build the test string'+CHAR(13)+REPLICATE('-',90);

DECLARE @Match VARCHAR(MAX) = REPLICATE(CAST('X' AS VARCHAR(MAX)),50000),
@Junk1 VARCHAR(MAX) = REPLICATE(CAST('Z' AS VARCHAR(MAX)),100011),
@Junk2 VARCHAR(MAX) = REPLICATE(CAST('#' AS VARCHAR(MAX)),2000000000);

DECLARE @SearchText VARCHAR(MAX) = @Match,
@SomeString VARCHAR(MAX) = @Junk1+@Match+@Junk2;

PRINT 'Performance Test #1: Serial Execution'+CHAR(13)+REPLICATE('-',90);

SELECT ItemIndex = ISNULL(MAX(f.ItemIndex),0)
FROM
(
-- Returns the first position of @SearchText
SELECT TOP(1) ng.Position
FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
WHERE    ng.Token = @SearchText
ORDER BY ng.Position
) AS f(ItemIndex)
OPTION (MAXDOP 0);

PRINT 'Performance Test #2: Parallel Execution'+CHAR(13)+REPLICATE('-',90);

SELECT ItemIndex = ISNULL(MAX(f.ItemIndex),0)
FROM
(
-- Returns the first position of @SearchText
SELECT TOP(1) ng.Position
FROM     dbo.ngrams(@SomeString,LEN(@SearchText)) AS ng
WHERE    ng.Token = @SearchText
ORDER BY ng.Position
) AS f(ItemIndex)
CROSS JOIN dbo.make_parallel() --  NOOOO >> OPTION (QUERYTRACEON 8649);

SET STATISTICS TIME OFF;``````

How do you think we’ll do? How many minutes do you think this will run? Drum roll………….

### 2B Row Test Results

``````Build the test string
------------------------------------------------------------------------------------------
SQL Server Execution Times: CPU time = 12453 ms,  elapsed time = 12576 ms.
SQL Server Execution Times: CPU time = 7032 ms,  elapsed time = 7060 ms.

Performance Test #1: Serial Execution
------------------------------------------------------------------------------------------
SQL Server Execution Times: CPU time = 1953 ms,  elapsed time = 1950 ms.

Performance Test #2: Parallel Execution
------------------------------------------------------------------------------------------
SQL Server Execution Times: CPU time = 3568 ms,  elapsed time = 3570 ms.``````

This ran for about 25 seconds on my laptop. These results are more impressive when we dig deeper. SQL Server spent ~20 seconds building the string, another 1.9 seconds to find what we were looking for with a serial plan, 3.5 seconds for parallel. make_parallel adds a bit of spaghetti to the plan but, aside from that, the plans are identical.

Execution plan for the first (serial) query

If we hover over the second Nested Loops Join operator in the plan we can see the actual number of rows generated by dbo.NGrams: The matching search text is 10,011 characters deep which explains the 100,012 operations

100,012 rows (pseudo iterations) to parse a 2,000,150,011 character string; this in under 2 seconds with only one CPU. That’s Nasty Fast! Can you guess how we got to 100,012 rows? I buried the search string 100,011 characters deep. The deeper the search string is, the number of rows dbo.NGrams creates increases, something we’ll address in Part 2.

### make_parallel is a Super Power

Queries which leverage dbo.NGrams and dbo.NGrams8k generally perform better with parallel execution but not in this case. Here I’d likely force a serial plan to prevent a slower parallel plan. I included a parallel solution to help you understand the magic which is make_parallel. I first discussed the obvious benefits of make_parallel in PerfML Part 2 Self-Tuning with a Tally Table. Now for a not-so-obvious benefit, and it’s big.

In my experience, make_parallel and TraceFlag 8649 basically perform identically ~95% of the time. The only reason I ever use TraceFlag 8649 is during development as make_parallel makes the execution plans less readable. Every once in a while, not often, TraceFlag 8649 causes a query to run for minutes (even hours) instead of seconds/milliseconds. This is due to Intra-Query Parallel Deadlocking. What makes this type of deadlocking distinctly painful is the pure randomness of it. Sometimes it happens every 10-100 query executions, sometimes it’s a few times a month. This an example of the type of adventures undocumented trace flags will take you on. make_parallel doesn’t have this issue. Just another reason make_parallel deserves a cape.

## Conclusion

2B strings are not common but this should give you a taste for how this technique will hold up against millions or even billions of smaller strings. This is the power of the tally table and dbo.NGrams. Unless someone can point me to a book, article or forum post that details a faster way to find a string longer that 8K in SQL Server I’m going to aver that this is the fastest known method for finding VARCHARs 8001 characters long in SQL Server without an Index. But can we make it faster, much faster, unimaginably faster! See you at Part 2 of this series. Thanks for reading!

## PerfML Part 2: Self-Tuning with a Tally Table

In PerfML Part 1 we covered a some new algorithms for creating a self-tuning function. In Part 2 we’ll turbo charge our algorithm with parallelism and a tally table.

## Before You Begin

This is the second article about PerfML, where I introduced some new concepts, as well as some less common ones which will be referenced heavily in this article, such as the dynamic algorithm (completely unrelated to dynamic programming). If you haven’t read Part 1, I suggest you do so before continuing as you will get much more out of this article. I use delimitedsplit8k, fnTally and make_parallel so grab a copy of each if you are playing along at home.

## Introduction

In Part 1 of this series you were introduced to the self-tuning function and PerfML, the science of creating fast, deterministic, elegant self-tuning routines such as functions and stored procedures. We reviewed the difference between exact and approximation algorithms and how they can be combined to return a deterministic result. I also introduced a new type of parameter: the gap parameter (G)which can be used to add gaps to a sequence the purpose of creating an approximation algorithm. We then created a string search routine which leveraged an exact, brute force algorithm to find a specific sequence in a string. Then we transformed the function to leverage an approximation algorithm using our gap parameter which allowed us to introduce a margin of error, or range of possible valid values, while reducing the cost of the algorithm by a factor of G while preserving the ability to return an exact result when G=1. Lastly, we reviewed how to reduce the search area of our exact algorithm by first using an approximation to identify a much smaller range of possible valid answers.

Today I’ll define and teach you how to use a tuning parameter, the most important component of PerfML. A tuning parameter can be defined as:

A parameter that allows you to dynamically tune a deterministic function’s algorithm while guaranteeing the same deterministic return value.

The gap parameter (introduced in Part 1), is a form of tuning parameter. The tuning parameter is the core component of PerfML, tuning parameters are where your intelligence lives. For any given input there can be only one tuning parameter value which does the least amount of work, that value will be perform the best – both in speed and resource usage (memory, IO). Determining the best value for your tuning parameter is the most important (and daunting) task you will perform while developing self-tuning functions; master this and your algorithms will figure out how to mater themselves. Choosing the best tuning parameter value can be done as part of a pre-processing task, at runtime or a combination of both. This will all make more sense when you see the code examples.

## Sequential to Set-Based with fnTally

In part 1 we created a self-tuning function leveraging a self-referencing recursive scalar UDF. The goal was not to develop the most performant function but rather to understand the algorithms and other new concepts which are vital to PerfML; iterative programming and basic recursion are easier to understand. Functions are a vital component of any major programming language and avoiding loops, and other iterative, RBAR-style logic is vital to writing fast clean code in most declarative programming languages. For this series I’m using Microsoft SQL server which means we’ll be using fnTally by SQL MVP Jeff Moden to replace our loop logic fromPart 1. There are many advantages to a tally table of a loop, these are the ones I consider most important:

1. Speed
2. Functional Purity
3. Ability to measure performance
4. Parallel execution
##### 1.    Speed

First, a lazy sequence (CTE tally table in this case) will perform faster than a loop, cursor, iterative recursion (using recursion for counting) and any other iterative methods used for solving common programming problems. fnTally counts to one billion in 90 seconds on my laptop. Second, the fastest kind of T-SQL function in SQL Server is the Inline Table Valued Function (iTVF). With iTVFs, however, data structures are immutable so there is no way to perform your typical loop (do, while, etc.) This leaves you two options in SQL: recursion  or a tally table. Recursion is slow, tally tables are fast and virtually I/O free (readless). Because of the immutable data structures, the functional purity of an iTVF and  speed of a quality tally table or tally table function, the SQL optimizer a full range of options including the ability to multi-thread execution, that is, leverage parallelism.

##### 2.    Functional Purity

fnTally is an iTVF, which is SQL Server’s version of a pure function . Pure functions are generally side-effect free, don’t experience memory leaks and allow for very clean, concise and reusable code. Note the Advantages of pure functions section of this Microsoft article: Functional programming vs. imperative programming and the benefits on SQL iTVFs in Paul White’s article about APPLY.

The primary reason to implement functional transformations as pure functions is that pure functions are composable: that is, self-contained and stateless. These characteristics bring a number of benefits, including the following:

1. Increased readability and maintainability. This is because each function is designed to accomplish a specific task given its arguments. The function doesn’t rely on any external state.
2. Easier reiterative development. Because the code is easier to refactor, changes to design are often easier to implement. For example, suppose you write a complicated transformation, and then realize that some code is repeated several times in the transformation. If you refactor through a pure method, you can call your pure method at will without worrying about side effects.
3. Easier testing and debugging. Because pure functions can more easily be tested in isolation, you can write test code that calls the pure function with typical values, valid edge cases, and invalid edge cases.

Starting with fast and clean pure functions for our self-tuning algorithms will keep your functions blazing while being easier to debug.

##### 3.    Measuring Performance

It is very difficult to accurately measure scalar UDF performance (except when leveraging SQL Server 2019 scalar UDF inlining.) Key performance metric collection tasks such as measuring I/O, accurate run times and even extracting a usable execution plan are impossible when writing non-inline (impure) T-SQL scalar UDFs. The opposite is true with an inline table valued function (iTVF); fnTally is an iTVF and will be used later in this article for the logic used to replace our recursive scalar logic from Part 1 (dbo.stringSearchV1) .Below is a portion  of the execution plan from dbo.stringSearch (truncated for brevity) which leverages fnTally.

Our iTVF that generated the plan above uses the same “exact approximation” logic we used in our scalar UDF from the previous article. In this plan, because I’m familiar with my logic I can see that approximation algorithm retrieves 32,064 rows from my fnTally, which grows to 42,064 during the merge join. The exact algorithm then requires another 18,144 rows from fnTally which grows to 26,613 during the second merge join. The number of iterations for the scalar UDF is a mystery; all we know for sure as you will see, is that the fnTally version is profoundly faster.

##### 4.    Parallel Execution

T-SQL scalar UDFs (as well as multi-statement table values functions) cannot utilize a parallel execution plan unless they can leverage SQL Server 2019 scalar UDF inlining, and still parallel execution is not possible with inlined scalar UDFs that are self-referencing (in my experience). iTVFs that leverage fnTally, on the other hand, can be processed asynchronously and are often substantially faster as you will see momentarily.

Most of my functions which leverage a tally table or correctly developed tally table function, such as dbo.fnTally, run much faster when leveraging a parallel execution plan. Sometimes a parallel plan yields no performance benefit at all which, then, is a liability – you have multiple CPUs doing the job of one instead doing something more productive. Sometimes parallel plans are slower than serial, that’s an even bigger bummer. Sometimes it’s system/instance level settings (e.g. cost threshold for parallelism) and sometimes even your set-based code is bad (set-based != bullet proof.) Sometimes the optimizer just gets it wrong. Either way, if you need a parallel execution plan you have two options, Trace Flag 8649 and make_parallel by Adam Machanic. Trace Flag 8649 is not documented and therefore not safe for production. make_parallel is a simple T-SQL function which, in short, freaks the optimizer out and causes it to choose parallel execution if it’s available. make_parallel adds a lot of bloat to the execution plan so: in Dev I use the trace flag, in Production I use make_parallel when I have determined that doing so is the best option.

## Function Logic

We are going to build the same function from part 1, dbo.stringsearchV1, using fnTally for our iterative logic. These are the parameters we’ll use:

``````DECLARE
@G BIGINT       = 5,   -- gap param
@C VARCHAR(100) = 'X', -- search character
@L BIGINT       = 1,   -- min length
@H BIGINT       = 22,  -- max length
@S VARCHAR(MAX) = '999999999-XXXXXXXXXXXXXX-999999999'; -- sample search string``````

fnTally takes two parameters: @ZeroOrOne, which determines is the first row will be zero or one; we will set this to 0. @MaxN will be the highest integer returned, for @MaxN I came up with this formula: (@H-@L+1)/@G+SIGN((@H-@L+1)%@G); it will determine the correct number of rows with the @G-sized gaps in the.

##### Avoiding the descending sort

SQL sorts generally have a complexity of N log N, which is worse than linear. This means that the more rows we sort, the cost to sort each row increases. For row counts in the 10’s of thousands, this is not a big deal but it becomes a problem when we get into the 100’s of thousands and beyond.

The recursive solution from Part 1 did not require a sort to determine the “longest” allowable string. For our fnTally version we will be seeking the longest value which does requires us to use an ORDER BY N DESC clause. To circumvent this problem we can return N is descending order using Jeff Moden’s formula: @MaxN-N+1, mentioned in the fnTally documentation. This only works, however, when we are seeking an exact result (@G=1) and, therefore, no gaps in the sequence.

To return the numbers beginning with @H, decrementing by @G until @L we’ll use: IIF(t.N<gap.N, @H-@G*t.N, @L). If the final number is less than @L, @L is returned for the final iteration. So far we have:

``````SELECT
[N ASC]  = t.N,                          -- fnTally "N" (0-base row number)
[N DESC] = IIF(t.N<gap.N, @H-@G*t.N, @L) -- Gap reduced sequence from @H to @L
FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- number of rows
CROSS APPLY dbo.fnTally(0,gap.N)                      AS t;     -- fnTally call``````

With this logic, setting @G=2 would return (truncated) [22 20 … 6 4 2 1] for “[N DESC]”; @G=5 returns [22 17 12 7 2 1], @G=6 returns [22 16 10 4 1], and so on. For the L column (the lower-bound value) we’re using [N Desc] from earlier. For H, the upper-bound, we’ll leverage LAG: f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1.

``````SELECT
L = IIF(t.N<gap.N, @H-@G*t.N, @L), -- Gap reduced sequence from @H to @L
H = f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
CROSS APPLY dbo.fnTally(0, gap.N)                     AS t
CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L)))   AS f(N);``````
##### Retrieving matched values

Then add a subquery that searches the string (@S) for an L-sized series of @C in the input string (@S), and exclude rows where there is no match:

``````CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S)))  AS p(Idx)
WHERE        p.Idx > 0``````

Finally, we wrap our logic into a subquery, add our TOP (1) clause, ordered by N. Note that I am using the aliased of “RN” for “Row Number”; I use RN as my sort-key, and “N” for my tally table N column. We’ll wrap all this up in an iTVF and create a VARCHAR(8000) and VARCHAR(MAX) version of our new stringsearch function. The only difference between the two functions is the data type for @S. The final tally table (lazy sequence) logic for the our two set-based approximation function are:

### dbo.stringSearch8K

``````CREATE OR ALTER FUNCTION dbo.stringSearch8K
(
@C CHAR(1),
@S VARCHAR(8000),
@L BIGINT,
@H BIGINT,
@G BIGINT
)
/* Created on 20201004 by Alan Burstein. */
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT TOP(1)
L   = f2.L,  -- Lower Boundary
H   = f2.H,  -- Upper Boundary
Idx = p.Idx -- ItemIndex (position of the item match)
FROM
(
SELECT t.N, f.N, f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
CROSS APPLY dbo.fnTally(0, gap.N)                     AS t
CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L)))   AS f(N)
) AS f2(RN,L,H)
CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S))) AS p(Idx)
WHERE        p.Idx > 0
ORDER BY     f2.RN;``````

### dbo.stringSearch

``````CREATE OR ALTER FUNCTION dbo.stringSearch
(
@C CHAR(1),
@S VARCHAR(MAX),
@L BIGINT,
@H BIGINT,
@G BIGINT
)
/* Created on 20201004 by Alan Burstein. */
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT TOP(1)
L   = f2.L, -- Lower Boundary
H   = f2.H, -- Upper Boundary
Idx = p.Idx -- ItemIndex (position of the item match)
FROM
(
SELECT t.N, f.N, f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
FROM        (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
CROSS APPLY dbo.fnTally(0, gap.N)                     AS t
CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L)))   AS f(N)
) AS f2(RN,L,H)
CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S))) AS p(Idx)
WHERE        p.Idx > 0
ORDER BY     f2.RN;``````

The 8K version is slightly faster when dealing with smaller strings with a smaller range of acceptable values. As the strings and range of acceptable values grow, the Max version becomes faster.

## The Tuning Procedure

The tuning procedure (see the attached DDL as a text file below) is a T-SQL stored procedure I spun up quickly to test different design patterns using values for @G. This will help us find the optimal parameter for a specific predictable input. The procedure accepts the following parameters:

The procedure accepts all the parameters required for dbo.stringsearch. @I is for the number of test iterations the stored procedure should perform. @mode is for selecting on of 10 different design patterns to test. The parameter, @gaps is a comma-delimited list of different gap parameters (@G) for our function to accept. @puff and @C build sample strings of NEWIDs with the search pattern stuffed in the middle. Setting @mode=1, @I=5, @gaps = ‘1,2,5,10’ will test a specific design pattern using dbo.stringsearch 5 times for each value in @gaps. There are 4 gap values so for this example that would be a total of 20 tests.

The proc has examples with a performance test for each design pattern; executed with serial and parallel execution plans. Again, in Development environments I use the trace_flag 8649 because it’s easier to read the execution plan. I use make_parallel in Production because it performs the same and is safe to use in Production.

Below is the example output from a test I ran against a large set of rows five times (@I=5) for seven parameters between 5 and 35 (@gaps=’5,10,15,20,25,30,35′). The Gap column is the value of @G, Total is the total run time, min and max represent the slowest and fastest runs respectively. TAvg is a rolling average of the total time.

``````Gap   Total  Min    Max    TAvg
----- ------ ------ ------ ------
25    69294  13140  14437  69294
20    69546  12804  14603  69420
30	  71460  13096  14950  70100
35	  74830  13514  15990  71282
15    75196  13790  16413  72065
10    94631  17383  20097  75826
5     165316 30423  34530  88610``````

In this above example, with the example data used, @G=20 had the fastest run at 12.8 seconds, but it appears that 25 is the optimal value based on the total run time.

In the real world we would replace our sample data table, ##strings, with a sampling of real data you are developing a self-tuning functions for. For example, once I used to write ETL packages that collected social media metrics from search and social media companies. The source data from each company came in as a text file where we would extract key metrics. The content of each file, and the input strings within, was looked very different for each company but, for each the text was predictable. The optimal use case for self-tuning function for string parsing here would be to determine the best gap parameter value for each companies source data. For this we could build a tuning procedure for each company and store the best parameter value for later use.

## Design Patterns Tested

We have two versions of our function, one for 8K strings and one for VARCHAR(MAX). The performance is nearly identical for both functions except that the 8K version is faster with shorter strings (e.g. less than a 1000 characters). Note these parameters, the queries, my comments and the results.

###### Test Parameters and sample data

Executing this code will create the sample data table (##strings) leaving it intact for further analysis.

``````DECLARE
@rows BIGINT        = 2000,
@puff BIGINT        = 5,
@gaps VARCHAR(8000) = '5,10,15,20,25,30,35', -- Gap parameters to test
@C    VARCHAR(100)  = 'X',
@L    BIGINT        = 1,
@H    BIGINT        = 1000,
@max  BIGINT        = 1100,
@I    BIGINT        = 1,
@out  BIGINT        = 1; -- general performance details of test run

EXEC dbo.usp_stringsearch_test1 @rows, @puff, @G, @C, @L, @H, @max, 15, @out, @I, 0;``````

Please test away and let me know what you find.

###### Design Patterns

With our sample table in place we can test a few different design patterns; there are three: our exact algorithm, a dynamic algorithm (approximation > exact) and a dynamic algorithm with two approximation reductions (approximation > approximation > exact).

``````--==== 1. EXACT algorithm
SELECT
s.stringId, s.string,             -- ##strings attributes
Boundaries = CONCAT(f.L,'-',f.H), -- The Upper and lower bounds (exact because @G=1))
ItemIndex  = f.Idx
FROM        ##strings                             AS s
CROSS APPLY dbo.stringSearch(@C,s.String,@L,@H,1) AS f -- @G=1 means EXACT
WHERE       f.L >= @L;

--==== 2. DYNAMIC algorithm: approximation to exact reduction
SELECT      @X = f.L
FROM        ##strings                                 AS s
CROSS APPLY dbo.stringSearch(@C,s.String,@L,@H,@G)    AS f0
CROSS APPLY dbo.stringSearch(@C,s.String,f0.L,f0.H,1) AS f
WHERE       f.L >= @L
END;

--==== 3. DYNAMIC algorithm with two reductions: Approx to Approx to Exact
DECLARE @G2 BIGINT = @G/3+1; -- Simple reduction: split by 1/3rd then add 1 (min=2)

SELECT      @X = f.L
FROM        ##strings                                       AS s
CROSS APPLY dbo.stringSearch(@C, s.String, @L, @H, @G)      AS f0
CROSS APPLY dbo.stringSearch(@C, s.String, f0.L, f0.H, @G2) AS f1
CROSS APPLY dbo.stringSearch(@C, s.String, f1.L,
IIF(f1.L+2<=@H,f1.L+2,@H), 1) AS f
WHERE       f.L >= @L;``````

The first will be the slowest because it’s running in serial mode. The second is much faster because, again, parallel plans turbo charge the performance when combining fnTally and a pure dynamic alogrithm. The Third example, the two-reduction solution is slower for short strings but the fastest (by far) as strings and search patterns get longer. When I already know a query is going to be slow I’ll often set @I low (1 or 2) just to confirm while saving time.

## Performance Test

Using the parameters posted above let’s execute dbo.usp_stringsearch_test1 to as shown below.

``````--==== 8K - Dynamic with 1 reduction - SERIAL
EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 5,  @out, @I, 1;

--==== VARCHAR MAX - Dynamic with 1 reduction(#6) & 1 string reduction(#7) - PARALLEL
SELECT @puff *= 10/*500*/, @I=5, @H=2000, @max=2000;
EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 6,  @out, @I, 1;
EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 7,  @out, @I, 1;

--==== 8K & VARCHAR MAX - Dynamic with 2X reductions (@G2=@G/2+1) - PARARLLEL
SELECT @gaps = '100,250,500,800,1000,2000', @I=10;
EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 8,  @out, @I, 1;
EXEC dbo.usp_stringsearch_test1 @rows, @puff, @gaps, @C, @L, @H, @max, 10, @out, @I, 1;``````

You can see from the results below that:

1. Parallelism makes a huge difference
2. The two reduction solution is the fastest
3. The 8K and VARCHAR(MAX) versions perform almost identically
4. With the one-reduction solutions the tuning parameter can be way too high or way too low, the two-reduction solution seems to perform better with the highest possible value for @G.

Note that, for modes 8 & 10 (the two-reduction test), I chose different (higher) values for the gap:

## Conclusion

In Part 1 we learned how to develop a self-tuning function using a dynamic algorithm. Today we created a much faster version of our function leveraging fnTally. Things get even nasty faster when we add make_parallel. Another victory for set-based SQL and the tally table. In part 3 we’ll use these concepts to do something much cooler. Thanks for reading!