Diving deeper into the world of virtual indexing while being reminded that set-based code is not always bulletproof
Introduction
In Virtual Indexing Part 1 I introduced the idea of the virtual index, the ordered (AKA “suitably sorted”) stream of numbers returned by ROW_NUMBER which allows you to perform operations like grouping and window aggregations without the help of an index OR a sort in the execution plan. Today I want to draw your attention to some hidden CTE tally table dangers that are easy to detect and resolve but can be catastrophic when missed.
Problem #1: Gaps and the Hidden Row Explosion
Identifying gaps and islands in sequences is a common SQL task. One of the first things I learned to do with a tally table is identify gaps. It’s an easy concept to grasp, easier than identifying islands. In the query below we have a table variable populated with the numbers 1 to 10, but with the numbers 3,5 and 9 missing. Let’s use fnTally to identify the missing numbers.
Figure 1. Using fnTally to find gaps in a sequence
DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);
DECLARE @Max BIGINT = 10;
SELECT f.N
FROM dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t ON t.N = f.N
WHERE t.N IS NULL;
This returns the expected results: [3 5 9]. Now the execution plan:
Figure 1.1. fnTally gaps in a sequence query execution plan
fnTally returns 10 rows, as expected, to create the numbers 1 to 10. But look at the clustered index scan against the table variable – 70 rows. That is tragic: 70 rows scanned to identify 3 missing values! Can you identify the problem? Take a minute to see if you can figure this out…
@t has 7 rows, fnTally is returning 10 rows: 7*10=70. The optimizer knows that the clustered index on @t is unique but is not taking advantage of how the numbers returned by fnTally are also uniqueand ordered. The optimizer is behaving as if a one-to-many relationship between t.N and f.N is possible. With this plan the execution engine is forced to compare every value in @t to every value returned by fnTally. As sad as this might make you feel at first, let not your heart be troubled. The workarounds are endless, the trick, really, is to understand which one is best and why.
Table Operator Modifications
Let’s start with the three solutions below, ordered by which I would recommend. Each of these will get the job done with minimal changes to the underlying logic. The first solution is to just add an additional filter: f.N <= @Max; the second is to replace fnTally with a physical tally table, the third is to use a hint.
Figure 2. fnTally gaps in a sequence solutions
DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);
DECLARE @Max BIGINT = 10;
--==== Solution #1: Add a WHERE filter for fnTally.N
SELECT f.N
FROM dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t ON t.N = f.N
WHERE f.N <= @Max -- solves the fake one-to-many issue
AND t.N IS NULL;
--==== Solution #2: Use a persisted tally table with a WHERE filter
SELECT f.N
FROM dbo.tally AS f
LEFT JOIN @t AS t
ON t.N = f.N
WHERE f.N <= @Max -- required becuase there is no TOP(@Max) clause
AND t.N IS NULL;
--==== Solution #3: Add a Query Hint to the left join
SELECT f.N
FROM dbo.fnTally(1,@Max) AS f
LEFT MERGE JOIN @t AS t -- use Merge Join algorithm instead of loop join
ON t.N = f.N
WHERE t.N IS NULL;
Each query returns the correct result and in each case only 7 rows total are retrieved from @t instead of 7 rows for each number returned by fnTally.
Figure 3. Execution plan for the table operator solutions
I didn’t call it out in the screen shot but with each solution only 7 rows were retrieved instead of 70. The first solution is the easiest – just add an additional WHERE filter. This may seem redundant as it’s not necessary, but it works and it’s a simple fix. If it were always this simple I could end the discussion here but it’s not. The second solution is to use a physical tally table (dbo.tally in this example) instead; this solves the problem in this example and has other advantages I’ll cover momentarily. The third solution is to force a merge join using a query hint; this works but is my last choice. There have been times where a query hint is the only option which is why I’m calling it out now.
Anti-Join Solutions
Despite the fact that you have seen the term, Anti-join in the SSMS execution plan, it is not a commonly used phrase in the RDBMS world. An anti-join is but it is the best way to describe a scenario where you need all items from one table that do not exist in a second table.
Figure 4. Anti-join Venn Diagram
Reviewing the execution plan for the first set of solutions above (figure 2) you see that each used a Left Outer Join operator to identify the missing rows, then by a filter to exclude the others. Let’s examine two more solutions which leverage (NOT) EXISTS logical operator and the EXCEPT set operator.
Figure 5. Anti-join solutions
DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);
DECLARE @Max BIGINT = 10;
--==== Solution #4: EXCEPT >> Merge join handles the filter
SELECT f.N FROM dbo.fnTally(1,@Max) AS f
EXCEPT
SELECT t.N FROM @t AS t;
--==== Solution #5: NOT EXISTS >> Identical to above but with nested loop join
SELECT f.N
FROM dbo.fnTally(1,@Max) AS f
WHERE NOT EXISTS (SELECT t.N FROM @t AS t WHERE t.N = f.N)
AND f.N <= @Max; -- Optional, forces loop join
Figure 5.1. Anti-join solutions execution plans
Both produce almost identical execution plans and are both efficient. The EXCEPT solution is the best IMO because it’s the cleanest.
Parallelism
Now let’s safely force a parallel execution plan against the left join solution with the WHERE filter (Figure #2, Solution 1) then, again, with our EXCEPT anti-join from Figure #5.
Figure 6. Parallel execution test
DECLARE @t TABLE (N BIGINT NOT NULL PRIMARY KEY);
INSERT @t(N) VALUES(1),(2),(4),(6),(7),(8),(10);
DECLARE @Max BIGINT = 10;
--==== Good when serial, bad for parallel
SELECT f.N
FROM dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t ON t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
WHERE t.N IS NULL
AND f.N <= @Max
--==== Winner for serial and parallel
SELECT f.N FROM dbo.fnTally(1,@Max) AS f
EXCEPT
SELECT t.N FROM @t AS t CROSS JOIN dbo.make_parallel() AS x;
Below is the portion of each plan where the rows are retrieved. Even with the WHERE f.N <= @Max clause in place, which solved the 70-row explosion problem earlier with a serial plan, the row explosion returns with parallel execution. The EXCEPT solution, however, does not have this problem with a serial or parallel plan.
Figure 6.1. Parallel execution performance plan
Problem #2: Gaps, Left Join Aggregation and Parallelism
Keeping with the theme of gaps let’s use the same table variable from earlier but include duplicate and missing values. Using this sample data:
Figure 7. Problem #2 sample data
DECLARE @t TABLE (N BIGINT INDEX IX1 CLUSTERED NOT NULL);
INSERT @t(N) VALUES(1),(1),(2),(2),(2),(2),(4),(4),(6),(7),(8),(8),(8),(10),(10),(10);
Like earlier, this query will suffer the aforementioned hidden row explosions; instead of retrieving 16 rows from @t, it’s retrieving 160 rows (16*@Max.)
Figure 8. Problem #2, Solution #1 (fail)
SELECT f.N, Total = COUNT(t.N)
FROM dbo.fnTally(1,@Max) AS f
LEFT JOIN @t AS t
ON t.N = f.N -- could not shake the row explosion
GROUP BY f.N;
Again, an f.N <= @Max in a WHERE filter or as a join. t.N <= @Max will also do the trick in the WHERE or join filters. The problem, again, is that this only works when the optimizer chooses a serial execution plan. This means that, in this case, you must always use OPTION (MAXDOP 1) but be banished to eternal serial processing. C’mon man!
Problem #2 Workarounds
Merge Join hint will solve the problem but cannot get a parallel execution plan; in other words, make_parallel will be successful at solving the row explosion issue but not at forcing a parallel plan. The persisted tally table solution on the other hand, does not experience the row explosion issue and can enjoy both serial and parallel execution.
Figure 9. Problem #2 – Solutions #2 & #3
DECLARE @t TABLE (N BIGINT INDEX IX1 CLUSTERED NOT NULL);
INSERT @t(N) VALUES(1),(1),(2),(2),(2),(2),(4),(4),(6),(7),(8),(8),(8),(10),(10),(10);
DECLARE @Max BIGINT = 10;
-- Solution 1 - works, query hint though & no parallel execution
SELECT f.N, Total = COUNT(t.N)
FROM dbo.fnTally(1,@Max) AS f
LEFT MERGE JOIN @t AS t
ON t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
GROUP BY f.N;
-- Solution 2 - good serial & Parallel
SELECT f.N, Total = COUNT(t.N)
FROM dbo.tally AS f
LEFT JOIN @t AS t
ON t.N = f.N
CROSS JOIN dbo.make_parallel() AS x
WHERE f.N <= @Max
GROUP BY f.N;
In each case the row explosion is gone but only the physical tally table solution (dbo.tally) solves the problem with a serial or parallel plan while preserving the optimizer’s option to execute a parallel plan. This is an example of where fnTally simply cannot compete with dbo.tally.
Set-Based =! Bulletproof (Conclusion)
If there is a lesson to be learned today it’s that the very existence of a numbers table or the text, “tally” in your code does not guarantee that it will be fast. Furthermore, just because your set-based solution is fast – it can always be faster, exponentially faster in many cases as you will see in future articles. In this article we watched an innocent query to identify gaps in a sequence go terribly wrong. Fortunately, there are many workarounds provided that you are identify the problem when it arises. Hopefully you are now better prepared.
Which is faster again? fnTally or a tally persisted tally table? It depends. As per usual.
Less Code, More Fast – this is my coding philosophy in four words and the tagline for this site. Replacing bloated iterative procedural code with clean, set-based, pure, reusable code makes it easy to deploy code faster and with fewer bugs. Veteran programmers who write high-performing declarative code, such as SQL Developers, know that pure, set-based, loop-free code performs exponentially faster than code that uses loops, mutable data structures and procedural constructs such as “goto”. A correctly written tally table function, such as fnTally, count from 0 to 1,000,000,000 in 90 seconds on my laptop. I wouldn’t even try with a loop. Replacing loops with a tally table (or some other form of lazy sequence) is a low-risk, low-cost, high-reward way to write cleaner, faster code.
The Virtual Index
One small benefit to writing loops to perform tasks in your functions and procedures is that you don’t have to worry about sorting. With a set-based approach that leverages a tally table, if the order of the numbers returned matters then the optimizer will have to sort the numbers for you unless they are presorted, AKA indexed. When using physical tally table that is properly indexed sorting is not a concern, but what about with a tally table function like fnTally? You can’t index a function right?
Returning the numbers 1 to 7, in order, without a sort
An important consideration when replacing a loop with a tally table is sorting. The most common way to handle sorting in the SQL world is to presort using an index. fnTally is a function, you can’t add an index to a function so how do you return the numbers [ 1 2 3 4 5 6 7] in that order without sorting them? You can’t unless they are already sorted. fnTally returns it’s numbers as an ordered set. This ordered stream of numbers is what I refer to as the virtual index. Virtual indexing can be thought of as the art/science of:
Understanding what the virtual index is
Knowing when\how to use the virtual index
Understanding the alternatives and when to use them
If you are a set- based SQL coding warrior then you know that you can use ROW_NUMBER to create a Virtual Auxiliary Table of Numbers, AKA: tally table. With that virtual table comes with a virtual index. To understand and appreciate the difference between a virtual index and a physical index, lets create a mini-numbers temp table with the numbers 1 to 6. We won’t include an index which means that the numbers are unordered.
Figure 1 – Mini tally table (#nums)
--==== Create a small Number's table
IF OBJECT_ID('tempdb..#nums','U') IS NOT NULL DROP TABLE #nums;
CREATE TABLE #nums (N INT NOT NULL);
INSERT #nums VALUES (1),(2),(3),(4),(5),(6);
This table could be better described as a “tally heap” or a “virtual auxiliary heap of numbers” Let’s run a three queries against #nums without and index present then review execution plans.
Heap vs Index Tests
First to test our queries against #nums without an index in place:
Figure 2 – Test #1: No index (Heap)
SELECT TOP(3) t.N
FROM #nums AS t
ORDER BY t.N;
SELECT t.N, AVG(t.N)
FROM #nums AS t
GROUP BY t.N;
SELECT
[SUM] = SUM(t.N) OVER (ORDER BY t.N),
[LAG] = LAG(t.N,1,0) OVER (ORDER BY t.N)
FROM #nums AS t;
Figure 3 – Test #1 execution plans without an index on #nums
Each query required a sort operator. TOP with an ORDER BY, grouping (the GROUP BY), and the window functions (SUM OVER and LAG) are examples of operations which require an ordered set, AKA “suitably sorted stream.” If the stream of input values is unordered, the most common remedy is a sort operatorto create the suitably sorted stream when the query runs.
Hover over the properties for the stream aggregate and sequence operators in the second and third plans to see what each operator does and what they need:
Figure 3.1. – Stream Aggregate and Sequence Project Details
Suitable Sorting with a Physical Index
Sorting at runtime should be avoided, especially when dealing with a large volume of rows. The typical sort cost is n log n which is slower than linear. It’s like the exact opposite of a bulk discount – the cost per row increases as you increase rows. Let’s presort with a unique clustered index on #nums.
Figure 4 – Index for our numbers table (#nums)
CREATE UNIQUE CLUSTERED INDEX uq_nums ON #nums(N);
Now execute the same queries from Test #1(Figure 2) with the clustered index in place.
Figure 4.1. – Execution plan with index in place on #nums
Now that #nums is presorted there isn’t a need to sort at runtime.
The Suitably Sorted Virtual Index
First for the same queries but using fnTally and the execution plan.
Figure 5: Virtual Index Test Query
SELECT TOP(3) t.N FROM dbo.fnTally(1,6) AS t ORDER BY t.N;
SELECT t.N, AVG(t.N) FROM dbo.fnTally(1,6) AS t GROUP BY t.N;
SELECT [SUM] = SUM(t.N) OVER (ORDER BY t.N),
[LAG] = LAG(t.N,1,0) OVER (ORDER BY t.N)
FROM dbo.fnTally(1,6) AS t;
Figure 5.1: Virtual Index Test Execution Plans
That’s the exact same execution plan as in figure 4.1, after the clustered index was added to #nums! The only difference is the nested loop joins on the right which are used to generate the rows. fnTally returns thenumbers as an ordered set. In other words, for each query:
Using #nums without an index forces the optimizer to add a sort operator to execution plan for handling TOP, GROUP BY and a couple window function functions (LAG and SUM() OVER())
Adding an index to #nums presorts the values and eliminates the requirement to sort the values at runtime
fnTally is an inline table values function(iTVF) and cannot be indexed
In this example fnTally’s N column behaves like the indexed N column on #nums
The CTE tally table returns the numbers as an ordered set and generates an identical execution plan as when using a persisted tally table with an index. This is what I mean by, a virtual index. The virtual index here is the ordered set of numbers that ROW_NUMBER creates. There is a lot more to this topic that I look forward to exploring in Part 2 of this series.
In PerfML Part 1 we covered a some new algorithms for creating a self-tuning function. In Part 2 we’ll turbo charge our algorithm with parallelism and a tally table.
Before You Begin
This is the second article about PerfML, where I introduced some new concepts, as well as some less common ones which will be referenced heavily in this article, such as the dynamic algorithm (completely unrelated to dynamic programming). If you haven’t read Part 1, I suggest you do so before continuing as you will get much more out of this article. I use delimitedsplit8k, fnTally and make_parallel so grab a copy of each if you are playing along at home.
Introduction
In Part 1 of this series you were introduced to the self-tuning function and PerfML, the science of creating fast, deterministic, elegant self-tuning routines such as functions and stored procedures. We reviewed the difference between exact and approximation algorithms and how they can be combined to return a deterministic result. I also introduced a new type of parameter: the gap parameter (G)which can be used to add gaps to a sequence the purpose of creating an approximation algorithm. We then created a string search routine which leveraged an exact, brute force algorithm to find a specific sequence in a string. Then we transformed the function to leverage an approximation algorithm using our gap parameter which allowed us to introduce a margin of error, or range of possible valid values, while reducing the cost of the algorithm by a factor of G while preserving the ability to return an exact result when G=1. Lastly, we reviewed how to reduce the search area of our exact algorithm by first using an approximation to identify a much smaller range of possible valid answers.
Today I’ll define and teach you how to use a tuning parameter, the most important component of PerfML. A tuning parameter can be defined as:
A parameter that allows you to dynamically tune a deterministic function’s algorithm while guaranteeing the same deterministic return value.
The gap parameter (introduced in Part 1), is a form of tuning parameter. The tuning parameter is the core component of PerfML, tuning parameters are where your intelligence lives. For any given input there can be only one tuning parameter value which does the least amount of work, that value will be perform the best – both in speed and resource usage (memory, IO). Determining the best value for your tuning parameter is the most important (and daunting) task you will perform while developing self-tuning functions; master this and your algorithms will figure out how to mater themselves. Choosing the best tuning parameter value can be done as part of a pre-processing task, at runtime or a combination of both. This will all make more sense when you see the code examples.
Sequential to Set-Based with fnTally
In part 1 we created a self-tuning function leveraging a self-referencing recursive scalar UDF. The goal was not to develop the most performant function but rather to understand the algorithms and other new concepts which are vital to PerfML; iterative programming and basic recursion are easier to understand. Functions are a vital component of any major programming language and avoiding loops, and other iterative, RBAR-style logic is vital to writing fast clean code in most declarative programming languages. For this series I’m using Microsoft SQL server which means we’ll be using fnTally by SQL MVP Jeff Moden to replace our loop logic fromPart 1. There are many advantages to a tally table of a loop, these are the ones I consider most important:
Speed
Functional Purity
Ability to measure performance
Parallel execution
1. Speed
First, a lazy sequence (CTE tally table in this case) will perform faster than a loop, cursor, iterative recursion (using recursion for counting) and any other iterative methods used for solving common programming problems. fnTally counts to one billion in 90 seconds on my laptop. Second, the fastest kind of T-SQL function in SQL Server is the Inline Table Valued Function(iTVF). With iTVFs, however, data structures are immutable so there is no way to perform your typical loop (do, while, etc.) This leaves you two options in SQL: recursion or a tally table. Recursion is slow, tally tables are fast and virtually I/O free (readless). Because of the immutable data structures, the functional purity of an iTVF and speed of a quality tally table or tally table function, the SQL optimizer a full range of options including the ability to multi-thread execution, that is, leverage parallelism.
2. Functional Purity
fnTally is an iTVF, which is SQL Server’s version of a pure function. Pure functions are generally side-effect free, don’t experience memory leaks and allow for very clean, concise and reusable code. Note the Advantages of pure functions section of this Microsoft article: Functional programming vs. imperative programming and the benefits on SQL iTVFs in Paul White’s article about APPLY.
The primary reason to implement functional transformations as pure functions is that pure functions are composable: that is, self-contained and stateless. These characteristics bring a number of benefits, including the following:
Increased readability and maintainability. This is because each function is designed to accomplish a specific task given its arguments. The function doesn’t rely on any external state.
Easier reiterative development. Because the code is easier to refactor, changes to design are often easier to implement. For example, suppose you write a complicated transformation, and then realize that some code is repeated several times in the transformation. If you refactor through a pure method, you can call your pure method at will without worrying about side effects.
Easier testing and debugging. Because pure functions can more easily be tested in isolation, you can write test code that calls the pure function with typical values, valid edge cases, and invalid edge cases.
Starting with fast and clean pure functions for our self-tuning algorithms will keep your functions blazing while being easier to debug.
3. Measuring Performance
It is very difficult to accurately measure scalar UDF performance (except when leveraging SQL Server 2019 scalar UDF inlining.) Key performance metric collection tasks such as measuring I/O, accurate run times and even extracting a usable execution plan are impossible when writing non-inline (impure) T-SQL scalar UDFs. The opposite is true with an inline table valued function (iTVF); fnTally is an iTVF and will be used later in this article for the logic used to replace our recursive scalar logic from Part 1 (dbo.stringSearchV1) .Below is a portion of the execution plan from dbo.stringSearch (truncated for brevity) which leverages fnTally.
Our iTVF that generated the plan above uses the same “exact approximation” logic we used in our scalar UDF from the previous article. In this plan, because I’m familiar with my logic I can see that approximation algorithm retrieves 32,064 rows from my fnTally, which grows to 42,064 during the merge join. The exact algorithm then requires another 18,144 rows from fnTally which grows to 26,613 during the second merge join. The number of iterations for the scalar UDF is a mystery; all we know for sure as you will see, is that the fnTally version is profoundly faster.
4. Parallel Execution
T-SQL scalar UDFs (as well as multi-statement table values functions) cannot utilize a parallel execution plan unless they can leverage SQL Server 2019 scalar UDF inlining, and still parallel execution is not possible with inlined scalar UDFs that are self-referencing (in my experience). iTVFs that leverage fnTally, on the other hand, can be processed asynchronously and are often substantially faster as you will see momentarily.
Most of my functions which leverage a tally table or correctly developed tally table function, such as dbo.fnTally, run much faster when leveraging a parallel execution plan. Sometimes a parallel plan yields no performance benefit at all which, then, is a liability – you have multiple CPUs doing the job of one instead doing something more productive. Sometimes parallel plans are slower than serial, that’s an even bigger bummer. Sometimes it’s system/instance level settings (e.g. cost threshold for parallelism) and sometimes even your set-based code is bad (set-based != bullet proof.) Sometimes the optimizer just gets it wrong. Either way, if you need a parallel execution plan you have two options, Trace Flag 8649 and make_parallel by Adam Machanic. Trace Flag 8649 is not documented and therefore not safe for production. make_parallel is a simple T-SQL function which, in short, freaks the optimizer out and causes it to choose parallel execution if it’s available. make_parallel adds a lot of bloat to the execution plan so: in Dev I use the trace flag, in Production I use make_parallel when I have determined that doing so is the best option.
Function Logic
We are going to build the same function from part 1, dbo.stringsearchV1, using fnTally for our iterative logic. These are the parameters we’ll use:
DECLARE
@G BIGINT = 5, -- gap param
@C VARCHAR(100) = 'X', -- search character
@L BIGINT = 1, -- min length
@H BIGINT = 22, -- max length
@S VARCHAR(MAX) = '999999999-XXXXXXXXXXXXXX-999999999'; -- sample search string
fnTally takes two parameters: @ZeroOrOne, which determines is the first row will be zero or one; we will set this to 0. @MaxN will be the highest integer returned, for @MaxN I came up with this formula: (@H-@L+1)/@G+SIGN((@H-@L+1)%@G); it will determine the correct number of rows with the @G-sized gaps in the.
Avoiding the descending sort
SQL sorts generally have a complexity of N log N, which is worse than linear. This means that the more rows we sort, the cost to sort each row increases. For row counts in the 10’s of thousands, this is not a big deal but it becomes a problem when we get into the 100’s of thousands and beyond.
The recursive solution from Part 1 did not require a sort to determine the “longest” allowable string. For our fnTally version we will be seeking the longest value which does requires us to use an ORDER BY N DESC clause. To circumvent this problem we can return N is descending order using Jeff Moden’s formula: @MaxN-N+1, mentioned in the fnTally documentation. This only works, however, when we are seeking an exact result (@G=1) and, therefore, no gaps in the sequence.
To return the numbers beginning with @H, decrementing by @G until @L we’ll use: IIF(t.N<gap.N, @H-@G*t.N, @L). If the final number is less than @L, @L is returned for the final iteration. So far we have:
SELECT
[N ASC] = t.N, -- fnTally "N" (0-base row number)
[N DESC] = IIF(t.N<gap.N, @H-@G*t.N, @L) -- Gap reduced sequence from @H to @L
FROM (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- number of rows
CROSS APPLY dbo.fnTally(0,gap.N) AS t; -- fnTally call
With this logic, setting @G=2 would return (truncated) [22 20 … 6 4 2 1] for “[N DESC]”; @G=5 returns [22 17 12 7 2 1], @G=6 returns [22 16 10 4 1], and so on. For the L column (the lower-bound value) we’re using [N Desc] from earlier. For H, the upper-bound, we’ll leverage LAG: f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1.
SELECT
L = IIF(t.N<gap.N, @H-@G*t.N, @L), -- Gap reduced sequence from @H to @L
H = f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
FROM (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
CROSS APPLY dbo.fnTally(0, gap.N) AS t
CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L))) AS f(N);
Retrieving matched values
Then add a subquery that searches the string (@S) for an L-sized series of @C in the input string (@S), and exclude rows where there is no match:
CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S))) AS p(Idx)
WHERE p.Idx > 0
Finally, we wrap our logic into a subquery, add our TOP (1) clause, ordered by N. Note that I am using the aliased of “RN” for “Row Number”; I use RN as my sort-key, and “N” for my tally table N column. We’ll wrap all this up in an iTVF and create a VARCHAR(8000) and VARCHAR(MAX) version of our new stringsearch function. The only difference between the two functions is the data type for @S. The final tally table (lazy sequence) logic for the our two set-based approximation function are:
dbo.stringSearch8K
CREATE OR ALTER FUNCTION dbo.stringSearch8K
(
@C CHAR(1),
@S VARCHAR(8000),
@L BIGINT,
@H BIGINT,
@G BIGINT
)
/* Created on 20201004 by Alan Burstein. */
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT TOP(1)
L = f2.L, -- Lower Boundary
H = f2.H, -- Upper Boundary
Idx = p.Idx -- ItemIndex (position of the item match)
FROM
(
SELECT t.N, f.N, f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
FROM (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
CROSS APPLY dbo.fnTally(0, gap.N) AS t
CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L))) AS f(N)
) AS f2(RN,L,H)
CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S))) AS p(Idx)
WHERE p.Idx > 0
ORDER BY f2.RN;
dbo.stringSearch
CREATE OR ALTER FUNCTION dbo.stringSearch
(
@C CHAR(1),
@S VARCHAR(MAX),
@L BIGINT,
@H BIGINT,
@G BIGINT
)
/* Created on 20201004 by Alan Burstein. */
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT TOP(1)
L = f2.L, -- Lower Boundary
H = f2.H, -- Upper Boundary
Idx = p.Idx -- ItemIndex (position of the item match)
FROM
(
SELECT t.N, f.N, f.N+ISNULL(LAG(f.N,1) OVER (ORDER BY t.N)-f.N,1)-1
FROM (VALUES((@H-@L+1)/@G+SIGN((@H-@L+1)%@G))) AS gap(N) -- how many rows
CROSS APPLY dbo.fnTally(0, gap.N) AS t
CROSS APPLY (VALUES(IIF(t.N<gap.N, @H-@G*t.N, @L))) AS f(N)
) AS f2(RN,L,H)
CROSS APPLY (VALUES(CHARINDEX(REPLICATE(@C,f2.L),@S))) AS p(Idx)
WHERE p.Idx > 0
ORDER BY f2.RN;
The 8K version is slightly faster when dealing with smaller strings with a smaller range of acceptable values. As the strings and range of acceptable values grow, the Max version becomes faster.
The Tuning Procedure
The tuning procedure (see the attached DDL as a text file below) is a T-SQL stored procedure I spun up quickly to test different design patterns using values for @G. This will help us find the optimal parameter for a specific predictable input. The procedure accepts the following parameters:
The procedure accepts all the parameters required for dbo.stringsearch. @I is for the number of test iterations the stored procedure should perform. @mode is for selecting on of 10 different design patterns to test. The parameter, @gaps is a comma-delimited list of different gap parameters (@G) for our function to accept. @puff and @C build sample strings of NEWIDs with the search pattern stuffed in the middle. Setting @mode=1, @I=5, @gaps = ‘1,2,5,10’ will test a specific design pattern using dbo.stringsearch 5 times for each value in @gaps. There are 4 gap values so for this example that would be a total of 20 tests.
The proc has examples with a performance test for each design pattern; executed with serial and parallel execution plans. Again, in Development environments I use the trace_flag 8649 because it’s easier to read the execution plan. I use make_parallel in Production because it performs the same and is safe to use in Production.
Below is the example output from a test I ran against a large set of rows five times (@I=5) for seven parameters between 5 and 35 (@gaps=’5,10,15,20,25,30,35′). The Gap column is the value of @G, Total is the total run time, min and max represent the slowest and fastest runs respectively. TAvg is a rolling average of the total time.
In this above example, with the example data used, @G=20 had the fastest run at 12.8 seconds, but it appears that 25 is the optimal value based on the total run time.
In the real world we would replace our sample data table, ##strings, with a sampling of real data you are developing a self-tuning functions for. For example, once I used to write ETL packages that collected social media metrics from search and social media companies. The source data from each company came in as a text file where we would extract key metrics. The content of each file, and the input strings within, was looked very different for each company but, for each the text was predictable. The optimal use case for self-tuning function for string parsing here would be to determine the best gap parameter value for each companies source data. For this we could build a tuning procedure for each company and store the best parameter value for later use.
Design Patterns Tested
We have two versions of our function, one for 8K strings and one for VARCHAR(MAX). The performance is nearly identical for both functions except that the 8K version is faster with shorter strings (e.g. less than a 1000 characters). Note these parameters, the queries, my comments and the results.
Test Parameters and sample data
Executing this code will create the sample data table (##strings) leaving it intact for further analysis.
DECLARE
@rows BIGINT = 2000,
@puff BIGINT = 5,
@gaps VARCHAR(8000) = '5,10,15,20,25,30,35', -- Gap parameters to test
@C VARCHAR(100) = 'X',
@L BIGINT = 1,
@H BIGINT = 1000,
@max BIGINT = 1100,
@I BIGINT = 1,
@out BIGINT = 1; -- general performance details of test run
EXEC dbo.usp_stringsearch_test1 @rows, @puff, @G, @C, @L, @H, @max, 15, @out, @I, 0;
Please test away and let me know what you find.
Design Patterns
With our sample table in place we can test a few different design patterns; there are three: our exact algorithm, a dynamic algorithm (approximation > exact) and a dynamic algorithm with two approximation reductions (approximation > approximation > exact).
--==== 1. EXACT algorithm
SELECT
s.stringId, s.string, -- ##strings attributes
Boundaries = CONCAT(f.L,'-',f.H), -- The Upper and lower bounds (exact because @G=1))
ItemIndex = f.Idx
FROM ##strings AS s
CROSS APPLY dbo.stringSearch(@C,s.String,@L,@H,1) AS f -- @G=1 means EXACT
WHERE f.L >= @L;
--==== 2. DYNAMIC algorithm: approximation to exact reduction
SELECT @X = f.L
FROM ##strings AS s
CROSS APPLY dbo.stringSearch(@C,s.String,@L,@H,@G) AS f0
CROSS APPLY dbo.stringSearch(@C,s.String,f0.L,f0.H,1) AS f
WHERE f.L >= @L
END;
--==== 3. DYNAMIC algorithm with two reductions: Approx to Approx to Exact
DECLARE @G2 BIGINT = @G/3+1; -- Simple reduction: split by 1/3rd then add 1 (min=2)
SELECT @X = f.L
FROM ##strings AS s
CROSS APPLY dbo.stringSearch(@C, s.String, @L, @H, @G) AS f0
CROSS APPLY dbo.stringSearch(@C, s.String, f0.L, f0.H, @G2) AS f1
CROSS APPLY dbo.stringSearch(@C, s.String, f1.L,
IIF(f1.L+2<=@H,f1.L+2,@H), 1) AS f
WHERE f.L >= @L;
The first will be the slowest because it’s running in serial mode. The second is much faster because, again, parallel plans turbo charge the performance when combining fnTally and a pure dynamic alogrithm. The Third example, the two-reduction solution is slower for short strings but the fastest (by far) as strings and search patterns get longer. When I already know a query is going to be slow I’ll often set @I low (1 or 2) just to confirm while saving time.
Performance Test
Using the parameters posted above let’s execute dbo.usp_stringsearch_test1 to as shown below.
The 8K and VARCHAR(MAX) versions perform almost identically
With the one-reduction solutions the tuning parameter can be way too high or way too low, the two-reduction solution seems to perform better with the highest possible value for @G.
Note that, for modes 8 & 10 (the two-reduction test), I chose different (higher) values for the gap:
Conclusion
In Part 1 we learned how to develop a self-tuning function using a dynamic algorithm. Today we created a much faster version of our function leveraging fnTally. Things get even nasty faster when we add make_parallel. Another victory for set-based SQL and the tally table. In part 3 we’ll use these concepts to do something much cooler. Thanks for reading!