In Part 5 we learned how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second. We can do better!
Disclaimer
I’ve never made it to Part 6 of a technical series. For me, a huge obstacle to getting quality content out in a timely manner is the proof-reading and final editing process. Like a few of my friends in this field, I have many new techniques and programming concepts that I’m dying to share which have been decaying in MS Word Docs and .sql files for years. I started my NTally article in 2014 for example.
In September, while quarantined and after listening to too much David Goggins, I decided to step up my writing (and research) game and simply began publishing the first draft of my work the moment I’m done authoring it… What I’m trying to say is, I’ve decided to err on the side of delivering content. This means there will be typos, now and in the future. Sorry. I feel better getting it done, however, typos and all, than taking my good ideas to the grave. If you are a writer/blogger or are aspiring to do so, I suggest following my lead and just get it done – It feels so much bettr! 😉
Introduction
This series began with the introduction of the Virtual Index, the suitably sorted stream of numbers returned by T-SQL ROW_NUMBER which act very similar to an physical index with respect to sorting, grouping and filtering. Unlike a physical index, created using DDL, the virtual index is invoked with a SELECT statement against fnTally and goes away when you’re query is finished. No disk space to consume, fragmentation to address, nothing to check into your CI/CD system – just a magical, in-memory index (suitably sorted stream) which comes and goes as requested.
In Part 5 we reviewed how to return a distinct set of numbers from a one Billion row table in under 1/10th of a second using what I have coined, a multidimensional virtual index – that is, combining a virtual index with a real one speeds things up. More technically, we’re giving the optimizer a wider range of options than it would have with only one index in place.
So, if adding one virtual index to your query can speed things up, can we speed them up even further by adding more? I mean there basically free and can be pulled out of thin air – why not? That’s what we’re going to review today.
Dimensionality
Seasoned SQL developers as well as anyone who has written MDX expressions is familiar with the concept of dimensionality. Using SSAS for example, we can write and MDX expression to return a matrix with two or more dimensions as shown below. The Month dimension is on the x-axis, AKA the columns(0) axis in MDX; the Vehicle dimension is on y-axis, AKA rows(1). For brevity I included two Month members and three Vehicle members. The numbers can represent anything (e.g. Sales, Inventory, etc.) Using MDX we can easily write an MDX expression to return something that looks like this:
Figure 1. Two-dimensional matrix with Vehicle on the x-axis, month(s) on the y-axis.
Two-dimension matrix with a dimension for months and a second dimension for vehicles.
Now let’s say we wanted an deeper level of granularity for each vehicle color. We could add a third dimension for color to our expression to return something like this:
Figure 2. Three-dimensional matrix; a 3rd dimension added for color
Here we’re added a third dimension for “color” to transform it into a three-dimension matrix.
What does this have to do with indexing or performance you ask? Let’s apply this same concept, leveraging RangeAB, to perform what I’ve coined, Combinatorial Index Optimization. Think of it as combinatorial optimizationfor virtual indexes and indexed functions.
Combinatorial Index Optimization
Below is our query from Part 5 of this series where we reduced the the time it took to return a distinct set of numbers from a one billion set from 111 seconds using traditional methods, but only 0.07 seconds using a two-dimensional index.
SELECT fnFilter.N1 --fnFilter.RN-(b.Mn-1)
FROM (SELECT MIN(t.N), MAX(t.N) FROM #t AS t) AS b(Mn,Mx)
CROSS APPLY
(
SELECT r.RN, r.N1
FROM core.rangeAB(b.Mn,b.Mx,1,1) AS r
CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
) AS fnFilter
GROUP BY fnFilter.N1, b.Mn
ORDER BY fnFilter.N1;
As you may recall, we reduced the runtime by using the tally table for the left side of the one-to-many relationship and only returning numbers that exist in the source table (#t). We’ve reduced the question from, “where is the next distinct value?” to a simple yes/no question: “Is 1 in here? Is 2 in here?” This is an example of a two-dimensional virtual index created by way of combinatorial index optimization.
Figure 3. Two-dimensional Virtual Index
The tally table (RangeAB in this example)
Virtual Indexing while wearing 3-D glasses
As I have said before, the great thing about a virtual index is that it’s free. Armed with a correctly constructed tally table function I can conjure up one any time I wish just like an AD&D Level 7 Magic User.
In the real-world, for a table with millions/billions of rows there would almost certainly be a table or other indexed object (e.g. an indexed view) that I could use to join to for filtering. Either way, using rangeAB, and it’s @Gap parameter we can add another index for an even cooler type of filtering. First let’s review our 2-D solution. Note the code and my comments:
Figure 4. How the Two-dimensional Virtual Index works
Next our 3-D solution.
DECLARE @Gap BIGINT = 100;
SELECT @N = fnFilter.N1
FROM (SELECT MAX(t.N) FROM #t AS t) AS mx(N)
CROSS APPLY core.rangeAB(1,mx.N,@Gap,1) AS b
CROSS APPLY
(
SELECT r.RN, r.N1
FROM core.rangeAB(b.N1,b.N2-1,1,1) AS r
CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
) AS fnFilter
WHERE EXISTS (SELECT 1 FROM #t AS t WHERE t.N BETWEEN b.N1 AND b.N2-1) -- $$$
GROUP BY fnFilter.N1
ORDER BY fnFilter.N1;
This image will help explain the changes.
Figure 5. How the Three-dimensional Virtual Index works
Adding this 3rd dimension will dramatically improve performance in two distinct ways. First it’s going to reduce the amount of work required simply by ignoring entire partitions of our sequence once the EXISTS statement determines it’s safe to do so. As Figure 2 should help you understand a 2-D virtual index, here’s what a 3-D virtual index looks like.
Figure 6. How the Three-dimensional virtual index works
Partitions without members are ignored
Performance Test 1
For brevity I set @gap to 3 for the example above. Notice how, for 1-3 and 13-15, our second tally table (by order of appearance in this article) is pre-filtering entire subsets of work. Next the performance test. For this test I used the same test harness from Part 5 of this series to create one billion rows with numbers ranging from 1 to 10,000 with very few gaps. For the 3-D solution I set @g to 100.
Figure 7. Test #1 Results– 1 Billion, Dense
The 3-D index improves with a parallel plan
In this case the 3-D index, with a serial plan slowed things down just a smidgen because I purposely made sure there were few gaps. I included this test to demonstrate how, unlike the 2-D index, we get a faster result set with a parallel plan. This despite the fact that the additional rangeAB call couldn’t really filter anything. Adding an additional dimension allows the optimizer better multithreading options. That is the second distinct advantage of the 3-D index over a 2-D one.
Performance Test 2
For this test I filled a table with the 50,000 numbers ranging from 1 to 1,000,000. Next I used my test harness to duplicate the rows until I had two Billion. The means my tally table must generate 1,000,000 rows for the left side of my query. This is where the 3-D index shines. I set my @gap parameter to 6400 for this test.
Figure 8. Test #2 Results– 2 Billion, Sparse
3-D virtual index with gap reduction saves the day.
The 2-D solution runs for 12 seconds regardless of how many CPUs you throw at it. The 3-D solution runs for 1.3 seconds serial, 0.396 seconds with a parallel plan. That’s correct, less than 1/2 a second to extract 50K distinct values from a set of 2,000,000,000.
Conclusion
This, again, is what I mean by More Fast, Less Code. In Part 7 we’ll dive into what is arguably my favorite topic: Finite Opposite Numbers. You’ll want to be seated for that one.
In this installment of the Virtual Indexing series we’ll examine how to combine indexes to return a distinct set of numbers faster than using GROUP BY or DISTINCT
Introduction
As you may be aware, the challenge is on! The primary objective of this article is to further illustrate the power of rangeAB. In Part 1 of this series I introduced the Virtual Index, the suitably sorted stream returned by SQL window functions which can be used for sorting, grouping and alike. In Part 3 we introduced the Indexed Function as well as my RangeAB functions. In this installment we’re going to combine a physical index to a virtual one to create what I call, a multi-dimensional virtual Index, and use it to return s distinct set of numbers from a very large set. Based on my testing, this technique completely decimates DISTINCT, GROUP BY, ROW_NUMBER or any other method for returning a distinct set.
The Problem
A common annoyance for me when developing ETL packages, is when I need to extract millions of non-distinct numbers into a table then use that table as a lookup downstream. The first solution to this problem is to perform a SELECT DISTINCT against that table each time I need unique values. A better solution is to create a new table to store a distinct set of values and then reference that table downstream. Either way, a DISTINCT against millions of rows can be slow and IO intensive, even with an index in place.
Sample Data
WARNING – the code below, as-is, creates One Billion rows in ~45 minutes. I need to use that many rows because of how efficient this technique is. If you are playing along at home be sure to adjust the @ii as needed, each iteration returns 10,000,000 rows. @ii is set to 100 for my billion-row test (100*10000000)= one billion. With the clustered index in place the sample data takes up 200MB/100 Million rows, a total of 20GB for one billion.
If you’re playing along at home you’ll need a copy of my random numbers function. The routine works like this: first it creates 15,000 random numbers between 1 and 25,000. This creates gaps in the sequence which will matter later in this series. The while loop then cross joins #t to itself and re-inserts 10,000,000 duplicates for each iteration (@ii). The random numbers function is attached here:
--==== WARNING – AS-IS, this routine will create 1,000,000,000 rows in 45 minutes...
--==== On my PC this runs about 4 minutes per 10 iterations
--==== With that in mind – set @ii accordingly!
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t (N INT NOT NULL INDEX nc_n CLUSTERED(N));
GO
INSERT #t SELECT r.RND FROM dbo.randomNumbers(15000,1,25000) AS r;
GO
--==== 1 Billion in 45 minutes at 10M 100X batches (100M in 4 minutes)
DECLARE
@st DATETIME = GETDATE(),
@i INT = 1,
@ii INT = 100;
WHILE @i<=@ii
BEGIN
PRINT @i;
INSERT #t SELECT TOP (10000000) t.N FROM #t AS t, #t AS t2 ORDER BY (SELECT NULL);
SET @i+=1
END;
Returning unique values from a large set – Old School
Below is the typical way we would retrieve a unique set from the sample data stored in #t. Let’s run a quick performance test, first with a serial execution plan, the second with a parallel plan.
Figure 2.1. Returning a DISTINCT set using GROUP BY
PRINT CHAR(10)+'GROUP BY - serial'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
SELECT @N=t.N
FROM #t AS t
GROUP BY t.N
OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@st,GETDATE());
GO
PRINT CHAR(10)+'GROUP BY - parallel'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
SELECT @N=t.N
FROM #t AS t
GROUP BY t.N
OPTION (QUERYTRACEON 8649);
PRINT DATEDIFF(MS,@st,GETDATE());
GO
Note the straightforward execution plan. All billion+ rows are retrieved from #t then a hash match is used to filter out duplicates. With a serial plan the query runs for 2 minutes, with a parallel plan it runs for 12 seconds.
Figure 2.2. Old School Execution plan
Returning a distinct set using GROUP BY
Returning a distinct set from a large set of numbers – New School
Now we’re going to use RangeAB to return a unique set, note my comments.
Figure 3.1. Using RangeAB to return a distinct set
SELECT N = fnFilter.N1
FROM (SELECT MAX(t.N) FROM #t AS t) AS b(Mx) -- (1) get the highest N from #t: MAX(N)
CROSS APPLY
( -- (2) Call RangeAB to retrieve the numbers 1 through MAX(N)
SELECT r.RN, r.N1 -- (3) Only return numbers from rangeAB that exist in #t
FROM core.rangeAB(1,b.Mx,1,1) AS r -- Note that b.Mx is the MAX(#t.N)
CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x) -- gets row goals
) AS fnFilter -- Inline Function for filtering
GROUP BY fnFilter.N1 -- (4) Use GROUP BY to eliminate the 1-many between r.N1 & t.N
ORDER BY fnFilter.N1; -- Not required but no penalty
The result set comes back instantly. It doesn’t matter if I force a serial or parallel plan. This, regardless of whether I leverage batch mode processing. This is true with both RangeAB and RangeAB2. If I run this with statistics time and IO on, we see:
Figure 3.2. STATISTICS TIME, IO results
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0...
Table '#t_00000001E3BF'. Scan count 25000, logical reads 108801, physical reads 0...
SQL Server Execution Times: CPU time = 78 ms, elapsed time = 174 ms.
That’s one heck of an improvement. Next let’s review what’s happening.
How it Works
Instead of dragging out all billion+ rows we are retrieving the numbers 1 to MAX(#t.N) from our tally table – approximately 25,000 rows. The execution plan reveals how the magic happens:
Figure 4.1. New School (using RangeAB) execution plan details
Note that we see 27,000 rows are pulled from our tally table – I don’t know what the extra 2,000 rows are about, but it won’t impact performance here
The query scans #t once for each unique number (N1) returned by our numbers table function but only returns them if there is a match. For each of the 25,000 possible numbers returned from the range function a scan is performed against #t for the number’s existence. This is where the “Scan count 25000” from Figure 3.2 comes from. 25,000 scans may seem excessive, but that number represents one-row index seeks which is nothing compared to a scan against a billion rows. In this example, there are 11,335 unique numbers – the exact number of rows returned from #t.
This two dimensional matrix will help visualize what is happening: instead of scanning four rows from #t to determine that 3 is one of the unique values, we are visiting each possible value only once from virtual index on RangeAB.
Figure 4.2. 2-Dimensional Matrix
Multidimensional Index Performance Test
For the test data I am leveraging the setup routine from Figure 1. Note that I run my alternative queries five times each but the old school, GROUP BY versions only run once. This is because they are that much slower.
Figure 5.1. New School Test Harness
DECLARE @rowz INT = (SELECT COUNT(*) FROM #t);
PRINT CONCAT('Multidimensional Virtual Index Test:, ',@rowz,' rows, N=1-25K'+
CHAR(10),REPLICATE('-',90));
GO
PRINT CHAR(10)+'GROUP BY - serial'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
SELECT @N=t.N
FROM #t AS t
GROUP BY t.N
OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@st,GETDATE());
GO
PRINT CHAR(10)+'GROUP BY - parallel'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
SELECT @N=t.N
FROM #t AS t
GROUP BY t.N
OPTION (QUERYTRACEON 8649);
PRINT DATEDIFF(MS,@st,GETDATE());
GO
--==== MAGIC WITH CLUSTERED INDEX, NO INDEX = DEATH ON HUGE GAPS
PRINT CHAR(10)+'APPLY + TOP 1 - Serial'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
SELECT @N = fnFilter.N1
FROM (SELECT MAX(t.N) FROM #t AS t) AS b(Mx) -- (1) get the highest N from #t: MAX(N)
CROSS APPLY -- (2) Call RangeAB to retrieve the numbers 1 through MAX(N)
(
SELECT r.RN, r.N1 -- (3) Only return numbers from rangeAB that exist in #t
FROM core.rangeAB(1,b.Mx,1,1) AS r
CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
) AS fnFilter
GROUP BY fnFilter.N1 -- (4) Use GROUP BY to eliminate the 1-many between r.N1 & t.N
ORDER BY fnFilter.N1 -- Not required but no penalty
OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@st,GETDATE());
GO 5
PRINT CHAR(10)+'APPLY + TOP 1 - Parallel'+CHAR(10)+REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @N BIGINT;
SELECT @N = fnFilter.N1
FROM (SELECT MAX(t.N) FROM #t AS t) AS b(Mx)
CROSS APPLY
(
SELECT r.RN, r.N1
FROM core.rangeAB(b.Mn,b.Mx,1,1) AS r
CROSS APPLY (SELECT TOP (1) 1 FROM #t AS t WHERE t.N = r.N1) AS z(x)
) AS fnFilter
GROUP BY fnFilter.N1
ORDER BY fnFilter.N1 -- No penalty
OPTION (QUERYTRACEON 8649);
PRINT DATEDIFF(MS,@st,GETDATE());
GO 5
Figure 5.2. New School test results
Multidimensional Virtual Index, 1000015000 rows, N=1-25K
------------------------------------------------------------------------------------------
GROUP BY - serial
------------------------------------------------------------------------------------------
111353
GROUP BY - parallel
------------------------------------------------------------------------------------------
12147
APPLY + TOP 1 - Serial
------------------------------------------------------------------------------------------
Beginning execution loop
70
66
70
63
64
Batch execution completed 5 times.
APPLY + TOP 1 - Parallel
------------------------------------------------------------------------------------------
Beginning execution loop
84
97
100
77
100
Batch execution completed 5 times.
Completion time: 2020-12-12T12:21:49.8432474-06:00
The old school GROUP BY solution runs for 113 seconds with a serial plan and speeds up by a factor of ten (nearly) with a parallel plan. The rangeAB version runs for 100MS or less with serial or parallel execution plans. It’s interesting to note that parallel execution does not improve the time for the RangeAB solution. In Part 6 of this series we’ll add an additional dimension to speed things up even more especially with a parallel execution plan. See you there, thanks for reading!
A set-based, more performant alternative to SQL Server’s NTILE function coming right up.
Introduction
Few authors have impacted my career like Itzik Ben-Gan. If you are trying to get a grasp on how to develop high-performing, set-based SQL then a good Ben-Gan book will be a vital part of your SQL toolbox. Today Mr. Ben-Gan posted a challenge to create the fastest number generator. I offered up rangeAB which leverages Jeff Moden’s fnTally function which the fastest “number generator” in the business based on my experience.
When I first learned about the how to use a numbers table I began trying to use them to solve problems I’d never seen other people solve without a loop or other iterative method. I thought it was fun and doing so helped me grow my skills. One thing I never saw anyone do is use a numbers table to develop a set-based alternative to NTILE. I even heard people say that it’s not possible. Today we’re going to come up with two functions, each which blow the doors of NTILE with respect to speed and IO. The fist using a virtual auxiliary table of numbers (AKA “Tally Table”), the second using a physical numbers tables.
Quick NTILE overview
Before I dive into the algorithm let’s quickly review what NTILE does. NTILE is a window ranking function used to divide a set of rows as evenly as possible.
In High-Performance T-SQL Using Window Functions Ben-Gan explains it like this:
The NTILE function allows you to arrange the rows within the window partition in roughly equally sized tiles, based on the input number of tiles and specified window ordering. For example, suppose that you want to arrange the rows from the OrderValues view in 10 equally sized tiles based on val ordering. There are 830 rows in the view; hence, with 10 requested tiles, the tile size is 83 (that’s 830 divided by 10). So the first 83 rows (the first tenth) based on val ordering will be assigned with tile number 1, the next 83 with tile number 2, and so on.
To calculate the number of tiles required, for each number, from 1 to @tile, the quantity is calculated by taking the @rows/@tiles, then adding 1 for each number that is less than, or equal to: @rows%@tiles. In the example below we get 3 tile numbers for 1,2,3 and 4 (13/4 = 3). Then the number 1 gets an additional member because 1<=(13%4). If add a row then 2 will also get an additional member (14%4 = 2), if we set @rows to 15 then the number 1,2&3 will have 4 members (15%4=3). If we set @rows to 16 we get 4 rows for the numbers 1,2,3 & 4 (16/4 = 4, 16%4 = 0.)
This is how NTILE works, this is how our alternative must also work.
The NTally Table – What it is and how it replaces NTILE
DECLARE @tiles BIGINT = 4,
@rows BIGINT = 13,
@desc BIT = 0;
--==== NTILE CLASSIC
SELECT RN = f.N, Tile = NTILE(@tiles) OVER (ORDER BY f.N)
FROM dbo.fnTally(1,@rows) AS f;
--==== ALTERNATIVE #1: PHYSICAL Auxiliary table of Numbers (dbo.NTally)
SELECT RN = ROW_NUMBER() OVER (ORDER BY t.N), Tile = t.N
FROM dbo.tally AS t
CROSS APPLY
(
SELECT TOP(@rows/@tiles+IIF(t.N<=@rows%@tiles,1,0)) $
FROM dbo.tally AS t1
CROSS JOIN dbo.tally AS t2
) AS x(x)
WHERE t.N<=@tiles;
--==== ALTERNATIVE #2: PHYSICAL Auxiliary table of Numbers (dbo.NTallyRangeAB)
SELECT RN = ROW_NUMBER() OVER (ORDER BY t.RN),
Tile = t.RN
FROM core.rangeAB(1,@tiles,1,1) AS t
CROSS APPLY (VALUES(IIF(@desc=0,t.RN,t.OP))) AS d(D)
CROSS APPLY core.rangeAB(1,@rows/@tiles+IIF(d.D<=@rows%@tiles,1,0),1,1) AS x;
Each query returns the same thing except that I renamed “N” to “RN”.
The NTally table is not ready to replace NTILE out of the box, a small amount of assembly is required; this is what the RN column is for. Below is an example of how to “tile” a table using NTILE followed by examples which leverage dbo.NTally and dbo.NTallyRangeAB, which leverages dbo.rangeAB.
The NTally solutions are easy to assemble: instead of “NTILE(@tiles) OVER (ORDER BY s.SomeID)” we’ll use ROW_NUMBER inside a CTE with the same ORDER BY. With the row number in place (RN) we can use it to join to our NTally table and return a “tiled” result set without NTILE.
Note that the function needs the number of rows ahead of time, a simple COUNT of the rows passed to the function as the @rows parameter does the trick. This will add a little penalty but we’ll still see a huge time/IO gains nonetheless.
Figure 3.1: NTally Functional Demo
--==== 1. Sample Data
IF OBJECT_ID('tempdb..#SomeTable') IS NOT NULL DROP TABLE #SomeTable;
CREATE TABLE #SomeTable
(
SomeId BIGINT PRIMARY KEY
);
INSERT #SomeTable(SomeId) SELECT f.N FROM dbo.GetNums(1,14) AS f;
--==== 2. Parameters and examples
DECLARE @tiles BIGINT = 5,
@rows BIGINT = 14,
@desc BIT = 0;
PRINT CHAR(13)+'NTILE CLASSIC'+CHAR(13)+REPLICATE('-',90);
SELECT s.SomeId, Tile = NTILE(@tiles) OVER (ORDER BY s.SomeID)
FROM #SomeTable AS s;
PRINT CHAR(13)+'NTally'+CHAR(13)+REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT a.SomeID, nt.Tile
FROM anchor AS a
CROSS JOIN dbo.NTally(@tiles, (SELECT COUNT(*) FROM #SomeTable)) AS nt
WHERE a.rn = nt.rn
PRINT CHAR(13)+'NTallyRangeAB'+CHAR(13)+REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT a.SomeID, nt.Tile
FROM anchor AS a
CROSS APPLY core.NTallyRangeAB(@tiles, (SELECT COUNT(*) FROM #SomeTable),@desc) AS nt
WHERE a.rn = nt.rn;
If you have kept up with this series you can think of dbo.NTally as an indexed function and dbo.NTallyRangeAB as a function that leverages the virtual index. Now for performance.
Test #1: 1,000,000 Rows – Time and IO
I’m running SQL 2019 which allows for batch mode on rowstore. I’m going to run the NTILE query twice, first as-is, then with batch mode disabled. I can’t compete with batch mode but we’ll get close. We’ll compare the results with the NTally functions. Unlike the NTILE examples here, the NTally versions get faster when they get a parallel plan; I excluded the parallel NTILE test because it was too awful.
Note, too, that the optimizer tries to use a Hash Match join algorithm for joining the row numbers. A merge join is the right join for this task; in the examples below I’ll let the optimizer pick the join, then force a merge join so you can see the performance difference.
Figure 4: 1M Row Test
IF OBJECT_ID('tempdb..#SomeTable') IS NOT NULL DROP TABLE #SomeTable;
CREATE TABLE #SomeTable
(
SomeId BIGINT PRIMARY KEY
);
INSERT #SomeTable(SomeId) SELECT f.N FROM dbo.GetNums(1,1000000) AS f;
SET STATISTICS TIME, IO ON;
DECLARE @Tiles BIGINT = 3, @NT BIGINT;
PRINT CHAR(13)+'NTILE CLASSIC with Batch Mode'+CHAR(13)+REPLICATE('-',90);
SELECT @NT = NTILE(@Tiles) OVER (ORDER BY s.SomeID)
FROM #SomeTable AS s;
PRINT CHAR(13)+'NTILE CLASSIC No Batch Mode'+CHAR(13)+REPLICATE('-',90);
SELECT @NT = NTILE(@Tiles) OVER (ORDER BY s.SomeID)
FROM #SomeTable AS s
OPTION(USE HINT('DISALLOW_BATCH_MODE')); -- Crawls when Parallel
PRINT CHAR(13)+'dbo.NTALLY SERIAL, Hash Join'+CHAR(13)+REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
FROM anchor AS a
CROSS APPLY dbo.NTally(3, (SELECT COUNT(*) FROM #SomeTable)) AS nt
WHERE a.rn = nt.rn
OPTION (MAXDOP 1);
PRINT CHAR(13)+'dbo.NTally SERIAL, Merge Join'+CHAR(13)+REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
FROM anchor AS a
INNER
MERGE JOIN dbo.NTally(@Tiles, (SELECT COUNT(*) FROM #SomeTable)) AS nt
ON a.rn = nt.rn
OPTION (MAXDOP 1);
PRINT CHAR(13)+'dbo.NTally PARALLEL, Merge Join'+CHAR(13)+REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
FROM anchor AS a
INNER
MERGE JOIN dbo.NTally(@Tiles, (SELECT COUNT(*) FROM #SomeTable)) AS nt
ON a.rn = nt.rn
CROSS JOIN dbo.make_parallel() AS x;
PRINT CHAR(13)+'dbo.NTallyRangeAB SERIAL, Hash Join'+CHAR(13)+REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
FROM anchor AS a
CROSS APPLY core.NTallyRangeAB(@Tiles, (SELECT COUNT(*) FROM #SomeTable),0) AS nt
WHERE a.rn = nt.rn
OPTION (MAXDOP 1);
PRINT CHAR(13)+'dbo.NTallyRangeAB SERIAL, Merge Join'+CHAR(13)+
REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
FROM anchor AS a
INNER
MERGE JOIN core.NTallyRangeAB(@Tiles, (SELECT COUNT(*) FROM #SomeTable),0) AS nt
ON a.rn = nt.rn
OPTION (MAXDOP 1);
PRINT CHAR(13)+'dbo.NTallyRangeAB Parallel, Merge Join'+CHAR(13)+
REPLICATE('-',90);
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT @NT = nt.Tile -- a.SomeID, nt.* --AS TileGroup
FROM anchor AS a
INNER
MERGE JOIN core.NTallyRangeAB(@Tiles, (SELECT COUNT(*) FROM #SomeTable),0) AS nt
ON a.rn = nt.rn
CROSS JOIN dbo.make_parallel() AS x;
SET STATISTICS TIME, IO OFF;
GO
Figure 4.2 Results:
NTILE CLASSIC with Batch Mode
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 1, logical reads 2108...
Table 'Worktable'. Scan count 1, logical reads 46274, physical reads 0...
SQL Server Execution Times: CPU time = 281 ms, elapsed time = 284 ms.
NTILE CLASSIC No Batch Mode
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 3, logical reads 2867958, physical reads 0...
Table '#SomeTable_00000001CF14'. Scan count 1, logical reads 2108...
SQL Server Execution Times: CPU time = 2375 ms, elapsed time = 2378 ms.
dbo.NTALLY SERIAL, Hash Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750
Table 'tally'. Scan count 3, logical reads 1756, physical reads 0...
SQL Server Execution Times: CPU time = 984 ms, elapsed time = 997 ms.
dbo.NTally SERIAL, Merge Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0...
Table 'Worktable'. Scan count 0...
Table 'tally'. Scan count 3, logical reads 599, physical reads 0...
SQL Server Execution Times: CPU time = 922 ms, elapsed time = 917 ms.
dbo.NTally PARALLEL, Merge Join
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 0...
Table 'Worktable'. Scan count 0...
Table 'tally'. Scan count 9, logical reads 601, physical reads 0...
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0
SQL Server Execution Times: CPU time = 2408 ms, elapsed time = 790 ms.
dbo.NTallyRangeAB SERIAL, Hash Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0
SQL Server Execution Times: CPU time = 906 ms, elapsed time = 946 ms.
dbo.NTallyRangeAB SERIAL, Merge Join
------------------------------------------------------------------------------------------
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0...
Table 'Worktable'. Scan count 0...
SQL Server Execution Times: CPU time = 969 ms, elapsed time = 977 ms.
dbo.NTallyRangeAB Parallel, Merge Join
------------------------------------------------------------------------------------------
Table 'Worktable'. Scan count 0, logical reads 0...
Table '#SomeTable_00000001CF14'. Scan count 7, logical reads 14750, physical reads 0...
SQL Server Execution Times: CPU time = 1639 ms, elapsed time = 535 ms.
Completion time: 2020-12-10T20:02:28.6874120-06:00
NTILE with batch mode runs for 284MS with a serial plan. Without batch mode, however, just can’t compete with the dbo.NTally or dbo.NTallyRangeAB. First, check those reads – 2,867,958 and it runs for 2 seconds. Each NTally solution runs for under 1 second with the best performance being NTallyRangeAB as the winner at 1/2 a second and 14,750 reads – 4X speed increase and a 99.5% I/O reduction.
No Looking Back
I snuck a bonus feature on dbo.NTallyRangeAB. You may have noticed that I added a third column named TileOP, this is just the tile group in descending order with the distribution flipped, e.g. the “overflow” tiles are assigned to the highest tile numbers first. For this we simply return TileOP instead of the Tile row. Let’s use our 14 row test from figure 3.2
Figure 5.1 NTallyRangeAB TileOP Demo:
DECLARE @tiles BIGINT = 5,
@rows BIGINT = 14,
@desc BIT = 0;
WITH anchor AS
(
SELECT s.SomeID, ROW_NUMBER() OVER (ORDER BY s.SomeID) AS rn
FROM #SomeTable AS s
)
SELECT a.SomeID, nt.Tile, nt.TileOP
FROM anchor AS a
CROSS APPLY core.NTallyRangeAB(@tiles, (SELECT COUNT(*) FROM #SomeTable),@desc) AS nt
WHERE a.rn = nt.rn
ORDER BY a.SomeId
I returned the Tile and TileOP columns so you can see the difference.
Some interesting lessons here. First, NTILE teamed with batch mode is nasty fast and tough to beat. If batch mode is not an option then NTally and NTallyRangeAB are the only game in town as far as I’m concerned .
This concludes Part 4 of this series. Forgive any typos – this is the first/last draft. Cheers!
“The fastest way to perform a calculation is to have already done so.”
“The fastest way to perform a calculation is to have already done so.”
Introduction
Here’s an interesting problem: given a sequence of numbers between M and N, how many are prime numbers? How would you solve this and how efficient will your solution be if M=0 and N=10,000,000? On my PC I’m able to get the job done in about 3 seconds.
Introducing the Indexed Function
One day while trying to figure out some recursive query I stumble across a SQL Server Central post about calculating a Fibonacci sequence where someone asked, “why not just store the values in a table?” It seems obscenely obvious to me now but at the time, but it blew me away that I never thought of it. It still blows my mind how so few developers take advantage of this. Just populate a table (on-disk or in-memory) with the values you need, add indices as needed, then build an inline table valued function (iTVF) to retrieve the value with an index seek. This allows your function to run in O(1) time for each value passed to it. Indexed values are stored as a suitably sorted stream which makes it possible to retrieve the value(s) without a scan. A web search for “create Fibonacci function“, “function for prime numbers” or something similar will generally return various algorithms (mostly recursive) but scarce mentions of pre-calculating problems where the values never change. Doesn’t Fibonacci(7) always 13? Factorial(5) = 120? When is 7 not a prime number? I suspect that, the fastest way to perform a calculation is to have already done so, then save the result in an index for fast retrieval.
In Paul White’s article, Understanding and using APPLY (Part 1) he refers to an inline table valued function (iTVF) as “a view that accepts parameters”. With that in mind, what would be the best way to describe an indexed view (or table) that accepts parameters? I’m thinking indexed function.
Nasty Fast Primes
Primes is a great example since calculating them can be costly, especially when using a solution which “counts up” iteratively or recursively. For this exercise we’re going to build a prime number function that can determine if an integer between 1 and 10,000,000 is prime and do so in O(1) time and space. I know that 99.999% of you don’t need a prime number function so remember that this is for learning and to prove the power of this concept; this technique will work against any immutable set of values.You don’t have to play along or can test this with a smaller number of primes. I’m including the DDL used to create the function so you can keep me honest about the performance tests later in the article.
The Setup Procedure
This proc indexes and populates a prime numbers table called idx.prime. There are two parameters: @Counter and @maxrows. @counter is the number that the proc starts at and should be left at 14. You can use @counter to set up batch inserts. If you execute usp_primes and accept the default values, the function will support 1-10M with a bit column indicating if the number is prime. You don’t have to do 10M rows but it’s ~300MB, uncompressed, if you do. For me it this runs 20 minutes without batching the insert; I did it all in one shot. Note that I keep my indexed functions in an “idx” schema, no harm in using dbo. Once the procedure is done you can create our prime numbers function as an indexed function. As mentioned in Part 2 of my PerfML series, functional purity is key so only an iTVF will do. If you’re playing along you will need a one million row tally table, you can get the DDL here and the stored procedure to create an indexed prime number function is attached below.
Note that it creates a schema name idx; this is where my indexed functions live but it’s not necessary.
The Indexed Function (idx.fnPrimes)
Now we have a sorted set of numbers from 1 to 10M. Our function takes @N as an input and performs a one-row index seek.
Figure 1: idx.fnPrimes
CREATE FUNCTION idx.fnPrimes
(
@N BIGINT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT p.N, p.PRIME FROM idx.prime AS p WHERE p.N = @N;
GO
To check if a number is prime:
Figure 2: Basic idx.fnPrimes demo
SELECT p.N, p.PRIME FROM idx.fnPrimes(11) AS p;
To return the numbers 1 to 7 along with their prime value we use fnTally to generate the stream of numbers and map them to fnPrimes.
SELECT p.N, p.PRIME
FROM dbo.fnTally(1,7) AS f
CROSS APPLY idx.fnPrimes(f.N) AS p;
This returns: [(1 1),(2 1),(3 1),(4 0),(5 1),(6 0),(7,1)]; now a quick performance test.
Performance Test #1: Calculate Primes 1 to 1 Million
The purpose of this test is to prove out the concept of the indexed function. This query will take the values 1 through 10,000,000 and determine if they are prime or not. I am forcing a serial plan since the serial plan actually performs best. @P variable captures the value so as to avoid SSMS spitting out 10 million rows and ruining the performance metrics.
Figure 3.1: idx.fnPrimes Performance Test
PRINT CHAR(13)+'fnTally serial'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @ST DATETIME = GETDATE(), @P BIT;
SELECT @P = p.PRIME
FROM dbo.fnTally(1,10000000) AS f
CROSS APPLY idx.fnPrimes(f.N) AS p
OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@ST,GETDATE());
GO 3
Figure 3.2: idx.fnPrimes Performance Test Results
fnTally serial
------------------------------------------------------------------------------------------
Beginning execution loop
2660
2727
2750
Under 3 seconds for idx.fnPrimes to accept the numbers 1 to 10,000,000 and determine if which ones are prime. This with a serial execution plan!
Unexpected Virtual Index Pitfalls (continued…)
One motivation for giving a name to what I call, the “virtual index,” is to make the following type of problem (and solution) easier to explain. Part 2 of this series you were introduced to some unexpected virtual indexing pitfalls which can be tricky to identify. There’s another issue to watch out for people using SQL 2019 which includes batch mode over rowstore, or users or who leverage Itzik Ben-Gan’s batch mode processing trick to enjoy this functionality on SQL Server 2016+. I use SQL 2019 and the optimizer leverages a window aggregate function that invokes batch mode over rowstore which is the culprit here. Below are two techniques to retrieve the numbers 1 to 7 and identify which are prime.
--==== 1. Leveraging fnTally
SELECT p.N, p.PRIME
FROM dbo.fnTally(1,7) AS f
CROSS APPLY idx.fnPrimes(f.N) AS p;
--==== 2. Static Values
SELECT p.N, p.PRIME
FROM (VALUES(1),(2),(3),(4),(5),(6),(7)) AS f(N)
CROSS APPLY idx.fnPrimes(f.N) AS p;
Each function returns the expected result set: [(1 1),(2 1),(3 1),(4 0),(5 1),(6 0),(7,1)]; The execution plan for the fnTally solution.
Figure 4.1: Execution Plan Details
fnTally retrieved 2700 rows to generate the numbers 1-7! The second query didn’t have this issue. Luckily there are more than a few ways to solve this.
Solution 1: Disable batch mode over rowstore
Let’s temporarily disable batch mode over rowstore and try again.
Figure 5.1: Solution #1: Disable batch mode over rowstore query
SELECT p.N, p.PRIME
FROM dbo.fnTally(1,7) AS f
CROSS APPLY idx.fnPrimes(f.N) AS p;
OPTION(USE HINT('DISALLOW_BATCH_MODE'));
Figure 5.2: Solution #1: Execution plan
7 rows only, much better!
Solution 2: make_parallel
In my experience Adam Machanic’s make_parallel always forces plans that don’t include batch mode over rowstore. I don’t know if it’s an option with make_parallel in play.
Figure 6.1: Solution #2: make_parallel
SELECT p.N, p.PRIME
FROM dbo.fnTally(1,7) AS f
CROSS APPLY idx.fnPrimes(f.N) AS p
CROSS JOIN dbo.make_parallel() AS parallel_execution;
Below is the portion of the plan where the ROW_NUMBER sequence is created; again, only 7 rows
Figure 6.2: Solution #2: Execution plan
Solution 3: Use a physical tally table
The conventional wisdom seems to be that the CTE tally table is faster than a physical table pretty much all the time, a reasonable assumption since the CTE tally table is readless but physical tally tables do require IO. Here is an example of where the physical tally table (correctly indexed) has an edge.
Figure 7.1: Solution #3: Physical Tally Table
Note that both a TOP + ORDER BY, WHERE clause filter, or a both together will do just fine.
SELECT TOP(7) p.N, p.PRIME
FROM dbo.tally AS f
CROSS APPLY idx.fnPrimes(f.N) AS p
ORDER BY f.N;
Again, only 7 numbers retrieved.
Figure 7.2: Solution #3: Execution Plan
This is one of many subtle differences between a persisted tally table and a CTE version. Let’s explore deeper.
RangeAB and RangeAB2
We will conclude by running two performance tests against one million rows. The one million limit is because that’s what dbo.tally is capped at. For this test let two tally table functions: the first can be considered an “add-on” to dbo.fnTally named dbo.rangeAB. The second takes identical parameters and returns identical results, the only difference is that it leverages physical tally table (dbo.tally) instead of dbo.fnTally.
Both functions can be used to help enforce best practices with as little code as possible. Even better: Both functions return the exact result set given the same parameters.This gives you the ability to test your set-based code against both a physical and virtual tally table index simply adding or removing the number “2” from the end of “RangeAB.”
The functions and usage examples are located here. For the purposes of this article I will show you the execution plans for a basic usage scenario so we can compare execution plans. Both examples use one of the RangeAB functions to return a series of weeks. We’ll run these with “show actual execution plan” turned on.
Figure 8.1: Basic Usage Example
DECLARE
@startdate DATE = '20200831',
@weeks BIGINT = 5,
@row1 BIGINT = 1;
BEGIN
--==== RangeAB (fnTally version)
SELECT
WeekNumber = f.RN,
WeeksLeft = f.OP,
DaysFromMedian = ABS(f.DM),
WeekStart = DATEADD(DAY,f.N1-1,@startdate),
WeekEnd = DATEADD(DAY,f.N2-2,@startdate)
FROM dbo.rangeAB(@row1, @weeks*7-(1-@row1), 7, @row1) AS f
ORDER BY f.RN;
--==== RangeAB2 (dbo.tally version)
SELECT
WeekNumber = f.RN,
WeeksLeft = f.OP,
DaysFromMedian = ABS(f.DM),
WeekStart = DATEADD(DAY,f.N1-1,@startdate),
WeekEnd = DATEADD(DAY,f.N2-2,@startdate)
FROM dbo.rangeAB2(@row1, @weeks*7-(1-@row1), 7, @row1) AS f
ORDER BY f.RN;
END;
Ignore the 40/60% estimates, they aren’t accurate, both perform almost identically. The win is that we can use each to quickly determine which is better – dbo.tally or dbo.fnTally without any logic changes. fnTally uses the Itzik Ben-Gan style ROW_NUMBER over cross joins, the other gets its rows from dbo.tally.
Figure 8.3: Execution Plans
Prime Numbers Test #1: RangeAB for a running total of Primes
Now back to our indexed function, idx.fnPrimes. This first example is a simple running total of prime numbers from 1 to 50. Ignore the @gap parameter for now.
DECLARE @gap BIGINT = 1,
@low BIGINT = 0,
@hgh BIGINT = 50;
--==== Solution #1: Primes running total over an Indexed Function + Virtual Index (RangeAB)
SELECT
N = ((rnd.RN-2)*@gap+@low)+1,
TotalPrimes = SUM(p.PRIME+0)
FROM dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0) AS r
CROSS APPLY idx.fnPrimes(r.RN) AS p
GROUP BY rnd.RN
ORDER BY rnd.RN;
--==== Solution #2: Primes running total over two indexed functions
SELECT
N = ((rnd.RN-2)*@gap+@low)+1,
TotalPrimes = SUM(p.PRIME+0)
FROM dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0) AS r
CROSS APPLY idx.fnPrimes(r.RN) AS p
GROUP BY rnd.RN
ORDER BY rnd.RN;
Figure 10.1: Primes Test #1 Results (truncated for brevity)
Here we’ll do two tests, first with a serial plan, then with a parallel plan. What we need to do is create @gap-sized groups and collect a between @low and @hgh along with a count of prime numbers in that group. We’ll compare the performance RangeAB vs RangeAB2. Again, the logic is identical except that the latter leverages dbo.tally. Note the parameters.
First to test serial execution using MAXDOP 1.
Figure 11.1 Serial Test
SET STATISTICS IO, TIME ON;
DECLARE
@gap BIGINT = 100000,
@low BIGINT = 0,
@hgh BIGINT = 1000000;
PRINT CHAR(10)+'fnTally version'+CHAR(10)+REPLICATE('-',90);
SELECT
L = ((rnd.RN-2)*@gap+@low)+1,
H = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0) AS r
CROSS APPLY idx.fnPrimes(r.RN) AS p
GROUP BY rnd.RN
ORDER BY rnd.RN
OPTION (MAXDOP 1);
PRINT CHAR(10)+'Tally Function V2 - Indexed'+CHAR(10)+REPLICATE('-',90);
SELECT
L = ((rnd.RN-2)*@gap+@low)+1,
H = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0) AS r
CROSS APPLY idx.fnPrimes(r.RN) AS p
GROUP BY rnd.RN
ORDER BY rnd.RN
OPTION (MAXDOP 1);
SET STATISTICS IO, TIME OFF;
Figure 11.3: Primes Test #2 Performance (Serial: Time & IO)
fnTally version
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843792, physical reads 0...
Table 'Worktable'. Scan count 0, logical reads 15101551, physical reads 0...
SQL Server Execution Times: CPU time = 23312 ms, elapsed time = 23452 ms.
Tally Function V2 - Indexed
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843792, physical reads 0...
Table 'tally'. Scan count 2, logical reads 9555, physical reads 0...
SQL Server Execution Times: CPU time = 9110 ms, elapsed time = 9163 ms.
This a case where dbo.tally finds redemption. 9 Seconds vs 23 – not bad. Next for a parallel execution plan leveraging dbo.make_parallel.
Figure 11.4: Parallel Test
SET STATISTICS IO, TIME ON;
DECLARE
@gap BIGINT = 100000,
@low BIGINT = 0,
@hgh BIGINT = 1000000;
PRINT CHAR(10)+'fnTally version'+CHAR(10)+REPLICATE('-',90);
SELECT
L = ((rnd.RN-2)*@gap+@low)+1,
H = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0) AS r
CROSS APPLY idx.fnPrimes(r.RN) AS p
CROSS JOIN dbo.make_parallel() AS x
GROUP BY rnd.RN
ORDER BY rnd.RN;
PRINT CHAR(10)+'Tally Function V2 - Indexed'+CHAR(10)+REPLICATE('-',90);
SELECT
L = ((rnd.RN-2)*@gap+@low)+1,
H = (rnd.RN-1)*@gap+@low,
TotalPrimes = SUM(p.PRIME+0)
FROM dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0) AS r
CROSS APPLY idx.fnPrimes(r.RN) AS p
CROSS JOIN dbo.make_parallel() AS x
GROUP BY rnd.RN
ORDER BY rnd.RN;
SET STATISTICS IO, TIME OFF;
Figure 11.5: Primes Test #2 Performance (Parallel: Time & IO)
fnTally version (RangeAB)
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843855, physical reads 0...
Table 'Worktable'. Scan count 0, logical reads 15101558, physical reads 0...
SQL Server Execution Times: CPU time = 35877 ms, elapsed time = 7816 ms.
dbo.tally version (RangeAB2)
------------------------------------------------------------------------------------------
Table 'tally'. Scan count 9, logical reads 9604, physical reads 0...
Table 'prime'. Scan count 0, logical reads 16843855, physical reads 0...
SQL Server Execution Times: CPU time = 16687 ms, elapsed time = 3638 ms.
The great news is that, in both cases, parallel execution speeds things up for both. The fist version (virtual index) increasing to 7.8 seconds, the second (the indexed function) completes in 3.6 seconds. The reads against the primes table (via the idx.primes indexed function) are excessive. That said, to calculate the numbers of primes, split into 100K chunks up to 1,000,000… that should take longer than 3.6 seconds.
Conclusion
We have a new way to take complicated functions and speed them up to O(1) time leveraging a new concept I coined as the Indexed Function.; a powerful tool for you set-based programming toolbox. We have a way to quickly compare a CTE-Tally to a real one. We’ve learned that a physical tally table, in some cases, can blow the doors off a virtual one.
The Virtual Index and Indexed Functions are indispensable tools in my toolbox. Thanks for reading.