Virtual Indexing Part 3: The Indexed Function

“The fastest way to perform a calculation is to have already done so.”

“The fastest way to perform a calculation is to have already done so.”

Introduction

Here’s an interesting problem: given a sequence of numbers between M and N, how many are prime numbers? How would you solve this and how efficient will your solution be if M=0 and N=10,000,000? On my PC I’m able to get the job done in about 3 seconds.

Introducing the Indexed Function

One day while trying to figure out some recursive query I stumble across a SQL Server Central post about calculating a Fibonacci sequence where someone asked, “why not just store the values in a table?” It seems obscenely obvious to me now but at the time, but it blew me away that I never thought of it. It still blows my mind how so few developers take advantage of this. Just populate a table (on-disk or in-memory) with the values you need, add indices as needed, then build an inline table valued function (iTVF) to retrieve the value with an index seek. This allows your function to run in O(1) time for each value passed to it. Indexed values are stored as a  suitably sorted stream which makes it possible to retrieve the value(s) without a scan. A web search for “create Fibonacci function“,  “function for prime numbers” or something similar will generally return various algorithms (mostly recursive) but scarce mentions of pre-calculating problems where the values never change. Doesn’t Fibonacci(7) always 13? Factorial(5) = 120? When is 7 not a prime number? I suspect that, the fastest way to perform a calculation is to have already done so, then save the result in an index for fast retrieval.

In Paul White’s article, Understanding and using APPLY (Part 1) he refers to an inline table valued function (iTVF) as “a view that accepts parameters”. With that in mind, what would be the best way to describe an indexed view (or table) that accepts parameters?  I’m thinking indexed function.

Nasty Fast Primes

Primes is a great example since calculating them can be costly, especially when using a solution which “counts up” iteratively or recursively. For this exercise we’re going to build a prime number function that can determine if an integer between 1 and 10,000,000 is prime and do so in O(1) time and space. I know that 99.999% of you don’t need a prime number function so remember that this is for learning and to prove the power of this concept; this technique will work against any immutable set of values.You don’t have to play along or can test this with a smaller number of primes. I’m including the DDL used to create the function so you can keep me honest about the performance tests later in the article.

The Setup Procedure

This proc indexes and populates a prime numbers table called idx.prime. There are two parameters: @Counter and @maxrows. @counter is the number that the proc starts at and should be left at 14. You can use @counter to set up batch inserts. If you execute usp_primes and accept the default values, the function will support 1-10M with a bit column indicating if the number is prime. You don’t have to do 10M rows but it’s ~300MB, uncompressed, if you do. For me it this runs 20 minutes without batching the insert; I did it all in one shot. Note that I keep my indexed functions in an “idx” schema, no harm in using dbo. Once the procedure is done you can create our prime numbers function as an indexed function. As mentioned in Part 2 of my PerfML series, functional purity is key so only an iTVF will do. If you’re playing along you will need a one million row tally table, you can get the DDL here and the stored procedure to create an indexed prime number function is attached below.

Note that it creates a schema name idx; this is where my indexed functions live but it’s not necessary.

The Indexed Function (idx.fnPrimes)

Now we have a sorted set of numbers from 1 to 10M. Our function takes @N as an input and performs a one-row index seek.

Figure 1: idx.fnPrimes

CREATE FUNCTION idx.fnPrimes
(
  @N BIGINT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
  SELECT p.N, p.PRIME FROM idx.prime AS p WHERE p.N = @N;
GO

To check if a number is prime:

Figure 2: Basic idx.fnPrimes demo

SELECT p.N, p.PRIME FROM idx.fnPrimes(11) AS p;

To return the numbers 1 to 7 along with their prime value we use fnTally to generate the stream of numbers and map them to fnPrimes.

SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)  AS f
CROSS APPLY idx.fnPrimes(f.N) AS p;

This returns: [(1 1),(2 1),(3 1),(4 0),(5 1),(6 0),(7,1)]; now a quick performance test.

Performance Test #1: Calculate Primes 1 to 1 Million

The purpose of this test is to prove out the concept of the indexed function. This query will take the values 1 through 10,000,000 and determine if they are prime or not. I am forcing a serial plan since the serial plan actually performs best. @P variable captures the value so as to avoid SSMS spitting out 10 million rows and ruining the performance metrics.

Figure 3.1: idx.fnPrimes Performance Test

PRINT CHAR(13)+'fnTally serial'+CHAR(13)+REPLICATE('-',90);
GO
DECLARE @ST DATETIME = GETDATE(), @P BIT;
  SELECT      @P = p.PRIME
  FROM        dbo.fnTally(1,10000000) AS f
  CROSS APPLY idx.fnPrimes(f.N)       AS p
  OPTION (MAXDOP 1);
PRINT DATEDIFF(MS,@ST,GETDATE());
GO 3

Figure 3.2: idx.fnPrimes Performance Test Results

fnTally serial
------------------------------------------------------------------------------------------
Beginning execution loop
2660
2727
2750

Under 3 seconds for idx.fnPrimes to accept the numbers 1 to 10,000,000 and determine if which ones are prime.  This with a serial execution plan!

Unexpected Virtual Index Pitfalls (continued…)

One motivation for giving a name to what I call, the “virtual index,” is to make the following type of problem (and solution) easier to explain. Part 2 of this series you were introduced to some unexpected virtual indexing pitfalls which can be tricky to identify. There’s another issue to watch out for people using SQL 2019 which includes batch mode over rowstore, or users or who leverage Itzik Ben-Gan’s batch mode processing trick to enjoy this functionality on SQL Server 2016+.  I use SQL 2019 and the optimizer leverages a window aggregate function that invokes batch mode over rowstore which is the culprit here. Below are two techniques to retrieve the numbers 1 to 7 and identify which are prime.

Figure 4: Hidden CTE Tally table row retrieval issue

--==== 1. Leveraging fnTally
SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)  AS f
CROSS APPLY idx.fnPrimes(f.N) AS p; 

--====  2. Static Values
SELECT      p.N, p.PRIME
FROM        (VALUES(1),(2),(3),(4),(5),(6),(7)) AS f(N)
CROSS APPLY idx.fnPrimes(f.N)                  AS p; 

Each function returns the expected result set: [(1 1),(2 1),(3 1),(4 0),(5 1),(6 0),(7,1)]; The execution plan for the fnTally solution.

Figure 4.1: Execution Plan Details

fnTally retrieved 2700 rows to generate the numbers 1-7! The second query didn’t have this issue. Luckily there are more than a few ways to solve this.

Solution 1: Disable batch mode over rowstore

Let’s temporarily disable batch mode over rowstore and try again.

Figure 5.1: Solution #1: Disable batch mode over rowstore query

SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)    AS f
CROSS APPLY idx.fnPrimes(f.N)   AS p;
OPTION(USE HINT('DISALLOW_BATCH_MODE'));

Figure 5.2: Solution #1: Execution plan

7 rows only, much better!

Solution 2: make_parallel

In my experience Adam Machanic’s make_parallel always forces plans that don’t include batch mode over rowstore. I don’t know if it’s an option with make_parallel in play.

Figure 6.1: Solution #2: make_parallel

SELECT      p.N, p.PRIME
FROM        dbo.fnTally(1,7)    AS f
CROSS APPLY idx.fnPrimes(f.N)   AS p
CROSS JOIN  dbo.make_parallel() AS parallel_execution;

Below is the portion of the plan where the ROW_NUMBER sequence is created; again, only 7 rows

Figure 6.2: Solution #2:  Execution plan

Solution 3: Use a physical tally table

The conventional wisdom seems to be that the CTE tally table is faster than a physical table pretty much all the time, a reasonable assumption since the CTE tally table is readless but physical tally tables do require IO. Here is an example of where the physical tally table (correctly indexed) has an edge.

Figure 7.1: Solution #3: Physical Tally Table

Note that both a TOP + ORDER BY, WHERE clause filter, or a both together will do just fine.

SELECT TOP(7) p.N, p.PRIME
FROM        dbo.tally         AS f
CROSS APPLY idx.fnPrimes(f.N) AS p
ORDER BY    f.N;

Again, only 7 numbers retrieved.

Figure 7.2: Solution #3: Execution Plan

This is one of many subtle differences between a persisted tally table and a CTE version. Let’s explore deeper.

RangeAB and RangeAB2

We will conclude by running two performance tests against one million rows. The one million limit is because that’s what dbo.tally is capped at. For this test let two tally table functions: the first can be considered an “add-on” to dbo.fnTally named dbo.rangeAB. The second takes identical parameters and returns identical results, the only difference is that it leverages physical tally table (dbo.tally) instead of dbo.fnTally.

Both functions can be used to help enforce best practices with as little code as possible. Even better: Both functions return the exact result set given the same parameters. This gives you the ability to test your set-based code against both a physical and virtual tally table index simply adding or removing the number “2” from the end of “RangeAB.”

The functions and usage examples are located here. For the purposes of this article I will show you the execution plans for a basic usage scenario so we can compare execution plans. Both examples use one of the RangeAB functions to return a series of weeks. We’ll run these with “show actual execution plan” turned on.

Figure 8.1: Basic Usage Example

DECLARE
  @startdate DATE   = '20200831',
  @weeks     BIGINT = 5,
  @row1      BIGINT = 1;

BEGIN
--==== RangeAB (fnTally version)
  SELECT
    WeekNumber     = f.RN,
    WeeksLeft      = f.OP,
    DaysFromMedian = ABS(f.DM),
    WeekStart      = DATEADD(DAY,f.N1-1,@startdate),
    WeekEnd        = DATEADD(DAY,f.N2-2,@startdate)
  FROM     dbo.rangeAB(@row1, @weeks*7-(1-@row1), 7, @row1) AS f
  ORDER BY f.RN;

--==== RangeAB2 (dbo.tally version)
  SELECT
    WeekNumber = f.RN,
    WeeksLeft  = f.OP,
    DaysFromMedian = ABS(f.DM),
    WeekStart  = DATEADD(DAY,f.N1-1,@startdate),
    WeekEnd    = DATEADD(DAY,f.N2-2,@startdate)
  FROM dbo.rangeAB2(@row1, @weeks*7-(1-@row1), 7, @row1) AS f
  ORDER BY f.RN;
END;

Figure 8.2: Results (for each)

    WeekNumber  WeeksLeft  DaysFromMedian  WeekStart  WeekEnd
    ----------- ---------- --------------- ---------- ----------
    1           5          14              2020-08-31 2020-09-06
    2           4          7               2020-09-07 2020-09-13
    3           3          0               2020-09-14 2020-09-20
    4           2          7               2020-09-21 2020-09-27
    5           1          14              2020-09-28 2020-10-04

Execution plans

Ignore the 40/60% estimates, they aren’t accurate, both perform almost identically. The win is that we can use each to quickly determine which is better – dbo.tally or dbo.fnTally without any logic changes. fnTally uses the Itzik Ben-Gan style ROW_NUMBER over cross joins, the other gets its rows from dbo.tally.

Figure 8.3: Execution Plans

Prime Numbers Test #1: RangeAB for a running total of Primes

Now back to our indexed function, idx.fnPrimes. This first example is a simple running total of prime numbers from 1 to 50. Ignore the @gap parameter for now.

DECLARE @gap BIGINT = 1, 
        @low BIGINT = 0, 
        @hgh BIGINT = 50; 

--==== Solution #1: Primes running total over an Indexed Function + Virtual Index (RangeAB) 
SELECT
  N           = ((rnd.RN-2)*@gap+@low)+1, 
  TotalPrimes = SUM(p.PRIME+0) 
FROM        dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)            AS p
GROUP BY    rnd.RN
ORDER BY    rnd.RN; 

--==== Solution #2: Primes running total over two indexed functions
SELECT
  N           = ((rnd.RN-2)*@gap+@low)+1, 
  TotalPrimes = SUM(p.PRIME+0) 
FROM        dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)             AS p
GROUP BY    rnd.RN
ORDER BY    rnd.RN; 

Figure 10.1: Primes Test #1 Results (truncated for brevity)

N       TotalPrimes
------- -----------
1       1
2       2
3       3
4       3
5       4
6       4
7       5
8       5
...
46      15
47      16
48      16
49      16
50      16

Prime Numbers Test #2: Performance

Here we’ll do two tests, first with a serial plan, then with a parallel plan. What we need to do is create @gap-sized groups and collect a between @low and @hgh along with a count of prime numbers in that group. We’ll compare the performance RangeAB vs RangeAB2. Again, the logic is identical except that the latter leverages dbo.tally. Note the parameters.

First to test serial execution using MAXDOP 1.

Figure 11.1 Serial Test

SET STATISTICS IO, TIME ON;

DECLARE
  @gap BIGINT = 100000,
  @low BIGINT = 0,
  @hgh BIGINT = 1000000;

PRINT CHAR(10)+'fnTally version'+CHAR(10)+REPLICATE('-',90);
  SELECT
  L           = ((rnd.RN-2)*@gap+@low)+1,
  H           = (rnd.RN-1)*@gap+@low,
  TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)            AS p
GROUP BY   rnd.RN
ORDER BY   rnd.RN
OPTION (MAXDOP 1);

PRINT CHAR(10)+'Tally Function V2 - Indexed'+CHAR(10)+REPLICATE('-',90);
 SELECT
   L           = ((rnd.RN-2)*@gap+@low)+1,
   H           = (rnd.RN-1)*@gap+@low,
   TotalPrimes = SUM(p.PRIME+0)
 FROM        dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
 CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0)     AS r
 CROSS APPLY idx.fnPrimes(r.RN)             AS p
 GROUP BY   rnd.RN
 ORDER BY   rnd.RN
 OPTION (MAXDOP 1);
SET STATISTICS IO, TIME OFF;

Figure 11.2: Primes Test #2 Result Set

L           H            TotalPrimes
----------- ------------ -----------
1           100000       19187
100001      200000       38367
200001      300000       57548
300001      400000       76728
400001      500000       95910
500001      600000       115092
600001      700000       134272
700001      800000       153453
800001      900000       172634
900001      1000000      191814

Figure 11.3: Primes Test #2 Performance (Serial: Time & IO)

fnTally version
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843792, physical reads 0...
Table 'Worktable'. Scan count 0, logical reads 15101551, physical reads 0...

 SQL Server Execution Times: CPU time = 23312 ms,  elapsed time = 23452 ms.

Tally Function V2 - Indexed
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843792, physical reads 0...
Table 'tally'. Scan count 2, logical reads 9555, physical reads 0...

 SQL Server Execution Times: CPU time = 9110 ms,  elapsed time = 9163 ms.

This a case where dbo.tally finds redemption. 9 Seconds vs 23 – not bad. Next for a parallel execution plan leveraging dbo.make_parallel.

Figure 11.4: Parallel Test

SET STATISTICS IO, TIME ON;

DECLARE
  @gap BIGINT = 100000,
  @low BIGINT = 0,
  @hgh BIGINT = 1000000;

PRINT CHAR(10)+'fnTally version'+CHAR(10)+REPLICATE('-',90);
  SELECT
  L           = ((rnd.RN-2)*@gap+@low)+1,
  H           = (rnd.RN-1)*@gap+@low,
  TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)            AS p
CROSS JOIN  dbo.make_parallel()           AS x
GROUP BY   rnd.RN
ORDER BY   rnd.RN;

PRINT CHAR(10)+'Tally Function V2 - Indexed'+CHAR(10)+REPLICATE('-',90);
SELECT
  L           = ((rnd.RN-2)*@gap+@low)+1,
  H           = (rnd.RN-1)*@gap+@low,
  TotalPrimes = SUM(p.PRIME+0)
FROM        dbo.rangeAB2(@low,@hgh,@gap,1) AS rnd
CROSS APPLY dbo.rangeAB2(0,rnd.N1,1,0)     AS r
CROSS APPLY idx.fnPrimes(r.RN)             AS p
CROSS JOIN  dbo.make_parallel()            AS x
GROUP BY   rnd.RN
ORDER BY   rnd.RN;
SET STATISTICS IO, TIME OFF;

Figure 11.5: Primes Test #2 Performance (Parallel: Time & IO)

fnTally version (RangeAB)
------------------------------------------------------------------------------------------
Table 'prime'. Scan count 0, logical reads 16843855, physical reads 0...
Table 'Worktable'. Scan count 0, logical reads 15101558, physical reads 0...

 SQL Server Execution Times: CPU time = 35877 ms,  elapsed time = 7816 ms.

dbo.tally version (RangeAB2)
------------------------------------------------------------------------------------------
Table 'tally'. Scan count 9, logical reads 9604, physical reads 0...
Table 'prime'. Scan count 0, logical reads 16843855, physical reads 0...

 SQL Server Execution Times: CPU time = 16687 ms,  elapsed time = 3638 ms.

The great news is that, in both cases, parallel execution speeds things up for both. The fist version (virtual index) increasing to 7.8 seconds, the second (the indexed function) completes in 3.6 seconds. The reads against the primes table (via the idx.primes indexed function) are excessive. That said, to calculate the numbers of primes, split into 100K chunks up to 1,000,000… that should take longer than 3.6 seconds.

Conclusion

We have a new way to take complicated functions and speed them up to O(1) time leveraging a new concept I coined as the Indexed Function.; a powerful tool for you set-based programming toolbox. We have a way to quickly compare a CTE-Tally to a real one. We’ve learned that a physical tally table, in some cases, can blow the doors off a virtual one.

The Virtual Index and Indexed Functions are indispensable tools in my toolbox. Thanks for reading.

Author: Alan Burstein

SQL Performance Ninja with 20+ years of writing high-performing, elegant SQL code. Expert at developing super fast NLP algorithms

2 thoughts on “Virtual Indexing Part 3: The Indexed Function”

Leave a Reply