IF OBJECT_ID('dbo.NTally') IS NOT NULL DROP FUNCTION dbo.NTally; GO CREATE FUNCTION dbo.NTally ( @tiles BIGINT, @rows BIGINT ) /***************************************************************************************** [Purpose]: Returns a tally table with "tile groups" and can be used as a (virtually) read-less and much faster alternative to the T-SQL NTILE function introduced in SQL Server 2005. See the usage examples below for more details on how to use this function [Author]: AJB [Compatibility]: SQL Server 2005+ [Syntax]: --===== Autonomous SELECT r.RN, r.Tile FROM dbo.NTally(@tiles,@rows) AS r --===== Against a table using APPLY WITH anchor AS ( SELECT rn = ROW_NUMBER() OVER (ORDER BY t.SomeValue), t.SomeValue FROM SomeTable AS t ) SELECT t.SomeValue, Tile = nt.tile FROM anchor AS t CROSS APPLY dbo.NTally(@tiles, (SELECT COUNT(*) FROM anchor)) AS nt WHERE t.rn = nt.rn; [Parameters]: @tiles = BIGINT; requested number of tile groups (same as the parameter passed to NTILE); @rows = BIGINT; the number of rows to be "tiled" (have group number assigned to it); [Returns]: Inline Table Valued Function returns: RN = BIGINT; a row number beginning with 1 and ending with @rows; Tile = BIGINT; a "tile number" or group number the same; [Dependencies]: A numbers table named dbo.tally with a UNIQUE NONCLUSTERED index on N; [Developer Notes]: 1. An inline derived tally table using a CTE or subquery WILL NOT WORK. NTally requires a correctly indexed tally table named dbo.tally; if you have or choose to use a permanent tally table with a different name or in a different schema make sure to change the DDL for this function accordingly. The recomended number of rows is 1,000,000; below is the recomended DDL for dbo.tally. Note the "Beginning" and "End" of tally code.To learn more about tally tables see: http://www.sqlservercentral.com/articles/T-SQL/62867/; --===== TALLY TABLE CODE IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally; CREATE TABLE dbo.tally (N int NOT NULL); --; Insert the numbers 1 through 1,000,0000 into dbo.tally INSERT dbo.tally SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns a CROSS JOIN sys.all_columns b; --; Create required primary key, clustered index and unique nonclustered indexes ALTER TABLE dbo.tally ADD CONSTRAINT pk_tally PRIMARY KEY CLUSTERED(N) WITH FILLFACTOR=100; ALTER TABLE dbo.tally ADD CONSTRAINT uq_tally UNIQUE NONCLUSTERED(N); 2. Given the number of rows in your persisted tally table (R), dbo.NTallyRangeAB can generate no more than R tile groups per partition. Given a persisted tally table with 1,000,000 rows - the maximum number of tiles is 1,000,000. In this scenario, setting @tile > 1,000,000 will cause the function to "quietly fail", meanin that it will return the wrong results without warning. The same consideration is true for the number of rows available per tile group per partition. Fortunately, I cross joined dbo.tally to itself to generate one trillion (R*R) rows per partition. A million row tally table will requires ~20MB of uncompressed disk space when created, along with the two indexes, using the DDL above; 3. For best results a P.O.C. index should exists on the table that you are "tiling". For more information about P.O.C. indexes see: http://sqlmag.com/sql-server-2012/sql-server-2012-how-write-t-sql-window-functions-part-3 4. NTally is deterministic when order does not matter or when the ORDER BY clause is OVER a DISTINCT set; otherwise the results should be considered non-deterministic. for more about deterministic and nondeterministic functions see: https://msdn.microsoft.com/en-us/library/ms178091.aspx [Examples]: --===== 1. Demonstrating how the function mimics NTILE --; To better understand NTally, run the DML with different values assigned to @rows and --; @tiles. Note how the tile column and NTILE produces the same results. DECLARE @rows bigint = 8, @tiles bigint = 3; SELECT r.RN, r.tile, NTILE(@tiles) OVER (ORDER BY rn) as [NTILE] FROM dbo.NTally(@tiles, @rows) AS r; --===== 2. Using NTally as a faster alternative to NTILE (with no PARTITION BY clause) --; Run the code below from --;START to --;END --; Note how you get the same result but how, the more rows you add, the more efficient --; the NTALLY solution is, with respect to reads, when compared to NTILE: --; e.g. NTILE against 100K rows = 200K+ reads, only 560+ reads for the the NTally method --;START SET NOCOUNT ON; DECLARE @rows bigint = 200000, @tiles bigint = 5; DECLARE @SomeTable TABLE (SomeValue int primary key); INSERT @SomeTable SELECT TOP(@rows) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))*5 FROM sys.all_columns a CROSS JOIN sys.all_columns b; --; 2.1. How to divide @some table into 3 tile groups using NTILE SET STATISTICS IO ON; PRINT 'NTILE version:'+CHAR(10)+REPLICATE('-',60); SELECT s.SomeValue, NTILE(@tiles) OVER (ORDER BY s.SomeValue) AS TileGroup FROM @SomeTable AS s; --; 2.2. How to divide @SomeTable into 3 tile groups using NTally PRINT CHAR(10)+'NTally version:'; WITH anchor AS ( SELECT s.SomeValue, ROW_NUMBER() OVER (ORDER BY s.SomeValue) AS rn FROM @SomeTable AS s ) SELECT a.SomeValue, nt.tile AS TileGroup FROM anchor AS a CROSS APPLY dbo.NTally(@tiles, (SELECT COUNT(*) FROM @SomeTable)) AS nt WHERE a.rn = nt.rn; SET STATISTICS IO OFF; --;END --===== 3. Using NTally an alternative to NTILE with a PARTITION BY clause --; 3.1. Create sample table with 10 rows and 3 partitions IF OBJECT_ID('tempdb..#SomeTable') IS NOT NULL DROP TABLE #SomeTable; CREATE TABLE #SomeTable ( PartitionKey int NOT NULL, SomeValue int NOT NULL, CONSTRAINT pk_SomeTable PRIMARY KEY(PartitionKey,SomeValue) ); INSERT #SomeTable SELECT TOP (12) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))/5+1, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))*5 FROM sys.all_columns; --; 3.2. Using NTILE and PARTITION BY SELECT s.PartitionKey, s.SomeValue, NTILE(3) OVER (PARTITION BY s.PartitionKey ORDER BY s.SomeValue) AS TileGroup FROM #SomeTable s; --; 3.3. Using the NTally function WITH anchor AS -- Use ROW_NUMBER for your partitioning and sorting ( SELECT RN = ROW_NUMBER() OVER (PARTITION BY v.PartitionKey ORDER BY v.SomeValue), PartitionKey, SomeValue FROM #SomeTable AS v ), parts AS -- collect the number of rows per partition ( SELECT PartitionKey, mxrn = MAX(a.RN) FROM anchor AS a GROUP BY PartitionKey ) SELECT a.PartitionKey, a.SomeValue, nt.tile AS TileGroup FROM parts AS p CROSS APPLY dbo.NTally(3,mxrn) AS nt CROSS APPLY anchor AS a WHERE p.PartitionKey = a.PartitionKey AND a.rn = nt.rn; DROP TABLE #SomeTable; --===== 4. Use Cases (1) Assigning Group Identifiers (AKA "bucket numbers") for parallel processing systems such as Hadoop, SSIS and PDW; NTILE divides the buckets as evenly aspossible. (2) NGroup (all) (3) NPage Family (some) ----------------------------------------------------------------------------------------- [Revision History]: Rev 00 - 20140501 - Initial Creation - AJB Rev 01 - 20160324 - Final touches and optimization including comments - AJB Rev 02 - 20190114 - Re-design/Simplified; All datateypes are bigint - AJB ****************************************************************************************/ RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT RN = ROW_NUMBER() OVER (ORDER BY t.N), Tile = t.N FROM dbo.tally AS t CROSS APPLY ( SELECT TOP(@rows/@tiles+IIF(t.N<=@rows%@tiles,1,0)) $ FROM dbo.tally AS t1 CROSS JOIN dbo.tally AS t2 ) AS x(x) -- Dummy Rows WHERE t.N<=@tiles; GO