A common use case in business intelligence queries is to expand a range of dates from the source system into a full set of rows, with one row per date. The result is easier to query as it can be linked directly to a date dimension. How can we implement this solution?
There's already a tip explaining how you can implement this: How to Expand a Range of Dates into Rows using a SQL Server Numbers Table. In that article, a solution with a "numbers table" (sometimes referred to as a "tally table") is discussed to expand the rows. In this tip, we'll introduce you to an alternative solution using the GENERATE_SERIES function. This function is presented as part of the SQL Server 2022 preview.
At the time of writing this tip, SQL Server 2022 is still in preview (currently, CTP 2.1 has been released). This means functionality or features of SQL Server might change, disappear, or be added in the final release.
The output of the function is a result set with one single column named value, which contains all the numeric values of the interval defined by the three parameters.
As you can see in the last example, the stop value isn't necessarily included in the returned interval. If we want to use the function to generate dates, we can do this with DATEADD:
For more information, you can check out the tip, My Favorite T-SQL Enhancements in SQL Server 2022, by Aaron Bertrand. His tip mentions a performance issue because the function is not order-preserving, but this has been fixed in CTP 2.1. When you add an ORDER BY on value, there's no sort operator in the plan:
Keep in mind the syntax has also changed a bit. Explicitly adding START and STOP before the parameter values will result in a syntax error:
Now we know how to use the new GENERATE_SERIES function, we can use it to solve our use case. Let's use the same sample data as in the previous tip:
To expand the data range, we're generating an interval of 1,000 numbers and turning those into dates, as we've seen in the previous paragraph. Finally, we join this against our sample data. The T-SQL statement becomes:
It's clear to see the code has become much shorter. The execution plan is also clearer:
As a reference, here's the execution plan of the query using common table expressions to generate a numbers table on the fly:
This doesn't speak directly about performance, mind you, but using GENERATE_SERIES will make plans that use a numbers table easier to read.
Probably most important, we get our desired result set:
We can now join this data set with a date dimension, for example, using the ExplodedDate column. This would allow for more straightforward time-intelligence queries. Let's illustrate with an example. Suppose we want to find all holiday periods that include a specific week. If your data dimension has a column for weeks, you can quickly filter on that column after we've joined the dimension against our expanded result set.
The tip Creating a date dimension or calendar table in SQL Server explains how you can quickly create a date dimension or calendar table. Once we have it, we can join it against the query we used earlier:
Here we're looking for holiday periods with at least one day of ISO week #23.
A DISTINCT clause is used to get only one single row: the holiday period we wished to find.
The new function GENERATE_SERIES will simplify any query that needs an interval of numbers. The function is easy to use and simplifies execution plans that use such a numbers table. There were some performance issues in the initial version, but these seem to be solved in the CTP 2.1 release.