Date Scaffolding: A Method for Adding Missing Dates
Date scaffolding is used to infill dates when a date might be missing from your data, for example you might be looking at staff turnover and you know their start date, and if they are a leaver their end date, but you will also want to know the dates in-between. Or you have a list of transactions and you want to plot the transactions by date, but what about the days when there are no transactions?
There are two main methods in Alteryx (and potentially other ways as well as there are many routes to solving a problem in Alteryx) which I will walk through. The first is the way I’ve tended to always do it and the second is a new method I recently discovered, which also has additional benefits.
For the examples the data is structured in the following format:
Using generate rows
In this first example I use a cross table tool to get one record for my data, with two fields, one for the Start date and a second for the End date. Once my data is shaped I use the Generate Rows tool. This enables me to, using a DateTimeAdd formula, create new records for each date between my start and end date. The set up for this is shown in the configuration pane:
Within the config pane, we set up:
- That we want to create a new field called ‘Dates’ and that the field type is date
- The initialisation expression, which is the Start Date, i.e. we want to start from this date
- The condition is that the new ‘Dates’ we create are less than or equal to (<=) the End date
- We loop through creating a new record until the condition in (3) is met by adding 1 day to the ‘Dates’. In here we can specify ‘seconds’ (n.b. the field type would need to be DateTime) through to ‘years’.
Then once we run the workflow we will have a record for each day between our start and end dates.
The second method uses a macro built in to Alteryx called TS Fill*.
* Despite the tool showing that it is built using R code, shown by the R icon in the bottom left, there is actually no R code in the tool and it mostly uses the Generate Rows method as outlined above, and uses interface tools to increase the functionality of the tool!
With this tool you don’t need to cross tab your data to get a field for the start and end date, instead it ‘fills in’ the missing dates. Therefore using the example data it will treat the first record as the start date and the second record as the end date.
In the config pane you can see that you now have interface tools to select how you want the TS Fill tool to behave.
To set the tool up:
- Select the Date / DateTime field that you want to create the scaffolding for
- What you interval is. (N.B. As mentioned above in the * note, this actually uses an interface tool to configure the behaviour of the Generate Rows tool, so if you select ‘Years’ it will update the formula in step 4 of the Generate Rows tool to be DateTimeAdd([Start],1,‘years’)
- Set the increment, so you can deal with creating say fortnightly data by setting the interval to weeks and the increment to 2.
Running the workflow will create each day between your start and end date.
Additional functionality of TS Fill
So apart from the TS Fill tool providing a user interface to configure, what lies behind it being the Generate Rows tool, which some may find it being a little easier to use if you find the DateTimeAdd function a bit fiddly to use. The TS Fill tool is useful in ‘filling in’ any missing dates. So if you find that your data extract is say from a sales system, you may only have records in your data for days where sales occured. So if you want to have a complete set of dates you will need to use the TS Fill tool to add in any missing days, you can then add in those missing date either with null values or depending on your needs to additional analysis to impute (say using average) what those values are.
So if like me you previously did scaffolding using generate rows, hopefully you can now add another tool to your workflows which is a bit more flexible.