Some lessons from Alteryxing the Advent of Code
It is December which means it is time for Advent of Code. This is an annual coding / programming challenge created by Eric Wastl, during December a daily set of small programming puzzles is set for a variety of skill sets and skill levels that can be solved in any programming language you like. As this is an Alteryx blog, I am going to write about how these can be solved using just Alteryx. The word for this coined by James Dunkerley is BaseA – we also spoke to James last year as part of our podcast series.
This is my fourth year of participating in the challenge and while it is only day 6, I felt now was the right time to write about some common lessons and tricks that you may find useful as you progress through Advent of Code but also can be applied to how you build workflows.
Read the question and read it again
A lesson you probably were told many a time when you were preparing for exams at school, and even many many years later it still holds true. I have lost so much time on previous solves where there was a simple step in the instructions which I either missed on first read or misinterpreted. So read it again and then move onto planning out your workflow, and probably read it a third time to check your logic again.
Plan your workflow
Many Alteryx users will have different approaches to how to do this, but planning out the the high level logic of your workflow by using comment boxes is a great way to plan each step. This will also help save you time as you build the workflow, without needing to refer back to the question.
Work with the example
Each question is provided with an example dataset, which is much smaller than your input file, but also you get provided the answer. So I always suggest building your workflow and testing it against the answer this has the following benefits:
i) You can check that your logic is producing the expected answer, before running with your full dataset. This is because if you run only with the full dataset and get the answer wrong you trigger a 1 minute timeout before you can submit a solution again, each successive incorrect answer increases the time before you can submit again.
ii) If you are not quite there with your logic the example workflow will be easier to debug. This will save you time, especially as the challenges tend to get much harder as the days progress.
However, as this caught me out this year, make sure you check that your example can scale with the actual input. On one challenge I used a text to columns tool and set this to split to 8 columns, which worked fine for the example, however the actual input required 12 columns to be created!
Check your data types!
This has tricked me a couple of times already this year. When building workflows in Alteryx you can rely on some automatic sorting of your data in tools such as the Summarize or Sample tool which will sort on the group by columns. However if your data is stored in string values, for example in when using text to columns you may generate Field 1, Field 2, … Field 10, Field 11, Field 12. When this is transposed you get a name column where the data is stored as a string, so when you group by this field the sort order is Field 1, Field 1o, Field 11, Field 12, Field 2… compared to what you might expect as Field 1, Field 2, Field 3… etc. You can see this difference below, and might get over looked, but trust me it can create very different results.
Can you simplify the problem?
Day 6 of this year’s Advent of Code is a great example of needing to simplify the problem. How the question was presented made it look like you wanted to build a solution where each day the length of the string increases.
This works fine until you get to part 2 with exponential growth with 255 iterations would end up with a really, really, really long string which slows down the solve. And this meme on Reddit sums it up well!
Actually you discover the solve only needs 9 records (1 for each day, with a count of the number of fish in that state) to pass through the iterative macro and it solves on my laptop in 0.8 seconds!
Other cases I’ve seen in previous years is where after x number of iterations a pattern is established. Once you have the pattern you can roll it forward to the iteration you need for the solution without needing to actually compute 1,000,000+ iterations.
Do you need an iterative macro?
I’ve previously blogged (and yes I do need to move this over to AlterTricks)about a similar approach where I originally thought the solution would need an iterative macro, when in fact there is a method with a much faster runtime. Sometimes there is a solution where you don’t need to build an iterative macro, and Alteryx is super fast at solving row based calculations. This came into use on Day 4 where you needed to find which bingo card is the first or last to win. Some users approached this by iterating through the balls being drawn and as part of each iteration checking to see if a card has won (completed a row or column). However if you translate each number on your bingo card to the order the numbers are drawn, then you can find the max draw position for each row and column (using a summarize tool) and then just sort your data to find the card with the lowest / highest number.
Compare your solution with the solutions submitted by others
Remember while you might be competing with fellow Alteryx users to top the leaderboard or be the first to solve a particular day, this is a great learning resource. So check out the solutions submitted by others and compare with your result. Did the other user approach it in a different way to you… Did they use fewer tools for a step or leverage a configuration option you didn’t know about. These are all examples of knowledge you can gain by spending a bit of time at the end of solving the challenge.
I have also been using Reddit to check out the solutions non-Alteryx users have been sharing, as there could also be something you could learn from their approach that could make you a better Alteryx developer.