2 minutes
Dynamic Date Ranges
Early in my career, when I first started mentoring junior data analysts, I used to throw in this story: “Every year in January, we sneak in to all the customer analytics systems and install an update that improves runtimes for all the queries by about thirty percent.”
Even the most gullible analysts didn’t buy it, but they would ask what I was talking about.
We always included dynamic date ranges in all our queries, typically limiting them to two years of data plus the current year. Every time a new year started, the last year would no longer match the filter criteria, and the number of rows returned would shrink. And since the customer reports were only going back two years, there was no reason to keep the older data.
As an example in Snowflake:
where invoice_dt >= dateadd('year', -2, current_date())
We were supporting hundreds of customers at the time, so we had to take performance seriously. Forgetting to include a dynamic date range was considered one of the deadliest sins of query writing. It virtually guaranteed we’d get a support ticket asking why the product was slow.
Now that cloud implementations are more common, this well-learned trick is becoming increasingly forgotten, relying instead on purchasing additional computing power. But it does make me wonder how much money is spent regularly on processing data that doesn’t even make it into a single report. This is one trick we should keep in our tool belts.
244 Words
2023-10-17 21:46