Forecasting
The Magical Trinary
Sometimes things don’t look like they’re supposed to look. Just walk into your child’s bedroom. At first glance (and second glance and maybe all the time), there is chaos with clothing, books and random toys scattered across the floor. But hidden in that chaos is a secret sense of order. Ask for a specific item and like a magician, presto, the child produces it in minutes.
Okay, maybe this is wishful thinking. But in model building, I sometimes find this child’s magic.
The Problem: Simplified Chaos
The picture below, created in our MetrixND software, is simplified chaos. My beautiful model (in blue) is set against the actual values (in red). Unlike every other year (you must trust me on this assumption), this March and April show an unexpected sawtooth pattern.
Simplified Chaos
The Binary Solution
Glancing at the chaos, my first instinct is to call March and April “outliers”, add a couple binaries, and move on with my life. Using two variables (March2021 and April2021 binaries), I remove the outliers and finish. The binaries are defined below.
March2021 = (Year= 2021) * (Month =3)
April2021 = (Year= 2021) * (Month =4)
You can see the result below.
Binary Solution
The Trinary Solution
But what if there is order hidden in the chaos? Could there be a reason that March is lower than expected only to be offset with April higher than expected?
In this case, we learn that several customers did not get billed in March and they received double bills in April. In other words, a group of customers’ March consumption data was moved to April creating a dip followed by a spike. Knowing this reason, we can model this data movement with a trinary variable.
A trinary variable is like a binary except that it involves two data points instead of one. Our trinary variable is created below:
MarchApril2021Trinary = (-1) * March2021 + (1) * April2021
The formula creates the following time series.
MarchApril2021Trinary
Using the trinary in the model creates the following result.
Trinary Solution
The Case for Trinaries
In simplified chaos, the trinary and binary solutions produce virtually identical results with nearly identical R2 and MAPE values. While the binary solution literally zeros out the errors for two data points, the trinary solution captures the data movement but still leaves a little error.
So why use a trinary? Here are a couple thoughts.
- Degrees of Freedom. The trinary solution involves fewer variables, thus increasing the degrees of freedom. In situations where the problem occurs multiple times, using trinary variables can provide a similar answer with fewer variables.
- Preserve Impacts. A binary removes the entire impact of the data point. A trinary moves a portion of the impact from one period to another preserving the impact from the remaining variables. In other words, the weather variable impact is still captured from the data points recognized by the trinary. In situations where the data points occur with large weather impacts (e.g., January or August), keeping the weather impacts with a trinary may be more useful than removing the weather impact with a binary.
Here’s a real-world example. In the left picture, chaos reigns with a sawtooth pattern extending from January through June 2021 (although you can argue that it mildly extends to October). In the right picture, I use trinary variables to capture the sawtooth pattern. Using the trinary solution, I improve the overall fit, preserve the weather response, and do not remove any 2021 data points from the data set.
And, I think that’s magic.
If you would like to learn more about MetrixND, please go to the forecasting section of the Itron website or send us an email at forecasting@itron.com. And be sure to subscribe to our blog for more tips and tricks and to be notified when we post our other interesting blogs.
Related Articles
HTML Example
A paragraph is a self-contained unit of a discourse in writing dealing with a particular point or idea. Paragraphs are usually an expected part of formal writing, used to organize longer prose.