Forecasting
I Get by With a Little Help from My Friends, and Google
It almost goes without saying that most utilities have seen a noticeable deviation in their electricity sales in 2020 due to the pandemic. But the question remains, how long will the deviation persist and what does the path ahead look like? Load forecasters the world over are peering into their crystal balls to try to figure this out.
A big part of the challenge is finding a driver that forecasters can use in their models to help capture the variation in sales due to behavioral changes from various COVID-19 mitigation policies. And if you do find a driver, with any luck you can forecast it without too much heartburn. Coincidentally, Google has been publishing daily anonymized COVID-19 Community Mobility Data that shows deviations from baseline location data by state, county and country from users who have turned on their Location History setting on their mobile devices. A few of my colleagues and I have leveraged this data in our forecast models, and the results have proven quite favorable.
The comprehensive dataset is available for download in CSV format on Google’s mobility data website, free of charge. Just scroll down to “Community Mobility Reports” and click on the “Global CSV” option to download the CSV file. The file is quite large (about 240 MB and counting) with too many records to fully load into Excel, and so I recommend opening it in something like Notepad or Notepad++ and copying and pasting the relevant data into a spreadsheet. You’ll have to do a little wrangling to get the data in a useable form, but surprisingly not much.
This data represents the percentage change in people’s visits to – or time spent – in six categories of places relative to the defined “baseline day,” or median value for that day-of-the-week from the period of Jan. 3 – Feb. 6, 2020. The categories are retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, and residential. To give you a visual, here is what the data for the state of California looks like:
From this visual, we can see there’s a positive percentage change in the residential category and a negative one in workplaces as people shift to spending more time at home and little to no time in their place of work. Retail and transit are also down as people are shopping less and not taking public transportation. Grocery and pharmacy is down as well, but not as much as other categories because people obviously still need to buy food and medications. These percentage deviations appear to have stabilized since June, which makes forecasting this data a little less intimidating.
One thing that pops out from looking at the data is there’s a well-defined day-type pattern (i.e., weekend vs. weekday) for the residential and workplace categories. That is, there’s less of a change on weekends because people were already home and not at work before the pandemic took off. The large spikes are for holidays, as those days reflect a significant change relative to Google’s baseline. Retail also has a day-type pattern, albeit a little less well-defined. For this reason, I found the retail, workplace and residential categories to be the most applicable and useful for predicting loads in this COVID-19 world. And since the data are of daily frequency, you can leverage them in a daily model, or run them through billing cycles and incorporate them into a monthly SAE model.
Going with the latter approach, I started with a “business as usual” Residential SAE model (i.e., one that’s estimated with data through February 2020 so the COVID-19 period data does not influence the model coefficients). What the model shows is that residential use per customer has been higher since April relative to where it should have been subject to the actual weather that occurred.
But incorporating Google’s mobility data into the model helps to close this gap. Moreover, forecasting what we think the percentage changes in the relevant categories will be gives us a better projection for how the rest of the year might shake out.
Undoubtedly, this data is not perfect. For example, the baseline days probably aren’t representative of the true baseline, and Google is aware of this too. And using them certainly won’t remove all of the wrenches this pandemic has thrown into our forecast models. But they just might help to tighten things up and yield a more reasonable load forecast.
Google states that the data will be available for as long as public health officials find them useful, but who knows how long that may be. I don’t think I will try and forecast that. But with any luck, that will be just long enough.
Shout out to the folks in the Operational Forecasting Team at AEMO for calling this data to our attention!
Related Articles
HTML Example
A paragraph is a self-contained unit of a discourse in writing dealing with a particular point or idea. Paragraphs are usually an expected part of formal writing, used to organize longer prose.