Forecasting

When is the R² Not the R²?

June 26, 2019

A recently received support email contains a riddle.

“When I build a regression model in Excel and remove the intercept, I get a different R² than when I do the same thing in MetrixND. Why does this happen?”

To be clear, the estimated regression model for both Excel and MetrixND excludes the intercept term as shown below.

So, why does Excel and MetrixND produce different R² values?

MetrixND Regressions and R² Calculation

In MetrixND, R² is calculated as shown below.

This is the standard definition for R² and is constructed as one minus the ratio of the sum of squared errors (SSE) divided by the total sum of squares (TSS).

The mathematical equivalent can be written as the model sum of squares (MSS) divided by the total sum of squares (TSS) as shown below. This is becomes important at the end (i.e. spoiler alert!).

 

Since I’m paranoid, I double check the equations by building a regression model and manually validating the MetrixND equations in Excel. Check. The manual equations confirm that MetrixND is calculating the R² according to the standard definition.

Excel Regressions and R² Calculation

If MetrixND is correct, then what’s wrong with Excel?

First, I need to run the Excel regression module. If you’ve never done it, you can do it with the following steps.

Step 1: Select Data, Data Analysis and pick the Regression option. This opens the Regression dialogue box.

Step 2: Fill out the dialogue box by identifying the X and Y data ranges and select your options. For my test, I select the check box for “Constant is Zero” to remove the intercept from the regression equation, highlighted in red:

Step 3: Select “OK” and I can see the answer.

Following this process, I compare the Excel and MetrixND R2 values, and indeed, I get different numbers. So what’s wrong?

I wade through the morass of internet posts and find a Microsoft confession: “… if you click the Set Intercept = 0 box, the R-squared value is always incorrect in any version of Excel…”

There it is. Microsoft says they are incorrect.

What is Excel Doing?

After a bit more digging, I am happy to confirm that Excel calculates R² using the following formula only when the intercept is removed (please note that Excel uses the standard definition which matches the MetrixND results when the intercept is included in the regression model).

 

When you consider this calculation, you can see that this version of R² is much like the mathematical equivalent version (MSS/TSS) I showed above. Here it is again for your reference:

 

If y ̅ = 0, then the equations are the same. Thus, when the intercept is removed, Excel changes the formula and re-centers the R² calculation around 0 not y ̅. In Microsoft’s words, this is “incorrect”.

When is the R² Not the R²?

When the R² formula is incorrect. If you find the word “incorrect” overly harsh, perhaps the word “different” is better. After all, it’s just a formula.

Want to learn more? Attend an Itron Forecasting Workshop!

By Mark Quan


Principal Forecast Consultant


Mark Quan is a Principal Forecast Consultant with Itron’s Forecasting Division. Since joining Itron in 1997, Quan has specialized in both short-term and long-term energy forecasting solutions as well as load research projects. Quan has developed and implemented several automated forecasting systems to predict next day system demand, load profiles, and retail consumption for companies throughout the United States and Canada. Short-term forecasting solutions include systems for the Midwest Independent System Operator (MISO) and the California Independent System Operator (CAISO). Long-term forecasting solutions include developing and supporting the long-term forecasts of sales and customers for clients such as Dairyland Power and Omaha Public Power District. These forecasts include end-use information and demand-side management impacts in an econometric framework. Finally, Quan has been involved in implementing Load Research systems such as at Snohomish PUD. Prior to joining Itron, Quan worked in the gas, electric, and corporate functions at Pacific Gas and Electric Company (PG&E), where he was involved in industry restructuring, electric planning, and natural gas planning. Quan received an M.S. in Operations Research from Stanford University and a B.S. in Applied Mathematics from the University of California at Los Angeles.


HTML Example

A paragraph is a self-contained unit of a discourse in writing dealing with a particular point or idea. Paragraphs are usually an expected part of formal writing, used to organize longer prose.