Forecasting
When is the R² Not the R²?
A recently received support email contains a riddle.
“When I build a regression model in Excel and remove the intercept, I get a different R² than when I do the same thing in MetrixND. Why does this happen?”
To be clear, the estimated regression model for both Excel and MetrixND excludes the intercept term as shown below.
So, why does Excel and MetrixND produce different R² values?
MetrixND Regressions and R² Calculation
In MetrixND, R² is calculated as shown below.
This is the standard definition for R² and is constructed as one minus the ratio of the sum of squared errors (SSE) divided by the total sum of squares (TSS).
The mathematical equivalent can be written as the model sum of squares (MSS) divided by the total sum of squares (TSS) as shown below. This is becomes important at the end (i.e. spoiler alert!).
Since I’m paranoid, I double check the equations by building a regression model and manually validating the MetrixND equations in Excel. Check. The manual equations confirm that MetrixND is calculating the R² according to the standard definition.
Excel Regressions and R² Calculation
If MetrixND is correct, then what’s wrong with Excel?
First, I need to run the Excel regression module. If you’ve never done it, you can do it with the following steps.
Step 1: Select Data, Data Analysis and pick the Regression option. This opens the Regression dialogue box.
Step 2: Fill out the dialogue box by identifying the X and Y data ranges and select your options. For my test, I select the check box for “Constant is Zero” to remove the intercept from the regression equation, highlighted in red:
Step 3: Select “OK” and I can see the answer.
Following this process, I compare the Excel and MetrixND R2 values, and indeed, I get different numbers. So what’s wrong?
I wade through the morass of internet posts and find a Microsoft confession: “… if you click the Set Intercept = 0 box, the R-squared value is always incorrect in any version of Excel…”
There it is. Microsoft says they are incorrect.
What is Excel Doing?
After a bit more digging, I am happy to confirm that Excel calculates R² using the following formula only when the intercept is removed (please note that Excel uses the standard definition which matches the MetrixND results when the intercept is included in the regression model).
When you consider this calculation, you can see that this version of R² is much like the mathematical equivalent version (MSS/TSS) I showed above. Here it is again for your reference:
If y ̅ = 0, then the equations are the same. Thus, when the intercept is removed, Excel changes the formula and re-centers the R² calculation around 0 not y ̅. In Microsoft’s words, this is “incorrect”.
When is the R² Not the R²?
When the R² formula is incorrect. If you find the word “incorrect” overly harsh, perhaps the word “different” is better. After all, it’s just a formula.
Want to learn more? Attend an Itron Forecasting Workshop!
Related Articles
HTML Example
A paragraph is a self-contained unit of a discourse in writing dealing with a particular point or idea. Paragraphs are usually an expected part of formal writing, used to organize longer prose.