Forecasting
Dates and Serial Numbers: Might As Well Leap
In Excel and MetrixND, dates are actually serial numbers. This is obvious in Excel when you format a cell containing a date as a number. In the following image from Excel, the first and second columns contain the same dates—the first column is formatted to display the value as a date and the second column is formatted to display the value as a number. To populate this series, I typed 1/1/1900 in the first cell and I added 1 to the prior cell in each subsequent row, dragging the series downward. For the sake of brevity, I have hidden the rows between 1/3/1900 and 2/28/1900.
There are a few observations to make about this image. First, the serial number is indexed to 1/1/1900. That is, Jan. 1, 1900 is indicated as 1. Second, the dates indicate that there was a Feb. 29 in 1900. There is one big problem with this however: 1900 was NOT a leap year.
Let’s review the rule for determining if a year is a leap year in the Gregorian calendar (which is the calendar used most widely around the world). A year is a leap year if the year is divisible by 4, except if it is divisible by 100, but years divisible by 100 are leap years if they are also divisible by 400. Thus, the year 2000 was a leap year because it is divisible by 4, 100 and 400, but the year 1900 was not a leap year because it is divisible by 4 and 100, but not by 400.
As a side note, the following MetrixND transform creates a binary variable that indicates if a year is or is not a leap year, but I digress.
This means that Excel is incorrectly identifying 1900 as a leap year. When I found this out, I was hoping that I had made a great discovery. Much to my chagrin however, Microsoft is aware of this and you can read a detailed explanation of it here. Essentially, the bug was intentionally implemented in Excel for backward compatibility with Lotus 1-2-3—an early spreadsheet application with which Excel competed—where this bug was introduced.
Let’s see how MetrixND addresses this issue. The following MetrixND transform utilizes the keyword DATE, which returns the serial number of the date. The following image displays the end of February and the start of March in 1900.
There are two things to observe. First, March 1 and March 2 are 61 and 62 respectively, which align exactly with the results from Excel. Second, there is (correctly) no Feb. 29 in 1900. This means all the days after Feb. 28, 1900 have the same serial number in Excel and MetrixND.
But, where does that leave us prior to Feb. 28? The following shows the dates including Jan. 1, 1900. The dates prior to Feb. 28, 1900 are different by one between the two applications, but that is of little consequence since we rarely have access to daily data from 1900.
The takeaway is that MetrixND indexes the date’s serial number to Dec. 31, 1899 instead of Jan. 1, 1900, as Excel does. Because MetrixND correctly makes 1900 a non-leap year, the serial numbers are consistent starting on March 1, 1900 and thereafter.
I am hopeful this pedantic piece of knowledge will allow you to sleep better at night.
Have you seen the new Itron website? Be sure to visit the new forecasting page at www.itron.com/forecasting.
Related Articles
HTML Example
A paragraph is a self-contained unit of a discourse in writing dealing with a particular point or idea. Paragraphs are usually an expected part of formal writing, used to organize longer prose.