_{1}

In this paper I present a novel polynomial regression method called Finite Difference Regression for a uniformly sampled sequence of noisy data points that determines the order of the best fitting polynomial and provides estimates of its coefficients. Unlike classical least-squares polynomial regression methods in the case where the order of the best fitting polynomial is unknown and must be determined from the
R
^{2}
value of the fit, I show how the
t
-test from statistics can be combined with the method of finite differences to yield a more sensitive and objective measu
re of the order of the best fitting polynomial. Furthermore, it is shown how these finite differences used in the determination of the order, can be reemployed to produce excellent estimates of the coefficients of the best fitting polynomial. I show that not only are these coefficients unbiased and consistent, but also that the asymptotic properties of the fit get better with increasing degrees of the fitting polynomial.

In this paper I present a novel polynomial regression method for a uniformly sampled sequence of noisy data points that I call the method of Finite Difference Regression. I show how the statistical t-test2 can be combined with the method of finite differences to provide a more sensitive and objective measure of the order of the best fitting polynomial and produce unbiased and consistent estimates of its coefficients.

Regression methods, also known as fitting models to data, have had a long history. Starting from Gauss (see [

Polynomial regression methods are methods that determine the best polynomial that describes a sequence of noisy data samples. In classical polynomial regression methods when the order of the underlying model is known, the coefficients of the terms are estimated by minimizing the sum of squares of the residual errors to the fit. For example, in classical linear regression (see [^{2} value of its fit. More on these two different aspects of polynomial regression are provided below.

1) Determining the order of the best fitting polynomial (Model order unknown)

In classical regression, the goodness of the order of the fitting polynomial is given by the R^{2} value of the best fitting polynomial―the greater the R^{2} value the better the fit. However, this means that one can go on fitting with higher degree polynomials and thereby increasing R^{2} values. Given the insensitivity of these R^{2} values (see for example Section 2.4) it is indeed hard to find a stopping criterion. To get around this problem, heuristic methods like AIC, BIC, etc. (see [

2) Determining the coefficients of the best fitting polynomial (Model order known/determined)

In classical polynomial regression, the polynomial coefficients of the fit that minimizes the sum of squares of the residual errors are found by a computation that is equivalent to inverting a square matrix. In contrast in the method I describe below, I once again employ the finite differences I used to find the order, to iteratively determine the polynomial coefficients. In Section 2.7, I show that the results of this method are remarkable―not only are the coefficients unbiased and consistent, but also that the asymptotic properties of the fit get better with increasing degrees of the fitting polynomial.

Before describing the Finite Difference Regression method in Section 2, I briefly recapitulate the general method of finite differences whose formulation can be traced all the way back to Sir Isaac Newton (see [^{nd} row, ^{st} row. A plot of this data is shown in

The remaining rows in ^{rd} difference row in

I assume throughout this paper that the data are sampled at regular intervals. Without loss of generality these intervals can be assumed to be of unit length. Then the following general observations can be made:

1) If the ( m + 1 ) st difference row is all zero, then the data is best described by an m^{th} order polynomial: a m x m + a m − 1 x m − 1 + ⋯ + a 1 x + a 0

2) If the ( m + 1 ) st difference row is all zero, then m ! a m is the constant in the m^{th} row. From this observation the coefficient of the highest term can be found. For example in ^{rd} difference row is all zero then 2 ! a 2 is the constant in the 2^{nd} difference row. Since this constant is 4, it immediately follows that the highest term should be 4 x 2 / 2 ! = 2 x 2

3) The lower order terms can be found successively by back substituting and subtracting the highest term. In this method, the data y in

In ^{nd} difference row is all zero. Hence 1 ! a 1 is the constant in the 1^{st} difference row. Since this constant is 1, the next highest term should be 1 x / 1 ! = x .

Again, the final term can be found by replacing y ′ in

The computation in

Hence, by successive applications of the finite difference method in this manner, the best fit polynomial to the data in

In this section I describe the Finite Difference Regression method, compare the results obtained from this method to those from classical regression, and show unbiasedness and consistency properties of the Finite Difference Regression coefficients.

The intuitive idea behind the method of Finite Difference Regression is simple. Suppose one were to add normally distributed noise with mean 0 independently to each of the y-values for example in

x | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

y | 6 | 13 | 24 | 39 | 58 | 81 | 108 | 139 | 174 | 213 |

1^{st} Difference | 7 | 11 | 15 | 19 | 23 | 27 | 31 | 35 | 39 | |

2^{nd} Difference | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | ||

3^{rd} Difference | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||

4^{th} Difference | 0 | 0 | 0 | 0 | 0 | 0 |

x | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

y | 6 | 13 | 24 | 39 | 58 | 81 | 108 | 139 | 174 | 213 |

y ′ = y − 2 x 2 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |

1^{st} Difference | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |

2^{nd} Difference | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

x | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

y ′ | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |

y ″ = y ′ − x | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |

1^{st} Difference | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

identically equal to zero. However, since differences between normally distributed random variables are also normally distributed (albeit with higher variance), each of the samples in a difference row (as in the rows of _{0} tests for zero population mean and the alternative hypothesis H_{a} tests for non-zero population mean. This is the main idea behind the method of Finite Difference Regression―the replacement of the notion of an all zero difference row in the non-noisy case by the notion of a highly probable row of all zeros in the noisy case and usage of the statistical t-test to test for the zero population mean hypothesis.

While it is true that successive differences will produce normally distributed samples, not all the samples so produced are statistically independent of one another. For example as shown in _{2} they are not independent. Similarly, y ′ 2 and y ′ 3 are not independent because both depend upon y 3 . However, y ′ 1 and y ′ 3 having no point in common, are independent!

This observation can be generalized by the following process. Suppose I had K observations to begin with. If I were to produce successive differences by skipping over every other term, in the manner of a binary tree as shown in

that at every stage the number of independent samples is halved and therefore the degrees of freedom is one less than this number.

^{st} two columns of ^{rd} column and added to the values of y in the 4^{th} column. Thus, the 4^{th} column represents noisy values of the data y. A graph of this noisy data is shown in ^{st} to 5^{th}, computed in the same way as in

The last 6 rows of

Header titled n = Number of samples

Header titled x ¯ = Sample mean

Header titled s = Sample standard deviation

Header titled t = Test statistic = x − μ s / n = x s / n (substituting μ = 0 )

Header titled df = Degrees of freedom computed as detailed in Section 2.1

Header titled P-value = { 2 × Areatorightof t withgiven d f for t > 0 2 × Areatoleftof t withgiven d f for t < 0

One of the assumptions of the t-test is that the sample size n is large. Typically, n ≥ 30 is required. This assumption is violated in some of the columns of ^{rd} difference column (titled Diff 3). The plot in

The insignificant P-value in the 3^{rd} difference column implies that with high probability the order of the best fitting polynomial should be m = 2 , i.e. a quadratic polynomial is the best fit polynomial to the data in

In classical polynomial regression, R 2 values instead of P-values represent the goodness of fit. Using Microsoft Excel’s built-in program to generate R 2 values to polynomial fits, I generated

m | 1 (Linear) | 2 (Quadratic) | 3 (Cubic) | 4 | 5 |
---|---|---|---|---|---|

R^{2} value | 0.9403 | 0.9999 | 0.9999 | 0.9999 | 0.9999 |

^{2} values used in classical regression do not have the sensitivity that the P-values with t-tests possess. The clear drop that occurs in the P-values as shown in ^{2} values in ^{2} values to be fitted with every polynomial the Finite Difference Regression is a one-shot test that gives a clear estimate of the order of the best fitting polynomial by testing the significance of the probabilities (P-values). The first successive difference column whose P-value is less than the significance level (a belief threshold for the viable model) comes out to be the best fitting polynomial model.

Unlike classical regression, however, this method requires a large sample size n, preferably one that is a power of 2. Also, because the degrees of freedom get halved at each successive difference and the degrees of freedom have to remain positive, the highest order polynomial that can be fit by this method is of the order log 2 n . This is generally acceptable because the order of the best fitting polynomials should by definition be small constants. Other classical model order selection heuristics like AIC or BIC (see [

Once the order of the best fitting polynomial has been found using the t-test in combination with finite differences, I will evaluate the coefficients of the best fitting polynomial using the method of successive finite differences as detailed in Section 1.

In the non-noisy situation described in Section 1, the matter was simple: there was a difference row of all zeros and therefore the previous difference row was a constant that could be used to determine the coefficient of the highest degree. In the noisy case, the issue is complicated because there is no such row that consists of all zeros and therefore the previous difference row is not a constant. However, the intuitive idea behind the method of Finite Difference Regression (see the 2^{nd} paragraph of Section 2), led me to assume that the sample mean x ¯ would be an excellent choice for the previous difference row constant, since it is the unique number that minimizes the sum of squares of the residuals. From ^{rd} difference column, and therefore x ¯ = 4.29 (the sample mean of the 2^{nd} difference column in

In this section I prove that the coefficients estimated by the Finite Difference Regression method as detailed in Section 2.6 are unbiased and consistent. The properties of unbiasedness and consistency are very important for estimators (see [

Consider the following table that shows the symbolic formulas for the successive differences for the n observations y 1 , y 2 , ⋯ , y n :

For subsequent ease of notation, I introduce a change of variables z i = def y i + 1 for 0 ≤ i ≤ n − 1 . With this renaming/re-indexing,

For any m, 0 ≤ m ≤ n − 1 , the m^{th} Diff column consists of expressions in rows indexed from m through n − 1 . Using mathematical induction, the m^{th} Diff column can be shown to be:

If the underlying polynomial is an m^{th} order polynomial f ( x ) = a m x m + a m − 1 x m − 1 + ⋯ + a 1 x + a 0 , then in the case of noiseless observations, every row of ^{th} row in the m^{th} Diff column shown in

Diff 0 | Diff 1 | Diff 2 | ⋯ | |
---|---|---|---|---|

1^{st} Row | y 1 | |||

2^{nd} Row | y 2 | y 2 − y 1 | ||

3^{rd} Row | y 3 | y 3 − y 2 | y 3 − 2 y 2 + y 1 | |

⋯ | ⋯ | ⋯ | ⋯ | |

n^{th} Row | y n | y n − y n − 1 | y n − 2 y n − 1 + y n − 2 |

Diff 0 | Diff 1 | Diff 2 | ⋯ | |
---|---|---|---|---|

0^{th} Row | z 0 | |||

1^{st} Row | z 1 | z 1 − z 0 | ||

2^{nd} Row | z 2 | z 2 − z 1 | z 2 − 2 z 1 + z 0 | |

⋯ | ⋯ | ⋯ | ⋯ | |

(n − 1)^{th} Row | z n − 1 | z n − 1 − z n − 2 | z n − 1 − 2 z n − 2 + z n − 3 |

Diff m | |
---|---|

m^{th} Row | ( m 0 ) z m − ( m 1 ) z m − 1 + ( m 2 ) z m − 2 + ⋯ + ( − 1 ) m ( m m ) z 0 |

(m + 1)^{th} Row | ( m 0 ) z m + 1 − ( m 1 ) z m + ( m 2 ) z m − 1 + ⋯ + ( − 1 ) m ( m m ) z 1 |

⋯ | ⋯ |

k^{th} Row | ( m 0 ) z k − ( m 1 ) z k − 1 + ( m 2 ) z k − 2 + ⋯ + ( − 1 ) m ( m m ) z k − m |

⋯ | ⋯ |

(n − 1)^{th} Row | ( m 0 ) z n − 1 − ( m 1 ) z n − 2 + ( m 2 ) z n − 3 + ⋯ + ( − 1 ) m ( m m ) z n − 1 − m |

( m 0 ) z k − ( m 1 ) z k − 1 + ( m 2 ) z k − 2 + ⋯ + ( − 1 ) m ( m m ) z k − m = ( m 0 ) y k + 1 − ( m 1 ) y k + ( m 2 ) y k − 1 + ⋯ + ( − 1 ) m ( m m ) y k + 1 − m = ( m 0 ) f ( k + 1 ) − ( m 1 ) f ( k ) + ( m 2 ) f ( k − 1 ) + ⋯ + ( − 1 ) m ( m m ) f ( k + 1 − m ) = m ! a m (1)

In case of noisy observations, let each re-indexed observation z now be corrupted by a sequence of additive zero-mean independent identically distributed random variables N 0 , N 1 , ⋯ , N n − 1 , each with variance σ 2 . In other words:

E [ N i ] = 0 and E [ N i 2 ] = σ 2 for 0 ≤ i < n

E [ N i N j ] = E [ N i ] E [ N j ] = 0 for 0 ≤ i < j < n

where E [ ] denotes the expectation operator. (See for example [

This addition of random variables makes the z’s also random variables denoted now as Z , and so Z i = f ( i + 1 ) + N i for 0 ≤ i < n . In this case, the k^{th} row in the m^{th} Diff column shown in

R k = ( m 0 ) Z k − ( m 1 ) Z k − 1 + ( m 2 ) Z k − 2 + ⋯ + ( − 1 ) m ( m m ) Z k − m = ( m 0 ) N k − ( m 1 ) N k − 1 + ( m 2 ) N k − 2 + ⋯ + ( − 1 ) m ( m m ) N k − m + ( m 0 ) f ( k + 1 ) − ( m 1 ) f ( k ) + ( m 2 ) f ( k − 1 ) + ⋯ + ( − 1 ) m ( m m ) f ( k + 1 − m ) = ( m 0 ) N k − ( m 1 ) N k − 1 + ( m 2 ) N k − 2 + ⋯ + ( − 1 ) m ( m m ) N k − m + m ! a m (2)

where, the last step of Equation (2) follows from the identity in Equation (1).

Let the random variable A ^ m ( n ) denote the estimate of a m with the n observations. Then as described in Section 2.6, since there are ( n − m ) rows in the m^{th} Diff column, the sample mean of the R’s divided by m ! provides the required estimate. In other words,

A ^ m ( n ) = 1 m ! ( n − m ) ∑ k = m n − 1 R k (3)

Now, define the reduced row random variable Q k for the m^{th} Diff column shown in

Q k = def R k − m ! a m for m ≤ k ≤ n − 1 (4)

Then, substituting back into Equation (3):

A ^ m ( n ) = 1 m ! ( n − m ) ∑ k = m n − 1 Q k + 1 m ! ( n − m ) ∑ k = m n − 1 m ! a m = 1 m ! ( n − m ) ∑ k = m n − 1 Q k + m ! a m ( n − m ) m ! ( n − m ) = 1 m ! ( n − m ) ∑ k = m n − 1 Q k + a m

Which, upon rearranging yields:

A ^ m ( n ) − a m = 1 m ! ( n − m ) ∑ k = m n − 1 Q k (5)

Note that it is the statistics of the expression A ^ m ( n ) − a m on the left-hand side of equation (5) that will prove the properties of unbiasedness and consistency of the estimate A ^ m ( n ) . Also note that by definition (4) and from equation (2), the reduced row random variable Q k for the m^{th} Diff column can be explicitly written as:

Q k = ( m 0 ) N k − ( m 1 ) N k − 1 + ( m 2 ) N k − 2 + ⋯ + ( − 1 ) m ( m m ) N k − m (6)

Taking the expectation operator E [ ] on both sides of equation (6):

E [ Q k ] = E [ ( m 0 ) N k − ( m 1 ) N k − 1 + ( m 2 ) N k − 2 + ⋯ + ( − 1 ) m ( m m ) N k − m ] = ( m 0 ) E [ N k ] − ( m 1 ) E [ N k − 1 ] + ⋯ + ( − 1 ) m ( m m ) E [ N k − m ] = 0 (7)

The last step in equation (7) follows from the fact that the random variables N i are all zero-mean. Then, taking the expectation operator on both sides of equation (5), and using the result from Equation (7):

E [ A ^ m ( n ) − a m ] = 1 m ! ( n − m ) ∑ k = m n − 1 E [ Q k ] = 0 (8)

Hence, E [ A ^ m ( n ) − a m ] = 0 for all n. This proves the unbiasedness property of A ^ m ( n ) .

Since the random variable A ^ m ( n ) − a m has been shown to be of zero-mean by virtue of its unbiasedness property, the variance of this random variable is

simply E [ ( A ^ m ( n ) − a m ) 2 ] . I show the consistency property of the estimate A ^ m ( n ) by way of deriving an upper-bound for E [ ( A ^ m ( n ) − a m ) 2 ] and showing that this upper-bound vanishes to 0 in the limit n → ∞ .

Writing out the reduced row random variables Q k for each row k for the m^{th} Diff column from Equation (6) one gets the following series of ( n − m ) equations:

Q m = ( m 0 ) N m − ( m 1 ) N m − 1 + ( m 2 ) N m − 2 + ⋯ + ( − 1 ) m ( m m ) N 0

Q m + 1 = ( m 0 ) N m + 1 − ( m 1 ) N m + ( m 2 ) N m − 1 + ⋯ + ( − 1 ) m ( m m ) N 1

⋯

Q k = ( m 0 ) N k − ( m 1 ) N k − 1 + ( m 2 ) N k − 2 + ⋯ + ( − 1 ) m ( m m ) N k − m

⋯

Q n − 1 = ( m 0 ) N n − 1 − ( m 1 ) N n − 2 + ( m 2 ) N n − 3 + ⋯ + ( − 1 ) m ( m m ) N n − 1 − m

Summing up these ( n − m ) equations column by column:

∑ k = m n − 1 Q k = ( m 0 ) S m n − 1 − ( m 1 ) S m − 1 n − 2 + ( m 2 ) S m − 2 n − 3 + ⋯ + ( − 1 ) m ( m m ) S 0 n − 1 − m (9)

where for ( n − 1 − m ) ≤ i ≤ ( n − 1 ) and 0 ≤ j ≤ m define:

S j i = def ∑ k = j i N k (10)

Squaring both sides of Equation (10), and then taking expectations:

E [ ( S j i ) 2 ] = E [ ( ∑ k = j i N k ) 2 ] = E [ ∑ k = j i N k 2 + S C T ] = ∑ k = j i ( E [ N k 2 ] + E [ S C T ] ) = ∑ k = j i σ 2 = ( i − j + 1 ) σ 2 (11)

In equation (11) I have invoked the zero-mean, independent nature of the N’s to cause the “sum of cross-terms” term SCT to vanish.

Define a new sequence of ( m + 1 ) random variables T 0 , T 1 , ⋯ , T m as follows:

T i = def ( − 1 ) i ( m i ) S m − i n − 1 − i for 0 ≤ i ≤ m (12)

By squaring both sides of definition (12), and then taking expectations:

E [ T i 2 ] = E [ ( − 1 ) 2 i ( m i ) 2 ( S m − i n − 1 − i ) 2 ] = ( m i ) 2 E [ ( S m − i n − 1 − i ) 2 ] = ( m i ) 2 ( n − m ) σ 2 (13)

where, the final step in Equation (13) follows from Equation (11).

Using definition (12), Equation (9) can be rewritten as:

∑ k = m n − 1 Q k = T 0 + T 1 + ⋯ + T m (14)

Then as E [ ( T 0 + T 1 + ⋯ + T m ) 2 ] ≤ ( m + 1 ) ( E [ T 0 2 ] + E [ T 1 2 ] + ⋯ + E [ T m 2 ] ) (see Appendix 1, squaring both sides of Equation (14), and then taking expectations:

E [ ( ∑ k = m n − 1 Q k ) 2 ] = E [ ( T 0 + T 1 + ⋯ + T m ) 2 ] ≤ ( m + 1 ) ( E [ T 0 2 ] + E [ T 1 2 ] + ⋯ + E [ T m 2 ] ) = ( m + 1 ) ( n − m ) σ 2 ( ( m 0 ) 2 + ( m 1 ) 2 + ⋯ + ( m m ) 2 ) [ FromEquation ( 13 ) ] = ( m + 1 ) ( n − m ) σ 2 ( 2 m m ) [ Frombinomialidentity ( see [ 8 ] pg . 64 ) ] (15)

Finally, by squaring both sides of Equation (5), and then taking expectations I obtain the following inequality from inequality (15):

E [ ( A ^ m ( n ) − a m ) 2 ] = 1 ( m ! ) 2 ( n − m ) 2 E [ ( ∑ k = m n − 1 Q k ) 2 ] ≤ ( m + 1 ) ( 2 m ) ! σ 2 ( m ! ) 4 ( n − m ) (16)

where, the last step in (16) is a rewriting of the binomial coefficient ( 2 m m ) in terms of factorials and subsequent simplification.

It can be seen that the factor ( m + 1 ) ( 2 m ) ! / ( m ! ) 4 on the right hand side of inequality (16) is a super-exponentially decreasing function of m (above a constant) due to presence of the large ( m ! ) 4 factor in its denominator. This factor is evaluated for a few values of m in

E [ ( A ^ m ( n ) − a m ) 2 ] ≤ 4.5 × σ 2 ( n − m ) (17)

For the Finite Difference Regression method, m ≤ log 2 n (see Section 0 and the comments in Section 0). Thus, the right-hand-side denominator n − m ≥ n − log 2 n in inequality (17) and therefore,

E [ ( A ^ m ( n ) − a m ) 2 ] ≤ 4.5 × σ 2 ( n − log 2 n ) (18)

Taking limits on both sides of inequality (18),

lim n → ∞ E [ ( A ^ m ( n ) − a m ) 2 ] ≤ lim n → ∞ 4.5 × σ 2 ( n − log 2 n ) = 0 (19)

m | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|

( m + 1 ) ( 2 m ) ! / ( m ! ) 4 | 1.000 | 4.00 | 4.500 | 2.222 | 0.608 | 0.105 | 0.012 |

m | 7 | 8 | 9 | 10 | 11 | 12 | 13 |

( m + 1 ) ( 2 m ) ! / ( m ! ) 4 | 1.1 × 10^{−3} | 7.1 × 10^{−5} | 3.7 × 10^{−6} | 1.5 × 10^{−7} | 5.3 × 10^{−9} | 1.5 × 10^{−10} | 3.8 × 10^{−12} |

However, as E [ ( A ^ m ( n ) − a m ) 2 ] ≥ 0 , inequality (19) implies that:

lim n → ∞ E [ ( A ^ m ( n ) − a m ) 2 ] = 0 (20)

Then invoking Chebyshev’s inequality (see [

lim n → ∞ P r o b [ | A ^ m ( n ) − a m | > ε ] ≤ lim n → ∞ E [ ( A ^ m ( n ) − a m ) 2 ] ε 2 = 0 , for any ε > 0

This proves the consistency property of A ^ m ( n ) .

Inequality (16) and the values in

In this paper I have presented a novel polynomial regression method for uniformly sampled data points called the method of Finite Difference Regression. Unlike classical regression methods in which the order of the best fitting polynomial model is unknown and is estimated from the R^{2} values, I have shown how the t-test can be combined with the method of finite differences to provide a more sensitive and objective measure of the order of the best fitting polynomial. Furthermore, I have carried forward the method of Finite Difference Regression, reemploying the finite differences obtained during order determination, to provide estimates of the coefficients of the best fitting polynomial. I have shown that not only are these coefficients unbiased and consistent, but also that the asymptotic properties of the fit get better with increasing degrees of the fitting polynomial.

At least three other avenues of further research remain to be explored:

1) The polynomial coefficients obtained by this method do not minimize the sum of the squared residuals. Is there an objective function that they do minimize?

2) The classical regression methods work on non-uniformly sampled data sets. Extending this method to non-uniformly sampled data should be possible.

3) Automatically finding a good t-test significance level i.e. the precipitously low P-value that sets the order of the best fitting polynomial (as described in Section 2.4) remains an open problem. For this, heuristics in the spirit of AIC or BIC methods (see [

I thank Prof. Brad Lucier for his interest and for the helpful discussions and comments that made this a better paper. I am also grateful to Dr. Jan Frydendahl for encouraging me to pursue this research activity.

Banerjee, A. (2018) The Method of Finite Difference Regression. Open Journal of Statistics, 8, 49-68. https://doi.org/10.4236/ojs.2018.81005

To show that for any sequence of ( m + 1 ) random variables T 0 , T 1 , ⋯ , T m :

E [ ( T 0 + T 1 + ⋯ + T m ) 2 ] ≤ ( m + 1 ) ( E [ T 0 2 ] + E [ T 1 2 ] + ⋯ + E [ T m 2 ] )

The proof follows from a variance-like consideration. Given the random variables T 0 , T 1 , ⋯ , T m and any random variable T ¯ , the sum of squares random variable ∑ i = 0 m ( T i − T ¯ ) 2 is always non-negative. Its expected value E [ ∑ i = 0 m ( T i − T ¯ ) 2 ] is therefore always non-negative. Choosing the random variable T ¯ = def ∑ i = 0 m T i / ( m + 1 ) i.e. the “mean” of T 0 , T 1 , ⋯ , T m proves the inequality as follows:

0 ≤ E [ ∑ i = 0 m ( T i − T ¯ ) 2 ] = E [ ∑ i = 0 m ( T i 2 − 2 T ¯ T i + T ¯ 2 ) ] = E [ ∑ i = 0 m T i 2 ] − 2 E [ ∑ i = 0 m T ¯ T i ] + E [ ∑ i = 0 m T ¯ 2 ] = ∑ i = 0 m E [ T i 2 ] − 2 E [ T ¯ ∑ i = 0 m T i ] + E [ T ¯ 2 ∑ i = 0 m 1 ] = ∑ i = 0 m E [ T i 2 ] − 2 E [ T ¯ × ( m + 1 ) T ¯ ] + E [ T ¯ 2 × ( m + 1 ) ] ( fromdefn . of T ¯ )

= ∑ i = 0 m E [ T i 2 ] − 2 ( m + 1 ) E [ T ¯ 2 ] + ( m + 1 ) E [ T ¯ 2 ] = ∑ i = 0 m E [ T i 2 ] − ( m + 1 ) E [ T ¯ 2 ] = ∑ i = 0 m E [ T i 2 ] − ( m + 1 ) E [ ( ∑ i = 0 m T i m + 1 ) 2 ] ( fromdefn . of T ¯ ) = ∑ i = 0 m E [ T i 2 ] − ( m + 1 ) ( m + 1 ) 2 E [ ( T 0 + T 1 + ⋯ + T m ) 2 ]

Multiplying both sides of the above inequality by ( m + 1 ) and rearranging yields the required result.

x | y | Noise | y + Noise | Diff 1 | Diff 2 | Diff 3 | Diff 4 | Diff 5 |
---|---|---|---|---|---|---|---|---|

1 | 6 | −3.46 | 2.54 | |||||

2 | 13 | 11.08 | 24.08 | 21.53 | ||||

3 | 24 | −10.78 | 13.22 | −10.86 | −32.39 | |||

4 | 39 | −12.23 | 26.77 | 13.55 | 24.41 | 56.81 | ||

5 | 58 | −15.22 | 42.78 | 16.01 | 2.46 | −21.95 | −78.76 | |

6 | 81 | −20.56 | 60.44 | 17.65 | 1.64 | −0.82 | 21.13 | 99.89 |

7 | 108 | −18.25 | 89.75 | 29.31 | 11.66 | 10.02 | 10.84 | −10.29 |

8 | 139 | 0.81 | 139.81 | 50.06 | 20.75 | 9.09 | −0.94 | −11.78 |

9 | 174 | 2.70 | 176.70 | 36.88 | −13.18 | −33.92 | −43.01 | −42.07 |

10 | 213 | 65.28 | 278.28 | 101.58 | 64.70 | 77.87 | 111.80 | 154.81 |

11 | 256 | 14.08 | 270.08 | −8.20 | −109.78 | −174.47 | −252.34 | −364.14 |

12 | 303 | −21.65 | 281.35 | 11.27 | 19.47 | 129.25 | 303.72 | 556.06 |

13 | 354 | 2.26 | 356.26 | 74.91 | 63.64 | 44.17 | −85.07 | −388.79 |

14 | 409 | 31.42 | 440.42 | 84.16 | 9.25 | −54.39 | −98.57 | −13.49 |

15 | 468 | −4.20 | 463.80 | 23.37 | −60.79 | −70.04 | −15.65 | 82.92 |

16 | 531 | −11.58 | 519.42 | 55.62 | 32.25 | 93.04 | 163.08 | 178.72 |

17 | 598 | −4.56 | 593.44 | 74.03 | 18.41 | −13.84 | −106.88 | −269.95 |

18 | 669 | −52.78 | 616.22 | 22.77 | −51.26 | −69.66 | −55.82 | 51.06 |

19 | 744 | 40.58 | 784.58 | 168.37 | 145.59 | 196.85 | 266.51 | 322.33 |

20 | 823 | 40.27 | 863.27 | 78.69 | −89.68 | −235.27 | −432.12 | −698.63 |

21 | 906 | −1.21 | 904.79 | 41.52 | −37.17 | 52.51 | 287.78 | 719.90 |

22 | 993 | 41.62 | 1034.62 | 129.83 | 88.32 | 125.49 | 72.98 | −214.80 |

23 | 1084 | −4.49 | 1079.51 | 44.89 | −84.94 | −173.26 | −298.75 | −371.73 |

24 | 1179 | −8.48 | 1170.52 | 91.01 | 46.12 | 131.07 | 304.32 | 603.07 |

25 | 1278 | −11.62 | 1266.38 | 95.86 | 4.84 | −41.28 | −172.35 | −476.67 |

26 | 1381 | −55.64 | 1325.36 | 58.99 | −36.87 | −41.71 | −0.43 | 171.92 |

27 | 1488 | −13.86 | 1474.14 | 148.78 | 89.79 | 126.66 | 168.37 | 168.80 |

28 | 1599 | 72.21 | 1671.21 | 197.07 | 48.30 | −41.49 | −168.15 | −336.52 |

29 | 1714 | 35.99 | 1749.99 | 78.78 | −118.29 | −166.59 | −125.10 | 43.05 |

30 | 1833 | −46.35 | 1786.65 | 36.66 | −42.12 | 76.18 | 242.77 | 367.87 |

31 | 1956 | 4.35 | 1960.35 | 173.70 | 137.04 | 179.15 | 102.97 | −139.80 |

32 | 2083 | −6.83 | 2076.17 | 115.82 | −57.87 | −194.91 | −374.07 | −477.04 |

33 | 2214 | −5.17 | 2208.83 | 132.66 | 16.84 | 74.71 | 269.62 | 643.69 |

34 | 2349 | 10.94 | 2359.94 | 151.10 | 18.44 | 1.61 | −73.10 | −342.73 |

35 | 2488 | 32.31 | 2520.31 | 160.38 | 9.27 | −9.17 | −10.77 | 62.33 |

36 | 2631 | 15.10 | 2646.10 | 125.79 | −34.59 | −43.86 | −34.70 | −23.92 |
---|---|---|---|---|---|---|---|---|

37 | 2778 | 10.35 | 2788.35 | 142.24 | 16.45 | 51.04 | 94.91 | 129.60 |

38 | 2929 | −14.74 | 2914.26 | 125.91 | −16.33 | −32.79 | −83.83 | −178.74 |

39 | 3084 | 39.17 | 3123.17 | 208.91 | 83.00 | 99.33 | 132.12 | 215.95 |

40 | 3243 | 36.96 | 3279.96 | 156.80 | −52.12 | −135.12 | −234.45 | −366.57 |

41 | 3406 | −4.73 | 3401.27 | 121.31 | −35.49 | 16.63 | 151.74 | 386.19 |

42 | 3573 | −6.60 | 3566.40 | 165.13 | 43.82 | 79.31 | 62.68 | −89.06 |

43 | 3744 | 65.08 | 3809.08 | 242.68 | 77.55 | 33.74 | −45.57 | −108.25 |

44 | 3919 | −22.02 | 3896.98 | 87.90 | −154.77 | −232.33 | −266.06 | −220.49 |

45 | 4098 | 13.68 | 4111.68 | 214.70 | 126.80 | 281.57 | 513.90 | 779.96 |

46 | 4281 | −6.64 | 4274.36 | 162.67 | −52.03 | −178.82 | −460.39 | −974.29 |

47 | 4468 | −5.14 | 4462.86 | 188.50 | 25.83 | 77.86 | 256.68 | 717.07 |

48 | 4659 | −37.01 | 4621.99 | 159.13 | −29.37 | −55.20 | −133.06 | −389.74 |

49 | 4854 | 6.42 | 4860.42 | 238.43 | 79.30 | 108.67 | 163.88 | 296.94 |

50 | 5053 | −46.83 | 5006.17 | 145.75 | −92.68 | −171.98 | −280.66 | −444.53 |

51 | 5256 | 7.28 | 5263.28 | 257.11 | 111.36 | 204.04 | 376.03 | 656.68 |

52 | 5463 | −8.25 | 5454.75 | 191.47 | −65.64 | −177.00 | −381.04 | −757.07 |

53 | 5674 | 18.11 | 5692.11 | 237.36 | 45.89 | 111.53 | 288.53 | 669.57 |

54 | 5889 | −20.01 | 5868.99 | 176.89 | −60.48 | −106.37 | −217.90 | −506.42 |

55 | 6108 | 5.34 | 6113.34 | 244.35 | 67.46 | 127.94 | 234.30 | 452.20 |

56 | 6331 | 1.67 | 6332.67 | 219.33 | −25.02 | −92.48 | −220.42 | −454.72 |

57 | 6558 | 8.07 | 6566.07 | 233.40 | 14.07 | 39.09 | 131.58 | 351.99 |

58 | 6789 | 7.01 | 6796.01 | 229.94 | −3.46 | −17.54 | −56.63 | −188.21 |

59 | 7024 | −6.75 | 7017.25 | 221.24 | −8.69 | −5.23 | 12.31 | 68.94 |

60 | 7263 | 14.76 | 7277.76 | 260.51 | 39.27 | 47.96 | 53.19 | 40.88 |

61 | 7506 | −18.95 | 7487.05 | 209.29 | −51.22 | −90.48 | −138.44 | −191.63 |

62 | 7753 | −2.33 | 7750.67 | 263.61 | 54.32 | 105.54 | 196.03 | 334.47 |

63 | 8004 | 24.70 | 8028.70 | 278.03 | 14.42 | −39.91 | −145.45 | −341.47 |

64 | 8259 | 57.11 | 8316.11 | 287.41 | 9.38 | −5.04 | 34.87 | 180.32 |

n | 64 | 63 | 62 | 61 | 60 | 59 | ||

x ¯ | 2833.75 | 131.96 | 4.29 | 0.68 | −1.03 | 1.93 | ||

s | 2517.12 | 83.47 | 63.10 | 113.72 | 211.20 | 399.59 | ||

t | 9.01 | 12.55 | 0.54 | 0.05 | −0.04 | 0.04 | ||

df | 63 | 31 | 15 | 7 | 3 | 1 | ||

P-value | 1.0000 | 1.0000 | 0.3996 | 0.0362 | 0.0278 | 0.0236 |