|
Introduction
In this article I present a nice formula for the regression line though a set of points in a plane.
Given are points (xi, yi)...where i = 1,2,...,n
Asked the line y = ax + b which has the least deviation with these points.
A commom measure for the deviation is the sum of squares of the differences:
in case of n points.
For point i we have :
Before continuing, first some definitions and rules:
Definition
sum:
S xi = x1+ x2 + .... + xn
average:
=
Arithmetic rules
S (xi + yi) = S xi + S yi
if c is a constant:
S cxi = c S xi
and
S c = n c
from the average we conclude:
= n
application of the rules:
The formula's for a and b of the regression line y = ax + b
Function f(a,b) of the sum of the squared deviations of points 1..n is:
f(a,b) is first differentiated to a , then to b.
differentiation to a:
f 'a(a,b) =
2 Σ | (yi − (a xi + b)) · −xi |
differentiation to b:
For the best fit, both derivatives must be zero.
This yields the following system of equations::
Σ | (xi yi − a xi 2 − b xi) | = 0
...................1)
= 0
....................2)
from ....2) we see
− a − b n = 0
− − b = 0
b = − a
................3)
substitute result for b at ........1)
Σ | (xi yi − a xi 2 − ( − a ) xi) | = 0
Σ | (xi yi − a xi 2 − xi + a xi) | = 0
− a − + a = 0
− a ( − ) − = 0
a ( − ) = −
a =
Formally, we have found the formula's for a and b.
Above value of a can be substituted at .......3) to know b.
However, with some manipulation the formula may be converted into a more elegant form.
We separately attack nominator and denominator.
1. the nominator
2. the denominator
summarizing:
Note:
please look [here] for an article about the best polynomial through a set of points.
It is a nice application of linar algebra.
|
|