In this article I present a nice formula for the regression line though a set of points in a plane.
Given are points (xi, yi)...where i = 1,2,...,n
Asked the line y = ax + b which has the least deviation with these points.
A commom measure for the deviation is the sum of squares of the differences:
in case of n points.
For point i we have :
Before continuing, first some definitions and rules:
S xi = x1+ x2 + .... + xn
S (xi + yi) = S xi + S yi
application of the rules:
The formula's for a and b of the regression line y = ax + b
if c is a constant:
S cxi = c S xi
S c = n c
from the average we conclude:
Function f(a,b) of the sum of the squared deviations of points 1..n is:
f(a,b) is first differentiated to a , then to b.
differentiation to a:
f 'a(a,b) =
differentiation to b:
For the best fit, both derivatives must be zero.
|Σ ||(yi − (a xi + b)) · −xi|
This yields the following system of equations::
from ....2) we see
|Σ ||(xi yi − a xi 2 − b xi)|
− a − b n = 0
substitute result for b at ........1)
− − b = 0
b = − a
Formally, we have found the formula's for a and b.
|Σ ||(xi yi − a xi 2 − ( − a ) xi)|
|Σ ||(xi yi − a xi 2 − xi + a xi)|
− a − + a = 0
− a ( − ) − = 0
a ( − ) = −
Above value of a can be substituted at .......3) to know b.
However, with some manipulation the formula may be converted into a more elegant form.
We separately attack nominator and denominator.
1. the nominator
2. the denominator
please look [here] for an article about the best polynomial through a set of points.
It is a nice application of linar algebra.