(This topic is also in Section 1.4 in *Finite Mathematics*, *Applied Calculus* and *Finite Mathematics and Applied Calculus*)

For best viewing, adjust the window width to at least the length of the line below.

Linear regression is a method of finding the linear equation that comes closest to fitting a collection of data points. For example, here is a some data showing the number of households in China with cable TV.*

Year (x) (x = 0 represents 2000) | 0 | 1 | 2 | 3 |

Households with Cable (y) (Millions) | 68 | 72 | 80 | 83 |

If we plot these data, we get the following graph.

Although no straight line passes exactly through these points, there are many straight lines that pass *close* to them. Here is one of them.

**Q **How good an approximation is the line to the data?
**A **Suppose that we used the line rather than the data points to estimate the number of households with cable. Then we would get slightly different values from the original observed values shown above. These values are called **predicted values**.

Year (x) | 0 | 1 | 2 | 3 |

Observed value of y | 68 | 72 | 80 | 83 |

Predicted value of y | 62 | 70 | 78 | 86 |

The better our choice of line, the closer the predicted values will be to the observed values. The difference between the predicted value and the observed value is called the **residue.**

Residue = Observed Value - Predicted ValueOn the graph, the residues measure the vertical distances between the (observed) data points and the line

Year (x) | 0 | 1 | 2 | 3 |

Observed value of y | 68 | 72 | 80 | 83 |

Predicted value of y | 62 | 70 | 78 | 86 |

Residue | 68-62 = 6 | 72-70 = 2 | 80-78 = 2 | 83-86 = -3 |

Notice that some residues are positive and others negative. If we add up the *squares* of the residues, we get a measure of how well the line fits, called the **sum-of-squares error.**

Residues, Sum-of-Squares Error (SSE)
A Residue = Observed value - Predicted value The
The smaller SSE, the better the approximating function fits the data.
Example
Referring to the above example, the sum-of-squares error is SSE = 6 |

**Q **OK. So what is the regression line?
**A **The regression line is the line that gives the *smallest possible value of SSE.*.

**Q **How do we find this line?
**A **There are a variety of ways of finding it, since most forms of technology have built-in regression routines. Here is one on this web site. However, it is nice to be able to compute the regression line by hand, and this is what we do next.

Computing the Regression Line
The y = mx + bwhere m and b are computed as follows. Here, "Σ" means "the sum of." Thus, for example, Σx = Sum of the x-values = xOn the other hand, (Σx)Finally, n = Number of data points |

The easiest way to compute these values is by using a table, as we show in the following exercise, where we will compute the regression line for the above data.

There is also material in the book about the **correlation coefficient r,** which, like SSE, is a way of measuring the goodness of fit of a line to the given data. However, it is more useful than SSE for comparing the goodness of fit of different lines to different data. You will need to read the material on computing r before you can answer all of the exercises in Section 1.5 of the textbook. Also, press "Review Exercises" on the sidebar to see a collection of exercises that covers the whole of Chapter 1.

Copyright © 2001, 2007 Stefan Waner