Texto en-línea Libro de texto Tutorial Ejercisios English
 Matemáticas finitas resumen del tema: regresión lineal y exponencial

Ya hemos visto como ajustar una recta a un conjunto de dos puntos de datos: Se calcula la ecuación de la recta que pasa por dos puntos dados. (Vea la resumen de funciones para algunos ejemplos.) Sin embargo, por lo general se tiene más que dos puntos de datos, y raras veces estan todos en una sola recta. El problema es que hallar la recta que ajusta los datos tan cerca que posibile.

### Ejemplo

Emezamos intentando contsruir una función lineal de demanda. Suponga que su investigación de mercado mostra las siguientes estadisticas de venta para casas de varios precios durante el eño pasado:

 Precio (Miles de dólares) 160 180 200 220 240 260 280 Ventas de nueva casas este año 126 103 82 75 82 40 20

We would like to use these data to construct a demand function for the real estate market. (Recall that a demand function gives demand y, measured here by annual sales, as a function of unit price, x.) Here is a plot of y versus x.

The data definitely suggest a straight line, more-or-less, and hence a linear relationship between p and q. Here are several possible "straight line fits."

Q Which line best fits the data?
A We would like the sales predicted by the best-fit line (predicted values) to be as close to the actual sales (observed values)as possible. The differences between the predicted values and the observed values appear as the vertical distances shown in the figure below.

Q Since we want the vertical distances to be as small as possible, why can't we set them all to zero and solve for the slope and intercept of the straight line?
A If this were possible, then there would be a straight line that passes through all the data points. A look at the graph shows that this is not the case.

Q Then why not find the line that minimizes all the vertical distances?
A This is not possible either. The line that minimizes the first two distances is the line that passes through the first two data points, since it makes the distances 0. But this line certainly does not minimize the distance to the third point. In other words, there is a trade-off: making some distances smaller makes others larger.

Q So what do we do?
A Since we cannot minimize all of the distances, we minimize some reasonable combination of them. Now, one reasonable combination of the distances would be their sum, but that turns out the be difficult to work with (because distances are measured in terms of absolute values). Instead, we use the sum of the squares of the distances (no absolute values required). The line that minimizes this sum is called the best fit line, regression line, or least squares line associated with the given data.

Q How do we obtain the equation of the best fit line?
A Following is the formula for the best fit straight line. To justify it requires some calculus. If you know about partial derivatives, consult the chapter on the calculus of several variables in Calculus Applied to the Real World for a detailed explanation.

Regression (Best Fit) Line

The best fit line associated with the n points (x1, y1), (x2, y2), . . . , (xn, yn) has the form

y = mx + b
where
slope = m= n(xy) - (x)(y)n(x2) - (x)2

intercept = b= y - m(x)n
Here, means "the sum of." Thus
xy = sum of products = x1y1 + x2y2 + . . . + xnyn
x = sum of x-values = x1 + x2 + . . . + xn
y = sum of y-values = y1 + y2 + . . . + yn
x2 = sum of squares of x-values = x12 + x22+ . . . + xn2

Using the formula above is easy, as the following example shows.

Example 1 Computing a Regression Line by Hand

Find the least squares line associated with the following data:

 x 1 2 3 4 y 1.5 1.6 2.1 3

Solution In order to apply the formula, it is best to organize the data in a table as shown.

To complete the table, do the following:

1. Fill in the values for xy and x2 in each row and press "Check."
2. When all your entries are correct, press "Sum" to compute the sum in each column.
3. To see the correct values, press "Help."
 x y xy x2 1 1.5 2 1.6 3 2.1 4 3.0 x = 10 y = 8.2 xy = x2 =

Substituting the correct values from the above table into the formula gives

slope = m= n(xy) - (x)(y)n(x2) - (x)2
=
4(23) - (10)(8.2)

4(30) - 102
= 0.5

intercept = b= y - m(x)n
=
8.2 - (0.5)(10)

4
= 0.8

Thus our least squares line is

y = 0.5x + 0.8.

Before we go on... Here is a plot the data points and the least squares line.

Notice that the line doesn't pass through even one of the original points, and yet it is the straight line that best approximates them.

Let us now return to the data on demand for real estate with which we began this topic.

Example 2 Demand for Homes

Find a linear demand equation that best fits the following data, and use it to predict annual sales of homes priced at \$140,000.

 x = Price (Thousands of \$) \$160 \$180 \$200 \$220 \$240 \$260 \$280 y = Sales of New Homes This Year 126 103 82 75 82 40 20

Solution Here is the table we use to organize the calculations.

 x y xy x2 160 126 20,160 25,600 180 103 18,540 32,400 200 82 16,400 40,000 220 75 16,500 48,400 240 82 19,680 57,600 260 40 10,400 67,600 280 20 5,600 78,400 Sums x = 1,540 y = 528 xy = 107,280 x2 = 350,000

Substituting these values in the formula gives (n = 7)

slope = m= n(xy) - (x)(y)n(x2) - (x)2
=
7(107,280) - (1,540)(528)

7(350,000) - 1,5402
-0.7929

intercept = b= y - m(x)n
=
528 -(- 0.7928571429)(1,540)

7
249.9

Notice that we used the most accurate value, m = -0.7928571429, that we could obtain on our calculator in the formula for b rather than the rounded value -0.7929. This illustrates the following important general guideline:

When calculating, never round intermediate results. Rather, use the most accuate results obtainable, or have your calculator store them for you.

Thus our least squares line is

y = -0.7929x + 249.9.

We can now use this equation to predict the annual sales of homes priced at \$140,000, as we were asked to do. Remembering that x is the price in thousands of dollars, we set x = 140, and solve for y, getting y 139. Thus our model predicts that approximately 139 homes will have been sold in the range \$140,000-\$159,000.

Before we go on... We must remember that these figures were for sales in a range of prices. For instance, it would be extremely unlikely that 139 homes would have been sold at exactly \$140,000. On the other hand, it does predict that, were we to place 139 homes on the market at \$140,000, we could expect to sell them all

Here is the original data, together with the least squares line.

Q If the given data points all happen to lie on a straight line, is this the line we get by the best fit method?
A Yes. This has the following implication: you can use linear regression on a graphing calculator to check your calculations of the equation of a straight line passing through two specified points.

Q If the given points do not lie on a straight line, is there a way we can tell how far off they are from lying on a straight line?
A There is a way of measuring the "goodness of fit" of the least squares line, called the coefficient of correlation. This is a number r between -1 and 1. the closer it is to -1 or 1, the better the fit. For an exact fit, we would have r = -1 (for a negative slope line) or r = 1 (for a positive slope line). For a bad fit, we would have r close to 0. The figure below shows several collections of data points with best-fit lines and corresponding values of r.

The correlation coefficient can be calculated with the following formula. (To justify this formula requires a fair knowledge of statistics, so we shall not attempt to do so here.)

Coefficient of Correlation

Coefficient of Correlation = r=
 n(xy) - (x)(y) [n(x2) - (x)2]0.5 [n(y2) - (y)2]0.5

### 2. Best Fit Exponential Curve (Regression Exponential Curve)

Q Now we know how to fit a straight line to given data. What about an exponential curve, of the form

q = Art ?
A The idea is to convert an exponential curve to a linear one, using the logarithm, as follows.

q = Art,

and take the logarithm of both sides:

log(q) = log(Art).

The properties of logarithms give

log(q) = log(A) + log(rt)

or

log(q) = log(A) + t log(r).

This expresses log(q) as a linear function of t, with

slope = m = log(r)
Intercept = b = log(A).

Therefore, if we find the best-fit line using log(q) as a function of t, the slope and intercept will be given as above, and so we can obtain r and A by

r = 10m
A = 10b.

To summarize,

 Exponential Regression To obtain a best-fit exponential curve of the form q = Art: Take y = log(q), and x = t. Find the regression line y = mx + b for the data (x, y). The desired exponential model is q = Art, where r = 10m, and A = 10b.

Example 3 Sales of Compaq

Revenues from sales of Compaq computers are shown in the following table, where x represents time in years since 1990.* Obtain an exponential regression model for the data.

 t = Year (1990 = 0) 0 2 4 7 R = Revenue (\$ billion) 3 4 11 25

* Data are rounded. Source: Company Reports/The New York Times, January 27, 1998, p. D1.

Solution Since we need to model log(R) as a linear function of t, we first make a table with x = t and y = log(R), and then calculate the regression line, y = mx + b.

 x (= t) 0 2 4 7 y (= log(R)) 0.477121 0.60206 1.04139 1.39794

Instead of doing this calculation by hand as we did in the above examples, you can do it automatically using the on-line regression utility. Just enter the x- and y-values in the table, and press the "y = mx+b" button. (Yes, that utility does exponential regression as well, but we would like you to know how it works!)

The linear regression model we obtain is

y = 0.13907x + 0.42765.

Thus, the desired exponential model is

R = Art,
where r = 10m = 100.13907 1.3774, and A = 100.42765. 2.6770.

This gives our revenue model as

R = 2.6770 (1.3774)t.

Before we go on... Go to the on-line regression utility, enter the original data (before you took the logarithms) and press the "y = a(b^x)" button. What do you find?

### 3. Other Forms of Regression

At the on-line regression utility, you can also find best-fit curves of the following forms:

 y = ax2 + bx + c Quadratic regression y = ax3 + bx2 + cx + d Cubic regression

On the TI-83, you will find all of these, as well as the following

 y = ax4 + bx3 + cx2 + dx + e Quartic regression y = axb Power regression y = asin(bx+c) + d Sine regression

Last Updated:February, 1998