8.4: Measures of Dispersion

This tutorial: Part A: Variance and Standard Deviation of a Set of Scores
Next tutorial: Part B: Variance and Standard Deviation of a Random Variable

Based on Section 8.4 in Finite Mathematics and Finite Mathematics and Applied Calculus

Variance and Standard Deviation of a Set of Scores

Consider the following two sets of scores:

Both these sets have the same mean (50), but the second set is a lot more widely dispersed ("scattered") than the first.
Set 1Set 2

Q How do we measure the dispersion of a set of scores?
A Here's how to do it graphically: First, measure the distance of each point from the mean, square each distance, and then take the average of all those squared distances. This measurement is called the population variance of the set of scores.

Sum of squared distances:
102 + 02 + 102 + 102 + 10 2 + 02 = 400
Population variance:
400/6 66.67
Sum of squared distances:
502 + 502 + 252 + 252 + 30 2 + 302 = 8050
Population variance:
8050/6 1341.67

Q Is there a formula to measure this?
A Actually, there are a couple. First, notice that the distance of a typical score x to the mean is given by subtracting it from the mean: x - . Therefore, the square distance is (x - )2. Then we can get the average of these (the population variance) by adding and dividing by the number of points.

Population Variance and Standard Deviation

The population variance (it is written as 2) is the average square distance from the mean:

    2=
    (x1 - )2 + (x2 - )2 + ... + (xn - )2

    n

The population standard deviation is the square root of the population variance, and is written as .

Example
The population mean of the scores {1, -1, 2, 3} is

    =
    1 - 1 + 2 + 3

    4
    =
    5

    4
    = 1.25
Its population variance and standard deviation are given by:
    2=
    (1 - 1.25)2 + (-1 - 1.25)2 + (2 - 1.25)2 + (3 - 1.25)2

    4
    =
    (-0.25)2 + (-2.25)2 + 0.752 + 1.752

    4
    =
    8.75

    4
    = 2.1875
    Population variance
    =

    2.1875
       1.4790
    Population standard deviation

Here is one for you to try. The values you enter must be accurate to 4 dcimal places:

    Population: 0, 2, 4, 4, 10    
    = 2 = =
       


Sample Variance and Standard Deviation

The sample variance is the statistic we use when using a sample of scores instead of all of them (the whole population). The sample variance is written as s2, and is computed in almost the same way as the population variance except that, instead of dividing by n, we divide by n-1:

    s2=
    (x1 - x)2 + (x2 - x)2 + ... + (xn - x)2

    n - 1

Notice that the sample mean x is computed in exactly the same way as the population mean -- we just use a different symbol for it.

The sample standard deviation is the square root of the sample variance, and is written as s.

Example
The mean of the sample {1, -1, 2, 3} is

    x=
    1 - 1 + 2 + 3

    4
    =
    5

    4
    = 1.25
Its sample variance and standard deviation are given by:
    s2=
    (1 - 1.25)2 + (-1 - 1.25)2 + (2 - 1.25)2 + (3 - 1.25)2

    4-1
    =
    (-0.25)2 + (-2.25)2 + 0.752 + 1.752

    3
    =
    8.75

    3
    2.91667
    Sample variance
    s=

    2.91667
       1.7078
    Sample standard deviation

Here is one for you to try: the same set of data you used above, but this time treated as a sample. The values you enter must be accurate to 4 dcimal places:

    Sample: 1, 1, 2, 3   
    x = s2 = s =
       

Q Why do we divide by n when computing the variance and standard deviation for a population, but by n-1 when doing it for a sample?
A When we have a sample, we do not have all the data in the population, but we would still like the sample variance to approximate the population variance. We can interpret this to mean that we would like the average of a very large number of calculations of sample variances for different samples to be very close to the population variance. It turns out that the formula for s2 given above is the formula that accomplishes this task. The sample variance s2 as we have defined it is referred to by statisticians as an "unbiased estimator" of the population variance 2; if, instead, we divided by n in the formula for s2, we would, on average, tend to underestimate the population variance. (See the on-line text on Sampling Distributions for more discussion of unbiased estimators.)

A Tabular Method for Calculation of Variance & Standard Deviation

Here is a nice way or organizing the data we used in computing the variance and standard deviation of the sample 1, -1, 2, 3:

xx - x (x - x)2
11 - 1.25 = -0.25 (-1.25)2 = 0.0625
-1-1 - 1.25 = -2.25(-2.25)2 = 5.0625
22 - 1.25 = 0.750.752 = 0.5625
33 - 1.25 = 1.751.752 = 3.0625
508.75

Ecneret Keane, the Utarek, Mars Minister of Health, is concernend about reports of mercury contamination in Martian striped sandworm (a staple of the Martian diet). The Martian Environmental Protection Agency has determined a safe level of less than 5.8 micrograms of mercury per liter of blood for humans, so Ec Keane decided to conduct tests of blood mercury levels on 6 randomly chosen Utarek (human) citizens.

His measurements (in mcg/liter) are: 5.5, 5.8, 6.0, 6.2, 5.5, 5.8, 5.8

Complete the following table in order to compute the mean and sample standard deviation of mercury blood levels. Check your calculations after completing each column.

Ecneret Keane
Utarek Minister of Health

x x - x (x - x)2
5.5
5.8
6
6.2
5.5
5.8
Sum: Sum:
    x:     s2:

Q OK. We know how to calculate the standard deviation, which is a measure of dispersion. Is there anything more specific that it tells us?
A There are two ways we can use the standard deviation to get specific information about a set of scores. One of these ways, called the empirical rule, (see blow) gives us a great deal of information, but only applied to distributions of scores that are both bell-shaped and symmetric.

Q What does it mean for a distibution of scores to be bell-shaped?
A It means that if you group the scores into suitable measurement classes (see the tutorial for Section 8.1) and then graph the frequencies or probabilities, you get a nice bell-shaped symmetric curve:

Bell-shaped and symmetricNot symmetricNot bell-shaped
Empirical Rule

For a set of data whose frequency distribution is bell-shaped and symmetric, the following is true:

  • Approximately 68% of the scores fall within 1 standard deviation of the mean (within the interval [x - s, x + s] for samples or [ - , + ] for populations).
  • Approximately 95% of the scores fall within 2 standard deviations of the mean (within the interval [x - 2s, x + 2s] for samples or [ - 2, + 2] for populations).
  • Approximately 99.7% of the scores fall within 3 standard deviations of the mean (within the interval [x - 3s, x + 3s]for samples or [ - 3, + 3].
    Almost all of the scores lie within 3 standard deviations of the mean.

Examples

1. If the mean of a sample with a bell-shaped symmetric distribution is 20 with standard deviation s = 2, then approximately 95% of the scores lie in the interval [20-2(2), 20 + 2(2)] = [16, 24]. In other words, approximately 95% of the scores lie between 16 and 24.

Note that this also means that approximately 5% of the scores lie outside this range: approximately 2.5% are above 24 and approximately 2.5% are below 16 (since the distribution is symmetric.)

2. The distribution of blood mercury levels for citizens of Utarek, Mars is actually bell-shaped and symmetric, with a mean of 3.8 mcg/liter and a standard devation of 0.3 mcg/liter.

The probability that a randomly selected Utarek citizen has a blood mercury level below 2.9 mcg/liter is approximately      

Q The Empirical Rule rtells us how to interpret the sandard devaition for bell-shaped symmetric distributions. What about distributions that re not bell-shaped and symmetric?
A In cases where the distribution is not nice, we cannot be nearly so accurate. What we can always say is the following:
A

Chebyshev's Rule

For an arbitrary set of data (not necessarily bell-shaped or symmetric) the following is true:

  • At least 3/4 of the scores fall within 2 standard deviations of the mean (within the interval [x - 2s, x + 2s] for samples or [ - 2, + 2] for populations).
  • At least 8/9 of the scores fall within 3 standard deviations of the mean (within the interval [x - 3s, x + 3s]for samples or [ - 3, + 3].
  • At least 15/16 of the scores fall within 4 standard deviations of the mean (within the interval [x - 4s, x + 4s]for samples or [ - 4, + 4].
    . . .
  • In general, at least (n2-1)/n2 of the scores fall within n standard deviations of the mean (within the interval [x - ns, x + ns]for samples or [ - n, + n].

Examples

1. If the mean of a sample is 20 with standard deviation s = 2, then at least 3/4, or 75%, of the scores lie in the interval [20-2(2), 20 + 2(2)] = [16, 24]. In other words, approximately 95% of the scores lie between 16 and 24.

Note that this also means that at most 25% of the scores lie outside this range. We cannot say that at most 12.5% are above 24 and at most 12.5% below 16 unless we know that the distribution is symmetric.

2. The distribution of blood mercury levels for citizens of the Kadmus Urbyne, (within Utarek, Mars) is not known to be bell-shaped and symmetric. It has a mean of 3.8 mcg/liter and a standard devation of 0.3 mcg/liter.

The probability that a randomly selected Kadmus citizen has a blood mercury level below 2.9 mcg/liter is    

Now try some of the exercises in Section 8.4 of Finite Mathematics and Finite Mathematics and Applied Calculus. However, to be able to do all the exercises, you will need to go on to the next tutorial, which deals with random variables rather than sets of scores.

Top of Page

Last Updated: December, 2003
Copyright © 1999, 2003 Stefan Waner and Steven R. Costenoble