Summary of Chapter 8 in
Finite Mathematics /
Finite Mathematics & Applied Calculus
Topic: Random Variables and Statistics

              Student Home
True/False Quiz
Review Exercises
On-Line Tutorial
Summary Index
Everything for Finite Math
Everything for Calculus
Everything for Finite Math & Calculus
Chapter 7 Summary

Tools: Histogram & Probability Distribution Generator | Binomial Distribution Tool (Bernoulli Trials) | Normal Distribution Utility | Normal Distribution Table

Random Variable | Probability Distribution | Bernoulli Trials and the Binomial Distribution | Measures of Central Tendency: Mean, Median, and Mode of a Set of Data | Mean, Median, and Mode of a Random Variable | Measures of Dispersion | Variance and Standard Deviation of a Random Variable | Interpreting Standard Deviation | Statistics of a Binomial Distribution | Continuous Random Variable | Uniform Distribution | Normal Random Variable | More on Normal Distributions

 
Random Variable

A random variable X is a rule that assigns a numerical value to each outcome in the sample space of an experiment.

A discrete random variable can take on specific, isolated numerical values, like the outcome of a roll of a die, or the number of dollars in a randomly chosen bank account.

A continuous random variable can take on any values within a continuum or an interval, like the temperature in Central Park, or the height of an athlete in centimeters.

Discrete random variables that can take on only finitely many values (like the outcome of a roll of a die) are called finite random variables.

Example

1. Finite Random Variable

In an experiment to simulate tossing three coins, let X be the number of heads showing after each toss. X is a finite random variable that can assume the three values: 0, 1, 2, and 3.

Coin 1:
Coin 2:
Coin 3:
Value of X:

2. Infinite Discrete Random Variable

Roll a die until you get a get a 6; X = the number of times you roll the die.

The possible values for X are 1, 2, 3, 4, ... (If you are extremely unlucky, it might take you a million rolls before you get a 6!)

3. Continuous Random Variable

Measure the length of an object; X = its length in cm.

Top of Page
Probability Distribution

The probability P(X = x) is the probability of the event that X = x. Similarly, the probability that P(a < X < b) is the probability of the event that X lies between a and b.

These probabilities may be estimated, empirical, or abstract (see Chapter 7 in Finite Mathematics or the Probability Summary for a discussion of these estimated, empirical, and abstract of probability.)

For a finite random variable, the collection of numbers P(X = x) as x varies is called the probability distribution of X, and it is useful to graph the probability distribution as a histogram.

Press here for an on-line utility that will generate any probability distribution and also show you the histogram.

Example

Estimated Probability Distribution

Let X be the number of heads showing after each toss of three coins (see above). The following simulation shows the estimated probability distribution (relative frequency distribution) of X.

Coin 1:
Coin 2:
Coin 3:
Value of X
N
Estimated Prob. Distribution
0123

Empirical Probability Distribution

For the experiment above, the empirical probability distribution is given by the following histogram.

The empirical probability distribution is given by counting the number of combinations that give 0, 1, 2, or 3 heads.

Top of Page
Bernoulli Trials and the Binomial Distribution

A Bernoulli trial is an experiment with two possible outcomes, called success and failure. Each outcome has a specified probability: p for success and q for failure (so that p+q = 1).

If we perform a sequence of n independent Bernoulli trials, then some of them result in success and the rest of them in failure. The probability of exactly x successes in such a sequence is given by

P(exactly x successes in n trials) = C(n,x)pxqn-x.       Note: q = 1-p

For an on-line utility which allows you to compute and graph the probability distribution for Bernoulli trials, press here.

If X is the number of successes in a sequence of n independent Bernoulli trials, with probability p for success and q for failure, then X is said to have a binomial distribution. This distribution is given by the above formula

P(X = x) = C(n,x)pxqn-x
for x running from 0 to n.

For an on-line utility which allows you to compute and graph the probability distribution for Bernoulli trials, press here.

Examples Suppose we toss an unfair coin, with p = P(heads) = 0.8 and q = P(tails) = 0.2, three times. Take X = number of heads. Then the distribution is given by

X0123
Formula0.23 3(0.8)1(0.2)2 3(0.8)2(0.2)1 0.83
Probability0.0080.0960.3840.512

The histogram density function given above results from to tossing a fair coin three times, and is also a binomial distribution.

Estimated Binomial Probability Distribution

Here is a simulation of the above coin-tossing experiment.

N
Frequency Distribution
0123
Estimated Probability Distribution
0123

Top of Page
Measures of Central Tendency:
Mean, Median, and Mode of a Set of Data

A collection of specific values, or "scores", x1, x2, . . ., xn of a random variable X is called a sample. If {x1, x2, . . ., xn} is a sample, then the sample mean of the collection is


    x
    =
    x1 + x2 + . . .+ xn

    n
    =
    ∑xi

    n
    ,
where n is the sample size: the number of scores.

The sample median m is the middle score (in the case of an odd-size sample), or average of the two middle scores (in the case of an even-size sample), when the scores in a sample are arranged in ascending order.

A sample mode is a score that appears most often in the collection. (There may be more than one mode in a sample.)

If the sample x1, x2, . . ., xn we are using consists of all the values of X from an entire population (for instance, the SAT of every graduating high school student who took the test), we refer to the mean, median, and mode above as the population mean, median, and mode.

We write the population mean as μ instead of .

Example

Consider the following collection of scores:

    11.5, 3, 5.5, 0.5, 3, 10, 2.5, 4

The sum is ∑i = 40, and n = 8, so that


    x
    =
    SX

    n
    =
    40

    8
    =5.

To get the sample median, arrange the scores in increasing order, and select the middle scores (two of them since n is even):

    0.5, 2.5, 3, 3, 4, 5.5, 10, 11.5,

The sample median is the average, 3.5, of these middle scores.

Since the score 3 appears most often, the sample mode is 3.

Top of Page
Mean, Median, and Mode of a Random Variable

If X is a finite random variable taking on values x1, x2, . . ., xn, the mean or expected value of X, written μ, or E(X), is

μ = E(X) = x1.P(X = x1) + x2.P(X = x2) + . . . + xn.P(X = xn)
= ∑ (xi.P(X = xi).
The median of X is the least number m such that
P(X ≤ m) ≥ 1/2 and P(X ≥ m) ≥ 1/2.
(This definition holds for continuous variables as well.)

A mode of X is a number m such that P(X = m) is largest. This is the most likely value of X or one of the most likely values if X has several values with the same largest probability. For a continuous random variable, a mode is a number m such that the probability density function is highest at x = m.

The expected value, median, and mode of a random variable are the average, median, and mode we expect to get if we have a large number of X-scores. Conversely, if all we know about X is a collection of X-scores, then the average, median and mode of those scores are our best estimates of the expected value, median and mode of X.

Example

Suppose we toss an unfair coin, with p = P(heads) = 0.8 and q = P(tails) = 0.2, three times. Take X = number of heads. Then the distribution (see above) is given by

x0123
P(x)0.0080.0960.3840.512

The expected value of X is given by

  E(X) = ∑ (xi.P(X = xi)
        = 0(.008) + 1(.096) + 2(.384) + 3(.512)
        = 2.4.

The median is 3, since P(X ≤ 3) = 1 ≥ 1/2 and P(X ≥ 3) = 0.512 ≥ 1/2. Further, 3 is the least value of X with this property.

The mode is also 3, since its probability is the greatest.

Top of Page
Measures of Dispersion

Sample Variance and Sample Standard Deviation

Given a set of numbers x1, x2, . . . , xn the sample variance is

    s2=
    (xi - )2

    n-1
    =
    (x1 - )2 + (x2 - )2 + ... + (xn - )2

    n-1

The sample standard deviation is the square root, s, of the sample variance.


Population Variance and Population Standard Deviation

The population variance and standard deviation have slightly different formulas from those of the corresponding statistics for samples. Given a set of numbers x1, x2, . . . , xn the population variance, σ2, is

    σ2=
    (xi - )2

    n
    =
    (x1 - )2 + (x2 - )2 + ... + (xn - )2

    n

The population standard deviation, σ, is the square root of the population variance.


To read more about the difference between the sample and population variance and standard deviation, go to our on-line text: Sampling Distributions.

Example

Consider the following collection of scores we looked at above.

    11.5, 3, 5.5, 0.5, 3, 10, 2.5, 4

We saw above that the smple mean is 5 (see the example "Mean, Median, and Mode of a Set of Data" above). The following table shows the squares of the differences from the mean, which we use to compute the sample variance and standard deviation.

xi11.535.50.53102.54
x- 6.5 -2 0.5 -4.5 -2 5 -2.5 -1
(x-)2 42.25 4 0.25 20.25 4 25 6.25 1

The sum of the entires in the bottom row is ∑ (xi - )2 = 103. Therefore,

    s2=
    (xi - )2

    n-1
    =
    103

    7
    14.714
Also,
    s = 14.7141/2 3.836.

For the population variance, we divide 103 by n = 8 instead of 7, getting

    σ2 = 12.875
    σ 3.588

Top of Page
Variance and Standard Deviation of a Random Variable

If X is a random variable, its variance is defined to be

σ2 = E( [X - μ]2 ).
Its standard deviation is defined to be the square root σ of the variance. An alternate formula for the variance, useful for calculation, is
σ2 = E(X2) - μ2.

The variance and standard deviation of a random variable are the sample variance and sample standard deviation we expect to get if we have a large number of X-scores. Conversely, if all we know about X is a collection of X-scores, then the sample variance and sample standard deviation of those scores are our best estimates of the variance and standard deviation of X.

Example

Let us look again at the experiment in which we toss an unfair coin, with p = P(heads) = 0.8 and q = P(tails) = 0.2, three times. (X = number of heads.) Here is the distribution with the x2 scores added.

x0123
x20149
Probability0.0080.0960.3840.512

We saw above that μ = 2.4. Further,

  E(X2) = ∑ (xi2.P(X = xi)
        = 0(.008) + 1(.096) + 4(.384) + 9(.512)
        = 6.24.

Therefore,
  σ2=E(X2) - μ2
=6.24 - 2.42 = 0.48,
and

σ = 0.481/2 0.6928.

Top of Page
Interpreting Standard Deviation

Chebyshev's Rule

For a sets of data, the following is true.

  • At least 3/4 of the scores fall within 2 standard deviations of the mean (within the interval [-2s, +2s] for samples or [μ-2σ, μ+2σ] for populations).
  • At least 8/9 of the scores fall within 3 standard deviations of the mean (within the interval [-3s, +3s] for samples or [μ-3σ, μ+3σ] for populations).
  • At least 15/16 of the scores fall within 4 standard deviations of the mean (within the interval [-4s, +4s] for samples or [μ-4σ, μ+4σ] for populations).
    ...
  • At least 1-1/k2 of the scores fall within k standard deviations of the mean (within the interval [-ks, +ks] for samples or [μ-kσ, μ+kσ] for populations).

Empirical Rule

For a set of data whose frequency distribution is "bell-shaped" and symmetric (as in the figure), the following is true.

  • Approximately 68% of the scores fall within 1 standard deviation of the mean (within the interval [-s, +s] for samples or [μ-σ, μ+σ] for populations).
  • Approximately 95% of the scores fall within 2 standard deviations of the mean (within the interval [-2s, +2s] for samples or [μ-2σ, μ+2σ] for populations).
  • Approximately 99.7% of the scores fall within 3 standard deviations of the mean (within the interval [-3s, +3s] for samples or [μ-3σ, μ+3σ] for populations).
Example

Looking at the binomial distribution immediately above, we have

    E(X) = 2.4;
    s(X) = 0.481/2 0.69

Chebyshev's Rule now says:

    k = 2:   P(1.02 ≤ X ≤ 3.78) ≥ 0.75
    k = 3:   P(0.33 ≤ X ≤ 4.47) ≥ 0.89

However, we cannot apply the Empirical Rule to this disrtibution (look at the probability deistribution in the box above and notice that it is not symmeric).

Example of Empirical Rule

If the mean of a sample with a bell-shaped symmetric distribution is 20 with standard deviation s = 2, then approximately 95% of the scores lie in the interval [16, 24].

Top of Page
Statistics of a Binomial Distribution

If X is the number of successes in a sequence of n independent Bernoulli trials, with probability p of success in each trial and probability q = 1p of failure, then

μ = np
and
σ2 = npq.
To find the mode of X, take (n+1)p and round down (if necessary) to get an integer. If (n+1)p is already an integer than both (n+1)p - 1 and (n+1)p will be modes.

If n is large and p is not too close to 0 or 1, the median is approximately equal to the mean, np (which will also be the mode in this case).

Example

Looking at the unfair coin experiment immediately above, with n = 3, p = P(heads) = 0.8 and q = P(tails) = 0.2, we find

μ = np = 3(0.8) = 2.4,
and
σ2 = npq = 3(0.8)(0.2) = 0.48,
confirming the results above.

Top of Page
Continuous Random Variable

A continuous random variable X may take on any real value whatsoever. The probabilities P(a ≤ X ≤ b) are specified by means of a probability density curve, a curve lying above the x-axis with the total area between the curve and the x-axis being 1.

The probability P(a ≤ X ≤ b) is given by the area enclosed by the curve, the x-axis, and the lines x = a and x = b.

Examples

For a detailed discussion of several examples (the uniform, exponential, normal, and beta distributions) go to the on-line section on probability density functions. (To activate the links there, press the dots and not the words...)

Top of Page
Uniform Distribution

A finite uniform distribution is one in which all values of X are equally likely. A continuous uniform distribution is one whose probability density function is a horizontal line.

Example

The experiment: Cast a die and record the number uppermost
The random variable: X = the number uppermost

The probability distribution is then
X123456
Probability1/61/61/61/61/61/6

Top of Page
Normal Random Variable

The most important kind of continuous random variable is the normal random variable. It is one with a bell-shaped probability density curve given by the following equation.

Standard Normal Variable
The standard normal variable Z is a normal random variable with mean 0 and standard deviation 1. Probabilities of the form

    P(a ≤ Z ≤ b)
can be calculated with the aid of a Normal Distribution Table (also in the Appendix of Finite Mathematics).

To compute areas under normal curves without having to use a table, try our Normal Distribution Utility.

Example

If Z is the standard normal variable, then

    P(0 ≤ Z ≤ 0.5) 0.1915.
The corresponding area under the bell curve is illustrated in the following diagram.

For on-line interactive text on the role of the uniform distribution in measurements of sample means, go to our on-line text: Sampling Distributions.

For a calculus-based discussion of this and other distributions (the uniform, exponential, and beta distributions) go to the on-line section on probability density functions. (To activate the links there, press the dots and not the words...)

Top of Page
More on Normal Distributions

Probability of a Normal Distribution Being within k Standard Deviations of its Mean

If X is a normal random variable with mean and standard deviation s, then

P(s ≤ X ≤ +s) = 0.6826
P(2s ≤ X ≤ +2s) = 0.9545
P(3s ≤ X ≤ +3s) = 0.9973

Normal Approximation to a Binomial Distribution
If X is the number of successes in a sequence of n independent Bernoulli trials, with probability p of success in each trial, and if the range of values of X three standard deviations above and below the mean lies entirely within the range 0 to n (the possible values of X), then

P(a ≤ X ≤ b) is approximately equal to P(a-0.5 ≤ Y ≤ b+0.5)
where Y has a normal distribution with the same mean and standard deviation as X, that is, = np and s = (n.p.(1-p))1/2.

Top of Page

Last Updated: March, 20046
Copyright © 2000, 2003 Stefan Waner and Steven R. Costenoble

Top of Page