1. Continuous Random Variables and Histograms

Applied calculus on-line chapter: calculus applied to probability and statistics

Section 1. Continuous Random Variables and Histograms

Suppose that you have purchased stock in Colossal Conglomerate, Inc., and each day you note the closing price of the stock. The result each day is a real number X (the closing price of the stock) in the unbounded interval [0, +\infty). Or, suppose that you time several people running a 50-meter dash. The result for each runner is a real number X, the race time in seconds. In both cases, the value of X is somewhat random. Moreover, X can take on essentially any real value in some interval, rather than, say, just integer values. For this reason we refer to X as a continuous random variable. Here is the official definition:

Continuous Random Variable

A random variable is a function X that assigns to each possible outcome in an experiment a real number. If X may assume any value in some given interval I (the interval may be bounded or unbounded), it is called a continuous random variable. If it can assume only a number of separated values, it is called a discrete random variable.

Examples

Roll a die and take X to be the number on the uppermost face. Then X is a discrete random variable with possible values 1, 2, 3, 4, 5 and 6.
Locate a star in the cosmos and take X to be its distance from the solar system in light years. Then X is a continuous random variable whose values are real numbers in the interval (0, +\infty).
Open the business section of your newspaper and take X to be the closing price of Colossal Conglomerate stock. Then X can take on essentially any positive real value, so we can think of X as a continuous random variable.
Toss a coin and take X to be 1 if the result is heads and 0 if the result is tails. Then X is with values
Let X be the temperature of a sick person taken with a mercury thermometer that goes from 70° to 120°. Then X is with values

If X is a random variable, we are usually interested in the probability that X takes on a value in a certain range. For instance, if X is the daily closing price of Colossal Conglomerate stock and we find that 60% of the time the price is between $10 and $20, we would say

The probability that X is between $10 and $20 is 0.6.

We write this statement mathematicallly as follows.

P(10 \leq X \leq 20) = .6

We can use a bar chart, called a probability distribution histogram, to display the probabilities that X lies in selected ranges. This is shown in the following example.

Example 1 College Population by Age

The following table shows the distribution of U.S. residents (16 years old and over) attending college in 1980 according to age:

Age	15-19	20-24	25-29	30-34	35-?
Number in 1980 (millions)	2.7	4.8	1.9	1.2	1.8

Source: 1980 Census of Population, US Department of Commerce/Bureau of the Census.

Draw the probability distribution histogram for X = the age of a randomly chosen college student.

Solution A little terminology: The numbers in the bottom row are called frequencies and the given table is known as a frequency distribution. Summing the frequencies, we see that the total number of students in 1980 was 12.4 million. We can therefore convert all the data in the table to probabilities by dividing by this total.

Age	15-19	20-24	25-29	30-34	35-?
Probability	.22	.39	.15	.10	.15

The probabilities in the above table have been rounded, with the consequence that they add to 1.01 instead of the expected 1. In the category 15-19, we have actually included anyone at least 15 years old and less than 20 years old. For example, someone 19½ years old would be in this range. We would like to write 15-20 instead, but this would be ambiguous, since we would not know where to count someone who was exactly 20 years old. Now the probability that a college student is exactly 20 years old (and not, say, 20 years and 1 second) is essentially 0, so it doesn't matter (see the discussion after Example 2 below). We therefore rewrite the table with these ranges.

Age	15-20	20-25	25-30	30-35	≥35
Probability	.22	.39	.15	.10	.15

The table tells us that, for instance,

P(15 \leq X \leq 20) = .22

and

P(X \geq 35) = .15.

The probability distribution histogram is the bar graph we get from these data:

Try the on-line histogram maker that draws probability distribution histograms.

Before we go on... Had the grouping into ranges been finer—for instance into divisions of 1 year instead of 5 (and we had the year-by-year data as well), then the histogram would appear smoother, and with lower bars as shown below. Why?

This smoother looking distribution suggests a smooth curve. It is this kind of curve that we shall be studying in the next section.

Top of Page

Example 2 Age of a Rented Car

A survey finds the following probability distribution for the age of a rented car.

Age	0-1	1-2	2-3	3-4	4-5	5-6	6-7
Probability	.20	.28	.20	.15	.10	.05	.02

Plot the associated probability distribution histogram, and use it to evaluate (or estimate) the following:

(a)

P(0 \leq X \leq 4)

(b)

P(X \geq 4)

(c)

P(2 \leq X \leq 3.5)

(d)

P(X = 4)

Solution The histogram is shown below.

(a) We can calculate P(0 \leq X \leq 4) from the table by adding the corresponding probabilities:

P(0 \leq X \leq 4) = .20 + .28 + .20 + .15 = .83

This corresponds to the shaded region of the histogram shown in the following figure.

Notice that since each rectangle has width equal to 1 unit and height equal to the associated probability, its area is equal to the probability that X is in the associated range. Thus P(0 \leq X \leq 4) is also equal to the area of the shaded region.

(b) Similarly, P(X \geq 4) is given by the area of the unshaded portion of the above figure, so

P(X \geq 4) = .10 + .05 + .02 = .17

(Notice that P(0 \leq X \leq 4) + P(X \geq 4) = 1. Why?)

(c) To calculate P(2 \leq X \leq 3.5), we need to make an educated guess, since neither the table nor the histogram has subdivisions of width 0.5. Referring to the graph, we can approximate the probability by the shaded area shown below:

Thus,

P(2 \leq X \leq 3.5) \approx .20 + \frac{1}{2}(.15) = .275.

(d) To calculate P(X = 4), we would need to calculate P(4 \leq X \leq 4). But this would correspond to a region of the histogram with zero area (see the figure below) so we conclude that P(X = 4) = 0.

Question In the above example P(X = 4) was zero. Is it true that P(X = a) is zero for every number a in the interval associated with X?

Answer As a general rule, yes. If X is a continuous random variable, then X can assume infinitely many values, and so it is reasonable that the probability of its assuming any specific value we choose beforehand is zero.

Caution If you wish to use a histogram to calculate probability as area, make sure that the subdivisions for X have width 1; for instance, 1 \leq X \leq 2,\ 2 \leq X \leq 3, and so on.

The first histogram in Example 1 had bars corresponding to larger ranges for X. The first bar has a width of 5 units, so its area is 5 \times .22, which is 5 times the probability that 15 \leq X \leq 20. If you wish to use a histogram to give probability as area, divide the area by the width of the intervals.

There is another way, used by working statisticians, to calulate probability as area in a histogram: Draw your histograms so that the heights are not necessarily the probabilities but are chosen so that the area of each bar gives the corresponding probability.

Top of Page