|Applied calculus on-line chapter: calculus applied to probability and statistics|
|Section 1. Continuous Random Variables and Histograms|
Suppose that you have purchased stock in Colossal Conglomerate, Inc., and each day you note the closing price of the stock. The result each day is a real number X (the closing price of the stock) in the unbounded interval [0, +\infty). Or, suppose that you time several people running a 50-meter dash. The result for each runner is a real number X, the race time in seconds. In both cases, the value of X is somewhat random. Moreover, X can take on essentially any real value in some interval, rather than, say, just integer values. For this reason we refer to X as a continuous random variable. Here is the official definition:
Continuous Random Variable
A random variable is a function X that assigns to each possible outcome in an experiment a real number. If X may assume any value in some given interval I (the interval may be bounded or unbounded), it is called a continuous random variable. If it can assume only a number of separated values, it is called a discrete random variable.
If X is a random variable, we are usually interested in the probability that X takes on a value in a certain range. For instance, if X is the daily closing price of Colossal Conglomerate stock and we find that 60% of the time the price is between $10 and $20, we would say
We write this statement mathematicallly as follows.
The following table shows the distribution of U.S. residents (16 years old and over) attending college in 1980 according to age:
|Number in 1980 (millions)||2.7||4.8||1.9||1.2||1.8|
Draw the probability distribution histogram for X = the age of a randomly chosen college student.
Solution A little terminology: The numbers in the bottom row are called frequencies and the given table is known as a frequency distribution. Summing the frequencies, we see that the total number of students in 1980 was 12.4 million. We can therefore convert all the data in the table to probabilities by dividing by this total.
The probabilities in the above table have been rounded, with the consequence that they add to 1.01 instead of the expected 1. In the category 15-19, we have actually included anyone at least 15 years old and less than 20 years old. For example, someone 19½ years old would be in this range. We would like to write 15-20 instead, but this would be ambiguous, since we would not know where to count someone who was exactly 20 years old. Now the probability that a college student is exactly 20 years old (and not, say, 20 years and 1 second) is essentially 0, so it doesn't matter (see the discussion after Example 2 below). We therefore rewrite the table with these ranges.
The table tells us that, for instance,
The probability distribution histogram is the bar graph we get from these data:
Before we go on... Had the grouping into ranges been finer—for instance into divisions of 1 year instead of 5 (and we had the year-by-year data as well), then the histogram would appear smoother, and with lower bars as shown below. Why?
This smoother looking distribution suggests a smooth curve. It is this kind of curve that we shall be studying in the next section.
A survey finds the following probability distribution for the age of a rented car.
Plot the associated probability distribution histogram, and use it to evaluate (or estimate) the following:
(a) We can calculate P(0 \leq X \leq 4) from the table by adding the corresponding probabilities:
This corresponds to the shaded region of the histogram shown in the following figure.
Notice that since each rectangle has width equal to 1 unit and height equal to the associated probability, its area is equal to the probability that X is in the associated range. Thus P(0 \leq X \leq 4) is also equal to the area of the shaded region.
(b) Similarly, P(X \geq 4) is given by the area of the unshaded portion of the above figure, so
(Notice that P(0 \leq X \leq 4) + P(X \geq 4) = 1. Why?)
(c) To calculate P(2 \leq X \leq 3.5), we need to make an educated guess, since neither the table nor the histogram has subdivisions of width 0.5. Referring to the graph, we can approximate the probability by the shaded area shown below:
(d) To calculate P(X = 4), we would need to calculate P(4 \leq X \leq 4). But this would correspond to a region of the histogram with zero area (see the figure below) so we conclude that P(X = 4) = 0.
Answer As a general rule, yes. If X is a continuous random variable, then X can assume infinitely many values, and so it is reasonable that the probability of its assuming any specific value we choose beforehand is zero.
If you wish to use a histogram to calculate probability as area, make sure that the subdivisions for X have width 1; for instance, 1 \leq X \leq 2,\ 2 \leq X \leq 3, and so on.
The first histogram in Example 1 had bars corresponding to larger ranges for X. The first bar has a width of 5 units, so its area is 5 \times .22, which is 5 times the probability that 15 \leq X \leq 20. If you wish to use a histogram to give probability as area, divide the area by the width of the intervals.
There is another way, used by working statisticians, to calulate probability as area in a histogram: Draw your histograms so that the heights are not necessarily the probabilities but are chosen so that the area of each bar gives the corresponding probability.