 ## Sampling Distributions & The Central Limit Theorem Miscellaneous on-line topics for Finite Mathematics

Index of On-Line Text
Exercises for This Topic
Everything for Finite Math
Español ### 1. Sampling Distributions

It is often impossible to measure the mean or standard deviation of an entire population unless the population is small, or we do a nationwide census. The population mean and standard deviation are examples of population parameters--descriptive measurements of the entire population. Given the impracticality of measuring population parameters, we instead measure sample statistics--descriptive measurements of a sample. Examples of sample statistics are the sample mean, sample median, and sample standard deviation.

Q OK, so why not use the sample statistic as an estimate of the corresponding population parameter; for instance, why not use the sample mean as an estimate of the population mean?
A This is exactly what we do to estimate population means and medians (with a slight modification in the case of the standard deviation). However, a sample statistic (such as the sample mean) may be "all over the place," so a further question is: how confident can we be in the sample statistic?

Q Give me an example.
A If we cast a fair die and take $X$ to be the uppermost number, we know that the population mean (expected value) is $μ = 3.5,$ and that the population median is also $m = 3.5.$ But if we take a sample of, say, four throws, the mean may be far from $3.5.$ Here are the results of $5$ such samples of $4$ throws (we used a random number generator to obtain these samples):

 $X_{1}$ $X_{2}$ $X_{3}$ $X_{4}$ $X$ Sample 1 $6$ $2$ $5$ $6$ $4.75$ Sample 2 $2$ $3$ $1$ $6$ $3$ Sample 3 $1$ $1$ $4$ $6$ $3$ Sample 4 $6$ $2$ $2$ $1$ $2.75$ Sample 5 $1$ $5$ $1$ $3$ $2.5$

Since each sample consists of $4$ throws, we say that the sample size is $n = 4.$ Notice that none of the five samples gave us the correct mean, and that the mean of the first sample is far from the actual mean.

Q The table above is interesting: look at the values of the mean $X.$ The everage (mean) of these means is $3.2.$ Thus, although the mean of a particular sample may not be a good predictor of the population mean, we get better results if we take the mean of a whole bunch of sample means.
A You have put your thumb on one of the most important concepts inferential statistics; the values of $\bar{x}$ are values of a random variable (take a sample of $5,$ and measure the mean), and its probability distribution is called the sampling distribution of the sample mean. The above table suggests that the expected value of the sampling distribution of the mean is the same as the population mean, and this turns out to be true.

 Sampling Distribution The sampling distribution of a statistic $S$ for samples of size $n$ is defined as follows. The experiment consists of choosing a sample of size $n$ from the population and measuring the statistic $S.$ The sampling distribution is the resulting probability distribution. Quick Example If the statistic $S$ is the sample mean $\bar{x}$ of samples of size $4$ as above, then the sampling distribution is the probability distribution of the sample means $\bar{x}.$ (We will see how to calculate such distributions below.)

Before going on to the first worked example of a sampling distribution, take a look at the following simulation of a weighted die (sample size $n = 8$). Each time you press the "New Sample" button, the imaginary die will be cast $8$ times. See if you can estimate the expected value $μ$ by repeated sampling.

 $\bar{x} =$ My estimate of the expected value is: Example 1 Computing a Sampling Distribution by Hand

An unfair coin has a $75%$ chance of landing heads-up. Let $X = 1$ if it lands heads-up, and $X = 0$ if it lands tails-up. Find the sampling distribution of the mean $\bar{x}$ for samples of size $3.$

Solution The experiment consists of tossing a coin $3$ times and measuring the sample mean $\bar{x}.$ The following table shows the collection of all possible outcomes (samples) and associated sample mean.

 Outcome $HHH$ $HHT$ $HTH$ $HTT$ $THH$ $THT$ $TTH$ $TTT$ ProbabilityHow did you get these? $27/64$ $9/64$ $9/64$ $3/64$ $9/64$ $3/64$ $3/64$ $1/64$ $X$How did you get these? $1$ $2/3$ $2/3$ $1/3$ $2/3$ $1/3$ $1/3$ $0$

As the table shows, the possible values of $\bar{x}$ are $0, 1/3, 2/3,$ and $1.$ The desired sampling distribution is its probability distribution, shown below. (Enter the probabilities as either fractions or decimals, and press "Check". Don't press "Peek" unless you want to avoid doing any of the calculations!)

 $\bar{x}$ $0$ $1/3$ $2/3$ $1$ $P(\bar{x} = \bar{x})$

Note The distribution of the sample mean is a binomial distribution. The Central Limit Theorem will tell us that, for large sample sizes, it must look more and more like a normal distribution.

The next example involves sampling from a continuous distribution, and will involve using technology. Example 2 Using Technology to Sample from a Continuous Distribution

The example with which we began this section involved taking five samples of size $n = 4$ from a finite uniform random variable (the outcome of rolling a die). Here, we will also sample from a uniform random variable, but this time we use the continuous random variable with domain $[0, 1],$ so that the outcomes can be any number between $0$ and $1.$ For instance, a possible sample of size $n = 6$ is

${0.136, 0.397, 0.278, 0.029, 0.810, 0.496},$

which has mean $\bar{x} = 0.358.$ If we allow decimals of arbitrary length, then the number of possible samples of size$n = 6$ is infinite, so we cannot list them all. Instead, we will let you decide how many samples to generate, and compute the resulting (experimental) probability distribution based on these samples. In a sense, this will be an approximation of the actual sampling distribution. (The larger the number of samples you use, the better the approximation.)

Q How do I use the simulation?
A Just follow these instructions:
1. First, select the number of samples you would like to generate. We suggest using a fairly small number at first, such as $20.$ (You can change it later to a larger value, but don't say we didn't warn you: the larger the number of samples, the longer the wait, especially on those slow non-Macintosh machines...)
2. Next, press "Generate Samples, and you will see the $20$ or so samples appear in a new window, together with the mean, $\bar{x},$ of each sample.
3. Finally, press "Graph" to see the resulting probability distribution and graph of the sample means, using the following measurement classes: $0-0.1, 0.1-0.2, 0.2-0.3, ..., 0.9-1.0.$
4. Once you have done all that, you should then answer some questions based on the sampling distribution you have generated!

 Number of Samples:

Now use the distribution (on the graph) to answer the following questions:

Q What is the expected value of $\bar{x}$ based on your experimental data? (Use the midpoint of each measurement class in setting up your calculation -- see the graph you generated for the probabilities.)
A

Q What is the theoretical expected value of $\bar{x}$?
A

Note The histogram gives a "sample" of the actual sampling distribution; we can't produce the whole sampling distribution in the above manner, since there are, in principle, infinitely many possible samples.

### 2. Unbiased Estimates of Population Parameters

Suppose we want to estimate the population mean from a sample of $100.$ We could use the sample mean, or perhaps the sample median, as such an estimate. Such an estimate is called a point estimator. Suppose, for instance, that we want to use the sample median as a point estimator of the population mean. How accurate is it?

First of all, there are going to be lots of different medians corresponding to the different samples of $100.$ If we knew the sampling distribution of the sample median with $n = 100,$ we could compute the expected value (mean) of this sampling distribution. That is, we can compute the expected value of the sample median. If it equals the population mean, we would say that the sample median is an unbiased estimator of the population mean. Otherwise, we say that it is a biased estimator with bias equal to the difference between the expected value of the estimator and the value of the population parameter.

Further, in order to obtain a more accurate estimate of the population parameter, we should use a sample statistic whose standard deviation (the standard deviation of its sampling distribution) is as small as possible. In this way, the statistic of a single sample is more likely to be close to the expected value. Example 3 Is the Sample Mean an Unbiased Estimator of the Population Mean?

Refer to Example 1: $X$ is the number of heads when we toss an unfair coin (with a $75%$ chance of heads coming up). That is, $X = 1$ if it's a head and $X = 0$ if it's a tail. Determine whether the sample mean is an unbiased estimator of the population mean.

Solution We need to compare the population mean for $X$ with the expected value of the sampling distribution of the sample means. That is, we must compare two expected values:
$E(X) = μ =$ (expected value of $X$)
$E(\bar{x}) =$ (expected value of $\bar{x}$)

Step 1 Compute the population mean $E(X) = μ.$
This means we must compute the average number of heads that comes up when a coin is tossed (not three times-that is the sample size we used-but once). But, the expected value of $X$ is given by $μ = ΣxP(X=x) = 0(0.25) + 1(0.75) = 0.75.$

Step 2 Compute the expected value $E(\bar{x})$ of the sampling distribution of the sample mean.
To do this, we need the sampling distribution of the sample mean, and we already calculated that: the sampling distribution of $\bar{x}$ was found to be as shown in the following table.

 $\bar{x}$ $0$ $1/3$ $2/3$ $1$ $P(\bar{x} = \bar{x})$ $1/64$ $9/64$ $27/64$ $27/64$

We can now compute its expected value $E(\bar{x})$ in the usual way. (Complete the following table, entering your answers as fractions, and press "Check.")

 $\bar{x}$ $0$ $1/3$ $2/3$ $1$ $P(\bar{x} = \bar{x})$ $1/64$ $9/64$ $27/64$ $27/64$ $\bar{x}P(\bar{x} = \bar{x})$ $Sum = E(\bar{x}) =$

Since $E(\bar{x})$ is the same as the population mean $E(X),$ the estimator is unbiased.

Note The following results can be proved.
1. The sample mean is always an unbiased estimator of the population mean, regardless of the distribution or the sample size!
2. The sample standard deviation (recall that it uses a different formula from the population standard deviation) is always an unbiased estimator of the population standard deviation, again regardless of the distribution of the sample size! That is why we used $n-1$ instead of $n$ in the formula for sample standard deviation; if we used the same formula as for the population standard deviation, it would have been a biased estimator.

Following is a summary of the properties of the sampling distribution.

Properties of the Sampling Distribution

1. Mean:
 Mean of the sampling distribution $=$ Population mean: $μ_{\bar{x}} = μ$
2. Standard deviation:
Standard Deviation of Sampling Distribution$=$
 Population Standard Deviation Square Root of Sample Size
$σ_{\bar{x}} = \frac{σ}{√n}$
3. If the population distribution is normal, then so is the sampling distribution of $\bar{x}.$
4. The Central Limit Theorem If the population distribution is arbitrary (not necessarily normal) with mean $μ$ and standard deviation $σ,$ then, for sufficiently large $n,$ the sampling distribution of $\bar{x}$ is approximately normal, with mean

$μ_{\bar{x}} = μ$

and standard deviation

$σ_{\bar{x}} = \frac{σ}{√n}.$

Q So the Central Limit Theorem says that the means of large samples are always normally distributed. How large is large?
A Actually a difficult question to answer: the larger the sample size, the closer a distribution is to being normal. There is no exact point where we can say that the sample size is large enough to warrant an assumption that the sampling distribution is normal. In practice, we tend to use $n = 30$ as a cutoff point: for samples of size $n ≥ 30,$ we assume that a sampling distribution is approximately normal, otherwise we do not.

The following illustration shows how the sample size effects the shape of the sampling distribution. Q What if the original distribution (of $X$) is normal?
A In that case, the sample mean is normally distributed, no matter how large (or small) the sample size. Example 4 Using the Central Limit Theorem

A lightbulb manufacturer claims that the lifespan of its lightbulbs has a mean of $54$ months and $a$ st. deviation of $6$ months. Your consumer advocacy group tests $50$ of them. Assuming the manufacturer's claims are true, what is the probability that it finds a mean lifetime of less than $52$ months?

Solution In symbols, we are seeking $P(\bar{x} ≤ 52).$ Now, $\bar{x}$ is approximately normally distributed by the Central Limit Theorem, and has a mean of $μ = 54$ and a standard deviation of $σ_{\bar{x}} = 6/√50 ≈ 0.85$ months. To find the required probability, we need to convert to $z-$scores:
$z = \frac{\bar{x} - μ_{\bar{x}}}{σ _{\bar{x}}} = \frac{52 - 54}{0.85} ≈ -2.35$

Thus, we must use the tables to find $P(Z ≤ -2.35).$ If we look at a sketch of this, remembering that the table only gives $P(0 ≤ Z ≤ z),$ we compute

$0.5 - P(0 ≤ Z ≤ 2.35) = 0.5 - 0.4906 = 0.0094.$

Thus, the probability of this happening is $0.0094,$ or $0.94%.$ Thus, we can be $99.06%$ certain that this won't happen (if the manufacturer's claim is correct!).  Last Updated:February, 1998
Copyright © 1998 StefanWaner and Steven R. Costenoble