1. Sampling Distributions
It is often impossible to measure the mean or standard deviation of an entire population unless the population is small, or we do a nationwide census. The population mean and standard deviation are examples of population parameters--descriptive measurements of the entire population. Given the impracticality of measuring population parameters, we instead measure sample statistics--descriptive measurements of a sample. Examples of sample statistics are the sample mean, sample median, and sample standard deviation.
Q OK, so why not use the sample statistic as an estimate of the corresponding population parameter; for instance, why not use the sample mean as an estimate of the population mean?
A This is exactly what we do to estimate population means and medians (with a slight modification in the case of the standard deviation). However, a sample statistic (such as the sample mean) may be "all over the place," so a further question is: how confident can we be in the sample statistic?
Q Give me an example.
A If we cast a fair die and take $X$ to be the uppermost number, we know that the population mean (expected value) is $μ = 3.5,$ and that the population median is also $m = 3.5.$ But if we take a sample of, say, four throws, the mean may be far from $3.5.$ Here are the results of $5$ such samples of $4$ throws (we used a random number generator to obtain these samples):
| $X_{1}$ | $X_{2}$ | $X_{3}$ | $X_{4}$ |  $X$ |
Sample 1 | $6$ | $2$ | $5$ | $6$ | $4.75$ |
Sample 2 | $2$ | $3$ | $1$ | $6$ | $3$ |
Sample 3 | $1$ | $1$ | $4$ | $6$ | $3$ |
Sample 4 | $6$ | $2$ | $2$ | $1$ | $2.75$ |
Sample 5 | $1$ | $5$ | $1$ | $3$ | $2.5$ |
Since each sample consists of $4$ throws, we say that the sample size is $n = 4.$ Notice that none of the five samples gave us the correct mean, and that the mean of the first sample is far from the actual mean.
Q The table above is interesting: look at the values of the mean $X.$ The everage (mean) of these means is $3.2.$ Thus, although the mean of a particular sample may not be a good predictor of the population mean, we get better results if we take the mean of a whole bunch of sample means.
A You have put your thumb on one of the most important concepts inferential statistics; the values of $\bar{x}$ are values of a random variable (take a sample of $5,$ and measure the mean), and its probability distribution is called the sampling distribution of the sample mean. The above table suggests that the expected value of the sampling distribution of the mean is the same as the population mean, and this turns out to be true.
Sampling Distribution
The sampling distribution of a statistic $S$ for samples of size $n$ is defined as follows. The experiment consists of choosing a sample of size $n$ from the population and measuring the statistic $S.$ The sampling distribution is the resulting probability distribution.
Quick Example
If the statistic $S$ is the sample mean $\bar{x}$ of samples of size $4$ as above, then the sampling distribution is the probability distribution of the sample means $\bar{x}.$ (We will see how to calculate such distributions below.)
|
Before going on to the first worked example of a sampling distribution, take a look at the following simulation of a weighted die (sample size $n = 8$). Each time you press the "New Sample" button, the imaginary die will be cast $8$ times. See if you can estimate the expected value $μ$ by repeated sampling.
Example 1 Computing a Sampling Distribution by Hand
An unfair coin has a $75%$ chance of landing heads-up. Let $X = 1$ if it lands heads-up, and $X = 0$ if it lands tails-up. Find the sampling distribution of the mean $\bar{x}$ for samples of size $3.$
Solution The experiment consists of tossing a coin $3$ times and measuring the sample mean $\bar{x}.$ The following table shows the collection of all possible outcomes (samples) and associated sample mean.
As the table shows, the possible values of $\bar{x}$ are $0, 1/3, 2/3,$ and $1.$ The desired sampling distribution is its probability distribution, shown below. (Enter the probabilities as either fractions or decimals, and press "Check". Don't press "Peek" unless you want to avoid doing any of the calculations!)
Note The distribution of the sample mean is a binomial distribution. The Central Limit Theorem will tell us that, for large sample sizes, it must look more and more like a normal distribution.
The next example involves sampling from a continuous distribution, and will involve using technology.
Example 2 Using Technology to Sample from a Continuous Distribution
The example with which we began this section involved taking five samples of size $n = 4$ from a finite uniform random variable (the outcome of rolling a die). Here, we will also sample from a uniform random variable, but this time we use the continuous random variable with domain $[0, 1],$ so that the outcomes can be any number between $0$ and $1.$ For instance, a possible sample of size $n = 6$ is
${0.136, 0.397, 0.278, 0.029, 0.810, 0.496},$
which has mean $\bar{x} = 0.358.$ If we allow decimals of arbitrary length, then the number of possible samples of size$ n = 6$ is infinite, so we cannot list them all. Instead, we will let you decide how many samples to generate, and compute the resulting (experimental) probability distribution based on these samples. In a sense, this will be an approximation of the actual sampling distribution. (The larger the number of samples you use, the better the approximation.)
Q How do I use the simulation?
A Just follow these instructions:
- First, select the number of samples you would like to generate. We suggest using a fairly small number at first, such as $20.$ (You can change it later to a larger value, but don't say we didn't warn you: the larger the number of samples, the longer the wait, especially on those slow non-Macintosh machines...)
- Next, press "Generate Samples, and you will see the $20$ or so samples appear in a new window, together with the mean, $\bar{x},$ of each sample.
- Finally, press "Graph" to see the resulting probability distribution and graph of the sample means, using the following measurement classes: $0-0.1, 0.1-0.2, 0.2-0.3, ..., 0.9-1.0.$
Once you have done all that, you should then answer some questions based on the sampling distribution you have generated!
Now use the distribution (on the graph) to answer the following questions:
Note The histogram gives a "sample" of the actual sampling distribution; we can't produce the whole sampling distribution in the above manner, since there are, in principle, infinitely many possible samples.
2. Unbiased Estimates of Population Parameters
Suppose we want to estimate the population mean from a sample of $100.$ We could use the sample mean, or perhaps the sample median, as such an estimate. Such an estimate is called a point estimator. Suppose, for instance, that we want to use the sample median as a point estimator of the population mean. How accurate is it?
First of all, there are going to be lots of different medians corresponding to the different samples of $100.$ If we knew the sampling distribution of the sample median with $n = 100,$ we could compute the expected value (mean) of this sampling distribution. That is, we can compute the expected value of the sample median. If it equals the population mean, we would say that the sample median is an unbiased estimator of the population mean. Otherwise, we say that it is a biased estimator with bias equal to the difference between the expected value of the estimator and the value of the population parameter.
Further, in order to obtain a more accurate estimate of the population parameter, we should use a sample statistic whose standard deviation (the standard deviation of its sampling distribution) is as small as possible. In this way, the statistic of a single sample is more likely to be close to the expected value.
Example 3 Is the Sample Mean an Unbiased Estimator of the Population Mean?
Refer to Example 1: $X$ is the number of heads when we toss an unfair coin (with a $75%$ chance of heads coming up). That is, $X = 1$ if it's a head and $X = 0$ if it's a tail. Determine whether the sample mean is an unbiased estimator of the population mean.
Solution
We need to compare the population mean for $X$ with the expected value of the sampling distribution of the sample means. That is, we must compare two expected values:
$E(X) = μ =$ (expected value of $X$)
$E(\bar{x}) =$ (expected value of $\bar{x}$)
Step 1 Compute the population mean $E(X) = μ.$
This means we must compute the average number of heads that comes up when a coin is tossed (not three times-that is the sample size we used-but once). But, the expected value of $X$ is given by
$μ = ΣxP(X=x) = 0(0.25) + 1(0.75) = 0.75.$
Step 2 Compute the expected value $E(\bar{x})$ of the sampling distribution of the sample mean.
To do this, we need the sampling distribution of the sample mean, and we already calculated that: the sampling distribution of $\bar{x}$ was found to be as shown in the following table.
$\bar{x}$ | $0$ | $1/3$ | $2/3$ | $1$ |
$P(\bar{x} = \bar{x})$ | $1/64$ | $9/64$ | $27/64$ | $27/64$ |
We can now compute its expected value $E(\bar{x})$ in the usual way. (Complete the following table, entering your answers as fractions, and press "Check.")