Day By Day
Notes for MATH 201
Fall 2006
Activity: Go over syllabus. Take roll. Overview examples: Gilbert trial, election polls, spam
filters
Goals: Review
course objectives: collect data, summarize information, make
inferences.
Reading: To The Student, pages
xxxi-xxxiv.
Activity: Discussion of variables and
graphs. From a list of numbers,
communicate the important information to the person next to you. (Work in pairs or groups.) For your list of numbers, make a
frequency table and a histogram.
Useful commands for the calculator:
STAT
EDIT (use one of the lists to enter data,
L1 for example; the other L's can be used too.)
2nd
STATPLOT 1 On (Use this screen to
designate the plot settings. You
can have up to three plots on the screen at once. For now we will only use one at a time.)
ZOOM
9 This command centers the window around
your data.
In your description to your
neighbor, keep in mind these terms:
symmetry, skew, center, spread, mode, outlier. Also make sure that you try different window settings for
your histogram.
Goals: Begin
graphical summaries (describing data with pictures). Be able to use the calculator to make a
histogram.
Skills:
…
Identify types of
variables. To choose the proper graphical displays, it
is important
to be able to differentiate between Categorical (or Qualitative) and
Quantitative (or Numerical) variables.
…
Be familiar with
types of graphs. To graph categorical variables we use bar graphs or
pie graphs. To graph numerical
variables, we use histograms, stem plots, or QUANTILE (a TI-83 program we will explore on Day 3). In practice, most of our variables will
be numerical but it is still important to choose the right
display.
…
Summarize data into a
frequency table. The easiest way to make a frequency table is to first
use the TI-83 to make a histogram and then to TRACE over the boxes and record the classes and
counts. You can control the size
and number of the classes with Xscl
and Xmin
in the WINDOW menu. The decision as to
how many classes to create is arbitrary; there isn't a "right"
answer, or rather all choices of Xscl
and Xmin are "right"
answers. One popular suggestion is
try the square root of the number of data values. For example, if there are 25 data points, use 5
intervals. If there are 50 data points, try 7
intervals. This is a rough rule;
you should experiment with it. The
TI-83 has a rule for doing this; I do not know what their rule is. You should develop your intuitions by
changing the interval width Xscl and
starting point Xmin and see what happens to
the display.
…
Know how to create
and interpret graphs for categorical variables. The two main
graphs for categorical variables are pie graphs and bar charts. Pie graphs are difficult to make by
hand, but popular on computer programs like Excel. Bar charts are also common on spreadsheets. Data represented by pie graphs and bar
charts usually are expressed as percents of the whole; thus they add to
100%. The ordering of categories
is arbitrary; therefore concepts such as skew and center make no sense.
Reading: Section 1.1. (Skip Time Plots and
Time Series.)
Activity: Use the "Arizona Temps" dataset to practice creating
the histograms, stem plots, and quantile plots for several lists. Compare and interpret the graphs. Identify shape, center, and spread.
QUANTILE is a program I wrote that plots the sorted data in a
list and "stacks" them up.
This is also known as a quantile plot. Basically we are graphing the data value versus its rank, or
percentile, in the dataset. The
syntax is PRGM EXEC QUANTILE ENTER.
Answer these questions:
1) Do any of the lists have
outliers?
2) What information does the stem
plot show that the histogram hides?
3) What information does the
quantile plot show that the stem plot hides?
Goals: Be able
to use the calculator to make (and be able to interpret) a quantile plot, using
the program QUANTILE. Be able to make a stem plot
by hand.
Skills:
…
Use the
TI-83 to create
an appropriate histogram or quantile plot. STAT PLOT and QUANTILE are our two main tools for viewing distributions of
data. Histograms are common
displays, but have flaws; the choice of class width is troubling as it is not
unique. The quantile plot is more
reliable, but less common. For
interpretation purposes, remember that in a histogram tall boxes represent
places with lots of data, while in a quantile plot those same high-density data
places are represented by steepness.
…
Create a stem plot by
hand. The stem plot is a convenient manual display; it is
most useful for small datasets, but not all datasets make good stem plots. Choosing the "stem" and
"leaves" to make reasonable displays will require some practice. Some notes for proper choice of stems:
if you have many empty rows, you have too many stems. Move one column to the left and try again. If you have too few rows (all the data
is on just one or two stems) you have too few stems. Move to the right one digit and try again. Some datasets will not give good
pictures for any choice of stem, and some benefit from splitting or rounding
(see the example on page 13).
…
Describe shape,
center, and spread.
From each of our graphs, you should be able to make
general statements about the shape, center, and spread of the distribution of
the variable being explored. One
of the main conclusions we want to make about lists of data when we are doing
inference (Chapters 6 to 8) is whether the data is close to symmetric; many
times "close enough" is, well, close enough! We will discuss this in more detail
when we see the Central Limit Theorem in Chapter 5.
Reading: Section 1.2.
Activity: Dance Fever example. Use the "Arizona Temps" dataset to calculate the mean,
the standard deviation, the 5-number summary, and the associated box plot for
any of the variables.
Compare these measures with the corresponding histograms and quantile plots you
did on Day 2. Note the
similarities (where the data values are dense, and where they are sparse) but
especially note the differences.
The box plots and numerical measures cannot describe shape very
well. The histograms are hard to
use to compare two lists. The stem
and leaf is difficult to modify.
Answer these questions:
1) Are high and low temperatures
distributed the same way, other than the obvious fact that highs are higher
than lows?
2) How does a single case affect the
calculator's routines? (What if we
had had an outlier?)
3) What information does the box
plot disguise?
To calculate our summary statistics, we will use 1-Var Stats (to use List 1) or 1-Var Stats L2 for List 2, for example. There are two screens of output; we will be mostly concerned
with the mean , the standard deviation Sx, and the five-number summary on screen two.
Goals: Compare
numerical measures of center.
Summarize data with numerical measures and box plots. Compare these new measures with the
histograms, stem plots, and quantile plots you made on Day
3.
Skills:
…
Understand the effect
of outliers on the mean.
The mean (or average) is unduly influenced by outlying
(unusual) observations. Therefore,
knowing when your distribution is skewed is
helpful.
…
Understand the effect
of outliers on the median. The median is almost completely
unaffected by outliers. For
technical reasons, though, the median is not as common in scientific
applications as the mean.
…
Use the TI-83 to
calculate summary statistics.
Calculating may be as simple as entering numbers into
your calculator and pressing a button.
Or, if you are doing some things by hand, you may have to organize
information the correct way, such as listing the numbers from low to high. On the TI-83, the numerical measures
are calculated using STAT CALC 1-Var
Stats L#. Please
get used to using the statistical features of your calculator to produce the
mean. While I know you can
calculate the mean by simply adding up all the numbers and dividing by the
sample size, you will not be in the habit of using the full features of your
machine, and later on you will be missing out.
…
Compare several lists
of numbers using box plots.
For two lists, the best simple approach is the
back-to-back stem plot. For more
than two lists, I suggest trying box plots, side-by-side, or stacked. At a glance, then, you can assess which
lists have typically larger values or more spread out values,
etc.
…
Understand box
plots. You should know that the box plots for some lists
don't tell the interesting part of those lists. For example, box plots do not describe shape very well; you can only see where the
quartiles are. Alternatively, you
should know that the box plot can
be a very good first quick look.
Reading: Section 1.2.
Activity: Create the following lists:
1) A list of 10 numbers that has
only one number below the mean.
2) A list of 10 numbers that has the
standard deviation greater than the mean.
3) A list of 10 numbers that has a
standard deviation of zero.
For your fourth list start with any 21 numbers. Find a number N
such that 14 of the numbers in your list are within N of the average.
For example, pick a number N
(say 4), calculate the average plus 4, the average minus 4, and count how many
numbers in your list are between those two values. If the count is less than 14, try a larger number
for N (bigger than 4). If the count is more than 14, try a smaller number
for N (smaller than 4).
Finally, compare the standard deviation to the Inter Quartile Range (IQR = Q3 -
Q1).
(You may use any extra time today to discuss Presentation 1 in your groups.)
Goals: Interpret
standard deviation as a measure of spread.
Skills:
…
Understand standard
deviation. At first, standard deviation will seem foreign to you,
but I believe that it will make more sense the more you become familiar with
it. In its simplest terms, the
standard deviation is non-negative number that measures how "wide" a
dataset is. One common
interpretation is that the range of a dataset is 4 standard deviations. Another interpretation is that the
standard deviation is roughly ¾ times IQR. Eventually we will use the standard deviation in our
calculations for statistical inference; until then, this measure is just
another summary statistic, and getting used to this number is your goal. The normal curve of the next section
will further help us understand standard deviation.
Reading: Section 1.3.
Activity: Introduce the TI-83's normal
calculations.
Homework 1 due.
DISTR
normalcdf( lower, upper ) calculates the
area under a normal curve between lower
and upper. If you specify just 2 values, mean 0
and standard deviation 1 are assumed.
If you want a different mean or standard deviation, add a third and
fourth parameter.
Example: DISTR normalcdf(
-10, 20, 5, 10 ) finds the area between
-10 and +20 on a normal curve with mean 5 and standard deviation 10
while DISTR normalcdf(
-2, 2 ) finds the area on the standard normal curve between -2 and
+2.
DISTR
invNorm( works backwards, but
only gives upper as an answer.
It is also referred to as a percentile. The 90th percentile is that point at which 90 %
of the observations are below. The
syntax is DISTR invNorm( .90 ) or DISTR invNorm(
.90, 5, 10 ) ; the first example assumes
the standard normal curve and reports the 90th percentile. The second example uses a mean of 5 and
a standard deviation of 10 and also reports the 90th percentile.
Note that if the desired area is above a certain number, you will have to use subtraction or
symmetry, as DISTR
invNorm( only
reports values below, or to the left.
Goals: Introduce
normal curve. Use TI-83 in place
of the standard normal table in the text.
Skills:
…
Know what a
z-score is (standardization).
Sometimes, instead of knowing a variable's actual value, we
are only interested
in how far above or below average it is.
This information is contained in the z-score.
Negative values indicate a below average observation, while positive
values are above average. If the
list follows a normal distribution (the familiar "bell-shaped" curve)
then it will be relatively rare to have values below -2 or above +2 (only about
5 % of cases). Even if the list is
not normal, surprisingly the z-score
still tends to have few values beyond ±2, although this is not
guaranteed.
…
Using the TI-83 to
find areas under the normal curve.
When we have a distribution
that can be approximated with the bell-shaped normal curve, we can make
accurate statements about frequencies and percentages by knowing just the mean
and the standard deviation of the data.
Our TI-83 has 2 functions, DISTR normalcdf( and DISTR invNorm(
which allow us to calculate these percentages more easily and more accurately
than the table in the text. We use
DISTR
normalcdf( when we want the percentage as
an answer and we use DISTR invNorm( when we already
know the percentage but not the value that gives that percentage.
Reading: Section 1.3.
Activity: Practice normal calculations.
1) Suppose SAT scores are
distributed normally with mean 800 and standard deviation 100. Estimate the chance that a randomly
chosen score will be above 720.
Estimate the chance that a randomly chosen score with be between 800 and
900. The top 20% of scores are
above what number? (This is called
the 80th percentile.)
2) Find the Inter Quartile Range
(IQR) for the standard normal (mean 0, standard deviation 1). Compare this to the standard deviation
of 1.
3) Women aged 20 to 29 have
normally distributed heights with mean 64 and standard deviation 2.7. Men have mean 69.3 with standard
deviation 2.8. What percent of
women are taller than the average man, and what percentage of men are taller
than the average woman?
4) Pretend we are manufacturing
fruit snacks, and that the average weight in a package is .92 ounces with
standard deviation 0.05. What
should we label the net weight on the package so that only 5 % of packages are
"underweight"?
5) Suppose that your average
commute time to work is 20 minutes, with standard deviation of 2 minutes. What time should you leave home to
arrive to work on time at 8:00?
(You may have to decide a reasonable value for the chance of being
late.)
Goals: Master
normal calculations. Realize that
summarizing using the normal curve is the ultimate reduction in complexity, but
only applies to data whose distribution is actually
bell-shaped.
Skills:
…
Memorize 68-95-99.7
rule. While we do rely on our technology to calculate areas
under normal curves, it is convenient to have some of the values committed to
memory. These values can be used
as rough guidelines; if precision is required, you should use the TI-83
instead. I will assume you know
these numbers by heart when we encounter the normal numbers again in chapters 5
through 8.
…
Understand that
summarizing with just the mean and standard deviation is a special case. We
have progressed from pictures like histograms and quantile plots to summary
statistics like medians, means, and standard deviations to finally summarizing
an entire list with just two numbers: the mean and the standard deviation. However, this last step in our
summarization only applies to
lists whose distribution resembles the bell-shaped normal curves. If the data's distribution is skewed,
or has any other shape, this level of summarization is insufficient. Also, it is important to realize that
these calculations are only approximations.
…
Interpret a normal
quantile plot. We often want to know if a list of data can be
approximated with a normal curve.
While we might try histograms and quantile plots to see if
they "look
normal", it is a difficult task, because we have to match the shape to the
very special shape of the normal curve.
One simple alternative graphical method is the normal
quantile plot. This
plot is nearly identical to a quantile plot, but instead of graphing the
percentiles, we graph the z-scores. Our TI-83 does this for us; the sixth
icon in the STAT PLOT Type. Be cautious though; the graph, as
usual, is unlabeled. However, we
only care if the graph is nearly a straight line or not.
Reading: Sections 2.1 and 2.2.
Activity: Using the "Arizona Temps" data, plot "Flagstaff
High" versus "Phoenix High".
Then guess what the correlation coefficient might be without using your calculator. Use the sample diagrams on page 126 to guide you.
Finally, using your calculator, calculate the actual value for the correlation
coefficient and compare it to your guess.
Repeat for the variables "Flagstaff High" and "Flagstaff
Low".
Goals: Display
two variables and measure (and interpret) linear association using the
correlation coefficient.
Skills:
…
Plot data with a
scatter plot. This will be as simple as entering two lists of
numbers into your TI-83 and pressing a few buttons, just as for histograms or
box plots. Or, if you are doing
plots by hand you will have to first choose an appropriate axis scale and then
plot the points. You should also
be able to describe overall patterns in scatter diagrams and suggest tentative
models that summarize the main features of the relationship, if
any.
…
Use the TI-83 to
calculate the correlation coefficient.
We will have to use the
regression function STAT CALC LinReg(ax+b) to
calculate correlation, r. First, you will have to have pressed DiagnosticOn. Access
this command through the CATALOG (2nd 0). If you
type ENTER after the STAT CALC
LinReg(ax+b) command, the calculator
assumes your lists are in columns L1and
L2; otherwise you will type where they are,
for example STAT CALC
LinReg(ax+b) L2, L3.
…
Interpret the
correlation coefficient.
You should know the range of the correlation
coefficient (-1 to +1) and what a "typical" diagram looks like for
various values of the correlation coefficient. Again, page 126 is your guide. You should recognize some of the things the correlation
coefficient does not measure, such
as the strength of a non-linear
pattern.
Reading: Section 2.2.
Activity: Outlier effects on
correlation. The dataset we will
explore today has 7 data points.
Plot them and calculate the correlation coefficient.
Add an eighth point in three different places and for each new dataset,
recalculate the correlation coefficient.
Summarize the effect of outliers in a paragraph.
(You may use any extra time today to discuss Presentation 1 in your
groups.) Homework 2
due.
Goals:
Understand
the impact of outliers on correlation.
Skills:
…
Interpret the
correlation coefficient. You should recognize how outliers
influence the magnitude of the correlation coefficient. One simple way to observe the
effects of
outliers is to calculate the correlation coefficient with and without the
outlier in the dataset and compare the two values. If the values vary greatly (this is a judgment call) then
you would say the outlier is "influential".
Reading: Section 2.3.
Activity: Using the Olympic data, fit a
regression line to predict the 2004 and 2008 race
results.
Goals: Practice
using regression with the TI-83.
We want the regression equation, the regression line superimposed on the
plot, the correlation coefficient, and we want to be able to use the line to
predict new values.
Skills:
…
Fit a line to
data. This may be as simple as 'eyeballing' a straight line
to a scatter plot. However, to be
more precise, we will use least squares, STAT CALC LinReg(ax+b) on the TI-83, to calculate the
coefficients, and VARS Statistics
EQ RegEQ to type the equation
in the Y= menu.
You should also be able to sketch a line onto a scatter plot (by hand)
by knowing the regression coefficients.
…
Interpret regression
coefficients. Usually, we want to only interpret slope, and slope is
best understood by examining the units involved, such as inches per year or
miles per gallon, etc. Because
slope can be thought of as "rise" over "run", we are
looking for the ratio of the units involved in our two variables. More precisely, the slope tells us the
change in the response variable for a unit change in the explanatory
variable. We don't
typically bother
interpreting the intercept, as zero is often outside of the range of
experimentation.
…
Estimate/predict new
observations using the regression line.
Once we have calculated a
regression equation, we can use it to predict new responses. The easiest way to use the TI-83 for
this is to TRACE on the regression
line. You may need to use up and
down arrows to toggle back and forth from the plot to the line. You may also just use the equation
itself by multiplying the new x-value
by the slope and adding the intercept.
(This is exactly what TRACE is
doing.)
Reading: Section 2.3.
Activity: Revisit outliers dataset, adding
regression lines. Plot the data
again and calculate the regression line.
Add an eighth point in three different places and for each new dataset,
recalculate the regression line.
Summarize the effect of outliers in a paragraph.
Goals: Practice
using regression with the TI-83.
We want the regression equation, the regression line superimposed on the
plot, the correlation coefficient, and we want to be able to use the line to
predict new values.
Skills:
…
Understand
the limitations
and strengths of linear regression.
Quite simply, linear
regression should only be used with scatter plots that are roughly linear in
nature. That seems obvious. However, there is nothing that prevents
us from calculating the numbers
for any data set we can input into our TI-83's. We have to realize what our data looks like
before we calculate the regression; therefore a scatter plot
is essential. In the presence of outliers and
non-linear patterns, we should avoid drawing conclusions from the fitted
regression line.
Reading: Sections 2.4 and 2.5.
Activity: Correlation/Regression
summary. U. S. population
example. Alternate regression
models. Homework 3
due.
1) For correlation, the variables
can be entered in any order; correlation is a fact about a
pair of variables.
For regression, the order the variables are presented matters. If you reverse the order, you get a
different regression line.
2) We must have
numerical variables to calculate correlation. For categorical variables, we would use
contingency tables, but not in this course.
3) High correlation does not
necessarily mean a straight line scatter plot. U. S. population growth is an example.
4) Correlation is not resistant;
the dataset from Days 9 and 11 showed that the placement of a single point in
the scatter plot can greatly influence the value of the correlation and the
coefficients in the regression equation.
(You may use any extra time today to discuss Presentation 1 in your groups.)
Goals: See
scatter plots and correlation in practice. Understand correlations limitations and
features.
Skills:
…
Recognize the proper
use of correlation, and know how it is abused. Correlation
measures straight line
relationships. Any departures from
that model make the correlation coefficient less reliable as a summary measure.
Just as for the standard deviation and the mean, the correlation coefficient is
affected by outliers. Therefore,
it is extremely important to be aware of data that is unusual. Some two-dimensional outliers are hard
to detect with summary statistics; scatter plots are a must
then.
…
Understand that there
are competing regression models. We have focused our attention
on the linear regression models, but as we see on our TI-83, there
are many other potential models.
If you use an alternate model, be sure to plot the fitted line along
with the scatter plot. Our usual
measure of fit, r2, will not
accurately tell the story.
Reading: Chapters 1 and 2.
Activity: Presentation 1. Graphical and Numerical Summaries,
Regression and Correlation
Gather 3 to 5 variables on at least 20 subjects; the source is irrelevant, but
knowing the data will help you explain its meaning to us. Be sure to have at least one numerical
and at least one categorical variable.
Demonstrate that you can summarize data graphically and numerically.
Also, pick one of the 50 states.
Predict the population in the year 2010 using a regression function (not
necessarily linear though).
Describe how you decided upon your model, and explain how good you think
your prediction is.
Reading: Chapters 1 and 2.
Activity: Exam 1. This first exam will cover graphical summaries (pictures),
numerical summaries (summary calculations), normal curve calculations (areas
under the bell curve), scatter plots, correlation, and regression (two-variable
summaries). Some of the questions
will be multiple choice. Others
will require you to show your worked out solution. Section reviews are an excellent source for studying for the
exams. Don't forget to review your
class notes and these on-line notes.
Reading: Section 3.1.
Activity: History of polls.
to 1936: Selection Bias
Literary Digest calls the 1936 election for Alf Landon (remember him), in an
electoral college landslide. The
poll was performed by sending postcards to people with telephones, magazine
subscribers, car owners, and a few people on lists of registered voters. The first problem was that the sample
was biased toward Republican-leaning voters, who could afford magazine
subscriptions and telephones during the Great Depression. The second problem is the response rate
error: they received only 2.3 million responses, for a 23% response rate. This was compounded by a
volunteer error:
the respondents are most likely people who wanted change, and Roosevelt is
president. Nevertheless, Franklin
Roosevelt, the Democrat, wins in a landslide. Statistician Jessica Utts (Seeing Through Statistics,
Jessica M. Utts, Duxbury Press, Wadsworth Publishing Co., 1996 pages 65-66),
who examined issues of the Literary Digest from 1936, says "They were very
cocky about George Gallup predicting they would get it wrong. [Gallup helped make his reputation in
polling by correctly calling the race.]
The beauty of something like that is that the winner is eventually
known." [Taken from: http://whyfiles.org/009poll/fiasco.html]
to 1948: Quota Sampling
Aided by erroneous polling, newspapers prematurely call the presidential
election for challenger Thomas Dewey, leading to the famous photograph of an
elatedly re-elected Harry Truman.
The problematic polls use then-popular
"quota-sampling" techniques. In other words, they sought out a
certain number of men, a certain number of women, and similarly for blacks,
whites, and various income levels.
According to statistician Fritz Scheuren, quota sampling in political
polling was abandoned after this debacle in favor of random sampling of the
population. [Taken from: http://whyfiles.org/009poll/fiasco.html]
to present: Random Sampling
I will give you the numbers and we can see how the polls have improved since
1936, and if there are still any biases towards either Republicans or
Democrats.
Goals: Introduce
sampling. Identify biases. Explore why
non-random samples are not trustworthy.
Skills:
…
Understand the issues
of bias. We seek representative samples. The "easy" ways of sampling,
samples of convenience and voluntary response samples, may or may not produce
good samples, and because we don't know the chances of subjects being in such
samples, they are unreliable sampling methods. Even when probability methods are used, biases can spoil the
results. Avoiding bias is our
chief concern in designing surveys.
…
Huge samples are not
necessary. One popular misconception about sampling is that if
the population is large, then we need a proportionately large sample. This is just not so. My favorite counter-example is our
method of tasting soup. To find
out if soup tastes acceptable, we mix it up, then sample from it with a
spoon. It doesn't matter to us
whether it is a small bowl of soup, or a huge vat, we still use only a
spoonful. The situation is the
same for statistical sampling; we use a small "spoon", or
sample. The fundamental
requirement though is that the "soup" (our population) is "well
mixed" (as in a simple random sample - see Day 17).
Reading: Section 3.1.
Activity: Lurking variables exercises. I have a set of problems from
Statistics, by Freedman, Pisani, and Purves, which I think will
help us think about looking for alternative explanations than the proffered
"attractive" conclusion.
Work on each problem for 5 minutes; we will discuss each of them in
detail before the class is over.
Be prepared to defend your explanation.
Goals: Explore
experimentation ideas. Discover
potential lurking variables and ways to control for them.
Skills:
…
Examine situations
and detect lurking variables.
When searching for lurking variables, it is not enough
to suggest variables that might also explain the response variable. Your potential "lurking"
variable must also be associated with the explanatory variable. So, for example, suppose you are trying
to explain height using weight. A
possible lurking variable might be age, because age tends to be associated with
weight and height. On the other hand, a variable
associated with height that is unlikely to be related to weight (and therefore
would not be a lurking variable)
is arm span.
…
Know appropriate ways
to attempt to control for lurking variables. Once a potential
lurking variable has been identified, we can make more appropriate comparisons
by dividing our subjects into smaller, more homogeneous groups and
then making the comparison.
…
Understand that
experimentation, done properly, will allow us to establish cause-and-effect
relationships. Observational studies have lurking variables; we can
try to control for them by various methods, but we cannot eliminate them. If the data is collected appropriately
through good experimentation, however, the effects of lurking variables can be
eliminated. This is done through
randomization, the thinking being that if a sample is large enough, it can't
realistically be the case that all of one group contains all the large values of a lurking variable, for
example.
Reading: Section 3.2.
Activity: Creating random samples. We will use three methods of
sampling today:
dice, Table B in our book, and our calculator. To make the problem feasible, we will only use a population
of size 6. (I know this is
unrealistic in practice, but the point today is to see how randomness works,
and hopefully trust that the results extend to larger problems.) Pretend that the items in our
population (perhaps they are people) are labeled 1 through 6. For each of our methods, you will have
to decide in your group what to do with "ties". Keep in mind the goal of simple random
sampling: at each stage, each remaining item has an equal chance to be the next
item selected.
By rolling dice, generate a sample of three people. (Let the number on the die correspond to one of the
items.) Repeat 20 times, giving 20
samples of size 3.
Using Table B, starting at any haphazard location, select three people. (Let
the random digit correspond to one of the items.) Repeat 20 times, giving 20 more samples of size 3.
Using your TI-83, select three people. select three people. The TI-83 command MATH PRB
randInt( 2, 4, 5 ) will produce 5 numbers
between 2 and 4, inclusive, for example.
(If you leave off the third number, only one value will be
generated.) Repeat 20 times,
giving 20 more samples of size 3.
Your group should have drawn 60 samples at the end. Keep careful track of which samples you selected; record
your results in order, as 125 or 256, for example. (125 would mean persons 1, 2, and 5 were selected.) We will pool the results of everyone's
work together on the board.
Goals: Gain
practice taking random samples.
Understand what a simple random sample is. Become familiar
with MATH randInt. Accept that calculator is
random.
Skills:
…
Know the definition
of a Simple Random Sample (SRS). Simple Random Samples can be defined in two ways:
1) An SRS is a sample where, at
each stage, each item has an equal chance to be the next item selected.
2) A scheme were every possible
sample has an equal chance to be the
sample results in an SRS.
…
Select an SRS from a
list of items. The TI-83 command MATH randInt will select numbered items from a list randomly. If a number selected is already in the
list, ignore that number and get a new one. Remember, as long as each remaining item is equally likely to be chosen as the next item,
you have drawn an SRS.
…
Understand the real
world uses of SRS's.
In practice, simple random samples are not that
common. It is just too impractical
(or impossible) to have a list of the entire population available. However, the idea of simple random
sampling is essentially the foundation for all the other types of
sampling. In that sense then it is
very common.
Reading: Sections 3.3 and 3.4.
Activity: Alternate Sampling Schemes.
Using a small population, we will explore alternate sampling schemes. For each of the methods below, use
simple random sampling as appropriate to draw a more complicated random
sample. Then, for your sample,
find the average and the standard deviation of the variable of interest. Use your average to guess the total for
the entire population. Keep track
of your results, as we will pool the class results to see the effects of the
different methods.
Our population today is the 50 United States. Our variable of interest is the Land Area.
Simple Random Sampling: Randomly select 10 states.
Systematic Sampling: Using the alphabetically sorted list,
choose a random number between 1 and 5.
Then select every 10th item after that. In general, you would use a random
number between 1 and N/n, and then select every nth item.
Stratified Sampling: From the large states (Alaska, Texas,
California, Montana, New Mexico, Arizona, Nevada, Colorado, Wyoming, and
Oregon), randomly select 4 states.
From the small states (Rhode Island, Delaware, Connecticut, Hawaii, New
Jersey, Massachusetts, New Hampshire, Vermont, and Maryland), randomly select 1
state. From the remaining 31
states, randomly select 5 states.
To guess the total for the entire U. S., guess the totals for the three strata
separately. That is, for the small
states, multiply your average for them by 9. For the large states, multiply by 10. For the other states,
multiply by 31.
Cluster Sampling: Using the following breakdown of the
states (North, Southwest, Central, Southeast, and Northeast), randomly select
two regions. From each region,
randomly select 5 states.
North: Alaska, Washington, Oregon, Nevada, Idaho, Montana, Wyoming,
N. Dakota, S. Dakota, Nebraska, and Minnesota.
Southwest: Hawaii, California, Nevada, Utah,
Arizona, Colorado, New Mexico, Kansas, Oklahoma, and Texas.
Central: Iowa, Missouri, Arkansas, Wisconsin, Michigan, Illinois,
Indiana, Ohio, Kentucky, and Tennessee.
Southeast: Louisiana, Mississippi, Alabama,
Georgia, Florida, S. Carolina, N. Carolina, Virginia, W. Virginia, and
Maryland.
Northeast: Pennsylvania, Delaware, New Jersey, New
York, Vermont, New Hampshire, Massachusetts, Connecticut, Rhode Island, and
Maine.
For our entire class, which method produced the most reliable results? I will summarize the benefits and
drawbacks of each method, despite whether we actually saw these effects in our
simulation today.
Homework 4 due.
Goals:
Understand
the differences (good and bad) between various sampling
schemes.
Skills:
…
Systematic
Sampling. In
systematic sampling, we decide on a fraction of the population we would like to
sample, and then randomly select a number between 1 and n, the sample size. Then from the list of items, we take every
N/nth
item. This scheme works best when
we have a list of items, and it is simple to select items periodically from
that list. For example, we may
have decided to take very 20th name, and there are 20 names per
page. It would be thus very
convenient to take the 11th item from each page, for
example.
…
Stratified
Sampling.
In stratified
random sampling, we draw a simple random sample from several sub-populations,
called strata. The sample size may
differ by stratum; in fact, this leads to the most efficient use of stratified
sampling. This scheme works best
when there is variability between strata.
For example, one stratum may be more homogeneous than another
stratum. Then, just a few items
are needed from the stratum that is homogenous, since all the items in that
stratum are basically the same.
Alternatively, from a stratum that is quite diverse, you should sample
more heavily. In our example in
class today, the large states are quite different from one another in areas, so
we took 40 % of them. The small
states are very similar in area, so we took only 1 (10 % of them). The remaining 31 states are in between,
so we sampled 16 % of them.
…
Cluster
Sampling. In cluster sampling, we select several groups of
items. Within each selected item,
we then repeat by selecting from several smaller groups. This process continues until we have
our ultimate sampled units. This
scheme works best when we do not have a list of all the items in a population,
but we have lists of groups of
items, for example states or counties.
One drawback of this method is that the ultimate clusters are generally
quite similar (all people living on a block in a town are generally more
similar than different), so the effective sample size is lower than it
appears. To compensate, cluster
samples typically have larger sample sizes.
Reading: Section 4.1.
Activity: What is Randomness? Our notions of probability theory are
based on the "long run", but our everyday lives are dominated by
"short runs". Today we
will look at some everyday sequences to see if they exhibit this behavior.
Coin experiment 1: Write down a
sequence of H's and T's representing head and tails, pretending you are
flipping a coin. Then flip a real
coin 50 times and record these 50 H's and T's. Without knowing which list is which, in most cases I will be
able to identify your real coin.
Baseball players: In sports you
often hear about the "hot hand". We will pick a player, look at his last 20 games, and see if
flipping a coin will produce a simulation that resembles his real
performance. Then we will examine
whether we could pick out the simulation without knowing which was which.
Coin experiment 2: Spin a penny on
a flat surface, instead of tossing it into the air. Record the percentage of heads.
Coin experiment 3: Balance a
nickel on its edge on a flat surface.
Jolt the surface enough so that the nickel falls over, and record the
percentage of heads.
Goals: Observe
some real sequences of random experiments. Develop an intuition about
variability.
Skills:
…
Recognize the feature
of randomness. Random does not mean haphazard, or without
pattern. We cannot predict what
will happen on a single toss of a coin, but we can predict what will happen in 1,000 tosses of a
coin. This is the hallmark of a
random process: uncertainty in a small number of trials, but a predictable
pattern in a large number of trials.
…
Resist the urge to
jump to conclusions with small samples.
Typically our daily activities
do not involve large samples of
observations. Therefore our ideas
of "long run" probability theory are not applicable. You need to develop some intuition
about when to believe an observed experiment, and when to doubt the
results. We will hone this
intuition as we develop our upcoming inference methods. For now, understand that you may be
jumping to conclusions by just believing the simulation's observed value.
Reading: Section 4.2.
Activity: Sample Spaces, Simulation.
Using either complete sampling spaces (theory) or simulation, find (or
estimate) these chances:
1) Roll two dice, one colored, one
white. Find the chance of the
colored die being less than the white die.
2) Roll three dice and find the
chance that the largest of the three dice is a 6. (Ignore ties; that is, the largest value when 6, 6, 4 is
rolled is 6.)
3) Roll three dice and
find the chance
of getting a sum of less than 8.
Goals: Create
sample spaces. Use simulation to
estimate probabilities.
Skills:
…
List simple sample
spaces. Flipping coins and rolling dice are common events to
us, and listing the possible outcomes lets us explore probability
distributions. We will not delve
deeply into probability rules; rather, we are more interested in the ideas of
probability and I think the best way to accomplish this is by
example.
…
Know the probability
rules and how to use them. We have three rules we use primarily:
the complement rule, the addition rule for disjoint events, and the
multiplication rule for independent events. The complement rule is used when the calculation of the
complement turns out to be simpler than the even itself. For example, the complement of "at
least one" is "none", which is a simpler event to describe. The addition rule for disjoint events
is used when we are asking about the chances of one event or another occurring. If the two events are disjoint (they have no elements in
common) we can find the chance of their union (one event or the other) by
adding their individual probabilities.
The multiplication rule for independent events is used when we have a
question about the intersection of two sets, which can be phrased as a question
about two events occurring simultaneously. Both sets
occurring at once can be phrased as one event and the other event occurring, so the multiplication rule
is used for "and" statements.
…
Simulation can be
used to estimate probabilities, but only for a very large number of
trials. If the number of repetitions of an experiment is
large, then the resulting observed frequency of success can be used as an
estimate of the true unknown probability of success. However, a "large" enough number of repetitions
may be more than we can reasonably perform. For example, for problem 1 today, a sample of 100 will give
results between 32/100 and 51/100 (.32 to .51) 95% of the time. That may not be good enough for our
purposes. Even with 500, the range
is 187/500 to 230/500 (.374 to .460).
Eventually the answers will converge to a useful percentage; the
question is how soon that will occur.
We will have answers to that question after Section 8.1.
Reading: Section 4.3.
Activity: Continue coins and dice. Introduce Random Variables and
Probability Histograms. We will
finish up the problems from Day 20, and also examine Pascal's triangle, which
is a way of figuring binomial probabilities (chances on fair coins). Also in our tables, we will include
random variables.
Goals:
Understand
that variables may have values that are not equally likely.
Skills:
…
Understand sampling
distributions and how to create simple ones. We have listed
sample spaces of equally likely events, like dice and coins. Events can further
be grouped together and assigned values.
These new groups of events may not be equally likely, but as long as the
rules of probability still hold, we have valid probability distributions. Pascal's Triangle is one such example,
though you should realize that it applies only to fair coins. We will work with "unfair
coins" (proportions) later, in Chapter 8. Historical note: examining these sampling distributions led
to the discovery of the normal curve in the early 1700's. We will copy their work and
"discover" the normal curve for ourselves too using dice.
Reading: Section 4.4.
Activity: Means and Variances Rules. Using the frequency option on
STAT CALC 1-Var
Stats to calculate mx and sx. Simulating data to see the
rules in action.
While one could simply memorize these rules, I think it might be more
instructive to simulate some data and see the rules at work. So, we
are going to reproduce some data very much like the Example 4.27 data on page
303. Then we will
"tinker" with the parameters and see how things change.
To start with, generate some x-values
in L1:
MATH
PRB randNorm( mx, sx, 300
) -> L1.
(Use the values in the problem for mx, sx, my, sy, and r.)
You might think we can use a similar
command to generate some y-values
in L2.
However, this would ignore the correlation in the two variables. To account for this, we must
"borrow" some results from regression. The next two commands will put "errors"
in L3 and y-values
in L2. Trust
me, it works.
MATH
PRB randNorm( 0, sy
* à( 1 - r2 ), 300 ) -> L3
my – r * sy / sx ( mx - L1 ) + L3
-> L2
Plot L1vs
L2 to verify that the data does indeed have a correlation
of r.
Calculate the means and standard deviations to see that your simulation
is close to the assumed values: STAT CALC 1-Var Stats L1 and STAT CALC 1-Var Stats L2. (You
can also do STAT CALC LinReg(ax+b) to get
the correlation coefficient.)
Now let's see how the rules work by doing the sum and the difference of
the two "SAT scores": L1 + L2 -> L4 and L1 -
L2 -> L5. Check
to see if these simulations agree with the theoretical results by finding the
means and standard deviations of L4 and L5: STAT CALC 1-Var Stats L4 and STAT CALC 1-Var Stats L5
Now try this again using a different
value for r. (In
particular, see what happens when r =
0. This is the case for
independence.)
Goals: Explore
the rules we have for means and variances using one particular simulation of
normal data. Use the TI-83 to
calculate the mean and variance for a discrete
distribution.
Skills:
…
Know how to use the
TI-83 to calculate the mean and variance for a discrete distribution. By
including a variable of weights or frequencies, the TI-83 will
calculate mx and sx
for a discrete distribution. The
syntax is STAT CALC 1-Var Stats L1, L2,
where the x-values are
entered in L1 and the weights (probabilities expressed as counts)
are entered in L2.
…
Understand the rules
for sums and differences and linear combinations or random
variables. Through
simulation, you should have an intuitive feel for why the correlation has so
much to do with the variance of a sum or difference. In particular, with higher correlations, the variance of the
sum increases, while the variance of a difference decreases. With no correlation, the variance of
the sum is the same as the variance of the difference. Also you should be able to see quite
easily why the linear combination rules work.
Reading: Section 4.5.
Activity: Constructing
probability trees. Demonstrating
Bayes' formula with the rare disease problem. Homework 5 due.
Consider a card trick where two cards are drawn sequentially off the top of a
shuffled deck. (There are 52 cards
in a deck, 4 suits of 13 ranks.)
We want to calculate the chance of getting hearts on the first draw, on
the second draw, and on both draws.
We will organize our thoughts into a tree diagram, much like water
flowing in a pipe. On each branch,
the label will be the probability of taking that branch; thus at each node, the
exiting probabilities (conditional probabilities) add to one.
On the far right of the tree, we will have the intersection events. Their probability is found by
multiplying.
Calculate the chances of:
1) Drawing a heart on the first
card.
2) Drawing a heart on the second
card.
3) Drawing at least one heart.
4) Drawing two hearts.
5) Drawing a heart on the second
draw given that a heart was drawn first.
6) Drawing a heart on the first
draw given that a heart was drawn first.
Now we will do this work for the rare disease problem.
Goals: Be able
to express probability calculations as tree diagrams. Be able to reverse the events in a probability tree, which
is what Bayes' formula is about.
Skills:
…
Know how to use the
multiplication rule in a probability tree. Each branch of a
probability tree is labeled with the conditional probability for
that branch. To
calculate the joint probability of a series of branches, we multiply the
conditional probabilities together.
Note that at each branching in a tree, the (conditional) probabilities
add to one, and that overall, the joint probabilities add to
one.
…
Recognize conditional
probability in English statements.
Sometimes the key word is
"given". Other times the
conditional phrase has "if".
But sometimes the fact that a statement is conditional is
disguised. For example: "Assuming John buys the insurance,
what is the chance he will come out ahead" is equivalent to "If John
buys insurance, what is the chance he will come out
ahead".
…
Be able to use the
conditional probability formula to reverse the events in a probability
tree. The key here is the symmetry of the events in the
conditional probability formula.
We exchange the roles of A and B, and tie them together with our formula
for Pr(A and B). This reversal is
the essence of Bayes' formula.
…
Know the definition
of independence. Independence is a fact about probability, not about
sets. Contrast this to
"disjoint" which is a property of sets. In
particular, independent events are by definition not disjoint.
Independence is important later as an assumption as it allows us to
multiply individual probabilities together without having to worry about
conditional probability.
Reading: Section 5.1.
Activity: Coin flipping is a good way to
understand randomness, but because most coins have a probability of heads very
close to 50 %, we don't get the true flavor of the binomial distribution. Today we will simulate the flipping of
an unfair coin; that is, a binomial process with probability not
equal to 50 %.
Experiment 1: Our unfair
"coin" will be a die, and we will call getting a 6 a success. Roll your die 10 times and record how
many sixes you got. Repeat this
process 10 times each. Your group
should have 40 to 50 trials of 10 die rolls. Pool your results and enter the data into a list on your
calculator. We want to see the
histogram (be sure to make the box width reasonable) and calculate the summary
statistics, in particular the mean and variance. Also produce a quantile plot. Compare the simulated results with theory.
Experiment 2: Your calculator will
generate binomial random variables for you, but it is not as illuminative as
actually producing the raw data yourself.
Still, we can see the way the probability histogram looks (if we
generate enough cases; this is an application of the law of large
numbers). I suggest 100 at a
minimum. Again be sure to make
your histogram have an appropriate box width. The command is MATH PRB randBin( n,
p, r
), where n is the sample size, p is
the probability of success, and r is the number of times to repeat the experiment.
Goals: Become
familiar with the binomial distribution and its
applications.
Skills:
…
Know the four
assumptions. The binomial distribution requires four
assumptions: 1) There must be only two outcomes. 2) Trials must be independent of one another. 3) There must be a fixed sample size, chosen ahead of time and
not determined while the experiment is running. 4) The
probability of success is the same from trial to trial. Number 2 is the most difficult to
check, so it is usually simply assumed.
Number 4 rules out finite populations, where the success probability
depends on what is left in the pool.
Number 3 rules out situations where the experiment continues until some
event happens.
…
Know how to calculate
binomial probabilities. Using DISTR binompdf(
n,
p, x
), we can calculate the chance of getting
exactly x
successes in n trials with constant
probability p. Using DISTR binomcdf(
n,
p, x
), we can calculate the chance of getting
at most x
successes in n trials with constant
probability p. The second command adds up repeated
results of the first command, starting at x and working down to zero.
Because binomcdf only
calculates x successes or less, when we want to know
about y successes or more, we must use the complement rule of
probability. To find the chance of
y successes or more, use y
– 1 successes or less. For example, the chance of 70 or more successes is found
this way: Pr(X Ñ 70)
= 1 - Pr(X æ 69).
…
Mean and
Variance. With
algebra, we can show that the mean of a binomial random variable is
n p, and the variance is n p (1 - p). You should memorize these two facts,
but don't worry about how they are proven.
…
Continuity
correction. We can use the normal curve to approximate binomial
probabilities. But because the
binomial is a discrete distribution and the normal is continuous, we need to
adjust our endpoints by ½ unit.
I recommend drawing a diagram with rectangles to see which way the
½ unit goes. For example,
calculating the chance of getting 40 to 50 heads inclusive on 100 coin flips
(perhaps a bent coin) entails using 39.5 to 50.5.
Reading: Section 5.2.
Activity: Central Limit Theorem
exploration.
In addition to coins and dice, MATH PRB rand on your calculator is another good random mechanism
for exploring "sampling distributions". These examples will give you some different views of
sampling distributions. The
important idea is that each time an experiment is performed, a potentially
different result occurs. How these
results vary from sample to sample is what we seek. You are going to produce many samples, and will
therefore see how these values vary.
1) Sums of two items: Each of you in your group will roll two
dice. Record the sum on the
dice. Repeat this 30 times,
generating 30 sums. Make a
histogram and a QUANTILE plot of your 30
sums. Compare to the graphs of the
other members in your group, particularly noting the shape. Sketch the graphs you made and compare
to the theoretical results.
2) Sums of 4 items: Each of you generate 4 random numbers
on your calculator, add them together, average, and record the result; repeat
30 times. The full
command is: seq ( rand +
rand + rand + rand, X, 1, 30 ) / 4 -> L1,
which will generate 30 four-sum average random numbers and store them
in L1.) Again,
make a graph of the distribution.
(seq is found in the
LIST
OPS menu.)
3) Sums of 12 items: Each of you generate 12
random normal numbers on your calculator using MATH PRB
randNorm( 65, 5, 12). Add them together and record the
result; repeat 30 times. The full
command is: seq (sum ( randNorm( 65, 5, 12 ) ), X, 1, 30 ) -> L2.) Again,
make a graph of the distribution.
(This is problem 5.59 in our text.)
For all the lists you generated, calculate the standard deviation and the
mean. We will find these two
statistics to be immensely important in our upcoming discussions about
inference. It turns out that these
means and standard deviations can be found through formulas instead of having
to actually generate repeated samples.
These means depend only on the mean and standard deviation of the
original population (the dice or rand
or randNorm in this case) and the number of times the dice were
rolled or rand was pressed
(called the sample
size, denoted n).
Goals: Examine
histograms or quantile plots to see that averages are less variable than
individual measurements. Also, the
shape of these curves should get closer to the shape of the normal
curve as n increases.
Skills:
…
Understand
the concept
of sampling variability.
Results vary from sample to sample. This idea is sampling
variability. We are
very much interested in knowing what the likely values of a statistic are, so
we focus our energies on describing the sampling distributions. In today's exercise, you simulated
samples, and calculated the variability of your results. In practice, we only do one sample, but
calculate the variability with a formula.
In practice, we also have the Central Limit Theorem, which lets us use
the normal curve in many situations to calculate probabilities.
Reading: Section 5.2.
Activity: Practice Central Limit Theorem
(CLT) problems. We will have
examples of non-normal data and normal data to contrast the diverse cases where
the CLT applies. Homework 6
due.
1) People staying at a certain
convention hotel have a mean weight of 180 pounds with standard deviation
35. The elevator in the hotel can
hold 20 people. How much weight
will it have to handle in most cases?
Do we need to assume weights of people are normally distributed?
2) Customers at a large grocery
store require on average 3 minutes to check out at the cashier, with standard
deviation 2. Because checkout time
cannot be negative, they are obviously not normally distributed. Can we calculate the chance that 85
customers will be handled in a four hour shift? If so, calculate the chance; if not, what else do you need
to know?
3) Suppose the number of
hurricanes in a season has mean 6 and standard deviation à6. What is the chance that in 30 years
there have been fewer than 160 hurricanes?
4) The number of boys in a 4 child
family can be modeled reasonably well with the binomial distribution. If five such families live on the same
street, what is the chance that the total number of boys is 12 or more?
Goals: Use
normal curve with the CLT.
Skills:
…
Recognize how to use
the CLT to answer probability questions concerning sums and averages. The
CLT says that for large sample sizes, the distribution of the sample average is
approximately normal, even though the original data in a problem may not be
normal.
…
For small samples, we
can only use the normal curve if the actual distribution of the original data
is normally distributed.
It is important to realize when original data is not
normal, because there is a tendency to use the CLT even for small sample sizes,
and this is inappropriate. When
the CLT does apply, though, we are
armed with a valuable tool that allows us to estimate probabilities concerning
averages. A particular example is
when the data is a count that must
be an integer, and there are only a few possible values, such as the number of
kids in a family. Here the normal
curve wouldn't help you calculate chances of a family having 3 kids.
However, we could calculate quite accurately the number of
kids in 100 such families.
Reading: Chapters 3 to 5.
Activity: Catch up day and Review of
Probability.
Reading: Chapters 3 to 5.
Activity: Presentation 2. Sampling and Probability (Chapters 3
and 4).
Sample 20 students from UWO. For
each student, record the number of credits they are taking this semester, what
year they are in school, and whether or not they are graduating this
semester. Try to make your sample as
representative as you can. You
must have a probability sample to get full credit. Discuss the biases your sample has and what you did to avoid
bias.
Your sample has a particular number of men. Using whatever resources you feel appropriate, calculate and
present to us the chance that your sample had the number of males it had. Be sure you can justify your
assumptions.
Reading: Chapters 3 to 5.
Activity: Exam 2. This second exam is on sampling, experiments, and
probability, including sampling distributions. Some of the questions will be multiple choice. Others will require you to show your
worked out solution. Section
reviews are an excellent source for studying for the exams. Don't forget to review your class notes
and these on-line notes.
Activity: Guess m&m's percentage. What fraction of m&m's are blue or
green? Is it 25 %? 33 %? 50 %? We take
samples to find out.
Each of you will sample from my jar of m&m's, and you will all calculate
your own confidence interval. Of
course, not everyone will be correct, and in fact, some of us will have
"lousy" samples. But
that is the point of the confidence coefficient, as we will see when we jointly
interpret our results.
It has been my experience that confidence intervals are easier to understand if
we talk about sample proportions instead of sample averages. Thus I will use techniques from Chapter
8. Each of you will have a
different sample size and a different number of successes. In this case the sample size,
n, is the total number of m&m's you have selected,
and the number of successes, x, is
the total number of blue or green m&m's in your sample. Your guess is simply the
ratio x/n, or
the sample proportion. We call this estimate
p-hat or .
Use STAT TEST
1-PropZInt with 70 % confidence for your
interval here today.
When you have calculated your confidence interval, record your result on the
board for all to see. We will
jointly inspect these confidence intervals and observe just how many are
"correct" and how many are "incorrect". The percentage of correct
intervals should match our chosen level of confidence. This is in fact what is meant by
confidence.
Goals: Introduce
statistical inference - Guessing the parameter. Construct and interpret a confidence
interval.
Skills:
…
Understand how to
interpret confidence intervals. The calculation of a confidence interval is quite
mechanical. In fact, as we have
seen, our calculators do all the work for us. Our job is then not so much to calculate confidence intervals as it is to be able to
understand when one should be used
and how best to interpret one.
…
Know what the
confidence level measures. We would like to always make correct
statements, but in light of sampling variability we know this is
impossible. As a compromise, we
use methods that work most of the
time. The proportion of times our
methods work is expressed as a confidence coefficient. Thus, a
95 % confidence interval method produces correct statements 95 % of the
time. (By "correct
statement" we mean one where the true unknown value is contained in our
interval, that is, it is between our two numbers.)
Reading: Section 6.1. (Skip Bootstrap. Skip Choosing the Sample Size.)
Activity: Changing confidence levels and
sample sizes.
Today we will explore how changing confidence levels and sample sizes influence
CI's. Complete the following
table, filling in the confidence interval width in the body of the table. Use STAT TEST 1-PropZInt but in
each case make x close
to 50 % of n. (The
calculator will not let you use non-integers for x; round off if needed.)
Confidence Level
============> Sample Size |
70
% |
90
% |
95
% |
99
% |
99.9
% |
10 |
|
|
|
|
|
20 |
|
|
|
|
|
50 |
|
|
|
|
|
100 |
|
|
|
|
|
1000 |
|
|
|
|
|
We will try to make sense of this chart, keeping in mind the meaning of
confidence level, and the desire to have narrow intervals.
Now repeat the above table using STAT TEST ZInterval, with s =
15 and = 100.
Confidence Level
============> Sample Size |
70
% |
90
% |
95
% |
99
% |
99.9
% |
10 |
|
|
|
|
|
20 |
|
|
|
|
|
50 |
|
|
|
|
|
100 |
|
|
|
|
|
1000 |
|
|
|
|
|
Goals: See how
the TI-83 calculates our CI's.
Interpret
the effect of differing confidence coefficients and sample
sizes.
Skills:
…
Understand the
factors that make confidence intervals believable guesses for the
parameter. The two chief factors that make our confidence
intervals believable are the sample size and the confidence coefficient. The key result is larger confidence
makes wider intervals, and larger sample size makes narrower
intervals.
…
Know the details of
the Z Interval. When we know the population standard
deviation, s, our method for guessing the true value of
the mean, m, is to use a z confidence interval. This
technique is unrealistic in that you must know the true population standard
deviation. In practice, we will
estimate this value with the sample standard deviation, s, but a different technique is appropriate (See Day
35).
Reading: Sections 6.2 and 6.4.
Activity: Argument by contradiction. Scientific method. Type I and Type II error diagram. Courtroom terminology. Homework 7 due.
Some terminology:
Null hypothesis. A statement about a parameter. The null hypothesis is
always an equality or a single claim (like two variables are
independent). We assume the null
hypothesis is true in our following calculations, so it is important that the
null be a specific value or fact that can be assumed.
Alternative hypothesis.
The alternative hypothesis is a statement that we will
believe if the null hypothesis is rejected. The alternative does not have to be the complement of the
null hypothesis. It just has to be
some other statement. It
can be an inequality, and usually is.
One- and Two-Tailed Tests.
A one-tailed test is one where the alternative
hypothesis is in only one direction, like "the mean is
less than 10".
A two-tailed test is one where the alternative hypothesis is
indiscriminate about direction, like "the mean is not
equal to 10".
When a researcher has an agenda in mind, he will usually choose a
one-tailed test. When a researcher
is unsure of the situation, a two-tailed test is appropriate.
Rejection rule. To decide between two competing hypotheses, we create
a rejection rule. It's usually as
simple as "Reject the null hypothesis if the sample mean is greater than
10. Otherwise fail to
reject." We always want to
phrase our answer as "reject the null hypothesis" or "fail to
reject the null hypothesis".
We never want to say "accept the null hypothesis". The reasoning is this: Rejecting the null hypothesis means the
data have contradicted the assumptions we've made (assuming the null hypothesis
was correct); failing to reject the null hypothesis doesn't mean we've proven
the null hypothesis is true, but rather that we haven't seen anything to doubt
the claim yet. It
could be the case that we just haven't taken a large enough
sample yet.
Type I Error. When we reject the null hypothesis when
it is in fact true, we have made a Type I error. We have made a conscious decision to treat this error as a
more important error, so we construct our rejection rule to make this error
rare.
Type II Error. When we fail to reject the null
hypothesis, and in fact the alternative hypothesis is the true one, we have
made a Type II error. Because we
construct our rejection rule to control the Type I error rate, the Type II
error rate is not really under our control; it is more a function of the
particular test we have chosen.
The one aspect we can
control is the sample size.
Generally, larger sample make the chance of making a Type II error
smaller.
Significance level, or size of the test.
The probability of making a
Type I error is the significance level.
We also call it the size of the test, and we use the symbol a to represent it. Because we want the Type I error to be rare, we usually will
set a to be a small number, like .05 or .01 or even
smaller. Clearly smaller is
better, but the drawback is that the smaller a is,
the larger the Type II error becomes.
P-value. There are two definitions for the P-value. Definition 1: The P-value is the alpha level that will cause us
to just reject our observed data. Definition 2:
The P-value is the chance of seeing data as extreme or more extreme than
the data actually observed. Using
either definition, we calculate the P-value as an area under a tail in a
distribution. Caution: the P-value
calculation will depend on whether we have a one- or a two-tailed test.
Power. The power of a test is the probability of rejecting
the null hypothesis when the alternative hypothesis is true. We are calculating the chance of making
a correct decision. Because the
alternative hypothesis is usually not an equality statement, it is more
appropriate to say that power is a function rather than just a single value.
We will examine these ideas using the z-test. The TI-83 command
is STAT
TEST ZTest. The command gives you a menu of items to input. It assumes your null hypothesis is a
statement about a mean m. you
must tell the assumed null value, m0,
the alternative claim, either two-sided, or one of the one-sided choices. You also need to tell the calculator
how your information has been stored, either as a list of raw DATA
or as summary STATS. If you choose CALCULATE the machine will simply display the test statistic and
the P-value. If you
choose DRAW, the calculator will graph the P-value
calculation for
you. You should experiment to see
which way you prefer.
Goals: Introduce
statistical inference - Hypothesis testing.
Skills:
…
Recognize the two
types of errors we make.
If we decide to reject a null hypothesis, we might be
making a Type I error. If we fail
to reject the null hypothesis, we might be making a Type II error. If it turns out that the null
hypothesis is true, and we reject it because our data looked weird, then we
have made a Type I error.
Statisticians have agreed to control this type of error at a specific
percentage, usually 5%. On the
other hand, if the alternative hypothesis is true, and we
fail to reject the null hypothesis, we have also made a
mistake. This second type of error
is generally not controlled by us;
the sample size is the determining factor here.
…
Understand why one
error is considered a more serious error.
Because we control the
frequency of a Type I error, we feel confident that when we reject the null
hypothesis, we have made the right decision. This is how the scientific method works; researchers usually
set up an experiment so that the conclusion they would like to make is the
alternative hypothesis. Then if
the null hypothesis (usually the opposite of what they are trying to show) is
rejected, there is some confidence in the conclusion. On the other hand, if we fail to reject the null hypothesis, the most useful
conclusion is that we didn't have a large enough sample size to detect a real
difference. We aren't really
saying we are confident the null hypothesis is a true statement; rather we are
saying it could be true. Because we cannot control the frequency
of this error, it is a less confident statement.
Reading: Section 6.2.
Activity: Practice problems on hypothesis
testing. Introduce
z-test as an example.
Goals: Practice
contradiction reasoning, the basis of the scientific
method.
Skills:
…
Become familiar with
"argument by contradiction".
When researchers are trying to
"prove" a treatment is better or that their hypothesized mean is the
right one, they will usually choose to assume the opposite as the null
hypothesis. For election polls,
they assume the candidate has 50% of the vote, and hope to show that is an
incorrect statement. For showing
that a local population differs from, say, a national population, they will
typically assume the national average applies to the local population, again
with the hope of rejecting that assumption. In all cases, we formulate the hypotheses
before collecting data; therefore, you will never see a
sample average in either a null or alternative
hypothesis.
…
Understand why we
reject the null hypothesis for small p-values. The p-value is the
probability of seeing a sample result "worse" than the one we
actually saw. In this sense,
"worse" means even more evidence against the null hypothesis; more
evidence favoring the alternative hypothesis. If this probability is small, it means either we have
observed a rare event, or that we have made an incorrect assumption, namely the
null hypothesis. Statisticians and
practitioners have agreed that 5% is a reasonable cutoff between a result that
contradicts the null hypothesis and a result that could be argued to be in
agreement with the null hypothesis.
Thus, we reject our claim only when the p-value is a small
enough number.
Reading: Section 6.2.
Activity: Testing Simulation. In this experiment, you will work in
pairs and generate data for your partner to analyze. Your partner will come up with a conclusion (either reject
the null hypothesis or fail to reject the null hypothesis) and you will let
them know if they made the right decision or not. Keep careful track of the success rates.
For each of these simulations, let the null hypothesis mean be 20,
n = 10, and s = 5. You will let m change for each replication.
1) Without your
partner knowing, choose either 16, 18, 20, 22, or 24 for m. Then
use your calculator and generate 10 observations. Use MATH PRB randNorm( M, 5, 10 ) -> L1 where M is the value of
m you chose for this replication. Clear the screen (so your partner can't
see what you did) and give them the calculator. They will perform a hypothesis test using the .05
significance level and tell you their decision.
2) Repeat step 1 until you have
each done at least 10 hypothesis tests; it is not necessary to have each value
of m exactly twice, but try to do each one at least
once. Do m = 20 at least twice each. (We need more cases for 20 because we're using a small
significance level.)
3) Keep track of the results you
got (number of successful decisions and number of unsuccessful decisions) and
report them to me so we can all see the combined results.
Goals: Interpret
significance level. Observe the
effects of different values of the population mean. Recognize limitations to inference. Realize the potential abuses of
hypothesis tests.
Skills:
…
Interpret
significance level.
Our value for rejecting, usually .05, is the
percentage of the time that we falsely reject a true null hypothesis. It does not measure whether we had a
random sample; it does not measure whether we have bias in our sample. It only measures whether random data could look like the
observed data.
…
Understand how the
chance of rejecting the null hypothesis changes when the population mean is
different than the hypothesized value.
When the population
mean is not the hypothesized value, we expect to reject the null
hypothesis more often. This is
reasonable, because rejecting a false null hypothesis is a correct
decision. Likewise, when the null
hypothesis is in fact true, we hope to seldom decide to reject. If we have generated enough
replications in class, we should see a power curve emerge that tells us how
effective our test is for various values of the population
mean.
…
Know the limitations
to confidence intervals and hypothesis tests. Section 6.3 has
examples of when our inference techniques are inappropriate. The main points to watch for are
non-random samples, misinterpreting what "rejecting the null
hypothesis" means, and misunderstanding what error the margin of error is
measuring. Be sure to read the
examples in Section 6.3 carefully as I will not go over them in detail in
class.
Reading: Section 6.3.
Activity: Gosset Simulation. Homework 8
due.
Take samples of size 5 from a
normal distribution. Use s instead of s in the
standard 95% confidence z-interval. Repeat 100 times to see if the true
coverage is 95%. (My
program GOSSET accomplishes this.) We will pool our results to see how close we are to
95%. A century ago, Gosset noticed
this phenomenon and guessed what the true distribution should be. A few years later Sir R. A. Fisher
proved that Gosset's guess was correct, and the t distribution was accepted by the statistical
community. Gosset was unable to
publish his results under his own name (to protect trade secrets), so he used
the pseudonym "A. Student".
You will therefore sometimes see the t distribution referred to as "Student's
t distribution".
While we will use the TI-83 to calculate confidence intervals, it
will be helpful
to know the formulas in addition.
All of the popular confidence intervals are based on adding and
subtracting a margin of error to a
point estimate. This estimate is
almost always an average, although in the case of proportions it is not
immediately clear that it is an average.
Goals: Introduce
t-test. Understand how the z-test is inappropriate in most small sample
situations.
Skills:
…
Know why
using the t-test or the t-interval when s
is unknown is appropriate.
When we use s instead of s and do not
use the correct t distribution, we
find that our confidence intervals are too narrow, and our hypothesis tests
reject H0 too often.
…
Realize that the
larger the sample size, the less serious the problem. When we have
larger sample sizes, say 15 to 20, we notice that the simulated success rates
are much closer to the theoretical.
Thus the issue of t vs z is a moot point for large samples.
Reading: Section 7.1.
Activity: Matched Pairs vs 2-Sample.
Matched Pairs problems are really one sample datasets disguised as two sample
datasets because two measurements on the same subject are taken. Sometimes "subject" is a
person; other times it is less recognizable, such as a year. The key issue is that two measurements
have been taken that are related to one another. One quick way to tell if you have a two sample problem is
whether the lists are of different lengths. Obviously if the lists are of different lengths, they are
not paired together. Naturally the
tricky situation is when the lists are of the same length, which occurs often
when researchers assign the same number of subjects to each of treatment and
control groups.
Once you realize that a sample is a matched pairs data set and that
the difference in the two measurements is the important fact, the
analysis proceeds just like one sample problems, but you use the list of
differences. In this respect,
there is nothing new about the matched pairs situation.
Goals: Recognize
when matched-pairs applies.
Skills:
…
Detect situations
where the matched pairs t-test
is appropriate. The nature of the matched pairs is that each value of
one of the variables is associated with a value of the other variable. The most common example is a repeated
measurement on a single individual, like a pre-test and a post-test. Other situations are natural pairs,
like a married couple, or twins.
In all cases, the variable we are really interested in is the difference in the two scores or
measurements. This single
difference then makes the matched pairs test a one-variable
t-test.
Reading: Section 7.2.
Activity: Finish 2-sample work. FIX THIS
Goals: Complete
2-sample t-test.
Skills:
…
Know the typical null
hypothesis for 2-sample hypothesis tests.
The typical null hypothesis
for 2-sample problems, both matched and independent samples, is that of
"no difference". For the
matched pairs, we say H0: m=0, and for
the 2 independent samples we say H0: m1=
m2.
As usual, the null hypothesis is an equality statement, and the
alternative is the statement the researcher typically wants to end up
concluding. In both 2-sample
procedures, we interpret confidence intervals as ranges for the
difference in means, and hypothesis tests as whether the
observed difference in means is far from zero.
Reading: Section 8.1.
Activity: Proportions:
What are the true batting averages of baseball players? Do we believe results from a few
games? A season? A career? We can use the binomial distribution as a model for getting
hits in baseball, and examine some data to estimate the true hitting ability of
some players. Keep in mind as we
do this the four assumptions of the binomial model, and whether they are truly
justifiable.
For a typical baseball player, we can look at confidence intervals for the true
percentage of hits he gets. Using
our results from linear combinations (Day 13), we can develop the
two sample proportions formulas. On the calculator, the command is STAT
TEST 2-PropZInt.
Technical note: the Plus 4 Method
will give more appropriate confidence intervals. As this method is extraordinarily easy to use (add 2 to the
numerator, and 4 to the denominator), I recommend you always use it when
constructing confidence intervals for proportions. For two sample problems, divide the 2 and 4 evenly between
the two samples; that is, add 1 to each numerator and 2 to each
denominator. Furthermore, the Plus
4 Method seems to work even for very small sample sizes, which is not the
advice generally given by textbooks for the large sample approximation. The Plus 4 Method advises that samples
as small as 10 will have fairly reliable results; the large sample theory
requires 5 to 10 cases in each of
the failure and success group.
Thus, at least 20 cases are
required, and that is only when p
is close to 50 %.
Homework 9 due.
Goals: Introduce
proportions.
Skills:
…
Detect situations
where proportions z-test is correct.
We have several conditions
that are necessary for using proportions.
We must have situations where only two outcomes are possible, such as yes/no,
success/failure,
live/die, Rep/Dem, etc. We must
have independence between trials, which is typically simple to justify; each
successive measurement has nothing to do with the previous one. We must have a constant probability of
success from trial to trial. We
call this value p. And finally we must have a fixed number
of trials in mind beforehand; in contrast, some experiments continue
until a certain number of successes has
occurred.
…
Know the conditions
when the normal approximation is appropriate. In order to use the
normal approximation for proportions, we must have a large enough sample
size. The typical rule of thumb is
to make sure there are at least 5 successes and at least 5 failures in the
sample. For example, in a sample
of voters, there must be at least 5 Republicans and at least 5 Democrats, if we
are estimating the proportion or percentage of Democrats in our
population. (Recall the m&m's
example: when you each had fewer than 5 blue or green m&m's, I made you
take more until you had at least 5.)
…
Know the Plus 4
Method. A recent (1998) result from statistical research suggested
that the typical normal theory failed mysteriously in certain unpredictable
situations. Those researchers
found a convenient "fix": pretend there are 4 additional
observations, 2 successes and 2 failures.
By adding these pretend cases to our real cases, the resulting
confidence intervals almost magically capture the true parameter the stated
percentage of the time. Because
this "fix" is so simple, it is the recommended approach in
all confidence
interval problems. Hypothesis testing procedures remain
unchanged.
Reading: Section 8.2.
Activity: 2-Sample
Proportions
Goals: 2-Sample
proportions.
Skills:
…
Detect situations
where the 2-proportion z-test
is correct. Description.
Reading: Chapters 6 to 8.
Activity: Review
statistical inference,
Goals: Conclude
course topics. Know
everything.
Skills:
…
Be able to correctly
choose the technique from among the z-test, the t-test,
the matched pairs t-test,
the 2 sample t-test, and z-tests for proportions. Description.
Reading: Chapters 6 to 8.
Activity: Presentation 3. Statistical Inference (Chapters 6 to
8). Homework 10
due.
Make a claim, a statistical hypothesis, and test it. Gather appropriate data to test your claim. Discuss and justify any assumptions you
made. Explain why your test is the
appropriate technique.
Reading: Chapters 6 to 8.
Activity: Exam 3. This last exam covers the z- and t- tests and intervals in Chapters 6 and 7,
and the z tests and intervals for proportions in Chapter
8. Some of the questions will be
multiple choice. Others will
require you to show your worked out solution. Section reviews are an excellent source for studying for the
exams. Don't forget to review your
class notes and these on-line notes.
Managed by: Chris Edwards
edwards at uwosh dot edu
Last updated December 10, 2006