Day By Day
Notes for MATH 301
Fall 2006
Activity: Go over syllabus. Take roll. Overview examples: Randomness - coin example. Gilbert trial. Election polls. Spam filters.
Creating random samples.
The text is
remiss in telling us how to actually select random samples in practice. Many texts fail in this regard, so to
fill in this blank, we will use three methods of sampling today: dice, a table
of random digits, and our calculator.
To make the problem feasible, we will only use a population of size
6. (I know this is unrealistic in
practice, but the point today is to see how randomness works, and trust that
hopefully the results extend to larger problems.) Pretend that the items in our population (perhaps they are
people) are labeled 1 through 6.
For each of our methods, you will have to decide in your group what to
do with "ties". Keep in
mind the goal of simple random sampling: at each stage, each remaining item has
an equal chance to be the next item selected.
By rolling dice, generate a sample of three people. (Let the number on the die correspond to one of the
items.) Repeat 20 times, giving 20
samples of size 3.
Using the table of random digits, starting at any haphazard location, select
three people. (Let the random
digit correspond to one of the items.)
Repeat 20 times, giving 20 more samples of size 3.
Using your calculator, select three people. The TI-83 command MATH
randInt(2,4,5) will produce 5 numbers between 2 and 4, inclusive,
for example. (If you leave off the
third number, only one value will be generated.) If your calculator has a rand
function only, you can achieve the same result as the TI-83 MATH randInt(2,4) with
int(3*rand)+2. Repeat 20 times, giving 20 more samples of
size 3.
Your group should have drawn 60 samples at the end. Keep careful track of which samples you selected; record
your results in order, as 125 or 256, for example. (125 would mean items 1, 2, and 5 were selected.) We will pool the results of
everyone's work
together on the board.
Goals: Review
course objectives: collect data, summarize information, model with probability,
make inferences.
Gain practice taking random samples.
Understand what a simple random sample is. Become familiar
with randInt(. Accept that calculator is
random.
Skills:
…
Know the definition
of a Simple Random Sample (SRS). Simple Random Samples can be defined in two ways:
1) An SRS is a sample where, at
each stage, each item has an equal chance to be the next item selected.
2) A scheme were every possible
sample has an equal chance to be the
sample results in an SRS.
…
Select an SRS from a
list of items. The TI-83 command randInt( will select numbered items from a list randomly. If a number selected is already in the
list, ignore that number and get a new one. Remember, as long as each remaining item is equally likely to be chosen as the next item,
you have drawn an SRS.
…
Understand the real
world uses of SRS.
In practice, simple random samples are not that
common. It is just too impractical
(or impossible) to have a list of the entire population available. However, the idea of simple random
sampling is essentially the foundation for all the other types of
sampling. In that sense then it is very
common.
Reading: Sections 1.1 to
1.6.
Activity: Dance Fever example.
Use the "Arizona Temps" dataset to
calculate means, standard deviations, the 5-number summaries. To calculate our summary statistics
with the TI-83, we will use STAT CALC 1-Var Stats (to use List 1) or STAT CALC 1-Var Stats L2 for List 2, for example. There are two screens of output; we will be mostly concerned
with the mean , the standard deviation Sx, and the five-number summary on screen two.
Answer these questions:
1) Are high and low temperatures
distributed the same way, other than the obvious fact that highs are higher
than lows?
2) How does a single case affect the
calculator's routines? (What if we
had had an outlier?)
3) What information does
the 5-number
summary disguise?
Now, create the following lists:
1) A list of 10 numbers that has
only one number below the mean.
2) A list of 10 numbers that has the
standard deviation greater than the mean.
3) A list of 10 numbers that has a
standard deviation of zero.
For your fourth list start with any 21 numbers. Find a number N
such that 14 of the numbers in your list are within N of the average.
For example, pick a number N
(say 4), calculate the average plus 4, the average minus 4, and count how many
numbers in your list are between those two values. If the count is less than 14, try a larger number
for N (bigger than 4). If the count is more than 14, try a smaller number
for N (smaller than 4).
Finally, compare the standard deviation to the Interquartile Range (IQR = Q3 -
Q1).
Goals: Compare
numerical measures of center and spread.
Use technology to summarize data with numerical measures. Interpret standard deviation as a
measure of spread.
Skills:
…
Understand the effect
of outliers on the mean.
The mean (or average) is unduly influenced by outlying
(unusual) observations. Therefore,
knowing when your distribution is skewed is
helpful.
…
Understand the effect
of outliers on the median. The median is almost completely
unaffected by outliers. For
technical reasons, though, the median is not as common in scientific
applications as the mean.
…
Use the TI-83 to
calculate summary statistics.
Calculating may be as simple as entering numbers into
your calculator and pressing some buttons: STAT CALC 1-Var Stats. Or, if
you are doing some things by hand, you may have to organize information the
correct way, such as listing the numbers from low to high. Please get used to using the
statistical features of your calculator to produce the means, standard
deviations, etc. While I know you
can calculate the mean by simply adding up all the numbers and dividing by the
sample size, you will not be in the habit of using the full features of your
machine, and later on you will be "missing the
boat".
…
Understand standard
deviation. At first, standard deviation will seem foreign to you,
but I believe that it will make more sense the more you become familiar with
it. In its simplest terms, the
standard deviation is non-negative number that measures how "wide" a
dataset is. One common
interpretation is that the range of a dataset is about 4 standard
deviations. Another interpretation
is that the standard deviation is roughly ¾ times IQR; that is the
standard deviation is a bit smaller than the IQR. Eventually we will use the standard deviation in our
calculations for statistical inference; until then, this measure is just
another summary statistic, and getting used to this number is your goal. The normal curve of Chapter 6 will
further help us understand standard deviation.
Reading: Sections 1.7 to 1.9 and 8.3
(excluding normal quantile-quantile plots).
Activity: Use the "Arizona Temps" dataset to practice creating
the histograms, stemplots, boxplots, and quantile plots for several lists. Compare and interpret the graphs. Identify shape, center, and spread.
Compare these measures with the corresponding numerical measures you calculated
on Day 2. Notice that the boxplots
and numerical measures cannot describe shape very well. The histograms are hard to use to
compare two lists. The stem and
leaf is difficult to modify.
Useful commands for the TI-83:
STAT
EDIT (use one of the lists to enter data,
L1 for example; the other L's can be used too)
2nd
STATPLOT 1 On (Use this screen to
designate the plot settings. You
can have up to three plots on the screen at once. For now we will only use one at a time.)
ZOOM
9 This command centers the window around
your data.
PRGM
EXEC QUANTILE ENTER This program I wrote
plots the sorted data and "stacks" them up. It is essentially a quantile plot.
Using the plots now instead of the summary statistics, answer these questions
again:
1) Are high and low temperatures
distributed the same way, other than the obvious fact that highs are higher
than lows?
2) How does a single case affect
the calculator's routines? (What
if we had had an outlier?)
3) What information does the
5-number summary disguise?
Goals: Be able
to use the calculator to make a histogram, boxplot, or a quantile plot. Be able to make a stemplot by
hand.
Skills:
…
Summarize data into a
frequency table. The easiest way to make a frequency table is
to TRACE the boxes in a histogram and record the classes and
counts. You can control
the size and
number of the classes with Xscl
and Xmin
in the WINDOW menu. The decision as to
how many classes to create is arbitrary; there isn't a "right"
answer. One popular suggestion is
try the square root of the number of data values. For example, if there are 25 data points, use 5
intervals. If there are 50 data
points, try 7 intervals. This is a
rough rule; you should experiment with it. The TI-83 has a rule for doing this; I do not know what
their rule is. You should
experiment by changing the interval width and see what happens to the
diagram.
…
Use the TI-83 to
create an appropriate histogram, boxplot, or quantile plot. STAT PLOT is our main tool for
viewing distributions of data.
Histograms are common displays, but have flaws; the choice of
class width
is troubling as it is not unique.
The quantile plot is more reliable, but less common. For interpretation purposes, remember
that in a histogram tall boxes represent places with lots of data, while in a
quantile plot those same high-density data places are
steep.
…
Create a stemplot by
hand. The stemplot is a convenient manual display; it is
most useful for small datasets, but not all datasets make good stemplots. Choosing the "stem" and
"leaves" to make reasonable displays will require some practice. Some notes for proper choice of stems:
if you have many empty rows, you have too many stems. Move one column to the left and try again. If you have too few rows (all the data
is on just one or two stems) you have too few stems. Move to the right one digit and try again. Some datasets will not give good
pictures for any choice of stem, and some benefit from splitting or rounding
(see the example in the text).
…
Describe shape,
center, and spread.
From each of our graphs, you should be able to make
general statements about the shape, center, and spread of the distribution of
the variable being explored.
…
Compare several lists
of numbers using boxplots.
For two lists, the best simple approach is the
back-to-back stemplot. For more
than two lists, I suggest trying boxplots, side-by-side, or stacked. At a glance, then, you can assess which
lists have typically larger values or more spread out values,
etc.
…
Understand
boxplots. You should know that the boxplots for some lists don't
tell the interesting part of those lists.
For example, boxplots do not
describe shape very well (apart from rough symmetry); you can only see where
the quartiles are. Alternatively,
you should know that the boxplot can
be a very good first quick look at a dataset.
Reading: Sections 2.1 to
2.3.
Activity: Sample Spaces. Venn Diagrams. Coins, Dice. Pascal's Triangle.
Using either complete sampling spaces (theory) or simulation, find (or
estimate) these chances:
1) Roll two dice, one colored, one
white. Find the chance of the
colored die being less than the white die.
2) Roll three dice and find the
chance that the largest of the three dice is a 6. (Ignore multiple values; that is, the largest value when 6,
6, 4 is rolled is a 6.)
3) Roll three dice and find the
chance of getting a sum of less than 8.
Goals: Create
sample spaces. Use Venn diagrams
to organize sample spaces. Use
simulation to estimate probabilities.
Skills:
…
Know the definitions
of Sample Space, Event, Outcome, etc.
The basic language of
probability will be used throughout the course, so it is important for you to
be conversant in it.
…
Be able to use a Venn
diagram. The Venn diagram is a way of partitioning the sample
space into mutually exclusive regions.
It can be useful for simply organizing sets, or sometimes is quite
useful in understanding proofs (as we will see in the inclusion/exclusion
formula on Day 6.)
…
List simple sample
spaces. Flipping coins and rolling dice are common events to
us, and listing the possible outcomes lets us explore probability
distributions. We will not delve
too deeply into probability rules; rather, we are more interested in the ideas
of probability and I think the best way to accomplish this is by
example.
…
Simulation can be
used to estimate probabilities.
If the number of repetitions of an experiment is
large, then the resulting observed frequency of success can be used as an
estimate of the true unknown probability of success. However, a "large" enough number of repetitions
may be more than we can reasonably perform. For example, for problem 1 today, a sample of 100 will give
results between 32/100 and 51/100 (.32 to .51) 95% of the time. That may not be good enough for our
purposes. Even with 500, the range
is 187/500 to 230/500 (.374 to .460). Eventually the answers will converge to a useful percentage;
the question is how soon that will occur.
We will have answers to that question after Chapter
?.
…
Recognize the
usefulness and properties of Pascal's Triangle. Pascal's
Triangle is old (known to the Persians and the Chinese in the 11th
century) yet is still quite useful.
There are just two rules to construct Pascal's Triangle: each row begins and ends with a 1, and
each entry is the sum of the two entries above it to the left and the
right. From such a simple
construction, though, we encounter many relationships: the combination formula,
the triangular numbers, the Fibonacci numbers, the powers of 2, among
others. Our chief interest is in
the combination formula and its relationship to the binomial distribution.
Reading: Section
2.3.
Activity: Presentation 1.
Summaries (Chapters 1 and 8.3)
Gather 3 to 5 variables on at least 20 subjects; the source is irrelevant, but
knowing the data will help you explain its meaning to us. Be sure to have at least one numerical
and at least one categorical variable.
Demonstrate that you can summarize data graphically and numerically.
Combinations vs Permutations.
Goals: Continue
exploring Pascal's Triangle and how it relates to counting (permutations and
combinations).
Skills:
…
Know the Permutation
and Combination formulas.
When counting the number of ways of choosing items or
ordering items, our formulas are nCr and nPr, respectively.
You will need to work enough problems so that you know when to use each
of them. One way to keep them
straight is to think of a Combination as a Committee of people,
and a Permutation as a Photograph of that committee. (There are more permutations than
combinations for a particular choice of n and r.) Also don't forget our trick of listing
the complete sample space, but only for small
problems!
Reading: Sections 2.4 and
2.5.
Activity: Finish Combinations and
Permutations.
Arrange the letters in FREDA.
Arrange the letters in FREED.
Arrange the letters in ERRORS.
Arrange the letters in SETTER.
Demonstrate the Inclusion/Exclusion formula with a 3 set Venn diagram.
Use Venn diagrams to "prove":
A = (A«B)
» (A«B')
(A»B)' = A'«B'
Basic probability rules:
Probability is a number between 0 and 1, inclusive.
Mutually Exclusive events add when finding the union.
Mutually Exclusive and exhaustive events add to one.
Goals: Know the
rules of probability, including addition, complement, and
inclusion/exclusion. The
multiplication rule will be covered on Day 7.
Skills:
…
Understand the
probability rules.
Being adept at probability begins with knowing
definitions and knowing basic formulas.
For example, you can't prove things about mutually exclusive sets if you
can't recite the definition of mutually exclusive. Memorize at first; later it becomes "learned", not
"memorized".
…
Relate the rules to
sample spaces. Remember that the rules we're discussing are all based
on counting elements in sample spaces.
Sometimes it is helpful to have a few "standard" examples in
mind so conjectures or steps in reasoning can be verified. For example, the inclusion exclusion
principle is shown well with the two-dice problem "what is the chance of
at least one six?". Ignoring
the intersection makes the probability too large.
…
Realize how the Venn
diagram can help verify results. The inclusion/exclusion formula is a good example
where a Venn diagram can help with the proof or development. Other examples are DeMorgan's
Laws. For Bayes'
formula, on Day 7,
the Venn diagram will also be useful.
Reading: Sections 2.6 to
2.8.
Activity: Constructing
probability trees. Demonstrating
Bayes' with the rare disease problem.
Consider a card trick where two cards are drawn sequentially off the top of a
shuffled deck. (There are 52 cards
in a deck, 4 suits of 13 ranks.)
We want to calculate the chance of getting hearts on the first draw, on
the second draw, and on both draws.
We will organize our thoughts into a tree diagram, much like water
flowing in a pipe. On each branch,
the label will be the probability of taking that branch; thus at each node, the
exiting probabilities (conditional probabilities) add to one.
On the far right of the tree, we will have the intersection events. Their probability is found by
multiplying.
Calculate the chances of:
1) Drawing a heart on the first
card.
2) Drawing a heart on the second
card.
3) Drawing at least one heart.
4) Drawing two hearts.
5) Drawing a heart on the second
draw given that a heart was drawn first.
6) Drawing a heart on the first
draw given that a heart was drawn first.
Now we will do this work for the rare disease problem (Problem 2.128).
Goals: Be able
to express probability calculations as tree diagrams. Be able to reverse the events in a probability tree, which
is what Bayes' formula is about.
Skills:
…
Know how to use the
multiplication rule in a probability tree. Each branch of a
probability tree is labeled with the conditional probability for
that branch.
To calculate
the joint probability of a series of branches, we multiply the conditional
probabilities together. Note that
at each branching in a tree, the (conditional) probabilities add to one, and
that overall, the joint probabilities add to one.
…
Recognize conditional
probability in English statements.
Sometimes the key word is
"given". Other times the
conditional phrase has "if".
But sometimes the fact that a statement is conditional is
disguised. For example: "Assuming John buys the insurance, what is the chance
he will come out ahead" is equivalent to "If John buys insurance,
what is the chance he will come out ahead".
…
Be able to use the
conditional probability formula to reverse the events in a probability
tree. The key here is the symmetry of the events in the
conditional probability formula.
We exchange the roles of A and B, and tie them together with our formula
for Pr(A«B).
…
Know the definition
of independence. Independence is a fact about probability, not about
sets. Contrast this to
"disjoint" which is a property of sets. In
particular, independent events are by definition not disjoint.
Independence is important later as an assumption as it allows us to
multiply individual probabilities together without having to worry about
conditional probability.
Reading: Sections 3.1 and
3.2.
Activity: Continue coins and dice. Introduce Random Variables.
We will finish up the problems from Day 4. Also in our tables, we will include random variables.
Answer the following questions:
1) What is the chance of getting a
sum of 8 on two dice?
2) What is the chance of getting a
sum of 10 on two dice?
3) What is the chance of getting a
sum of x on two dice,
where x is between 1 and 13?
4) What is the chance of getting
10 heads on 20 flips of a fair coin?
5) How can you get the TI-83 to
graph a probability histogram?
Derive a pmf and its cdf. Use the
sum on two dice as an example.
Know how to work back and forth from one to the other.
Goals:
Understand
that variables may have values that are not equally likely.
Skills:
…
Understand discrete
random distributions and how to create simple ones. We have listed
sample spaces of equally likely events, like dice and coins. Events can further
be grouped together and assigned values.
These new groups of events may not be equally likely, but as long as the
rules of probability still hold, we have valid probability distributions. Pascal's triangle is one such example,
though you should realize that it applies only to fair coins. We will work with "unfair
coins" (proportions) later, in Chapter 5. Historical note: examining these sampling distributions led
to the discovery of the normal curve in the early 1700's. We will copy their work and
"discover" the normal curve for ourselves too using
dice.
…
Know the definition
of a discrete probability mass function (pmf). If a non-negative
function sums to 1 over some set, then we have a discrete pmf. It is not necessary for the set to be
finite; this means we may need to work with infinite sums. Because each item in the sum is a
probability, it is necessary that
each value is less than one.
(Contrast this with the continuous distributions on Day
9.)
…
Know the definition
of a discrete cumulative distribution function (cdf). If a
non-decreasing function begins at 0 from the left and ends at 1 on the right,
and has no place where the derivative is non-zero, then we have a discrete
cdf. The key is that discrete
cdf's are stairs, flat spot with discrete jumps.
Reading: Section
3.3.
Activity: Presentation 2.
Probability (Chapter 2)
Choose one of the following games and
1) Give us a short history of the
game.
2) Describe how randomness is part
of the game.
3) Using our probability rules,
show us an example using this game.
Games: Risk, Blackjack,
Backgammon, Roulette, Battleship, Poker, Minesweeper, Cribbage
Calculate a pmf and its cdf. Use
the uniform as an example. Know
how to work back and forth from one to the other. Note: calculus
required!
Go over the cdf method of generating random samples. Requires the cdf in a formula that can be inverted.
Goals: Introduce
continuous distributions.
Skills:
…
Know the definition
of a probability density function (pdf).
If a non-negative function
integrates to 1 over some interval, then we have a probability density
function. Notice that the function
can certainly be over 1 (contrast to pmf's); the key here is that the
area is one, not the maximum
height.
…
Know the definition
of a continuous cumulative distribution function (cdf). If
a continuous non-decreasing function begins at 0 from the left and ends at 1 on
the right, then we have a discrete cdf.
The key is the continuity.
If a function has only
jumps, it is discrete. If a
function has no jumps, it is
continuous. A function with both
is mixed. An example of a mixed distribution is a
question like "If you are employed, what is your income?" People without a job have no income, so
there is a spike at 0.
…
Realize that these
formulas and functions we are exploring are simply models. What
we are trying to do with these functions is model real world data with simple
curves. We hope to have both
simple equations and close fitting histograms and quantile plots. The best situation is a model that has
parameters that can be "tuned".
By choosing the parameters judiciously we can find curves that
approximate real data.
…
Know how to use the
cdf equation to generate random values from the variable. The
cdf ranges from 0 to 1, and each value represents a percentile. If we produce a uniform
number between 0
and 1, such as rand, then by cross-referencing on the graph, we can
tell which x-value corresponds to that uniform random number. While this can be done visually, it's
not very useful unless the equation can be solved
explicitly.
Reading: Section
3.4.
Activity: Joint
Distributions.
Because Calculus III is not a prerequisite for MATH 301, we will not do double
integrals or partial derivatives.
However, we can do marginal
distributions and explore the discrete problems, as they do not
require multiple
integrals.
To calculate a marginal distribution, we will integrate over the other
variable. So, the marginal
distribution with respect to x is
found by integrating out y. The variable x is treated as a constant.
After we have the marginal, we can use Bayes' formula and find the conditional
distribution. This leads to a
notion of independence just as in probability. This is the key result from this section: If two distributions are independent,
then they're joint pdf is the product of their marginals.
We will finish today with an exploration of the discrete cdf. The marginal distributions are found
the same way as the continuous; we just "add" out the other variable,
much like integrating. The cdf,
however, is harder to work with.
In fact, the discrete joint cdf is seldom used, except abstractly in
theorems.
Goals: Extend
the notions of pmf, pdf, and cdf to two dimensions.
Skills:
…
Know what the
marginal distributions are and how to derive them from a cdf. The
marginal distributions are the result of "getting rid of" the other
variable. For discrete
distributions, we sum; for continuous distributions, we integrate. One way to visualize this is to imagine
looking sideways at a joint pdf or pmf and "smushing" it together in
one direction, in essence averaging it, to make it only
one-dimensional.
…
Know the independence
definition. Our main conclusion of interest from this section is that
when two distributions are independent, we can get their joint pdf by
multiplying the marginals together. This relationship extends to expected values, so that we can
calculate the joint expected value by splitting the sum or integral into two
one-variable parts and multiplying the results together. If the two variables are dependent,
however, things are much more complicated, and we won't go there in this
course.
…
Understand the
relationship between the cumulative cdf and the pdf for continuous
distributions. Warning:
this requires Calc III. To
find the pdf from the cdf, we take partial derivatives. To find the cdf from the pdf, we do a
cumulative double integral. I'm
not expecting you to perform these operations; rather I want you just to know
what operation is required.
…
Understand why the
discrete cdf and discrete pmf are harder to work with in joint distributions
than the continuous cdf and pdf. Due to the stair-step nature of discrete
distributions, the pmf is not as straightforward to calculate. In most cases, there isn't a nice neat
formula for the cdf. The important
thing to know is definitions. The
"jumps" do not represent
the probability at that point.
(Which jump would one use?
There are three to choose from.)
Reading: Sections 4.1 and
4.2.
Activity: Expected Value,
Variance. Using the frequency
option on STAT CALC 1-Var Stats to calculate
m and s for a discrete
distribution.
GPA is an EV. Go over an EV
calculation. Then move on to a
function, like x2. Finally, use variance as the
function.
The TI-83 will help us calculate EV and Variance for a discrete
distribution. We need to include a
column of weights, integers, which are proportional to the probabilities of the
x-values. Then we use STAT CALC 1-Var
Stats L1, L2 where L1 contains the x-values and L2 contains the weights.
Goals: Define
Expected Value and Variance for distributions.
Skills:
…
Know the definition
of Expected Value.
Expected Value is calculated as the average product of
the random variable and its probabilities. For discrete variables, this is a sum, and for continuous
variables it is an integral.
…
Realize that the
definition is really the same for both discrete and continuous
distributions. Both definitions involve a summed product. The nature of the summing is the only
difference between them. Once you
realize that integration is simply repeated addition, the similarity of the two
formulas is clear.
…
Know the definition
of Variance. Variance is a particular kind of Expected Value. The Expected Value of a
function of x is simply the average product of that
function of x and
its probabilities. For Variance,
the function of x that we use is
the squared difference from the mean, or Expected
Value.
…
Understand that
Expected Values and Variances are parameters and should have no
x-values.
After summing or integrating,
the variable of summation or integration disappears. This is most easily seen in the Fundamental Theorem of
Calculus where we substitute numbers (the integrands) into the
anti-derivative. Explicitly we
note that x equals those integrand
values. Therefore, Expected Value
and Variance are constants.
…
Know how to use the
TI-83 to calculate the mean and variance for a discrete distribution. By
including a variable of weights or frequencies, the TI-83 will
calculate m and s for a discrete
distribution. The syntax
is STAT CALC 1-Var
Stats L1, L2, where the
x-values are entered in L1 and the weights (probabilities expressed as integers) are
entered in L2.
Reading: Chapters 1 to
3.
Activity: Exam 1.
This first exam will cover graphical summaries (pictures), numerical summaries
(summary calculations) and probability (including random variables and joint
distributions), but excluding Expected Values and Variances from Day 11
(Chapter 4).
Reading: Sections 4.3 and
5.1 to 5.3.
Day 13
Activity: Simulating data
to see the Linear Combinations formulas in action. Introduce Binomial.
Activity 1
Rules for Linear Combinations. While one could simply memorize
these rules, I think it might be more instructive to simulate some
data and see the rules at work. So, we are going to reproduce some SAT data. Suppose for a recent year, the verbal
SAT had mean 507 with standard deviation 111, math SAT had mean 519 with
standard deviation 115, and the correlation was 0.71. We will "tinker" with these parameters and see how
things change.
To start with, generate some x-values
in L1:
MATH
PRB randNorm( mx, sx, 300
) -> L1.
(Use the values in the problem for mx, sx, my, sy, and r.)
You might think we can use a similar
command to generate some y-values
in L2.
However, this would ignore the correlation in the two variables. To account for this, we must
"borrow" some results from regression. The next two commands will put "errors"
in L3 and y-values
in L2. Trust
me, it works.
MATH
PRB randNorm( 0, sy
* à( 1 - r2 ), 300 ) -> L3
my – r * sy / sx ( mx - L1 ) + L3
-> L2
Plot L1vs
L2 to verify that the data does indeed have a correlation
of r.
Calculate the means and standard deviations to see that your simulation
is close to the assumed values: STAT CALC 1-Var Stats L1 and STAT CALC 1-Var Stats L2. (You
can also do STAT CALC LinReg(ax+b) to get
the correlation coefficient.)
Now let's see how the rules work by doing the sum and the difference of
the two "SAT scores": L1 + L2 -> L4 and L1 -
L2 -> L5. Check
to see if these simulations agree with the theoretical results by finding the
means and standard deviations of L4 and L5: STAT CALC 1-Var Stats L4 and STAT CALC 1-Var Stats L5
Now try this again using a different
value for r. (In
particular, see what happens when r =
0. This is the case for
independence.)
Activity 2
Coin flipping is a good way to understand randomness, but because most coins
have a probability of heads very close to 50 %, we don't get the true flavor of
the binomial distribution.
Today we
will simulate the flipping of an unfair coin; that is, a binomial process with
probability not equal to 50 %.
Experiment 1: Our unfair
"coin" will be a die, and we will call getting a 6 a success. Roll you die 10 times and record how
many sixes you got. Repeat this
process 10 times each. Your group
should have 40 to 50 trials of 10 die rolls. Pool your results and enter the data into a list on your
calculator. We want to see the
histogram (be sure to make the box width reasonable) and calculate the summary
statistics, in particular the mean and variance. Also produce a quantile plot. Compare the simulated results with theory.
Experiment 2: Your calculator will
generate binomial random variables for you, but it is not as illuminative as
actually producing the raw data yourself.
Still, we can see the way the probability histogram looks (if we
generate enough cases; this is an application of the law of large
numbers). I suggest 100 at a minimum. Again be sure to make your histogram
have an appropriate width. The
command is MATH PRB randBin( n, p, r ), where n is the sample size, p is
the probability of success, and r is the number of times to repeat the experiment.
Activity 3
Using DISTR binompdf( and/or DISTR
binomcdf(, calculate these probabilities:
1) What is the chance of 4 or
fewer successes with n =
10 and p = .4?
2) What is the chance of 3 to 10
successes, inclusive, with n = 20
and p = .7?
3) What is the chance of more than
6 successes with n = 15
and p = .2?
Goals: Finish
Expectation Formulas. Introduce
the important Binomial distribution.
Skills:
…
Know how to combine
Expected Value and Variance for Linear Combinations of Random Variables. We
often are interested in linear functions of two random variables. For instance, we may know the mean and
variance of times for workers traveling to a job site. We may also know the mean and variance
of the length of the job.
What we really want to know is the total length of time the workers
are gone from the main plant.
…
Know the
four assumptions
underlying the binomial model.
We must have:
1) an experiment that has only two
outcomes: success and failure.
2) the trials are independent of
one another.
3) a fixed sample size
n.
4) a constant probability of
success p.
…
Become
familiar with the
TI-83 commands to calculate binomial probablities. Our TI-83 has two
functions, and which allow us to calculate binomial
probablities. As their names
imply, one is cumulative and the other is marginal. With sufficient memory, one could resort to the actual
formula and use a sequence command to accomplish the same thing. In particular, you can
verify that you are using the commands correctly by
memorizing a small check example.
The command syntax is DISTR binompdf( n, p
), which will give the whole pmf as a list,
or DISTR
binompdf( n,
p, x
), which will give just the probability of x.
For DISTR binomcdf( n, p, x ), we have
the cumulative probability below x.
…
Random values can be
generated using MATH PRB randBin(. The
usual cdf method of generating random values is useless here because we don't
have an explicite invertible formula for the binomial cdf. Thus we rely on other mechanisms. One could generate a series of
Bernoulli trials and add the 0's and 1's together. The easiest way, however, is MATH PRB randBin(
n,
p,
r ), where n is the sample size, p is
the probability of success, and r is the number of values required.
Reading: Sections 5.3 to
5.4.
Activity: Finish
Binomial. Introduce
Hypergeometric.
Our last item of business with the binomial is the mean and variance
calculation. I will go through the
"trick" in class, which is just a few algebra steps, but it might
throw some of you. I suggest
paying attention carefully, and trying to focus on the big picture instead of
the details. The later, knowing
what the purpose of each step is you can pay attention to the details. This trick will appear in
other distributions,
so it is worth knowing about.
My activity: All hypergeometric problems
can be thought
of as "lottery" problems.
The winning balls are our "successes"; the losing balls are
the "failures". In this
respect, the hypergeometric is like the binomial. However, balls are chosen without replacement, and this is where the two distributions
differ. I will do three types of
examples: quality control, poker,
and a large binomial.
Your activity: Is my program generating hypergeometric data? You will check to see if it
is close to theory.
Warning: this will
be difficult to answer unequivocally, due to the "Law of Small
Numbers". In your groups,
generate a number of observations with the program, using some parameters you
choose, and then decide some devices (graphs, calculations, etc) to compare the
program output with theory. At the
end of the period, we will share notes, so be prepared in your group to
summarize your findings.
Goals:
Understand
the Hypergeometric distribution, including generating cases, and the mean and
variance.
Skills:
…
Know the mean and
variance for the binomial.
For each distribution we encounter, we will want to
know the mean and variance. These
become useful later when we examine the Central Limit Theorem. For the binomial, the mean is
n p and the variance is n p ( 1 - p ).
…
Recognize situations
where the hypergeometric distribution is an appropriate model. The
example I think of when imagining the hypergeometric distribution is the
decision to accept or reject a shipment of goods. Another good example is the full house calculation for
poker. In any case, what we are
modeling is a binomial-like situation, with successes and failures. But see the next
item.
…
Know how the binomial
and the hypergeometric are related. The key difference is the
hypergeometric comes from a finite population. If one imagines balls in urns, then the binomial is a
hypergeometric using a huge
urn. (Of course, we're talking
about the limit here.)
…
Know the mean and
variance for the hypergeometric. Because
of the close relationship with the binomial, we can use the corresponding
formulas, with a slight modification.
If we let p = k / N, and if
we use the finite population correction factor ( N - n ) / ( N - 1
), we see the mean and variance
are easy to remember. For the
hypergeometric distribution, the mean is n k / N and the variance is ( N - n ) / ( N - 1 ) n k / N
( 1 - k / N ).
…
Use to generate random values from a
hypergeometric distribution. The program HYPER will generate random data which follows the
hypergeometric distribution. The
program is simple; it creates a cdf and compares it to a random
number MATH
PRB rand until the cdf exceeds MATH PRB rand. That
value is the one reported. In the
program, M is the population
size (N), K is the
number of successes in the population (k), N is the sample size
(n), and R is
the number of random numbers desired.
Reading: Sections 5.5 and 5.6 and 6.1 to
6.3.
Activity: Negative
Binomial and Poisson. Introduce
the TI-83's normal calculations.
I will derive the pmf for the Negative Binomial by a counting
argument. This will be one of
those few times where I think the derivation of a distribution is one you should memorize. If you know how the formula is
developed, you will know the pmf without having to memorize a formula. The essence of this proof is filling in
the slots of a sequence of successes and failures, but requiring the last one
to be a success. It is then just a
matter of using the right binomial coefficient and multiplying by the right
number of p's and ( 1 -
p )'s.
How can we generate random data from this distribution? Will a program similar to HYPER work? Or
does the infinite support worry us?
Can we simply flip coins (with our calculator)? After we struggle with this a bit, I
will share my two programs I wrote.
The poisson distribution is our last specific discrete model. It is based on the Taylor's series
expansion of ex. Basically, any series you can invent
that adds to a constant can be made into a pmf.
To begin the continuous distributions, we will look at the famous bell-shaped
curve, the normal curve. We have
encountered this before, on Day 13, and we see the shape begin to appear if we
graph the values from Pascal's Triangle.
Because the pdf for the normal is an equation for which there is no
corresponding anti-derivative (in a closed form), we must resort to numerical
integration to calculate areas under the curve. Every statistics textbook has a normal
table for this purpose. You are welcome to use the text's normal table; however, I
think you will find it much, much easier to use the built-in functions in the
TI-83.
The two commands are DISTR normalcdf( and
DISTR
invNorm(. DISTR normalcdf( is used to find the area under the
curve between two x-values, and DISTR
invNorm( is used to find the percentile. The command DISTR normalcdf(
10, 20, 14, 3 ) will give the area
between 10 and 20 using a mean of 14
and a standard deviation of 3. If you leave off the mean and standard
deviation, 0 and 1 will be assumed.
The command DISTR invNorm( .7, 14, 3) will
give the x-value that
has area .7 to the left of it, using a mean of 14 and a standard deviation of 3. Again,
if you leave off the mean and standard deviation, 0 and 1 are assumed.
Goals:
Understand
the Negative Binomial distribution.
Introduce normal curve. Use
TI-83 in place of the standard normal table in the text.
Skills:
…
Relate the Negative
Binomial to the Binomial and the Hypergeometric. The negative
binomial differs from the binomial because the number of trials isn't known
beforehand; we continue the experiment until r successes
occurs. The negative binomial
differs from the hypergeometric because the probability of a success on any
trial, p, is fixed, as in the
binomial.
…
Using the TI-83 to
find areas under the normal curve.
When we have a distribution
that can be approximated with the bell-shaped normal curve, we can make
accurate statements about frequencies and percentages by knowing just the mean
and the standard deviation of the data.
Our TI-83 has 2 functions, DISTR normalcdf( and DISTR invNorm(
which allow us to calculate these percentages more easily and more accurately
than the table in the text. We use
DISTR
normalcdf( when we want the percentage as
an answer and we use DISTR invNorm( when we already
know the percentage but not the value that gives that percentage.
Reading: Sections 6.3 and
6.4.
Activity: Practice normal calculations.
1) Suppose SAT scores are
distributed normally with mean 800 and standard deviation 100. Estimate the chance that a randomly
chosen score will be above 720.
Estimate
the chance that a randomly chosen score with be between 800 and 900. The top 20% of scores are above what
number? (This is called
the 80th
percentile.)
2) Find the Interquartile Range
(IQR) for the standard normal (mean 0, standard deviation 1). Compare this to the standard deviation
of 1.
3) Women aged 20 to 29 have
normally distributed heights with mean 64 and standard deviation 2.7. Men have mean 69.3 with standard
deviation 2.8. What percent of
women are taller than the average man, and what percentage of men are taller
than the average woman?
4) Pretend we are manufacturing
fruit snacks, and that the average weight in a package is .92 ounces with
standard deviation 0.05. What
should we label the net weight on the package so that only 5 % of packages are
"underweight"?
5) Suppose that your average
commute time to work is 20 minutes, with standard deviation 2 minutes. What time should you leave home to
arrive to work on time at 8:00?
(You may have to decide a reasonable value for the chance of being
late.)
Goals: Master normal calculations. Realize that summarizing using the normal curve is the
ultimate reduction in complexity, but only applies to data whose distribution
is actually bell-shaped.
Skills:
…
Memorize 68-95-99.7
rule. While we do rely on our technology to calculate areas
under normal curves, it is convenient to have some of the values committed to
memory. These values can be used
as rough guidelines; if precision is required, you should use the TI-83
instead. I will assume you know
these numbers by heart.
…
Understand that
summarizing with just the mean and standard deviation is a special case. We
have progressed from pictures like histograms to summary statistics like
medians, means, etc. to finally summarizing an entire list with just the mean
and the standard deviation.
However, this last step in our summarization only applies to lists whose distribution resembles the
bell-shaped normal curves. If the
data's distribution is skewed, or has any other shape, this level of
summarization is incomplete. Also,
it is important to realize that these calculations are only approximations.
Reading: Sections 6.5 to
6.7.
Activity: Normal
Approximation to the Binomial.
Gamma Distribution. Cauchy
Distribution.
Calculate the chance of getting between 40 and 50 heads on 100 tosses of a fair
coin. Use the binomial. Then use the normal approximation.
Now try the chance of exactly 50 heads on 100 tosses.
Now redo the first one using the continuity correction.
The next pdf we will explore is the flexible Gamma Distribution. Unfortunately, calculating areas
requires numerical integration, and tables are not readily available. Our book has a table called
the incomplete
gamma distribution, but we can use
our TI-83's chi-square
distribution and some theory from MATH 401 to find these same areas. Quite simply, convert the
desired x-value via y = 2 x / b and use 2 a degrees of
freedom, df. Then DISTR c2cdf( 0, y, df ) on the TI-83 will give the probability that the gamma variable is less
than x. To find right tail areas (above x), we use symmetry (subtraction). Also, if we know the pdf, we can simply
ask our calculators to do a numerical integration to get the desired
probability.
The last pdf is a counter-example to some theorems. The Cauchy Distribution (cdf shaped like arc tangent) is
most easily explored by the ratio of two standard normal numbers. It has unusual behavior though. While its pdf is symmetric, it has no
mean. This is a curiosity of
calculus results; while it surely has a median, the "center of mass"
integral is actually undefined.
This behavior is apparent after a long string of outcomes. Now and then, extreme outliers
occur. Even worse, if you keep a
cumulative average, it will not seem to approach any fixed value; it wanders
around directionless. This is in
direct contrast to the results of the Central Limit Theorem (Days 18 and 19)
because the Cauchy doesn't satisfy one of the hypotheses. One group will further examine this
distribution in the next presentation (Day 22).
Goals:
Understand
the details of using the normal curve to approximate binomial
probabilities. Examine Gamma and
Cauchy distributions.
Skills:
…
Know why we add or
subtract ½ before doing the normal approximation to the binomial. We
can use the normal curve to approximate binomial probabilities. But because the binomial is a discrete
distribution and the normal is continuous, we need to adjust our endpoints by
½ unit. I recommend drawing
a diagram with rectangles to see which way the ½ unit goes. For example, calculating the chance of
getting 40 to 50 heads inclusive on 100 coin flips (perhaps a bent coin)
entails using 39.5 to 50.5.
…
Know the shapes of
the gamma distribution. By adjusting the parameters, the gamma
distribution can take on a number of shapes. You should explore the various combinations to see when the
curve begins at 0, when it begins at infinity, and when it begins at some value
in between.
…
Know the properties
of the gamma distribution. You should know the mean and variance
of the gamma distribution. the
mean is a b and the
variance is a b
2.
…
Know how to generate
gamma probabilities when a is an
integer. With the appropriate transformation ( 2
x / b ) we can use our TI-83's
c2 function to find gamma
probabilities. If a is not an
integer, you will have to interpolate.
Another solution is to use the calculator's definite integral
abilities. This of course requires
you to have memorized the pdf.
Reading: Sections 8.1 to
8.5.
Activity: Central Limit Theorem
exploration.
In addition to coins and dice, rand on your
calculator is another good random mechanism for exploring "sampling
distributions". These
examples will give you some different views of sampling distributions. The important idea is that each time an
experiment is performed, a potentially different result occurs. How these results vary from sample to
sample is what we seek. You are
going to produce many samples, and will therefore see how these values vary.
1) Sums of two items: Each of you in your group will roll two
dice. Record the sum on the
dice. Repeat this 30 times,
generating 30 sums. Make a
histogram or a QUANTILE of your 30 sums. Compare to the graphs of the other
members in your group, particularly noting the shape. Sketch the graph you made and compare to the .
2) Sums of 4 items: Each of you generate 4 random numbers
on your calculator, add them together, average, and record the result; repeat
30 times. The full
command is: seq ( rand +
rand + rand + rand, X, 1, 30 ) / 4 -> L1,
which will generate 30 four-sum average random numbers and store them
in L1.) Again,
make a graph of the distribution.
3) Sums of 12 items: Each of you generate 12
random normal numbers on your calculator using MATH PRB
randNorm( 65, 5, 12). Add them together and record the
result; repeat 30 times. The full
command is: seq (sum ( randNorm( 65, 5, 12 ) ), X, 1, 30 ) -> L2.) Again,
make a graph of the distribution.
(This is problem ? in our text.)
For all the lists you generated, calculate the standard deviation and the
mean. We will find these two
statistics to be immensely important in our upcoming discussions about
inference. It turns out that these
means and standard deviations can be found through formulas instead of having
to actually generate repeated samples.
These means depend only on the mean and standard deviation of the
original population (the dice or rand
or randNorm in this case) and the number of times the dice were
rolled or rand was pressed
(called the sample
size, denoted n).
Goals: Examine
histograms to see that averages are less variable than individual
measurements. Also, the shape of
these curves should get closer to the shape of the normal curve as
n increases.
Skills:
…
Understand
the concept
of sampling variability.
Results vary from sample to sample. This idea is sampling
variability. We are
very much interested in knowing what the likely values of a statistic are, so
we focus our energies on describing the sampling distributions. In today's exercise, you simulated
samples, and calculated the variability of your results. In practice, we only do one sample, but
calculate the variability with a formula.
In practice, we also have the Central Limit Theorem, which lets us use
the normal curve in many situations to calculate probabilities.
Reading: Chapters 4 to
6.
Activity: Practice Central Limit Theorem
(CLT) problems. We will have
examples of non-normal data and normal data to contrast the diverse cases where
the CLT applies.
1) People staying at a certain
convention hotel have a mean weight of 180 pounds with standard deviation
35. The elevator in the hotel can
hold 20 people. How much weight
will it have to handle in most cases?
Do we need to assume weights of people are normally distributed?
2) Customers at a large grocery
store require on average 3 minutes to check out at the cashier, with standard
deviation 2. Because checkout time
cannot be negative, they are obviously not normally distributed. Can we calculate the chance that 85
customers will be handled in a four hour shift? If so, calculate the chance; if not, what else do you need
to know?
3) Suppose the number of
hurricanes in a season has mean 6 and standard deviation à6. What is the chance that in 30 years
there have been fewer than 160 hurricanes?
4) The number of boys in a 4 child
family can be modeled reasonably well with the binomial distribution. If five such families live on the same
street, what is the chance that the total number of boys is 12 or more?
Goals: Use
normal curve with the CLT.
Skills:
…
Recognize how to use
the CLT to answer probability questions concerning sums and averages. The
CLT says that for large sample sizes, the distribution of the sample average is
approximately normal, even though the original data in a problem may not be
normal.
…
For small samples, we
can only use the normal curve if the actual distribution of the original data
is normally distributed.
It is important to realize when original
data is not normal,
because there is a tendency to use the CLT even for small sample sizes, and
this is inappropriate. When the
CLT does apply, though, we are
armed with a valuable tool that allows us to estimate probabilities concerning
averages. A particular example is
when the data is a count that must
be an integer, and there are only a few possible values, such as the number of
kids in a family. Here the normal
curve wouldn't help you calculate chances of a family having 3 kids.
However, we could calculate quite accurately the number of
kids in 100 such families.
Reading: Sections 9.1 to
9.5.
Activity: Exam 2.
This second exam is on expected value, variance, and particular distributions,
both discrete and continuous. You
should know facts about each distribution we have encountered. You should be able to generate random
data from each distribution. You
should know the strengths and limitations of each of them.
Reading: Sections 8.4 and
8.5.
Activity: Guess m&m's percentage. What fraction of m&m's are blue or
green? Is it 25 %? 33 %? 50 %? We take
samples to find out.
Each of you will sample from my jar of m&m's, and you will all calculate
your own confidence interval. Of
course, not everyone will be correct, and in fact, some of us will have
"lousy" samples. But
that is the point of the confidence coefficient, as we will see when we jointly
interpret our results.
It has been my experience that confidence intervals are easier to understand if
we talk about sample proportions instead of sample averages. Thus I will use techniques from Section
9.10. Each of you will have a
different sample size and a different number of successes. In this case the sample size,
n, is the total number of m&m's you have selected,
and the number of successes, x, is
the total number of blue or green m&m's in your sample. Your guess is simply the
ratio x/n, or
the sample proportion. We call this estimate
p-hat or .
Use STAT TEST
1-PropZInt with 70 % confidence for your
interval here today.
When you have calculated your confidence interval, record your result on the
board for all to see. We will
jointly inspect these confidence intervals and observe just how many are
"correct" and how many are "incorrect". The percentage of correct
intervals should match our chosen level of confidence. This is in fact what is meant by
confidence.
Now we will explore how changing confidence levels and sample sizes influence
CI's. Complete the following
table, filling in the confidence interval width in the body of the table. Use STAT TEST 1-PropZInt but in
each case make x close
to 50 % of n. (The
calculator will not let you use non-integers for x; round off if needed.)
Confidence Level
============> Sample Size |
70
% |
90
% |
95
% |
99
% |
99.9
% |
10 |
|
|
|
|
|
20 |
|
|
|
|
|
50 |
|
|
|
|
|
100 |
|
|
|
|
|
1000 |
|
|
|
|
|
We will try to make sense of this chart, keeping in mind the meaning of
confidence level, and the desire to have narrow intervals.
Now repeat the above table using STAT TEST ZInterval, with s =
15 and = 100.
Confidence Level
============> Sample Size |
70
% |
90
% |
95
% |
99
% |
99.9
% |
10 |
|
|
|
|
|
20 |
|
|
|
|
|
50 |
|
|
|
|
|
100 |
|
|
|
|
|
1000 |
|
|
|
|
|
Goals: Introduce
statistical inference - Guessing the parameter. Construct and interpret a confidence interval. See how the
TI-83 calculates our CI's.
Interpret
the effect of differing confidence coefficients and sample
sizes.
Skills:
…
Understand how to
interpret confidence intervals.
The calculation of a confidence interval is quite
mechanical. In fact, as
we have seen,
our calculators do all the work for us.
Our job is then not so much to calculate confidence intervals as it is to be able to
understand when one should be used
and how best to interpret one.
…
Understand the
factors that make confidence intervals believable guesses for the
parameter. The two chief factors that make our confidence
intervals believable are the sample size and the confidence coefficient. The key result is larger confidence
makes wider intervals, and larger sample size makes narrower
intervals.
…
Know the details of
the Z Interval. When we know the population standard
deviation, s, our method for guessing the true value of
the mean, m, is to use a z confidence interval.
This technique
is unrealistic in that you must know the true population standard
deviation. In practice, we will
estimate this value with the sample standard deviation, s, but a different technique is appropriate (See Day
22).
Reading: Section
8.7.
Activity: Presentation 3.
Central Limit Theorem (Section 8.3)
Choose from among the following distributions: Exponential, Normal, Cauchy CAUTION: this distribution
violates the CLT hypotheses, Uniform,
Bernoulli, Problem 4.53's distribution.
Generate a sample of 5 items.
Calculate the mean and record the result. Repeat several hundred times. Finally, graph the quantile plot of the list of means from
your several hundred samples.
Repeat for a sample of 50 items.
Repeat for a sample of 200 items.
Compare all of your results to the theory from the CLT.
Gosset Simulation. Take samples of
size 5 from a normal distribution.
Use s instead of
s in the standard 95% confidence z-interval.
Repeat 100 times to see if the true coverage is 95%. (My program GOSSET accomplishes this.) We will pool our results to see how close we are to
95%. A century ago, Gosset noticed
this phenomenon and guessed what the true distribution should be. A few years later Sir R. A. Fisher
proved that Gosset's guess was correct, and the t distribution was accepted by the statistical
community. Gosset was unable to
publish his results under his own name (to protect trade secrets), so he used
the pseudonym "A. Student".
You will therefore sometimes see the t distribution referred to as "Student's
t distribution".
Goals: Introduce
t-test. Understand how the z-test is inappropriate in most small sample
situations.
Skills:
…
Know why
using the t-test or the t-interval when s
is unknown is appropriate.
When we use s instead of s and do not
use the correct t distribution, we
find that our confidence intervals are too narrow, and our hypothesis tests
reject H0 too often.
…
Realize that the
larger the sample size, the less serious the problem. When we have
larger sample sizes, say 15 to 20, we notice that the simulated success rates
are much closer to the theoretical.
Thus the issue of t vs z is a moot point for large samples.
Reading: Sections 9.8 to
9.11.
Activity: Matched Pairs vs 2-Sample. Proportions.
Matched Pairs problems are really one sample datasets disguised as two sample
datasets because two measurements on the same subject are taken. Sometimes "subject" is a
person; other times it is less recognizable, such as a year. The key issue is that two
measurements have
been taken that are related to one another. One quick way to tell if you have a two sample problem is
whether the lists are of different lengths. Obviously if the lists are of different lengths, they are
not paired together. Naturally the
tricky situation is when the lists are of the same length, which occurs often
when researchers assign the same number of subjects to each of treatment and
control groups.
Once you realize that a sample is a matched pairs data set and that
the difference in the two measurements is the important fact, the
analysis proceeds just like one sample problems, but you use the list of
differences. In this respect,
there is nothing new about the matched pairs situation.
Proportions: What are the true batting averages of baseball players? Do we believe results from a few
games? A season? A career? We can use the binomial distribution as a model for getting
hits in baseball, and examine some data to estimate the true hitting ability of
some players. Keep in mind as we
do this the four assumptions of the binomial model, and whether they are truly
justifiable.
For a typical baseball player, we can look at confidence intervals for the true
percentage of hits he gets. Using
our results from linear combinations (Day 13), we can develop the
two sample proportions formulas. On the calculator, the command is STAT TEST
1-PropZInt.
Technical note: the Plus 4 Method
will give more appropriate confidence intervals. As this method is extraordinarily easy to use (add
2 to the numerator,
and 4 to the denominator), I recommend you always use it when constructing
confidence intervals for proportions.
For two sample problems, divide the 2 and 4 evenly between the two
samples; that is, add 1 to each numerator and 2 to each denominator. Furthermore, the Plus 4 Method seems to
work even for very small sample sizes, which is not the advice generally given
by textbooks for the large sample approximation. The Plus 4 Method advises that samples as small as 10 will
have fairly reliable results; the large sample theory requires 5 to 10 cases in
each of the failure and success
group. Thus, at
least 20 cases are required, and that is only
when p is close to 50 %.
Goals: Recognize
when matched-pairs applies.
Introduce proportions.
Skills:
…
Detect situations
where the matched pairs t-test
is appropriate. The nature of the matched pairs is that each value of
one of the variables is associated with a value of the other variable. The most common example is a repeated
measurement on a single individual, like a pre-test and a post-test. Other situations are natural pairs,
like a married couple, or twins.
In all cases, the variable we are really interested in is the difference in the two scores or
measurements. This single
difference then makes the matched pairs test a one-variable
t-test.
…
Detect situations
where proportions z-test is
correct. We have several conditions that are necessary for
using proportions. We must have
situations where only two outcomes
are possible, such as yes/no, success/failure, live/die, Rep/Dem, etc. We must have independence between
trials, which is typically simple to justify; each successive measurement has
nothing to do with the previous one.
We must have a constant probability of success from trial to trial. We call this value p. And
finally we must have a fixed number of trials in mind beforehand; in contrast,
some experiments continue until a
certain number of successes has occurred.
…
Know the conditions
when the normal approximation is appropriate. In order to use the
normal approximation for proportions, we must have a large enough sample
size. The typical rule of thumb is
to make sure there are at least 5 successes and at least 5 failures in the
sample. For example, in a sample
of voters, there must be at least 5 Republicans and at least 5 Democrats, if we
are estimating the proportion or percentage of Democrats in our
population. (Recall the m&m's
example: when you each had fewer than 5 blue or green m&m's, I made you
take more until you had at least 5.)
…
Know the Plus 4
Method. A recent (1998) result from statistical research suggested
that the typical normal theory failed mysteriously in certain unpredictable
situations. Those researchers
found a convenient "fix": pretend there are 4 additional
observations,
2 successes and 2 failures. By
adding these pretend cases to our real cases, the resulting confidence
intervals almost magically capture the true parameter the stated percentage of
the time. Because this
"fix" is so simple, it is the recommended approach in all
confidence
interval problems. Hypothesis testing procedures remain
unchanged.
Reading: Sections 10.1 to
10.4.
Activity: Argument by contradiction. Scientific method. Type I and Type II error diagram. Courtroom terminology.
Some terminology:
Null hypothesis. A statement about a parameter. The null hypothesis is
always an equality or a single claim (like two variables are
independent). We assume the null
hypothesis is true in our following calculations, so it is important that the
null be a specific value or fact that can be assumed.
Alternative hypothesis.
The alternative hypothesis is a statement that we will
believe if the null hypothesis is rejected. The alternative does not have to be the complement of the
null hypothesis. It just has to be
some other statement. It
can be an inequality, and usually is.
One- and Two-Tailed Tests.
A one-tailed test is one where the alternative
hypothesis is in only one direction, like "the mean is
less than 10".
A two-tailed test is one where the alternative hypothesis is
indiscriminate about direction, like "the mean is not
equal to 10".
When a researcher has an agenda in mind, he will usually choose a
one-tailed test. When a researcher
is unsure of the situation, a two-tailed test is appropriate.
Rejection rule. To decide between two competing hypotheses, we create
a rejection rule. It's usually as
simple as "Reject the null hypothesis if the sample mean is greater than
10. Otherwise fail to
reject." We always want to
phrase our answer as "reject the null hypothesis" or "fail to
reject the null hypothesis".
We never want to say "accept the null hypothesis". The reasoning is this: Rejecting the null hypothesis means the
data have contradicted the assumptions we've made (assuming the null hypothesis
was correct); failing to reject the null hypothesis doesn't mean we've proven
the null hypothesis is true, but rather that we haven't seen anything to doubt
the claim yet. It
could be the case that we just haven't taken a large enough
sample yet.
Type I Error. When we reject the null hypothesis when
it is in fact true, we have made a Type I error. We have made a conscious decision to treat this error as a
more important error, so we construct our rejection rule to make this error
rare.
Type II Error. When we fail to reject the null
hypothesis, and in fact the alternative hypothesis is the true one, we have
made a Type II error. Because we
construct our rejection rule to control the Type I error rate, the Type II
error rate is not really under our control; it is more a function of the
particular test we have chosen.
The one aspect we can
control is the sample size.
Generally, larger sample make the chance of making a Type II error
smaller.
Significance level, or size of the test.
The probability of making a
Type I error is the significance level.
We also call it the size of the test, and we use the symbol a to represent it. Because we want the Type I error to be rare, we usually will
set a to be a small number, like .05 or .01 or even
smaller. Clearly smaller is
better, but the drawback is that the smaller a is,
the larger the Type II error becomes.
P-value. There are two definitions for the P-value. Definition 1: The P-value is the alpha level that will cause us
to just reject our observed data. Definition 2:
The P-value is the chance of seeing data as extreme or more extreme than
the data actually observed. Using
either definition, we calculate the P-value as an area under a tail in a
distribution. Caution: the P-value
calculation will depend on whether we have a one- or a two-tailed test.
Power. The power of a test is the probability of rejecting
the null hypothesis when the alternative hypothesis is true. We are calculating the chance of making
a correct decision. Because the
alternative hypothesis is usually not an equality statement, it is more
appropriate to say that power is a function rather than just a single value.
We will examine these ideas using the z-test. The TI-83 command
is STAT
TEST ZTest. The command gives you a menu of items to input. It assumes your null hypothesis is a
statement about a mean m. you
must tell the assumed null value, m0,
the alternative claim, either two-sided, or one of the one-sided choices. You also need to tell the calculator
how your information has been stored, either as a list of raw DATA
or as summary STATS. If you choose CALCULATE the machine will simply display the test statistic and
the P-value. If you choose , the calculator will graph the P-value
calculation for you. You should
experiment to see which way you prefer.
Goals: Introduce
statistical inference - Hypothesis testing.
Skills:
…
Recognize the two
types of errors we make.
If we decide to reject a null hyothesis, we might be
making a Type I error. If we fail
to reject the null hypothesis, we might be making a Type II error. If it turns out that the null
hypothesis is true, and we reject it because our data looked weird, then we
have made a Type I error.
Statisticians have agreed to control this type of error at a specific
percentage, usually 5%. On the
other hand, if the alternative hypothesis is true, and we
fail to reject the null hypothesis, we have also made a
mistake. This second type of error is
generally not controlled by us; the sample size is the determining
factor here.
…
Understand why one
error is considered a more serious error.
Because we control the
frequency of a Type I error, we feel confident that when we reject the null
hypothesis, we have made the right decision. This is how the scientific method works; researchers usually
set up an experiment so that the conclusion they would like to make is the
alternative hypothesis. Then if
the null hypothesis (usually the opposite of what they are trying to show) is
rejected, there is some confidence in the conclusion. On the other hand, if we fail to reject the null hypothesis, the most useful
conclusion is that we didn't have a large enough sample size to detect a real
difference. We aren't really
saying we are confident the null hypothesis is a true statement; rather we are
saying it could be true. Because we cannot control the frequency
of this error, it is a less confident statement.
…
Become familiar with
"argument by contradiction".
When researchers are trying to
"prove" a treatment is better or that their hypothesized mean is the
right one, they will usually choose to assume the opposite as the null
hypothesis. For election polls,
they assume the candidate has 50% of the vote, and hope to show that is an
incorrect statement. For showing
that a local population differs from, say, a national population, they will
typically assume the national average applies to the local population, again
with the hope of rejecting that assumption. In all cases, we formulate the hypotheses
before collecting data; therefore, you will never see a
sample average in either a null or alternative
hypothesis.
…
Understand why we
reject the null hypothesis for small p-values. The p-value is the
probability of seeing a sample result "worse" than the one
we actually
saw. In this sense,
"worse" means even more evidence against the null hypothesis; more
evidence favoring the alternative hypothesis. If this probability is small, it means either we have
observed a rare event, or that we have made an incorrect assumption, namely the
null hypothesis. Statisticians and
practitioners have agreed that 5% is a reasonable cutoff between a result that
contradicts the null hypothesis and a result that could be argued to
be in agreement
with the null hypothesis. Thus, we
reject our claim only when the p-value is a small number.
Reading: Sections 10.5 to
10.7.
Activity: Testing Simulation.
In this experiment, you will work in pairs and generate data for your partner
to analyze. Your partner will come
up with a conclusion (either reject the null hypothesis or fail to reject the
null hypothesis) and you will let them know if they made the right decision or
not. Keep careful track of the
success rates.
For each of these simulations, let the null hypothesis mean be
H0: m = 20, let n
= 10, and let s = 5. You
will change m for each repetition.
1) Without your
partner knowing, choose either 16, 18, 20, 22, or 24 for m. Then
use your calculator and generate 10 observations. Use MATH PRB randNorm( M, 5, 10 ) -> L1 where M is the value of
m you chose for this replication. Clear the screen (so your partner can't
see what you did) and give them the calculator. They will perform a hypothesis test using the .05
significance level and tell you their decision.
2) Repeat step 1 until you have
each done at least 10 hypothesis tests; it is not necessary to have each value
of m exactly twice, but try to do each one at least
once. Do m = 20 at least twice each. (We need more cases for 20 because we're using a small
significance level.)
3) Keep track of the results you
got (number of successful decisions and number of unsuccessful decisions) and
report them to me so we can all see the combined results.
In addition to the simulation above, which uses the TI-83's built in routines,
you should also be able to calculate the observed probabilities we saw in our
chart exactly. For
example, what is the probability that the sample average will cause
you to reject the null hypothesis that the mean is 20 when the true mean is
really 24? These calculations are
known as power calculations, and are a vital part of the choice of a test or
the design of an experiment. For
if you only had a very small chance of detecting an important difference, then
the procedure was really a waste of resources; you weren't going to reject
anyway. Typically, these
explorations will become sample size problems.
One last point today is that these techniques all require random samples, and
typically either normally distributed data or large sample sizes (to invoke the
CLT). In particular be cautious of
situations where the entire population has been measured. In these cases, asking the statistical
question "Is the difference due to chance?" makes no sense. Because the entire population was
measure, there is no chance involved; there is no sampling error due to items
not being chosen for the sample.
This situation occurs often in science. For example, a psychologist may conduct an experiment on a
litter of rats. This litter is of
course not a random sample of all rats; rather it is the rats available to that
researcher at that time. What is
typically assumed then is that this sample of rats is like a random sample. Whether this assumption is true or not cannot be
checked. Most researchers simply
proceed. I think it is important
that you at least acknowledge this "fudging". In practice, though, there will be
little you will do about it.
Goals: Interpret
significance level. Observe the
effects of different values of the population mean. Recognize limitations to inference.
Skills:
…
Interpret
significance level.
Our value for rejecting, usually .05, is the
percentage of the time that we falsely reject a true null hypothesis. It does not measure whether we had a
random sample; it does not measure whether we have bias in our sample. It only measures whether random data could look like the
observed data.
…
Understand how the
chance of rejecting the null hypothesis changes when the population mean is
different than the hypothesized value.
When the population
mean is not the hypothesized value, we expect to reject the null
hypothesis more often. This is
reasonable, because rejecting a false null hypothesis is a correct
decision. Likewise, when the null
hypothesis is in fact true, we hope to seldom decide to reject. If we have generated enough
replications in class, we should see a power curve emerge that tells us how
effective our test is for various values of the population
mean.
…
Know the limitations
to confidence intervals and hypothesis tests. Often users of
statistical inference techniques will blindly use them without checking the
implicit hypotheses. The main
points to watch for are non-random samples, misinterpreting what
"rejecting the null hypothesis" means, and misunderstanding what
error the margin of error is measuring.
Reading: Section 10.8 and 10.11
to 10.12.
Activity: Finish 2-sample work.
Proportions. FIX THIS
Goals: Complete
2-sample t-test.
Skills:
…
Know the typical null
hypothesis for 2-sample hypothesis tests.
The typical null hypothesis
for 2-sample problems, both matched and independent samples, is that of
"no difference". For the
matched pairs, we say H0: m = 0, and for
the 2 independent samples we say H0: m1
= m2.
As usual, the null hypothesis is an equality statement, and the
alternative is the statement the researcher typically wants to end up
concluding. In both 2-sample
procedures, we interpret confidence intervals as ranges for the
difference in means, and hypothesis tests as whether the
observed difference in means is far from zero.
…
Be able to correctly
choose the technique from among the z-test, the t-test,
the matched pairs t-test,
the 2 sample t-test, and
tests for proportions.
When we do not know the population standard deviation, we must use the t-procedures. When we have two dependent samples, we use the matched pairs t-test. When we are working with binomial counts, we use the proportions tests. See the previous days for more lengthy descriptions.
…
Know the mechanics of
the proportions z-tests.
For proportions tests, we use counts of successes and total sample sizes. These are put into a z-formula because in the binomial, once the proportion is known, so is the variance. The resulting test statistic has an approximate normal distribution, for large sample size.
…
Detect situations
where the 2-proportion z-test is correct.
When we have two independent binomial samples, the two-sample proportions test or interval is appropriate.
Reading: Chapters 8 to
10.
Activity: Presentation 4.
Statistical Inference (Chapters 8 to 10)
Make a claim, a statistical hypothesis, and test it. Gather appropriate data to test your claim. Discuss and justify any assumptions you
made. Explain why your test is the
appropriate technique.
Review
Goals: Conclude
course topics. Know everything.
Reading: Chapters 8 to
10.
Activity: Exam 3.
This last exam covers the t-tests
and intervals and the z-tests and
intervals for proportions in Chapters 8 through 10.
Managed by: Chris Edwards
edwards at uwosh dot edu
Last updated November 26, 2006