Day By Day Notes for MATH 301

Fall 2006

 

Day 1

Activity: Go over syllabus.  Take roll.  Overview examples: Randomness - coin example.  Gilbert trial.  Election polls. Spam filters.

Creating random samples.  The text is remiss in telling us how to actually select random samples in practice.  Many texts fail in this regard, so to fill in this blank, we will use three methods of sampling today: dice, a table of random digits, and our calculator.  To make the problem feasible, we will only use a population of size 6.  (I know this is unrealistic in practice, but the point today is to see how randomness works, and trust that hopefully the results extend to larger problems.)  Pretend that the items in our population (perhaps they are people) are labeled 1 through 6.  For each of our methods, you will have to decide in your group what to do with "ties".  Keep in mind the goal of simple random sampling: at each stage, each remaining item has an equal chance to be the next item selected.

By rolling dice, generate a sample of three people.  (Let the number on the die correspond to one of the items.)  Repeat 20 times, giving 20 samples of size 3.

Using the table of random digits, starting at any haphazard location, select three people.  (Let the random digit correspond to one of the items.)  Repeat 20 times, giving 20 more samples of size 3.

Using your calculator, select three people.  The TI-83 command
MATH randInt(2,4,5) will produce 5 numbers between 2 and 4, inclusive, for example.  (If you leave off the third number, only one value will be generated.)  If your calculator has a rand function only, you can achieve the same result as the TI-83 MATH randInt(2,4) with int(3*rand)+2. Repeat 20 times, giving 20 more samples of size 3.

Your group should have drawn 60 samples at the end.  Keep careful track of which samples you selected; record your results in order, as 125 or 256, for example.  (125 would mean items 1, 2, and 5 were selected.)  We will pool the results of everyone's work together on the board.

Goals:     Review course objectives: collect data, summarize information, model with probability, make inferences.

Gain practice taking random samples.  Understand what a simple random sample is. 
Become familiar with randInt(.  Accept that calculator is random.

Skills:

                        Know the definition of a Simple Random Sample (SRS).  Simple Random Samples can be defined in two ways:
1)  An SRS is a sample where, at each stage, each item has an equal chance to be the next item selected.
2)  A scheme were every possible sample has an equal chance to be the
sample results in an SRS.

                        Select an SRS from a list of items.  The TI-83 command randInt( will select numbered items from a list randomly.  If a number selected is already in the list, ignore that number and get a new one.  Remember, as long as each remaining item is equally likely to be chosen as the next item, you have drawn an SRS.

                        Understand the real world uses of SRS.  In practice, simple random samples are not that common.  It is just too impractical (or impossible) to have a list of the entire population available.  However, the idea of simple random sampling is essentially the foundation for all the other types of sampling.  In that sense then it is very common.

Reading: Sections 1.1 to 1.6.

 

Day 2

Activity: Dance Fever example.

Use the "Arizona Temps" dataset to calculate means, standard deviations, the 5-number summaries.  To calculate our summary statistics with the TI-83, we will use
STAT CALC 1-Var Stats (to use List 1) or STAT CALC 1-Var Stats L2 for List 2, for example.  There are two screens of output; we will be mostly concerned with the mean , the standard deviation Sx, and the five-number summary on screen two.

Answer these questions:

1)  Are high and low temperatures distributed the same way, other than the obvious fact that highs are higher than lows?
2)  How does a single case affect the calculator's routines?  (What if we had had an outlier?)
3)  What information does the 5-number summary disguise?

Now, create the following lists:

1)  A list of 10 numbers that has only one number below the mean.
2)  A list of 10 numbers that has the standard deviation greater than the mean.
3)  A list of 10 numbers that has a standard deviation of zero.

For your fourth list start with any 21 numbers.  Find a number N
such that 14 of the numbers in your list are within N of the average.  For example, pick a number N (say 4), calculate the average plus 4, the average minus 4, and count how many numbers in your list are between those two values.  If the count is less than 14, try a larger number for N (bigger than 4).  If the count is more than 14, try a smaller number for N (smaller than 4).

Finally, compare the standard deviation to the Interquartile Range (IQR = Q3 - Q1).

 

Goals:     Compare numerical measures of center and spread.  Use technology to summarize data with numerical measures.  Interpret standard deviation as a measure of spread.

Skills:

                        Understand the effect of outliers on the mean.  The mean (or average) is unduly influenced by outlying (unusual) observations.  Therefore, knowing when your distribution is skewed is helpful.

                        Understand the effect of outliers on the median.  The median is almost completely unaffected by outliers.  For technical reasons, though, the median is not as common in scientific applications as the mean.

                        Use the TI-83 to calculate summary statistics.  Calculating may be as simple as entering numbers into your calculator and pressing some buttons: STAT CALC 1-Var Stats.  Or, if you are doing some things by hand, you may have to organize information the correct way, such as listing the numbers from low to high.  Please get used to using the statistical features of your calculator to produce the means, standard deviations, etc.  While I know you can calculate the mean by simply adding up all the numbers and dividing by the sample size, you will not be in the habit of using the full features of your machine, and later on you will be "missing the boat".

                        Understand standard deviation.  At first, standard deviation will seem foreign to you, but I believe that it will make more sense the more you become familiar with it.  In its simplest terms, the standard deviation is non-negative number that measures how "wide" a dataset is.  One common interpretation is that the range of a dataset is about 4 standard deviations.  Another interpretation is that the standard deviation is roughly ¾ times IQR; that is the standard deviation is a bit smaller than the IQR.  Eventually we will use the standard deviation in our calculations for statistical inference; until then, this measure is just another summary statistic, and getting used to this number is your goal.  The normal curve of Chapter 6 will further help us understand standard deviation.

Reading: Sections 1.7 to 1.9 and 8.3 (excluding normal quantile-quantile plots).

 

Day 3

Activity: Use the "Arizona Temps" dataset to practice creating the histograms, stemplots, boxplots, and quantile plots for several lists.  Compare and interpret the graphs.  Identify shape, center, and spread.

Compare these measures with the corresponding numerical measures you calculated on Day 2.  Notice that the boxplots and numerical measures cannot describe shape very well.  The histograms are hard to use to compare two lists.  The stem and leaf is difficult to modify.

Useful commands for the TI-83:
STAT EDIT (use one of the lists to enter data, L1 for example; the other L's can be used too)
2nd STATPLOT 1 On (Use this screen to designate the plot settings.  You can have up to three plots on the screen at once.  For now we will only use one at a time.)
ZOOM 9 This command centers the window around your data.
PRGM EXEC QUANTILE ENTER This program I wrote plots the sorted data and "stacks" them up.  It is essentially a quantile plot.

Using the plots now instead of the summary statistics, answer these questions again:

1)  Are high and low temperatures distributed the same way, other than the obvious fact that highs are higher than lows?
2)  How does a single case affect the calculator's routines?  (What if we had had an outlier?)
3)  What information does the 5-number summary disguise?

 

Goals:     Be able to use the calculator to make a histogram, boxplot, or a quantile plot.  Be able to make a stemplot by hand.

Skills:

                        Summarize data into a frequency table.  The easiest way to make a frequency table is to TRACE the boxes in a histogram and record the classes and counts.  You can control the size and number of the classes with Xscl and Xmin in the WINDOW menu.  The decision as to how many classes to create is arbitrary; there isn't a "right" answer.  One popular suggestion is try the square root of the number of data values.  For example, if there are 25 data points, use 5 intervals.  If there are 50 data points, try 7 intervals.  This is a rough rule; you should experiment with it.  The TI-83 has a rule for doing this; I do not know what their rule is.  You should experiment by changing the interval width and see what happens to the diagram.

                        Use the TI-83 to create an appropriate histogram, boxplot, or quantile plot.  STAT PLOT is our main tool for viewing distributions of data.  Histograms are common displays, but have flaws; the choice of class width is troubling as it is not unique.  The quantile plot is more reliable, but less common.  For interpretation purposes, remember that in a histogram tall boxes represent places with lots of data, while in a quantile plot those same high-density data places are steep.

                        Create a stemplot by hand.  The stemplot is a convenient manual display; it is most useful for small datasets, but not all datasets make good stemplots.  Choosing the "stem" and "leaves" to make reasonable displays will require some practice.  Some notes for proper choice of stems: if you have many empty rows, you have too many stems.  Move one column to the left and try again.  If you have too few rows (all the data is on just one or two stems) you have too few stems.  Move to the right one digit and try again.  Some datasets will not give good pictures for any choice of stem, and some benefit from splitting or rounding (see the example in the text).

                        Describe shape, center, and spread.  From each of our graphs, you should be able to make general statements about the shape, center, and spread of the distribution of the variable being explored.

                        Compare several lists of numbers using boxplots.  For two lists, the best simple approach is the back-to-back stemplot.  For more than two lists, I suggest trying boxplots, side-by-side, or stacked.  At a glance, then, you can assess which lists have typically larger values or more spread out values, etc.

                        Understand boxplots.  You should know that the boxplots for some lists don't tell the interesting part of those lists.  For example, boxplots do not describe shape very well (apart from rough symmetry); you can only see where the quartiles are.  Alternatively, you should know that the boxplot can be a very good first quick look at a dataset.

Reading: Sections 2.1 to 2.3.

 

Day 4

Activity: Sample Spaces.  Venn Diagrams.  Coins, Dice.  Pascal's Triangle.

Using either complete sampling spaces (theory) or simulation, find (or estimate) these chances:

1)  Roll two dice, one colored, one white.  Find the chance of the colored die being less than the white die.

2)  Roll three dice and find the chance that the largest of the three dice is a 6.  (Ignore multiple values; that is, the largest value when 6, 6, 4 is rolled is a 6.)

3)  Roll three dice and find the chance of getting a sum of less than 8.

Goals:     Create sample spaces.  Use Venn diagrams to organize sample spaces.  Use simulation to estimate probabilities.

Skills:

                        Know the definitions of Sample Space, Event, Outcome, etc.  The basic language of probability will be used throughout the course, so it is important for you to be conversant in it.

                        Be able to use a Venn diagram.  The Venn diagram is a way of partitioning the sample space into mutually exclusive regions.  It can be useful for simply organizing sets, or sometimes is quite useful in understanding proofs (as we will see in the inclusion/exclusion formula on Day 6.)

                        List simple sample spaces.  Flipping coins and rolling dice are common events to us, and listing the possible outcomes lets us explore probability distributions.  We will not delve too deeply into probability rules; rather, we are more interested in the ideas of probability and I think the best way to accomplish this is by example.

                        Simulation can be used to estimate probabilities.  If the number of repetitions of an experiment is large, then the resulting observed frequency of success can be used as an estimate of the true unknown probability of success.  However, a "large" enough number of repetitions may be more than we can reasonably perform.  For example, for problem 1 today, a sample of 100 will give results between 32/100 and 51/100 (.32 to .51) 95% of the time.  That may not be good enough for our purposes.  Even with 500, the range is 187/500 to 230/500 (.374 to .460).  Eventually the answers will converge to a useful percentage; the question is how soon that will occur.  We will have answers to that question after Chapter ?.

                        Recognize the usefulness and properties of Pascal's Triangle.  Pascal's Triangle is old (known to the Persians and the Chinese in the 11th century) yet is still quite useful.  There are just two rules to construct Pascal's Triangle:  each row begins and ends with a 1, and each entry is the sum of the two entries above it to the left and the right.  From such a simple construction, though, we encounter many relationships: the combination formula, the triangular numbers, the Fibonacci numbers, the powers of 2, among others.  Our chief interest is in the combination formula and its relationship to the binomial distribution.

Reading: Section 2.3.

 

Day 5

Activity: Presentation 1.

Summaries (Chapters 1 and 8.3)

Gather 3 to 5 variables on at least 20 subjects; the source is irrelevant, but knowing the data will help you explain its meaning to us.  Be sure to have at least one numerical and at least one categorical variable.  Demonstrate that you can summarize data graphically and numerically.

Combinations vs Permutations.

Goals:     Continue exploring Pascal's Triangle and how it relates to counting (permutations and combinations).

Skills:

                        Know the Permutation and Combination formulas.  When counting the number of ways of choosing items or ordering items, our formulas are nCr and nPr, respectively.  You will need to work enough problems so that you know when to use each of them.  One way to keep them straight is to think of a Combination as a Committee of people, and a Permutation as a Photograph of that committee.  (There are more permutations than combinations for a particular choice of n and r.)  Also don't forget our trick of listing the complete sample space, but only for small problems!

 

Reading: Sections 2.4 and 2.5.

 

Day 6

Activity: Finish Combinations and Permutations.

Arrange the letters in FREDA.  Arrange the letters in FREED.  Arrange the letters in ERRORS.  Arrange the letters in SETTER.

Demonstrate the Inclusion/Exclusion formula with a 3 set Venn diagram.

Use Venn diagrams to "prove":
A = (A
«B) » (A«B')
(A
»B)' = A'«B'

Basic probability rules:

Probability is a number between 0 and 1, inclusive.
Mutually Exclusive events add when finding the union.
Mutually Exclusive and exhaustive events add to one.

Goals:     Know the rules of probability, including addition, complement, and inclusion/exclusion.  The multiplication rule will be covered on Day 7.

Skills:

                        Understand the probability rules.  Being adept at probability begins with knowing definitions and knowing basic formulas.  For example, you can't prove things about mutually exclusive sets if you can't recite the definition of mutually exclusive.  Memorize at first; later it becomes "learned", not "memorized".

                        Relate the rules to sample spaces.  Remember that the rules we're discussing are all based on counting elements in sample spaces.  Sometimes it is helpful to have a few "standard" examples in mind so conjectures or steps in reasoning can be verified.  For example, the inclusion exclusion principle is shown well with the two-dice problem "what is the chance of at least one six?".  Ignoring the intersection makes the probability too large.

                        Realize how the Venn diagram can help verify results.  The inclusion/exclusion formula is a good example where a Venn diagram can help with the proof or development.  Other examples are DeMorgan's Laws.  For Bayes' formula, on Day 7, the Venn diagram will also be useful.

Reading: Sections 2.6 to 2.8.

 

Day 7

Activity: Constructing probability trees.  Demonstrating Bayes' with the rare disease problem.

Consider a card trick where two cards are drawn sequentially off the top of a shuffled deck.  (There are 52 cards in a deck, 4 suits of 13 ranks.)  We want to calculate the chance of getting hearts on the first draw, on the second draw, and on both draws.  We will organize our thoughts into a tree diagram, much like water flowing in a pipe.  On each branch, the label will be the probability of taking that branch; thus at each node, the exiting probabilities (conditional probabilities) add to one. 

On the far right of the tree, we will have the intersection events.  Their probability is found by multiplying.

Calculate the chances of:

1)  Drawing a heart on the first card.
2)  Drawing a heart on the second card.
3)  Drawing at least one heart.
4)  Drawing two hearts.
5)  Drawing a heart on the second draw given that a heart was drawn first.
6)  Drawing a heart on the first draw given that a heart was drawn first.

Now we will do this work for the rare disease problem (Problem 2.128).

Goals:     Be able to express probability calculations as tree diagrams.  Be able to reverse the events in a probability tree, which is what Bayes' formula is about.

Skills:

                        Know how to use the multiplication rule in a probability tree.  Each branch of a probability tree is labeled with the conditional probability for that branch.  To calculate the joint probability of a series of branches, we multiply the conditional probabilities together.  Note that at each branching in a tree, the (conditional) probabilities add to one, and that overall, the joint probabilities add to one.

                        Recognize conditional probability in English statements.  Sometimes the key word is "given".  Other times the conditional phrase has "if".  But sometimes the fact that a statement is conditional is disguised.  For example:  "Assuming John buys the insurance, what is the chance he will come out ahead" is equivalent to "If John buys insurance, what is the chance he will come out ahead".

                        Be able to use the conditional probability formula to reverse the events in a probability tree.  The key here is the symmetry of the events in the conditional probability formula.  We exchange the roles of A and B, and tie them together with our formula for Pr(A«B).

                        Know the definition of independence.  Independence is a fact about probability, not about sets.  Contrast this to "disjoint" which is a property of sets.  In particular, independent events are by definition not disjoint.  Independence is important later as an assumption as it allows us to multiply individual probabilities together without having to worry about conditional probability.

Reading: Sections 3.1 and 3.2.

 

Day 8

Activity: Continue coins and dice.  Introduce Random Variables.

We will finish up the problems from Day 4.  Also in our tables, we will include random variables.

Answer the following questions:

1)  What is the chance of getting a sum of 8 on two dice?
2)  What is the chance of getting a sum of 10 on two dice?
3)  What is the chance of getting a sum of x
on two dice, where x is between 1 and 13?
4)  What is the chance of getting 10 heads on 20 flips of a fair coin?
5)  How can you get the TI-83 to graph a probability histogram?

Derive a pmf and its cdf.  Use the sum on two dice as an example.  Know how to work back and forth from one to the other.

Goals:     Understand that variables may have values that are not equally likely.

Skills:

                        Understand discrete random distributions and how to create simple ones.  We have listed sample spaces of equally likely events, like dice and coins. Events can further be grouped together and assigned values.  These new groups of events may not be equally likely, but as long as the rules of probability still hold, we have valid probability distributions.  Pascal's triangle is one such example, though you should realize that it applies only to fair coins.  We will work with "unfair coins" (proportions) later, in Chapter 5.  Historical note: examining these sampling distributions led to the discovery of the normal curve in the early 1700's.  We will copy their work and "discover" the normal curve for ourselves too using dice.

                        Know the definition of a discrete probability mass function (pmf).  If a non-negative function sums to 1 over some set, then we have a discrete pmf.  It is not necessary for the set to be finite; this means we may need to work with infinite sums.  Because each item in the sum is a probability, it is necessary that each value is less than one.  (Contrast this with the continuous distributions on Day 9.)

                        Know the definition of a discrete cumulative distribution function (cdf).  If a non-decreasing function begins at 0 from the left and ends at 1 on the right, and has no place where the derivative is non-zero, then we have a discrete cdf.  The key is that discrete cdf's are stairs, flat spot with discrete jumps.

Reading: Section 3.3.

 

Day 9

Activity: Presentation 2.

Probability (Chapter 2)

Choose one of the following games and
1)  Give us a short history of the game.
2)  Describe how randomness is part of the game.
3)  Using our probability rules, show us an example using this game.

Games:  Risk, Blackjack, Backgammon, Roulette, Battleship, Poker, Minesweeper, Cribbage


Calculate a pmf and its cdf.  Use the uniform as an example.  Know how to work back and forth from one to the other.  Note:  calculus required!

Go over the cdf method of generating random samples.  Requires the cdf in a formula that can be inverted.

Goals:     Introduce continuous distributions.

Skills:

                        Know the definition of a probability density function (pdf).  If a non-negative function integrates to 1 over some interval, then we have a probability density function.  Notice that the function can certainly be over 1 (contrast to pmf's); the key here is that the area is one, not the maximum height.

                        Know the definition of a continuous cumulative distribution function (cdf).  If a continuous non-decreasing function begins at 0 from the left and ends at 1 on the right, then we have a discrete cdf.  The key is the continuity.  If a function has only jumps, it is discrete.  If a function has no jumps, it is continuous.  A function with both is mixed.  An example of a mixed distribution is a question like "If you are employed, what is your income?"  People without a job have no income, so there is a spike at 0.

                        Realize that these formulas and functions we are exploring are simply models.  What we ar