Math 301
Introduction to Probability and Statistics
Spring 2013
Section 001 9:10 to 10:10, M W F
Instructor:
Dr. Chris Edwards Phone:
424-1358 or 948-3969 Office: Swart 123 Link to
Day By Day Notes or PDF
Classroom:
Swart 14 Text: Probability and
Statistics, 8th
edition, by Devore. Earlier editions of the text should be
acceptable.
Required
Calculator: TI-83, TI-83 Plus, or TI-84 Plus,
by Texas Instruments. Other TI graphics
calculators (like the TI-86) do not have the same statistics routines we will
be using and may cause you troubles.
Catalog
Description: Elementary
probability models, discrete and continuous random variables, sampling and
sampling distributions, estimation, and hypothesis testing. Prerequisite: Mathematics 172 with a grade of
C or better.
Course
Objectives: The
goal of statistics is to gain understanding from data. This course focuses on critical thinking and
active learning. Students will be
engaged in statistical problem solving and will develop intuition concerning
data analysis, including the use of appropriate technology. Specifically
students will develop
• an awareness of the nature and value of statistics
• a sound, critical approach to interpreting statistics,
including possible misuses
• facility with statistical calculations and evaluations,
using appropriate technology
• effective written and oral communication skills
Grading:
Final grades are based on these 300 points:
|
Topic |
Points |
Tentative Date |
Chapters |
Exam 1 |
Summaries, Probability |
56 pts. |
March 4 |
1, 2, 3.1 to 3.3, 4.1 to 4.2 |
Exam 2 |
Distributions |
56 pts. |
April 12 |
3, 4, 5 |
Exam 3 |
Inference |
56 pts. |
May 10 |
7, 8 |
Group Presentations |
15 Points Each |
60 pts. |
Various |
|
Homework |
8 Points Each |
72 pts. |
Mostly Weekly |
|
Grades: Grades
will be assigned by the following schedule.
Grade |
Points (Percent) |
Grade |
Points (Percent) |
Grade |
Points (Percent) |
A |
270 (90 %) |
B- |
231 (77 %) |
D+ |
189 (63 %) |
A- |
261 (87 %) |
C+ |
219 (73 %) |
D |
180 (60 %) |
B+ |
249 (83 %) |
C |
210 (70 %) |
D- |
171 (57 %) |
B |
240 (80 %) |
C- |
201 (67 %) |
F |
170 or fewer |
Homework: I will collect three homework problems
approximately once a week. The due dates
are listed on the course outline below. While
I will only be grading three problems, I presume that you will be working on
many more than just the three I assign. I suggest that you work together in small
groups on the homework for this class. What I expect is a well thought-out,
complete discussion of the problem.
Please don’t just put down a numerical answer;
I want to see how you did the
problem. (You won’t get full credit for
just numerical answers.) The method you
use is much more important to me than the final answer. Important
Grading Feature: If your homework percentage is lower than your exam
percentage, I will replace your
homework percentage with your exam percentage.
Therefore, your homework grade cannot be lower than your exam grade.
Presentations: There
will be four presentations, each worth 15 points. The descriptions of the presentations are in
the Day By Day Notes. I will assign you
to your groups for these presentations as I want to
avoid you having the same members each time.
I expect each person in a group to contribute to the work; however, you
can allocate the work in any way you like.
If a group member is not contributing, see me as soon as possible so I
can make a decision about what to do.
The topics are: 1 – Data
Displays (February 15). 2 –
Probability (March 1). 3 – Central
Limit Theorem (April 19). 4 –
Statistical Hypothesis Testing (May 8).
Office Hours: Office
hours are times when I will be in my office to help you. There are many other times when I am in my
office. If I am in and not busy, I will be
happy to help. My office hours for
Spring 2013 semester are 10:20 to 11:00, Monday, Wednesday, and Friday or by
appointment.
Philosophy: I
strongly believe that you, the student, are the only person who can make
yourself learn. Therefore, whenever it is
appropriate, I expect you to
discover the mathematics we will be exploring.
I do not feel that lecturing to you will teach you how to do
mathematics. I hope to be your guide
while we learn some mathematics, but you
will need to do the learning. I expect
each of you to come to class prepared to digest the day’s material. That means you will benefit most by having
read each section of the text and the Day By Day notes
before class.
My idea of
education is that one learns by doing. I
believe that you must be engaged in the learning process to learn well. Therefore, I view my job as a teacher not as
telling you the answers to the problems we will encounter, but rather pointing
you in a direction that will allow you to see the solutions yourselves. To accomplish that goal, I will find
different interactive activities for us to work on. Your job is to use me, your text, your
friends, and any other resources to become adept at the material.
Monday |
Wednesday |
Friday |
January 28 Day 1 |
January 30 Day 2 |
February 1 Day 3 |
February 4 Day 4 |
February 6 Day 5 |
February 8 Day 6 |
February 11 Day 7 |
February 13 Day 8 |
February 15 Day 9 |
February 18 Day 10 |
February 20 Day 11 |
February 22 Day 12 |
February 25 Day 13 |
February 27 Day 14 |
March 1 Day 15 |
March 4 Day 16 |
March 6 Day 17 |
March 8 Day 18 |
March 11 Day 19 |
March 13 Day 20 |
March 15 Day 21 |
March 25 Day 22 |
March 27 Day 23 |
March 29 Day 24 |
April 1 Day 25 |
April 3 Day 26 |
April 5 Day 27 |
April 8 Day 28 |
April 10 Day 29 |
April 12 Day 30 |
April
15 Day 31 |
April 17 Day 32 |
April 19 Day 33 |
April 22 Day 34 |
April 24 Day 35 |
April 26 Day 36 |
April 29 Day 37 |
May 1 |
May 3 Day 39 |
May 6 Day 40 |
May 8 |
May 10 Day 42 |
Homework
Assignments: (subject to change if we
discover issues as we go)
Homework
1, due February 8
1) Anxiety disorders and symptoms can
often be effectively treated with medications.
The accompanying data on a receptor binding measure was read from a
graph in a recent scientific paper on the subject. Use various methods from
Chapter 1 to describe and summarize the data.
In particular, we want to highlight the differences between the two
groups of patients.
PTSD: 10 20 25 28 31 35 37 38 38 39 39 42 46
Healthy: 23 39 40 41 43 47 51 58 63 66 67 69 72
2) Consider a
sample and suppose
that the values of and s have been calculated and are
known. Let and for all i’s.
(The y’s have been “centered”
and the z’s have been
“standardized”.) Find the means and
standard deviations for the two new lists, y
and z, in terms of and s.
3) Specimens of
three different types of rope wire were selected, and the fatigue limit was determined
for each specimen. Construct a
comparative box plot and a plot with all three quantile plots
superimposed. Comment on the information
each display contains. Also explain
which graphical display you prefer for comparing these data sets. Keep in mind your goal is to highlight the
differences between the data sets.
Type
1 350 350 350 358 370 370 370 371 371 372 372 384 391 391 392
Type
2 350 354 359 363 365 368 369 371 373 374 376 380 383 388 392
Type
3 350 361 362 364 364 365 366 371 377 377 377 379 380 380 392
4) The sample
data sometimes
represents a time series, where is the observed
value of a response variable x at
time t. Often the observed series shows a great deal
of random variation, which makes it difficult to study longer-term
behavior. In such situations, it is
desirable to produce a smoothed version of the series. One technique for doing so involves
exponential smoothing. A smoothing constant a
is chosen ( 0 < a
< 1) and then smoothed values are calculated
by and for .
a) Consider the
following time series of the temperature of effluent at a sewage treatment
plant on day t: 47, 54, 53, 50, 46,
46, 47, 50, 51, 50, 46, 52, 50, 50. Plot each x
against t. Does there appear to be any pattern? Now calculate the ’s using a
= .1. Repeat for a = .5.
Which value of a
gives a smoother series?
b) Substitute on the
right-hand side of the expression for , then substitute in terms of and , and so on. On
how many of the values does depend? What
happens to the coefficient on as k increases? If t
is large, how sensitive is to the initial
condition ?
Homework
2, due February 20
1) A utility
company offers a lifeline rate to any household whose electricity usage falls
below 240 kWh during a particular month.
Let A denote the event that a
randomly selected household in a certain community does not exceed the lifeline
usage during January, and let B be
the analogous event for the month of July (A
and B refer to the same household).
Suppose , , and . Compute a) , and b) the probability that the lifeline usage amount is
exceeded in exactly one of the two months. Describe this last event in terms of A and B.
2) The route
used by a motorist to get to work has two stoplights. The probability of signal 1 being red is .4
and for signal 2 it is .5. There is a .6
probability that at least one of the two is red. What is the probability that both signals are
red? That the first is red, but not the
second? That exactly one signal is red?
3) Three
molecules of type A, three of type B, three of type C, and three of type D
are to be linked together to form a chain molecule. An example of one such chain molecule is ABCDABCDABCD and another is BCDDAAABDBCC.
a) How many such
chain molecules are there? [Hint: If the three A’s were distinguishable from one another, such as , , and , how many molecules would there
be? How is this number reduced when the
subscripts are removed from the A’s?]
b) Suppose a
chain molecule of the type described is randomly selected. What is the probability that all three
molecules of each type end up next to one another (such as in BBBAAADDDCCC)?
4) Three
married couples have purchased theater tickets and are seated in a row
consisting of just six seats. If they
take their seats in a completely random fashion (random order), what is the
probability that Jim and Paula (husband and wife) sit in the two seats on the
far left? What is the probability that
Jim and Paula end up sitting next to one another? What is the probability that at least one of
the wives ends up sitting next to her husband?
[Note: you probably won’t be able to use a formula to answer this;
rather, a brute force listing of the sample space may be a more fruitful
strategy.]
Homework
3, due February 27
1) In a Little
League baseball game, suppose the pitcher has a 50 % chance of throwing a
strike and a 50 % chance of throwing a ball, and that successive pitches are
independent of one another. Knowing
this, the opposing team manager has instructed his hitters to not swing at
anything. What is the chance that the
batter walks on four pitches? What is
the chance that the batter walks on the sixth pitch? What is the chance that the batter walks (not
necessarily on four pitches)? Note: in
baseball, if a batter gets three strikes he is out, and if he gets four balls
he walks.
2) A car
insurance company classifies each driver as good risk, medium risk, or poor
risk. Of their current customers, 30 %
are good risks, 50 % are medium risks, and 20 % are poor risks. In any given year, the chance that a driver
will have at least one citation is 10 % for good risk drivers, 30 % for medium
risk drivers, and 50 % for poor risk drivers.
If a randomly selected driver insured by this company has at least one
citation during the next year, what is the chance that the driver was a good
risk? A medium risk?
3) An insurance
company offers its policyholders a number of different premium payment
options. For a randomly selected
policyholder, let X = the number of
months between successive payments. The
cdf of X is:
What is the pmf of X? Using just the cdf, compute P(3 ≤ X ≤ 6) and P(4 ≤ X). [Of course you can use your pmf to check your
work.]
4) The pmf for X, the number of major defects on a
randomly selected appliance in our warehouse, is
Compute
E(X), V(X) using the definition, and V(X)
using the shortcut.
Homework
4, due March 15
1) The time it takes
a read/write head to locate a desired record on a computer disk once positioned
on the right track can be reasonably modeled with a uniform distribution. If the disk rotates once every 25 msec, then
assume . Compute P(10 ≤ X ≤ 20), P(10 ≤ X), the cdf F(X),
E(X),
and V(X).
2) Use the
following pdf and find a) the cdf b) the mean and c) the median of the
distribution.
.
3) There are
two machines available for cutting corks intended for use in wine bottles. The first produces corks with diameters that
are normally distributed with mean 3 cm and standard deviation .1 cm. The second machine produces corks with
diameters that have a normal distribution with mean 3.04 cm and standard
deviation .02 cm. Acceptable corks have
diameters between 2.9 cm and 3.1 cm.
Which machine is more likely to produce an acceptable cork?
4) Suppose the
time it takes for Jed to mow his lawn can be modeled with a gamma distribution
using a
= 2 and b =
0.5. What is the chance that it takes at
most 1 hour for Jed to mow his lawn? At
least 2 hours? Between
0.5 and 1.5 hours?
Homework
5, due March 29
1)
The data below are
precipitation values during March over a 30-year period in Minneapolis-St.
Paul.
0.77 1.20 3.00 1.62 2.81 2.48 1.74 0.47 3.09 1.31 1.87 0.96
0.81 1.43 1.51 0.32 1.18 1.89 1.20 3.37 2.10 0.59 1.35 0.90
1.95 2.20 0.52 0.81 4.75 2.05
Construct
and interpret a normal probability plot for this data set. The large outliers should make the data look
non-normal. Can a transformation make
the data more normal looking? Calculate
the square root and the cube root of each observation, and construct and
interpret normal probability plots. What
do you conclude is the best choice: leave the data along, use the square root,
or use the cube root?
2) A particular
type of tennis racket comes in a midsize version and an oversize version. Sixty percent of all customers who shop at a
certain store want the oversize version.
Assume the next ten customers that come to the store are a random sample
of all customers. (This assumption is
sometimes difficult to justify in practice, but is almost always made to
facilitate our calculations. Whether
this is good practice or not is a worthy discussion.) What is the chance that at least six of the
next ten customers want the oversize racket?
What is the chance that the number of customers out of the next ten who
want the oversize racket is within one standard deviation of the mean? If the store currently has only seven rackets
of each version, what is the chance that all of the next ten customers can
get the version they want? [Hint: It
might be easier to calculate the last part by examining the complement.]
3) Suppose n in a binomial experiment is known and
fixed. Are there any values of p for which the variance is zero? Explain this result in words. For what value of p is the variance maximized?
[Hint: Either graph variance as a function of p or try to minimize using calculus.]
Homework 6, due April 8
1) A second stage smog
alert has been called in a certain area of Los Angeles county in which there
are 50 industrial firms. An inspector
will visit 10 randomly selected firms to check for violations of
regulations. If 15 of the firms are
actually violating at least one regulation, what is the pmf of the number of
firms visited by the inspector that are in violation of at least one
regulation? Find the Expected Value and
Variance for your pmf.
2) A couple
wants to have exactly two girls and they will have children until they have two
girls. What is the chance that they have
x boys? What is the chance they have 4 children
altogether? How many children would you
expect this couple to have? (Find the
Expected Value.)
3) Let X have a binomial distribution with n = 25.
For p = 0.5, 0.6, and 0.9,
calculate the following probabilities both
exactly and with the normal approximation to the binomial. a) P(15 ≤ X ≤
20). b) P(X ≤ 15). c) P(20 ≤ X).
Comment on the accuracy of the normal approximation for these parameter
choices.
Homework
7, due April 24
1) There are 40
students in a statistics class, and from past experience, the instructor knows
that grading exams will take an average of 6 minutes, with a standard deviation
of 6 minutes. If grading times are
independent of one another, and the instructor begins grading at 5:50 p.m.,
what is the chance that grading will be done before the 10 p.m. news begins?
2) A student
has a class that is supposed to end at 9:00 a.m. and another that is supposed
to begin at 9:10 a.m. Suppose the actual
ending time of the first class is normally distributed with mean 9:02 and
standard deviation 1.5 minutes. Suppose
the starting time of the second class is also normally distributed, with mean
9:10 and standard deviation 1 minute.
Suppose also that the time it takes to walk between the classes is a
normally distributed random variable with mean 6 minutes and standard deviation
1 minute. If we assume independence
between all three variables, what is the chance the student makes it to the
second class before the lecture begins?
[Hint: Consider the linear combination . Positive values of correspond to making it to class on time.]
3) A 90 %
confidence interval for the true average IQ of a group of 100 people is (114.4,
115.6). Deduce the sample mean and
population standard deviation used to calculate this interval, and then produce
a 99 % interval from the same data.
4) An
experimenter would like to construct a 99% confidence interval with a length of
no more than 0.2 ohms, for the average resistance of a segment of copper cable
of a certain length. If the experimenter
is willing to assume that the true standard deviation
is no larger than 0.15 ohms, what sample size would you recommend?
Homework
8, due May 3
1) Fifteen
samples of soil were tested for the presence of a compound, yielding these data
values: 26.7, 25.8, 24.0, 24.9, 26.4,
25.9, 24.4, 21.7, 24.1, 25.9, 27.3, 26.9, 27.3, 24.8, 23.6. Is it plausible that these data came from a
normal curve? Support your answer. Now calculate a 95% confidence interval for
the true average amount of compound present.
Comment on any assumptions you had to make.
2) A hot tub
manufacturer advertises that with its heating equipment, a temperature of 100°F
can be achieved in at most 15 minutes. A
random sample of 32 tubs is selected, and the time necessary to achieve 100°F
is determined for each tub. The sample
average time and sample standard deviation are 17.5 minutes and 2.2 minutes,
respectively. Does this data cast doubt
on the company’s claim? Calculate a
P-value, and comment on any assumptions you had to make.
3) A sample of
50 lenses used in eyeglasses yields a sample mean thickness of 3.05 mm and a
population standard deviation of .30 mm.
The desired true average thickness of such lenses is 3.20 mm. Does the data strongly suggest that the true
average thickness of such lenses is undesirable? Use a
= .05. Now suppose the experimenter
wished the probability of a Type II error to be .05 when m
= 3.00. Was a sample of size 50
unnecessarily large?
4) Suppose that
the true average viscosity should be 3000 in a certain process. Do the following measurements support that
standard? State and test the appropriate
hypotheses.
2781 2900 3013 2856 2888
Homework
9, due May 8
1) A
random-number generator is supposed to produce a sequence of 0s and 1s with
each value being equally likely to be a 0 or a 1 and with all values being
independent. In an examination of the
random-number generator, a sequence of 50,000 values is obtained of which
25,264 are 0s.
a) Formulate a set of
hypotheses to test whether there is any evidence that the random-number
generator is producing 0s and 1s with unequal probabilities, and calculate the
corresponding P-value.
b) Compute a 99%
confidence interval for the probability p
that a value produced by the random-number generator is a 0.
c) If a two-sided 99%
confidence interval for this probability is required with a total length no
larger than 0.005, how many additional values need to be investigated?
2) In a survey
of 4,722 American youngsters, 15 % were seriously overweight, as measured by
BMI. Calculate and interpret a 99 %
confidence interval for the proportion of all American youngsters who are
seriously overweight. Discuss whether
the Associated Press (who reported this data) actually took or could have taken
a random sample of American youngsters.
3) In is known
that roughly 2/3 of all human beings have a dominant right foot or eye. Is there also right-sided dominance in
kissing behavior? One scientific article
reported that in a random sample of 124 kissing couples, both people in 80 of
the couple tended to lean more to the right than to the left. If 2/3 of all kissing couples exhibit this
right-leaning behavior, what is the probability that the number in a sample of
124 who do so differs from the expected value by at least as much as what was
actually observed? (i.e.
calculate a P-value.) Does the result of
the experiment suggest that the 2/3 figure is
plausible or implausible? State and test
the appropriate hypotheses.
Managed by: chris
edwards
Last updated January 18, 2013