Day By Day
Notes for PBIS 187
Sports
Mathematics
Fall 2006
Activity: Go over syllabus. Take roll. Overview examples: NCAA tournament, QB rating, Batting
averages, What is random?
http://www.sabernomics.com/sabernomics/index.php/2006/05/age-cut-offs-and-month-of-birth-in-baseball
http://www-math.bgsu.edu/~albert/papers/saber.html
http://www.sabr.org
http://sabermetrics.hnrc.tufts.edu
http://www.baseball-reference.com
Goals: Review
course objectives: collect data, summarize information, make inferences, reason
logically.
Activity: Home Run Comparisons.
Pick one of the top home run hitters of all time (get the data from http://www.baseball-reference.com)
and create graphical summaries of their yearly home run totals. Make a histogram, a stem plot, and a
quantile plot.
Useful commands for the calculator:
STAT EDIT (Use
one of the lists to enter data, L1 for
example; the other L's can be used too.)
2nd
STATPLOT 1 On (Use this screen to designate the plot settings. You can have up to three plots on the
screen at once. For now we will
only use one at a time.)
ZOOM
9 (This
command centers the window around your data.)
PRGM QUANTILE
ENTER (This program plots the sorted data and "stacks"
them up, as opposed to a histogram, which places the boxes side by side.)
From your displays, write a short description of the player's home
run history.
To make a histogram: Enter data into a list on the
TI-83. Setup one of the
plots. Zoom the window
settings.
To interpret a histogram: Each "bin" is represented by
a rectangle; the height is proportional to the number of cases in that bin or
interval. Tall boxes mean lots of
data; short boxes (or empty boxes) indicate little (or no) data.
To make a stem plot: Choose a "numbers place",
such as tens, hundreds, etc. for a stem.
(You may also have to consider ones, tenths, hundreds, etc. The choice of stem will be dictated by
how many data points end up on each row; too many stems and each row has just
one or two items. Too
few stems and you have one or two stems with all the
data. Choosing the proper stem
requires good judgment.) After
choosing a stem, make a column of these stems starting at the lowest value, and
without skipping any values. Then
go through the data set and record each data point on the appropriate row
(stem), writing down only the
digit to the right of the stem's digit.
For example, if you have chosen the tens place for the stem, the data
value 123 would belong on the stem labeled "12" and you jot down the
number "3" for the leaf.
When you are finished, you may want to sort the items (the leaves) on
each row (stem). Note: the stem plot is a visual display; make
sure each digit you write down occupies the same amount of space. If you are typing, use Monaco or
Courier or some other fixed-width font.
It is especially tempting to squeeze together a string of 1's.
To interpret a stem plot: Each row of a stem plot can be
interpreted in the same way as a bin in a histogram; wide stems (just like tall
boxes in a histogram) represent lots of data points. One advantage of a stem plot over a histogram is that every
data point appears in the stem plot; in the histogram, all you know is how
many data values are in an
interval.
To make a quantile plot: A quantile plot is a graph of the rank
of a data value (lowest, second lowest, etc.) to its data value. We put the ranks on the left (the
vertical scale) and the data values on the bottom (the horizontal scale). All quantile plots start on the lower
left and end on the upper right.
The TI-83 program QUANTILE will graph a quantile
plot for you; all you need to tell the calculator is which list your data is
in.
To interpret a quantile plot: The slope of the graph is the important
feature of a quantile plot. Steep
sections represent x-values with lots
of data values; flat sections are areas with little or no data.
Goals: Perform
graphical summaries (describing data with pictures). Be able to use the calculator to make a histogram or a
quantile plot. Be able to make a
stem plot by hand.
Skills:
…
Identify types of
variables. To choose the proper graphical displays, it is
important to be able to differentiate between Categorical and Quantitative (or
Numerical) variables. Categorical
variables do not have numerical values, or if they are numerical, it is only a
label.
…
Be familiar with
types of graphs. To graph categorical variables we use bar graphs or
pie graphs. To graph numerical
variables, we use histograms, stem plots, or QUANTILE (TI-83 program). In practice, most of our variables will be numerical but it
is still important to choose the right display.
…
Summarize data into a
frequency table. The easiest way to make a frequency table is
to TRACE the boxes in a histogram and record the classes and
counts. You can control the size
and number of the classes with Xscl
and Xmin
in the WINDOW menu. The decision as to
how many classes to create is arbitrary; there isn't a "right"
answer. One popular suggestion is
try the square root of the number of data values. For example, if there are 25 data points, use 5
intervals. If there are 50 data
points, try 7 intervals. This is a
rough rule; you should experiment with it. The TI-83 has a rule for doing this; I do not know what
their rule is. You should
experiment by changing the interval width and see what happens to the
diagram.
…
Use the TI-83 to
create an appropriate histogram or quantile plot. STAT PLOT is our main tool for
viewing distributions of data.
Histograms are common displays, but have flaws; the choice of class
width is troubling as it is not unique.
The quantile plot is more reliable, but less common. For interpretation purposes, remember
that in a histogram tall boxes represent places with lots of data, while in a
quantile plot those same high-density data places are
steep.
…
Create a stem plot by
hand. The stem plot is a convenient manual display; it is
most useful for small datasets, but not all datasets make good stem plots. Choosing the "stem" and
"leaves" to make reasonable displays will require some practice. Some notes for proper choice of stems:
if you have many empty rows, you have too many stems. Move one column to the left and try again. If you have too few rows (all the data
is on just one or two stems) you have too few stems. Move to the right one digit and try again. Some datasets will not give good
pictures for any choice of stem, and some benefit from splitting or rounding
(see the example in class).
…
Describe shape,
center, and spread.
From each of our graphs, you should be able to make
general statements about the shape, center, and spread of the distribution of
the variable being explored. Our
descriptors will be simple words like symmetric, skewed, two-peaked, etc.
Day 3
Activity: Cumulative Progress.
Examples: Pennant races, Running
pace, Bowling averages.
http://www.alexreisner.com/baseball/history/race Davenport's graphs.
To display cumulative progress, use the program PROGRESS. The
program will prompt you for whether you want the endpoint to be the average of
the list or a number you input.
For the pennant races and other yes/no type responses, use INPUT and give it the value
"0". For the other
examples, we will likely use AVERAGE, but you can explore the shape of the graph with
other values. In all graphs,
regions of similar slope have similar averages. We will discuss this phenomenon in our class examples.
Numerical summaries, including box plots: Our main numerical
summaries will be the mean, the median, and the standard deviation. The mean is the arithmetic average, the
median is the middle number in the sorted list, and the standard deviation is a
measure of how spread out the values are.
Roughly, most data sets are 4 to 6 standard deviations wide. That is, the largest value is close to
4 to 6 standard deviations above the smallest value.
The 5-number summary uses the smallest value, the largest value, the median,
and the medians of the two halves of the data. These two other medians are called the quartiles, because
they split the data set up into quarters.
The box plot is a visual picture of the 5-number summary. The calculator has a
selection in the STAT PLOT menu for
this (the 5th
icon). However, I recommend using
the modified box plot
(the 4th
icon) as it has a built-in outlier detector. This outlier detection routine is not foolproof; we still
need good judgment. But it at
least gives us more than just our opinion.
Goals: Be able
to make and interpret a cumulative progress graph. Be able to calculate and interpret numerical summaries. Be able to make and interpret a box
plot.
Skills:
…
Know the basics of a
cumulative progress graph.
Quite simply, record the result over time. Up indicates success, down indicates
failure. If the result is
continuous (as in running or bowling) then it will be appropriate to modify the
slope (see next item.)
…
Know the two ways a
cumulative progress graph can be drawn. When comparing several subjects (like
teams' season records) and the response is yes/no, or win/loss, etc., it may
make more sense to simply plot the graph without adjustment, to allow a
comparison. Up indicates a
success, down indicates failure, and the endpoint (to the right) will not be at
zero unless by coincidence. When
an adjustment is made, we require the right endpoint to be at zero, and the
amount for each success and failure is adjusted accordingly. Personally I think this is best done
with a computer program. You are
basically multiplying each element in the list by a proportional amount. For the yes/no type answers, use the
average .5 in the PROGRESS
program.
…
Recognize the
features easily seen in a cumulative progress graph. The most visual
feature of a cumulative progress graph is the fact that parallel lines denote
periods of equivalent performance.
For example, if the graph over one period of time has the same slope as
over another period of time, then the performance (batting average, running
pace, or whatever is being measured) is the same for both time
periods.
…
Use the TI-83 to
calculate summary statistics.
Calculating may be as simple as entering numbers into
your calculator and pressing a button.
Or, if you are doing some things by hand, you may have to organize
information the correct way, such as listing the numbers from low to high. On the TI-83, the numerical measures
are accessed in 1-Var Stats function
in the STAT
CALC menu.
Please get used to using the statistical features of your calculator to
produce the mean. While I know you
can calculate the mean by simply adding up all the numbers and dividing by the
sample size, you will not be in the habit of using the full features of your
machine, and later on you will be missing out.
…
Compare several lists
of numbers using box plots.
For two lists, the best simple approach is the
back-to-back stem plot. For more
than two lists, I suggest trying box plots, side-by-side, or stacked. At a glance, then, you can assess which
lists have typically larger values or more spread out values,
etc.
…
Understand box
plots. You should know that the box plots for some lists
don't tell the interesting part of those lists. For example, box plots do not describe shape very well; you can only see where the
quartiles are. Alternatively, you
should know that the box plot can
be a very good first quick look.
…
Understand the effect
of outliers on the mean.
The mean (or average) is unduly influenced by outlying
(unusual) observations. Therefore,
knowing when your distribution is skewed or symmetric is
helpful.
…
Understand the effect
of outliers on the median. The median is almost completely
unaffected by outliers. For
technical reasons, though, the median is not as common in scientific
applications as the mean.
Activity: Basketball and football scores
comparisons. Do teams that score
many points also give up many points?
Can final score be predicted from half time score? Using the data below, make scatter
plots of team score versus opponent score and half time score versus final
score. For each scatter plot,
include a correlation coefficient.
2005 Green Bay Packers
|
Week |
Opponent |
Half |
Final |
2nd |
|
1 |
17 |
3 |
3 |
0 |
|
2 |
26 |
7 |
24 |
17 |
|
3 |
17 |
13 |
16 |
3 |
|
4 |
32 |
7 |
29 |
22 |
|
5 |
3 |
35 |
52 |
17 |
|
7 |
23 |
17 |
20 |
3 |
|
8 |
21 |
7 |
14 |
7 |
|
9 |
20 |
3 |
10 |
7 |
|
10 |
25 |
17 |
33 |
16 |
|
11 |
20 |
14 |
17 |
3 |
|
12 |
19 |
14 |
14 |
0 |
|
13 |
19 |
7 |
7 |
0 |
|
14 |
13 |
10 |
16 |
6 |
|
15 |
48 |
3 |
3 |
0 |
|
16 |
24 |
7 |
17 |
10 |
|
17 |
17 |
13 |
23 |
10 |
Nov 2005 Milwaukee Bucks
|
Game |
Opponent |
Half |
Final |
2nd |
|
1 |
102 |
50 |
102 |
52 |
|
2 |
96 |
46 |
110 |
64 |
|
3 |
100 |
49 |
105 |
56 |
|
4 |
110 |
53 |
103 |
50 |
|
5 |
102 |
40 |
103 |
63 |
|
6 |
109 |
46 |
85 |
39 |
|
7 |
87 |
48 |
90 |
42 |
|
8 |
103 |
44 |
82 |
38 |
|
9 |
100 |
39 |
80 |
41 |
|
10 |
97 |
51 |
108 |
57 |
|
11 |
99 |
44 |
91 |
47 |
|
12 |
85 |
35 |
76 |
41 |
|
13 |
100 |
55 |
100 |