Day By Day Notes for PBIS 187

Sports Mathematics

Fall 2006

 

Day 1

Activity: Go over syllabus.  Take roll.  Overview examples: NCAA tournament, QB rating, Batting averages, What is random?

http://www.sabernomics.com/sabernomics/index.php/2006/05/age-cut-offs-and-month-of-birth-in-baseball

http://www-math.bgsu.edu/~albert/papers/saber.html

http://www.sabr.org

http://sabermetrics.hnrc.tufts.edu

http://www.baseball-reference.com

Goals:     Review course objectives: collect data, summarize information, make inferences, reason logically.

Day 2

Activity: Home Run Comparisons.

Pick one of the top home run hitters of all time (get the data from http://www.baseball-reference.com) and create graphical summaries of their yearly home run totals.  Make a histogram, a stem plot, and a quantile plot.

Useful commands for the calculator:

    
STAT EDIT  (Use one of the lists to enter data, L1 for example; the other L's can be used too.)
    
2nd STATPLOT 1 On  (Use this screen to designate the plot settings.  You can have up to three plots on the screen at once.  For now we will only use one at a time.)
    
ZOOM 9  (This command centers the window around your data.)
    
PRGM QUANTILE ENTER  (This program plots the sorted data and "stacks" them up, as opposed to a histogram, which places the boxes side by side.)

From your displays, write a short description of the player's home run history.


To make a histogram:
  Enter data into a list on the TI-83.  Setup one of the plots.  Zoom the window settings.

To interpret a histogram:
  Each "bin" is represented by a rectangle; the height is proportional to the number of cases in that bin or interval.  Tall boxes mean lots of data; short boxes (or empty boxes) indicate little (or no) data.

To make a stem plot:
  Choose a "numbers place", such as tens, hundreds, etc. for a stem.  (You may also have to consider ones, tenths, hundreds, etc.  The choice of stem will be dictated by how many data points end up on each row; too many stems and each row has just one or two items.  Too few stems and you have one or two stems with all the data.  Choosing the proper stem requires good judgment.)  After choosing a stem, make a column of these stems starting at the lowest value, and without skipping any values.  Then go through the data set and record each data point on the appropriate row (stem), writing down only the digit to the right of the stem's digit.  For example, if you have chosen the tens place for the stem, the data value 123 would belong on the stem labeled "12" and you jot down the number "3" for the leaf.  When you are finished, you may want to sort the items (the leaves) on each row (stem).  Note:  the stem plot is a visual display; make sure each digit you write down occupies the same amount of space.  If you are typing, use Monaco or Courier or some other fixed-width font.  It is especially tempting to squeeze together a string of 1's.

To interpret a stem plot:
  Each row of a stem plot can be interpreted in the same way as a bin in a histogram; wide stems (just like tall boxes in a histogram) represent lots of data points.  One advantage of a stem plot over a histogram is that every data point appears in the stem plot; in the histogram, all you know is how many data values are in an interval.

To make a quantile plot:
  A quantile plot is a graph of the rank of a data value (lowest, second lowest, etc.) to its data value.  We put the ranks on the left (the vertical scale) and the data values on the bottom (the horizontal scale).  All quantile plots start on the lower left and end on the upper right.  The TI-83 program QUANTILE will graph a quantile plot for you; all you need to tell the calculator is which list your data is in.

To interpret a quantile plot:
  The slope of the graph is the important feature of a quantile plot.  Steep sections represent x-values with lots of data values; flat sections are areas with little or no data.

Goals:     Perform graphical summaries (describing data with pictures).  Be able to use the calculator to make a histogram or a quantile plot.  Be able to make a stem plot by hand.

Skills:

                        Identify types of variables.  To choose the proper graphical displays, it is important to be able to differentiate between Categorical and Quantitative (or Numerical) variables.  Categorical variables do not have numerical values, or if they are numerical, it is only a label.

                        Be familiar with types of graphs.  To graph categorical variables we use bar graphs or pie graphs.  To graph numerical variables, we use histograms, stem plots, or QUANTILE (TI-83 program).  In practice, most of our variables will be numerical but it is still important to choose the right display.

                        Summarize data into a frequency table.  The easiest way to make a frequency table is to TRACE the boxes in a histogram and record the classes and counts.  You can control the size and number of the classes with Xscl and Xmin in the WINDOW menu.  The decision as to how many classes to create is arbitrary; there isn't a "right" answer.  One popular suggestion is try the square root of the number of data values.  For example, if there are 25 data points, use 5 intervals.  If there are 50 data points, try 7 intervals.  This is a rough rule; you should experiment with it.  The TI-83 has a rule for doing this; I do not know what their rule is.  You should experiment by changing the interval width and see what happens to the diagram.

                        Use the TI-83 to create an appropriate histogram or quantile plot.  STAT PLOT is our main tool for viewing distributions of data.  Histograms are common displays, but have flaws; the choice of class width is troubling as it is not unique.  The quantile plot is more reliable, but less common.  For interpretation purposes, remember that in a histogram tall boxes represent places with lots of data, while in a quantile plot those same high-density data places are steep.

                        Create a stem plot by hand.  The stem plot is a convenient manual display; it is most useful for small datasets, but not all datasets make good stem plots.  Choosing the "stem" and "leaves" to make reasonable displays will require some practice.  Some notes for proper choice of stems: if you have many empty rows, you have too many stems.  Move one column to the left and try again.  If you have too few rows (all the data is on just one or two stems) you have too few stems.  Move to the right one digit and try again.  Some datasets will not give good pictures for any choice of stem, and some benefit from splitting or rounding (see the example in class).

                        Describe shape, center, and spread.  From each of our graphs, you should be able to make general statements about the shape, center, and spread of the distribution of the variable being explored.  Our descriptors will be simple words like symmetric, skewed, two-peaked, etc.

Day 3

Activity: Cumulative Progress.

Examples:  Pennant races, Running pace, Bowling averages.

http://www.alexreisner.com/baseball/history/race  Davenport's graphs.

To display cumulative progress, use the program
PROGRESS.  The program will prompt you for whether you want the endpoint to be the average of the list or a number you input.  For the pennant races and other yes/no type responses, use INPUT and give it the value "0".  For the other examples, we will likely use AVERAGE, but you can explore the shape of the graph with other values.  In all graphs, regions of similar slope have similar averages.  We will discuss this phenomenon in our class examples.

Numerical summaries, including box plots:
  Our main numerical summaries will be the mean, the median, and the standard deviation.  The mean is the arithmetic average, the median is the middle number in the sorted list, and the standard deviation is a measure of how spread out the values are.  Roughly, most data sets are 4 to 6 standard deviations wide.  That is, the largest value is close to 4 to 6 standard deviations above the smallest value.

The 5-number summary uses the smallest value, the largest value, the median, and the medians of the two halves of the data.  These two other medians are called the quartiles, because they split the data set up into quarters.  The box plot is a visual picture of the 5-number summary.  The calculator has a selection in the
STAT PLOT menu for this (the 5th icon).  However, I recommend using the modified box plot (the 4th icon) as it has a built-in outlier detector.  This outlier detection routine is not foolproof; we still need good judgment.  But it at least gives us more than just our opinion.

Goals:     Be able to make and interpret a cumulative progress graph.  Be able to calculate and interpret numerical summaries.  Be able to make and interpret a box plot.

Skills:

                        Know the basics of a cumulative progress graph.  Quite simply, record the result over time.  Up indicates success, down indicates failure.  If the result is continuous (as in running or bowling) then it will be appropriate to modify the slope (see next item.)

                        Know the two ways a cumulative progress graph can be drawn.  When comparing several subjects (like teams' season records) and the response is yes/no, or win/loss, etc., it may make more sense to simply plot the graph without adjustment, to allow a comparison.  Up indicates a success, down indicates failure, and the endpoint (to the right) will not be at zero unless by coincidence.  When an adjustment is made, we require the right endpoint to be at zero, and the amount for each success and failure is adjusted accordingly.  Personally I think this is best done with a computer program.  You are basically multiplying each element in the list by a proportional amount.  For the yes/no type answers, use the average .5 in the PROGRESS program.

                        Recognize the features easily seen in a cumulative progress graph.  The most visual feature of a cumulative progress graph is the fact that parallel lines denote periods of equivalent performance.  For example, if the graph over one period of time has the same slope as over another period of time, then the performance (batting average, running pace, or whatever is being measured) is the same for both time periods.

                        Use the TI-83 to calculate summary statistics.  Calculating may be as simple as entering numbers into your calculator and pressing a button.  Or, if you are doing some things by hand, you may have to organize information the correct way, such as listing the numbers from low to high.  On the TI-83, the numerical measures are accessed in 1-Var Stats function in the STAT CALC menu.  Please get used to using the statistical features of your calculator to produce the mean.  While I know you can calculate the mean by simply adding up all the numbers and dividing by the sample size, you will not be in the habit of using the full features of your machine, and later on you will be missing out.

                        Compare several lists of numbers using box plots.  For two lists, the best simple approach is the back-to-back stem plot.  For more than two lists, I suggest trying box plots, side-by-side, or stacked.  At a glance, then, you can assess which lists have typically larger values or more spread out values, etc.

                        Understand box plots.  You should know that the box plots for some lists don't tell the interesting part of those lists.  For example, box plots do not describe shape very well; you can only see where the quartiles are.  Alternatively, you should know that the box plot can be a very good first quick look.

                        Understand the effect of outliers on the mean.  The mean (or average) is unduly influenced by outlying (unusual) observations.  Therefore, knowing when your distribution is skewed or symmetric is helpful.

                        Understand the effect of outliers on the median.  The median is almost completely unaffected by outliers.  For technical reasons, though, the median is not as common in scientific applications as the mean.

Day 4

Activity: Basketball and football scores comparisons.  Do teams that score many points also give up many points?  Can final score be predicted from half time score?  Using the data below, make scatter plots of team score versus opponent score and half time score versus final score.  For each scatter plot, include a correlation coefficient.

2005 Green Bay Packers

Week

Opponent

Half

Final

2nd

1

17

3

3

0

2

26

7

24

17

3

17

13

16

3

4

32

7

29

22

5

3

35

52

17

7

23

17

20

3

8

21

7

14

7

9

20

3

10

7

10

25

17

33

16

11

20

14

17

3

12

19

14

14

0

13

19

7

7

0

14

13

10

16

6

15

48

3

3

0

16

24

7

17

10

17

17

13

23

10


Nov 2005 Milwaukee Bucks

Game

Opponent

Half

Final

2nd

1

102

50

102

52

2

96

46

110

64

3

100

49

105

56

4

110

53

103

50

5

102

40

103

63

6

109

46

85

39

7

87

48

90

42

8

103

44

82

38

9

100

39

80

41

10

97

51

108

57

11

99

44

91

47

12

85

35

76

41

13

100

55

100