Day By Day Notes for PBIS 189

Fall 2007

Day 1

Activity:     Go over syllabus.  Take roll.  Overview example: Kristin Gilbert trial.

Goals:         Review course objectives: collect data, summarize information, and make inferences.

I have divided this course into four "units".  Unit 1 (Days 2 through 10) is about summarizing one-dimensional data, that is a single list of data.  Unit 2 (Days 11 through 19) is about summarizing two-dimensional data, that is data that come in two lists.  Unit 3 (Days 20 through 30) is about sampling.  Unit 4 (Days 31 through 42) is about statistical inference.  Throughout the course, we will also focus on the abuses people often make of statistics and statistical methods.  Sometimes these abuses are unintentional.  My hope is that at the end of the course, you will have an appreciation for the "tricks" that people use to lie to you using numbers.

I use the Gilbert trial example because it demonstrates all three of the course's main ideas in action.  The charts we see are examples of how to organize and summarize information that may be complicated by several variables or dimensions (Units 1 and 2).  The argument about whether we would see results such as this one if there really were no relationship between the variables is an example of the probability we will study in Unit 3.  And the trial strategy itself is what statistical inference is all about (Unit 3).  Many of you will encounter inference when you read professional journals in your field and the researchers use statistics to support their conclusions of improved treatments or to estimate a particular proportion or average.

Most if not all of you are taking this course as a general education requirement or a requirement for your major.  There is a reason you have been asked to take such a course: often in our modern world we encounter numerical or logical arguments.  Without proper skills to deal with these arguments, you will often be fooled by "data pushers", those with an agenda and who use slick methods to get you to believe their position.  You should be a skeptic when people use numbers in their persuasive arguments.  Too often, and sometimes unknowingly, such arguments are misleading.  The University believes that a successful college graduate has the ability to reason quantitatively.  Our hope is that your efforts in this course will prepare you appropriately.

I believe to be successful in this course, you must actually read the text (and these notes) carefully, and work problems.  The most important thing is to engage yourself in the material.  However, our class activities will often be unrelated to the homework you practice and/or turn in for the homework portion of your grade; instead they will be for understanding of the underlying principles.  For example, on Day 21 we will simulate Simple Random Sampling by taking 60 samples from each group in the class.  This is something you would never do in practice, but which I think will demonstrate several lessons for us.  In these notes, I will try to point out to you when we're doing something to gain understanding, and when we're doing something to gain skills.

Each semester, I am disappointed with the small number of students who come to me for help outside of class.  I suspect some of you are embarrassed to seek help, or you may feel I will think less of you for not "getting it" on your own.  Personally, I think that if you are struggling and cannot make sense of what we are doing, and don't seek help, you are cheating yourself out of your own education.  I am here to help you learn statistics.  Please ask questions when you have them; there is no such thing as a stupid question.  Often other students have the same questions but are also too shy to ask them in class.  If you are still reluctant to ask questions in class, come to my office hours or make an appointment.  Incidentally, when I first took statistics, I didn't understand it all on my own either, and I too didn't go to the instructor for help.  I also didn't get as high a grade as I could have!

I believe you get out of something what you put into it.  Very rarely will someone fail a class by attending every day, doing all the assignments, and working many practice problems; typically people fail by not applying themselves enough - either through missing classes, or by not allocating enough time for the material.  Obviously I cannot tell you how much time to spend each week on this class; you must all find the right balance for you and your life's priorities.  One last piece of advice: don't procrastinate.  I believe statistics is learned best by daily exposure.  Cramming for exams may get you a passing grade, but you are only cheating yourself out of understanding and learning.

Reading:    (The reading mentioned in these notes refers to what reading you should do for the next day's material.)

To The Student, pages xxiii-xxix.  Bring to class on Day 2 a list of numbers.  (The source is not important, but knowing where the numbers came from and what they mean will help when you try to explain them.  Sports data is often a convenient source, but of course data come from every field.  Be creative.)

Day 2

Activity:     Discussion of variables and graphs.

Units 1 and 2 are about summarization.  We will work on several types of summary:  graphical summaries of a list of numbers, graphical summaries of a list of characteristics, numerical summaries of a list of numbers, and the summaries for two variables.  Sometimes we will summarize situations with one or two numbers, called summary statistics.  Other times, we need more than that, such as a picture, or many summary statistics.  One of your tasks in this course is to gain enough experience to know what is enough summarization for a particular situation.

The first summaries we will work on are the "one-variable graphical techniques".  If you are working with a list of characteristics (attribute data or categorical data), you will need to use a bar graph, or a frequency table, or a pie graph.  If you are working with a list of numbers, then you have many more options: frequency tables, histograms, quantile plots, stem plots, box plots (later), just to name a few.  Your goal in Units 1 and 2 is to become adept at producing these displays, either by hand or with the calculator, and to interpret the main features from either your own creations or someone else's work.

In these notes, I will put the daily task in gray background.

Your task today is, from your list of numbers, to communicate the important information to the person next to you.  (Work in pairs or groups.)  Specifically, make and interpret a frequency table and a histogram.  In your description to your neighbor, keep in mind these terms: symmetry, skew, center, spread, mode, outlier.  (These terms are in the text.)  On your calculator, make sure that you try different window settings for your histogram.  Remember, you are trying to describe your data to your neighbor without having to tell them every single value; you are trying to point out the main features that you have identified.  (For example, deciding whether some list is "skewed" or "symmetric" will often be a matter of opinion; you need to look at many lists before you can make good judgment calls.)

For our next class day, Day 3, print out the Arizona Temps data set and bring your copy to class.  (You can either find this data set online with the link or at the end of this document.)

In these notes, I will put sections of computer commands in boxes, like this one.

Frequently we will use the TI-83 to make our work easier.  At first, you may think of the machines as burdens, confusing and intimidating.  I believe that if you stick with it and become experienced with the tools, you will discover that the calculators are indispensable for calculating statistics.  I will show you what I can about the TI-83's, but it is up to you to practice using them.  I cannot do that part for you, of course.  As always, ASK questions as you have them!

Generally, we have three chores to perform before our machine will show us the graphical display we want.  We must: 1) Enter the data into the calculator.  2) Choose the right options for the display we want.  And 3) Set up the proper window settings.  The commands to do these activities on the calculator are:

1) STAT EDIT  Use one of the lists to enter data, L1 for example; the other L's can be used too.  The L's are convenient work lists.  At times, you may find that you want more meaningful names.  One way to do this is to store the list in a new named list after entering numbers.  The syntax for this is L1 -> NEWL, assuming the data was entered in L1 and you want the new name to be NEWL.

2) 2nd STATPLOT 1 On  Use this screen to designate the plot settings.  You can have up to three plots on the screen at once.  For histograms, we will only use one at a time.  Later, when we see box plots, we will make multiple displays.

3) ZOOM 9  This command centers the window "around" your data.  It is always a good idea to see what the WINDOW settings are.  If you then change any of the WINDOW settings, you will then press GRAPH to see the changes.  (If you use ZOOM 9 again, the changes you just made don't get used!)

Goals:  (In these notes, I will summarize each day's activity with a statement of goals for the day.)

Begin graphical summaries (describing data with pictures).  Be able to use the calculator to make a histogram.

Skills:  (In these notes, each day I will identify skills I believe you should have after working the day's activity, reading the appropriate sections of the text, and practicing exercises in the text.

¥                    Identify types of variables.  To choose the proper graphical displays, it is important to be able to differentiate between Categorical (or Qualitative) and Quantitative (or Numerical) variables.  Of course, it is also necessary to know exactly what a variable is.  Quite simply, a variable is an attribute about a collection of subjects.  As an example, for the students in this class, we could measure your height (a quantitative/numerical variable) or your major (a qualitative/categorical variable).  The important thing is that each subject in the group has a value for the variable.

¥                    Be familiar with types of graphs.  To graph categorical variables we use bar graphs or pie graphs.  To graph numerical variables, we use histograms, stem plots, or QUANTILE plots (using a TI-83 program we will explore on Day 3).  In practice, most of our variables will be numerical but it is still important to know the categorical displays.

¥                    Summarize data into a frequency table and produce a histogram.  A frequency table is a list of class intervals (sometimes called bins or classes) and the number of data values in the class intervals.  You can make a frequency table yourself by just counting how many data values are in each interval, but the easiest way is to first use the TI-83 to make a histogram and then to TRACE over the boxes and record the classes and counts.  In fact, the histogram is just a "picture" of the frequency table.

¥                    Know how to modify the TI-83 default histogram.  You can control the size and number of the classes with Xscl and Xmin in the WINDOW menu.  The decision as to how many classes to create is arbitrary; there isn't a "right" answer, or rather all choices of Xscl and Xmin are "right" answers.  You should experiment every time you use the TI-83 to make a histogram by changing the interval width Xscl and starting point Xmin to see what happens to the display.  I will try to do this in class every time I draw a histogram, so that you get used to it.

¥                    Know how to create and interpret graphs for categorical variables.  The two main graphs for categorical variables are pie graphs and bar charts.  Pie graphs are difficult to make by hand, but are popular on computer programs like Excel.  Bar charts are also common on spreadsheets.  Data represented by pie graphs and bar charts usually are expressed as percents of the whole; thus they add to 100 %.  The ordering for categories is arbitrary; therefore concepts such as skew and center make no sense.

Reading:    Chapter 1.  (Skip Time Plots.)

Day 3

Activity:     Creating and interpreting stem plots, histograms, and quantile plots.

Use the Arizona Temps data set to practice creating the histograms, stem plots, and quantile plots for several lists.  Compare and interpret the graphs.  Identify shape, center, and spread.  Today's new methods, the stem plot and the quantile plot, are alternate ways to summarize a list of numbers.

Answer these questions about the Arizona Temps:

1)     Do any of the lists have outliers?  (There is not a firm definition of exactly what an outlier is.  You will have to develop some personal judgment; the best way is to look at many lists and displays.)

2)    What information does the stem plot show that the histogram hides?

3)    What information does the quantile plot show that the stem plot and histogram hide?

4)    Which display is most "trustworthy"?  That is, which one has the smallest likelihood of misleading you?

The stem plot is a hand technique, most useful for small (under 40 values) data sets.  It is basically a quick way to make a frequency chart, but always using class intervals based on the base 10 system.  This means the intervals will always be ten "units" wide, such as 10 to 19, or 1 to 9, or .01 to .09.  The "unit" chosen is called the stem, and the next digit after the stem is called the leaf.

QUANTILE is a program I wrote that plots the sorted data in a list and "stacks" the values up.  This is known as a quantile plot.  Basically we are graphing the individual data value versus the rank, or percentile, in the data set.  Quantile plots always go up from left to right.  The command is PRGM EXEC QUANTILE ENTER.  The program will ask you for the list where you've stored the data.  A and B are temporary lists used by the program, so if you have data in these lists already, store them in another list before executing.

Goals:         Be able to use make and interpret a quantile plot, using the TI-83 program QUANTILE.  Be able to make and interpret a stem plot by hand.

Skills:

¥                    Use the TI-83 to create an appropriate histogram or quantile plot.  STAT PLOT and QUANTILE are our two main tools for viewing distributions of data on the TI-83.  Histograms are common displays, but have flaws; the choice of class intervals is troubling as it is not unique.  The quantile plot is more reliable, but less common.  For interpretation purposes, remember that in a histogram tall boxes represent places with lots of data, while in a quantile plot those same high-density data places are represented by steepness.

¥                    Create a stem plot by hand.  The stem plot is a convenient manual display; it is most useful for small data sets, but not all data sets make good stem plots.  Choosing the "stem" and "leaf" to make reasonable displays will require some practice.  Some ideas for a proper choice of stems: if you have many empty rows, you have too many stems.  Move one column to the left and try again.  If you have too few rows (all the data is on just one or two stems) you have too few stems.  Move to the right one digit and try again.  Some data sets will not give good pictures for any choice of stem, while some benefit from splitting or rounding (see the example on page 21).

¥                    Describe shape, center, and spread.  From each of your graphs, you should be able to make general statements about the shape, center, and spread of the distribution of the variable being explored.  One of the main conclusions we want to make about lists of data when we are doing inference (Chapters 14 to 22) is whether the data is close to symmetric; many times "close enough" is, well, close enough!  We will discuss this in more detail when we see the Central Limit Theorem in Chapter 11.

Reading:    Chapter 2.

Day 4

Activity:     Dance Fever example.  Using the TI-83 to calculate summary statistics and to make the box plots display.

In addition to graphical displays, we often want to summarize a list of numbers with numerical measures.  You are already familiar with the most famous of these, the average or mean.  Less familiar, but just as important statistically, is the standard deviation, which measures how much the data are spread out.

Use the Arizona Temps data set to calculate the mean, the standard deviation, the 5-number summary, and the associated box plot for any of the variables.

Compare the box plots and numerical measures with the corresponding histograms and quantile plots you made on Day 2.  Note the similarities (where the data values are dense, and where they are sparse) but especially note the differences.  The box plots and numerical measures cannot describe shape very well.  On the other hand, histograms are messy to use to compare two lists.  The stem and leaf is tedious to modify.

Answer these questions about the Arizona Temps:

1)     Are high and low temperatures distributed the same way, other than the obvious fact that highs are higher than lows?  (When we talk of a distribution we mean a description of how the data values are centered and how they are spread out.  Sometimes this will be a simple statement; other times we need to see a graph.)

2)    How does a single case influence the calculator's answers?  (What if there was an outlier in the list?  If you didn't have an outlier, change one of the values to a ridiculously large value and recalculate the graphs and measures.  Notice the effects of these extreme values.)

3)    What information does the box plot disguise?

To calculate our summary statistics, we will use 1-Var Stats (to use List 1) or 1-Var Stats L2 for List 2, for example.  There are two screens of output; we will be mostly concerned with the mean (pronounced "x bar"), the standard deviation Sx, and the five-number summary from screen two.