Math 385/585 Applied Regression Analysis

Fall 2015

Section 001 1:50 to 2:50, M W F

Instructor: Dr. Chris Edwards           Phone: 948-3969        Office: Swart 123

Classroom: Swart 203           Text: Applied Linear Statistical Models, 5th edition, by Kutner, Nachtsheim, Neter, and Li. Earlier editions of the text will likely be adequate, but you will have to allow for different page numbers and homework problem numbers.

Catalog Description: A practical introduction to regression emphasizing applications rather than theory. Simple and multiple regression analysis, basic components of experimental design, and elementary model building. Both conventional and computer techniques will be used in performing the analyses. Prerequisite: Math 201 or Math 301 and Math 256 each with a grade of C or better.

Course Objectives: The goal of statistics is to gain understanding from data. This course focuses on critical thinking and active learning involving statistical regression. Students will be engaged in statistical problem solving and will develop intuition concerning data analysis, including the use of appropriate technology. Specifically students will develop

¥           an awareness of the nature and value of regression

¥           a sound, critical approach to interpreting statistics, including possible misuses

¥           facility with statistical calculations and evaluations, using appropriate technology

¥           effective written and oral communication skills

Grading: Final grades are based on these 300 points:

 

Topic

Points

Tentative Date

Chapters

Exam 1

Simple Linear Regression

70 pts.

October 9

1 to 4

Exam 2

Multiple Regression I

70 pts.

November 16

5 to 8

Exam 3

Multiple Regression II

70 pts.

December 18

9 to 11, 13 and 14

Homework

15 Points Each

90 pts.

 

 

Final grades are assigned as follows:

270 pts.           A (90 %)

260 pts.           A- (87 %)

250 pts.           B+ (83 %)

240 pts.           B (80 %)

230 pts.           B- (77 %)

220 pts.           C+ (73 %)

210 pts.           C (70 %)

200 pts.           C- (67 %)

190 pts.           D+ (63 %)

180 pts.           D (60 %)

179 pts. or less            F

Homework: I will collect (around) 5 homework problems approximately once every other week. The due dates are listed on the course outline below. I suggest that you work together in small groups on the homework if you like, but donÕt forget that I am a resource for you to use. Often we will use computer software to perform our analyses; include printouts where appropriate, but please make your papers readable. In other words, I donÕt want 25 pages of printout handed in if you can summarize it in two.

Office Hours: Office hours are times when I will be in my office to help you. There are many other times when I am in my office. If I am in and not busy, I will be happy to help. My office hours for Fall 2015 semester are 3:00 to 4:00 Monday and 9:10 to 10:10 Tuesday, or by appointment.

Philosophy:  I strongly believe that you, the student, are the only person who can make yourself learn. Therefore, whenever it is appropriate, I expect you to discover the mathematics we will be exploring. I do not feel that lecturing to you will teach you how to do mathematics. I hope to be your guide while we learn some mathematics, but you will need to do the learning. I expect each of you to come to class prepared to digest the dayÕs material. That means you will benefit most by having read each section of the text and the Day By Day notes before class.

My personal belief is that one learns best by doing. I believe that you must be truly engaged in the learning process to learn well. Therefore, I do not think that my role as your teacher is to tell you the answers to the problems we will encounter; rather I believe I should point you in a direction that will allow you to see the solutions yourselves. To accomplish that goal, I will find different interactive activities for us to work on. Your job is to use me, your text, your friends, and any other resources to become adept at the material. The Day By Day notes also include Skills that I expect you to attain.

Math 585 Expectations: Expectations for the graduate students are understandably more rigorous than for the undergraduate student. Students taking Math 585 will have an extra theoretical problem added to each homework, to be assigned during the semester. In addition, a final project worth 50 points will be due at the end of the semester. This project will involve a complete analysis of a data set, including model estimation, development, and validation.


 

 

Monday

Wednesday

Friday

September 7
No Class

September 9 Day 1
Introduction, Least Squares

September 11 Day 2
Models
Sections 1.1 to 1.5

September 14 Day 3
Estimation
Sections 1.6 to 1.8

September 16 Day 4
 Inference
Sections 2.1 to 2.3

September 18 Day 5
Interval Estimates
Sections 2.4 to 2.6

September 21 Day 6
Homework 1 Due
ANOVA
Section 2.7

September 23 Day 7
GLM
Section 2.8

September 25 Day 8
Residuals I
Sections 3.1 to 3.6

September 28 Day 9
Residuals II
Sections 3.1 to 3.6

September 30 Day 10
Lack of Fit
Section 3.7

October 2 Day 11
Transformations
Sections 3.8 to 3.9

October 5 Day 12
Homework 2 Due

Simultaneous Inference
Sections 4.1 to 4.3

October 7 Day 13
Review

October 9 Day 14
Exam 1

October 12 Day 15
Intro to Matrices
Sections 5.1 to 5.7

October 14 Day 16
Regression Matrices
Sections 5.8 to 5.13

October 16 Day 17
Mult. Reg. Models
Sections 6.1 to 6.2

October 19 Day 18
Inference
Sections 6.3 to 6.6

October 21 Day 19
Intervals
Section 6.7

October 23 Day 20
Diagnostics
Section 6.8

October 26 Day 21
Homework 3 Due
Extra SS
Section 7.1

October 28 Day 22
 GLM Tests
Sections 7.2 to 7.3

October 30 Day 23
Computational Problems and Multicollinearity
Sections 7.5 to 7.6

November 2 Day 24
Polynomial Models
Section 8.1

November 4 Day 25
Interactions I
Section 8.1

November 6 Day 26
Interactions II
Section 8.2

November 9 Day 27
Dummy Variables I
Sections 8.3 to 8.7

November 11 Day 28
 Dummy Variables II
Sections 8.3 to 8.7

November 13 Day 29
Homework 4 Due
Review

November 16 Day 30
Exam 2

November 18 Day 31
Model Building
Sections 9.1 to 9.3

November 20 Day 32
Best Subsets
Sections 9.4 to 9.6

November 23 Day 33
Diagnostics
Sections 10.1 to 10.2

November 25
No Class

November 27
No Class

November 30 Day 34
X Outliers
Section 10.3

December 2 Day 35
Homework 5 Due
Y Outliers
Section 10.4

December 4 Day 36
Trees
Section 11.4

December 7 Day 37
Non-Linear Regression I
Sections 13.1 to 13.2

December 9 Day 38
Non-Linear Regression II
Sections 13.3 to 13.4

December 11 Day 39
Logistic Regression
Sections 14.2 to 14.3

December 14 Day 40
Homework 6 Due
Logistic Inference
Section 14.5

December 16 Day 41
Review

December 18 Day 42
Exam 3

 

Homework Assignments:  (subject to change if we discover difficulties as we go)

Homework 1   Due September 21, 2015

 

1.19, p. 35

:

1

2

3

É

118

119

120

:

21

14

28

É

28

16

28

:

3.897

3.885

3.778

É

3.914

1.860

2.948

Grade Point Average. The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a studentÕs grade point average (GPA) at the end of the freshman year  can be predicted from the ACT test score . The results of the study follow. Assume that first-order regression model (1.1) is appropriate.

a.)   Obtain the least squares estimates of  and , and state the estimated regression function.

b.)  Plot the estimated regression function and the data. Does the estimated regression function appear to fit the data well?

c.)   Obtain a point estimate of the mean freshman GPA for students with ACT test score .

d.)  What is the point estimate of the change in the mean response when the entrance test score increases by one point?

1.23, p. 36

Refer to Grade Point Average Problem 1.19.

a.)   Obtain the residuals . Do they sum to zero in accord with (1.17)?

b.)  Estimate  and . In what units is  expressed?

1.33, p. 37

Refer to the regression model  in Exercise 1.30 Derive the least squares estimator of  for this model.

2.4, p. 90

Refer to Grade Point Average Problem 1.19.

a.)   Obtain a 99 percent confidence interval for . Interpret your confidence interval. Does it include zero? Why might the director of admissions be interested in whether the confidence interval includes zero?

b.)  Test, using the test statistic , whether or not a linear association exists between studentÕs ACT score  and GPA at the end of the freshman year . Use a level of significance of 0.01. State the alternatives, decision rule, and conclusion.

c.)   What is the P-value of our test in part (b)? How does it support the conclusion reached in part (b)?

2.55, p. 97

Derive the expression for SSR in (2.51):

.

Homework 2   Due October 5, 2015

 

2.23, p. 93

Refer to Grade Point Average Problem 1.19.

a.)   Set up the ANOVA table.

b.)  What is estimated by MSR in your ANOVA table? By MSE? Under what condition do MSR and MSE estimate the same quantity?

c.)   Conduct and  test of whether or not . Control the  risk at 0.01. State the alternatives, decision rule, and conclusion.

d.)  What is the absolute magnitude of the reduction in the variation of  when  is introduced into the regression model? What is the relative reduction? What is the name of the latter measure?

e.)   Obtain  and attach the appropriate sign.

f.)    Which measure,  or , has the more clear-cut operational interpretation? Explain.

2.67, p. 99

Refer to Grade Point Average Problem 1.19.

a.)   Plot the data, with the least squares regression line for ACT scores between 20 and 30 superimposed?

b.)  On the plot from part (a), superimpose a plot of the 95 percent confidence band for the true regression line for ACT scores between 20 and 30. Does the confidence band suggest that the true regression relation has been precisely estimated? Discuss.

3.3, p. 146-147

Refer to Grade Point Average Problem 1.19.

a.)   Prepare a box plot for the ACT scores . Are there any noteworthy features in this plot?

b.)  Prepare a dot plot of the residuals. What information does this plot provide?

c.)   Plot the residuals  against the fitted values . What departures from regression model (2.1) can be studied from this plot? What are your findings?

d.)  Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation between the ordered residuals and their expected values under normality. Test the reasonableness of the normality assumption here using Table B.6 and . What do you conclude?

e.)   Conclude the Brown-Forsythe test to determine whether or not the error variance varies with the level of . Divide the data into the two groups,  and , and use  State the decision rule and conclusion. Does your conclusion support your preliminary findings in part (c)?

f.)    Information is given below for each student on two variables not included in the model, namely, intelligence test score .

3.21, p. 151

Derive the result in (3.29):

SSE                =              SSPE             +             SSLF

Homework 3   Due October 26, 2015

 

3.17, p. 150-151

Sales growth. A marketing researcher studied annual sales of a product that had been introduced 10 years ago. The data are as follows, where  is the year (coded) and  is sales in thousands of units:

:

1

2

3

4

5

6

7

8

9

10

:

0

1

2

3

4

5

6

7

8

9

:

98

135

162

178

221

232

283

300

374

395

 

a.)   Prepare a scatter plot of the data. Does a linear relation appear adequate here?

b.)  Use the Box-Cox procedure and standardization (3.36) to find an appropriate power transformation of . Evaluate SSE for . What transformation of  is suggested?

c.)   Use the transformation  and obtain the estimated linear regression function for the transformed data.

d.)  Plot the estimated regression line and the transformed data. Does the regression line appear to be a good fit to the transformed data?

e.)   Obtain the residuals and plot them against the fitted values. Also prepare a normal probability plot. What do your plots show?

f.)    Express the estimated regression function in the original units.

4.21, p. 175

When the predictor variable is so coded that  and the normal error regression model (2.1) applies, are  and  independent? Are the joint confidence intervals for  and   then independent?

5.7, p. 210

Refer to Plastic hardness Problem 1.22. Using matrix methods, find:

1)           

2)           

3)           

5.20, p. 211

Find the matrix  of the quadratic form: .

5.26, p. 212

Refer to Plastic hardness Problems 1.22 and 5.7.

a)    Using matrix methods, obtain the following:

1)  

2)    

3)  

4)  

5)   SSE

6)  

7)    when .

b)   From part (a6), obtain the following:

1)    

2)    

3)  

c)    Obtain the matrix of the quadratic form for SSE.

Homework 4   Due November 13, 2015

 

6.10, p. 249

Refer to Grocery retailer Problem 6.9.

a)    Fit regression model (6.5) to the data for three predictor variables. State the estimated regression function. How are , , and  interpreted here?

b)   Obtain the residuals and prepare a box plot of the residuals. What information does this plot provide?

c)    Plot the residuals against , , , , and  on separate graphs. Also prepare a normal probability plot. Interpret the plots and summarize your findings.

d)   Prepare a time plot of the residuals. Is there any indication that the error terms are correlated? Discuss.

e)    Divide the 52 cases into two groups, placing the 26 cases with the smallest fitted values  into group 1 and the other 26 cases into group 2. Conduct the Brown-Forsythe test for constancy of the error variance, using . State the decision rule and conclusion.

7.4, p. 289

Refer to Grocery retailer Problem 6.9.

a)    Obtain the analysis of variance table that decomposes the regression sum of squares into extra sums of squares associated with ; with X3, given ; and with , given  and X3.

b)   Test whether  can be dropped from the regression model given that  and X3 are retained. Use the  test statistic and . State the alternatives, decision rule, and conclusion. What is the P-value of the test?

c)    Does SSR SSR equal SSR( SSR() here? Must this always be the case?

7.17, p. 290

Refer to Grocery retailer Problem 6.9.

a)    Transform the variables by means of the correlation transformation (7.44) and fit the standardized regression model (7.45).

b)   Calculate the coefficients of determination between all pairs of predictor variables. Is it meaningful here to consider the standardized regression coefficients to reflect the effect of one predictor variable when the others are held constant?

c)    Transform the estimated standardized regression coefficients by means of (7.53) back to the ones for the fitted regression model in the original variables. Verify that they are the same as the ones obtained in Problem 6.10a.

8.16, p. 337-338

Refer to Grade point average Problem 1.19. An assistant to the director of admission conjectured that the predictive power of the model could be improved by adding information on whether the student had chosen a major field of concentration at the time the application was submitted. Assume that regression model (8.33) is appropriate, where  is entrance test score and  if student had indicated a major field of concentration at the time of application and 0 if the major field was undecided. Data for X2 were as follows:

:

1

2

3

É

118

119

120

:

0

1

0

É

1

1

0

a)    Explain how each regression coefficient in model (8.33) is interpreted here.

b)   Fit the regression model and state the estimated regression function.

c)    Test whether the  variable can be dropped from the regression model; use . State the alternatives, decision rule, and conclusion.

d)   Obtain the residuals for regression model (8.33) and plot them against . Is there any evidence in your plot that it would be helpful to include an interaction term in the model?

8.34, p. 340

In a regression study, three types of banks were involved, namely, commercial, mutual savings, and savings and loan. Consider the following system of indicator variables for type of bank:

Type of bank

Commercial

1

0

Mutual savings

0

1

Savings and loan

a)    Develop a first-order linear regression model for relating last yearÕs profit or loss  to size of bank  and type of bank .

b)   State the response functions for the three types of banks.

c)    Interpret each of the following quantities;

1)  

2)  

3)  

Homework 5   Due December 2, 2015

 

9.15, p. 378-379

Kidney function. Creatinine clearance  is an important measure of kidney function, but is difficult to obtain in a clinical office setting because it requires 24-hour urine collection. To determine whether this measure can be predicted from some data that are easily available, a kidney specialist obtained the data that follow for 33 male subjects. The predictor variables are serum creatinine concentration , age , and weight .

a)    Prepare separate dot plots for each of the three predictor variables. Are there any noteworthy features in these plots? Comment.

b)   Obtain the scatter plot matrix. Also obtain the correlation matrix of the  variables. What do the scatter plots suggest about the nature of the functional relationship between the response variable  and each predictor variable? Discuss. Are any serious multicollinearity problems evident? Explain.

c)    Fit the multiple regression function containing the three predictor variables as first-order terms. Does it appear that all predictor variables should be retained?

9.16, p. 379

Refer to Kidney function Problem 9.15.

a)    Using first-order and second-order terms for each of the three predictor variables (centered around the mean) in the pool of potential  variables (including cross products of the first-order terms), find the three best hierarchical subset regression models according to the  criterion.

b)   Is there much difference in  for the three best subset models?

9.19, p. 379

Refer to Kidney function Problem 9.15.

a)    Using the same pool of potential  variables as in Problem 9.16a, find the best subset of variables according to forward stepwise regression with  limits of  and  to add or delete a variable, respectively.

b)   How does the best subset according to forward stepwise regression compare with the best subset according to the  criterion obtained in Problem 9.16a?

10.10 a, p 415

Refer to Grocery retailer Problems 6.9 and 6.10.

a)   Obtain the studentized deleted residuals and identify any outlying  observations. Use the Bonferroni outlier test procedure with . State the decision rule and conclusion.

Homework 6   Due December 14, 2015

 

10.10 b-f, p 415

Refer to Grocery retailer Problems 6.9 and 6.10.

b)   Obtain the diagonal elements of the hat matrix. Identify any outlying  observations using the rule of thumb presented in the chapter.

c)   Management wishes to predict the total labor hours required to handle the next shipment containing  cases whose indirect costs of the total hours is  and  (no holiday in week). Construct a scatter plot of  against  and determine visually whether this prediction involves an extrapolation beyond the range of the data. Also, use (10.29) to determine whether an extrapolation is involved. Do your conclusions from the two methods agree?

d)   Cases 16, 22, 43, and 48 appear to be outlying  observations, and cases 10, 32, 38, and 40 appear to be outlying  observations. Obtain the DFFITS, DFBETAS, and CookÕs distance values for each of these cases to assess their influence. What do you conclude?

e)   Calculate the average absolute percent difference in the fitted values with and without each of these cases. What does this measure indicate about the influence of each of the cases?

f)   Calculate CookÕs distance  for each case and prepare an index plot. Are any cases influential according to this measure?

11.29, p. 479

Refer to Muscle Mass Problem 1.27.

a)    Fit a two-region regression tree. What is the first split point based on age? What is SSE for this two-region tree?

b)   Find the second split point given the two-region tree in part (a). What is SSE for the resulting three-region tree?

c)    Find the third split point given the three-region tree in part (b). What is SSE for the resulting four-region tree?

d)   Prepare a scatter plot of the data with the four-region tree in part (c) superimposed. How well does the tree fit the data? What does the tree suggest about the change in muscle mass with age?

e)    Prepare a residual plot of  versus  for the four-region tree in part (d). State your findings.

13.10, p. 550

Enzyme kinetics. In an enzyme kinetics study the velocity of a reaction  is expected to be related to the concentration  as follows:

 

Eighteen concentrations have been studied and the results follow:

i:

1

2

3

É

16

17

18

:

1

1.5

2

É

30

35

40

:

2.1

2.5

4.9

É

19.7

21.3

21.6

a)    To obtain starting values for g0 and g1, observe that when the error term is ignored we have  where , ,  and . Therefore fit a linear regression function to the transformed data to obtain initial estimates  and .

b)   Using the starting values obtained in part (a), find the least square estimates of the parameters  and .

13.12, p. 550

Refer to Enzyme kinetics Problem 13.10. Assume that the fitted model is appropriate and that large-sample inferences can be employed here.

1)   Obtain an approximate  percent confidence interval for .

2)   Test whether or not ; use . State the alternatives, decision rule, and conclusion.

 

Return to ChrisÕ Homepage

Return to UW Oshkosh Homepage

Managed by: chris edwards

Last updated August 29, 2015