Math
385/585 Applied Regression Analysis
Fall 2015
Section 001 1:50 to 2:50, M W
F
Instructor: Dr. Chris Edwards Phone: 948-3969 Office: Swart 123
Classroom: Swart 203 Text: Applied Linear Statistical Models, 5th edition, by Kutner, Nachtsheim,
Neter, and
Li. Earlier
editions of the text will likely be adequate, but you will have to allow for
different page numbers and homework problem numbers.
Catalog Description: A practical
introduction to regression emphasizing applications rather than theory. Simple and multiple regression analysis, basic components of
experimental design, and elementary model building. Both conventional
and computer techniques will be used in performing the analyses. Prerequisite:
Math 201 or Math 301 and Math 256
each with a grade of C or better.
Course Objectives: The goal of statistics
is to gain understanding from data. This course focuses on critical thinking
and active learning involving statistical regression. Students will be engaged
in statistical problem solving and will develop intuition concerning data
analysis, including the use of appropriate technology. Specifically students
will develop
¥ an awareness of the nature and value of regression
¥ a sound, critical approach to interpreting statistics,
including possible misuses
¥ facility with statistical calculations and evaluations,
using appropriate technology
¥ effective written and oral communication skills
Grading: Final grades are based on
these 300 points:
|
Topic |
Points |
Tentative Date |
Chapters |
Exam 1 |
Simple Linear Regression |
70 pts. |
October 9 |
1 to 4 |
Exam 2 |
Multiple Regression I |
70 pts. |
November 16 |
5 to 8 |
Exam 3 |
Multiple Regression II |
70 pts. |
December 18 |
9 to 11, 13 and 14 |
Homework |
15 Points Each |
90 pts. |
|
|
Final
grades are assigned as follows: 270
pts. A
(90 %) 260
pts. A-
(87 %) 250
pts. B+
(83 %) 240
pts. B
(80 %) 230
pts. B-
(77 %) 220
pts. C+
(73 %) 210
pts. C
(70 %) 200
pts. C-
(67 %) 190
pts. D+
(63 %) 180
pts. D
(60 %) 179
pts. or less F |
Homework: I
will collect (around) 5 homework problems approximately once every other week. The
due dates are listed on the course outline below. I suggest that you work
together in small groups on the homework if you like, but donÕt forget that I
am a resource for you to use. Often we will use computer software to perform
our analyses; include printouts where appropriate, but
please make your papers readable. In other words, I donÕt want 25 pages of
printout handed in if you can summarize it in two.
Office Hours: Office hours are times when I
will be in my office to help you. There are many other times when I am in my
office. If I am in and not busy, I will be happy to help. My office hours for Fall 2015 semester
are 3:00 to 4:00 Monday and 9:10 to 10:10 Tuesday, or by appointment.
Philosophy: I strongly believe that you, the
student, are the only person who can make yourself learn. Therefore, whenever
it is appropriate, I expect you to discover the mathematics we
will be exploring. I do not feel that lecturing to you will teach you how to do
mathematics. I hope to be your guide while we learn some mathematics, but you
will need to do the learning. I expect each of you to come to class prepared to
digest the dayÕs material. That means you will benefit most by having read each
section of the text and the Day By Day notes before class.
My personal belief is that one learns best by doing.
I believe that you must be truly engaged in the learning process to learn well.
Therefore, I do not think that my
role as your teacher is to tell you the answers to the problems we will
encounter; rather I believe I should point you in a direction that will allow
you to see the solutions yourselves. To accomplish that goal, I will find
different interactive activities for us to work on. Your job is to use me, your
text, your friends, and any other resources to become adept at the material.
The Day By Day notes also include Skills that I expect
you to attain.
Math 585 Expectations: Expectations
for the graduate students are understandably more rigorous than for the
undergraduate student. Students taking Math 585 will have an extra theoretical
problem added to each homework, to be assigned during
the semester. In addition, a final project worth 50 points will be due at the
end of the semester. This project will involve a complete analysis of a data
set, including model estimation, development, and validation.
Monday |
Wednesday |
Friday |
September 7 |
September 9
Day 1 |
September 11
Day 2 |
September 14
Day 3 |
September 16 Day
4 |
September 18
Day 5 |
September 21 Day
6 |
September 23
Day 7 |
September 25
Day 8 |
September 28
Day 9 |
September 30
Day 10 |
October 2 Day
11 |
October 5 Day
12 |
October 7 Day
13 |
October 9 Day
14 |
October 12 Day
15 |
October 14 Day
16 |
October 16 Day
17 |
October 19 Day
18 |
October 21 Day
19 |
October 23 Day
20 |
October 26 Day
21 |
October 28 Day
22 |
October 30 Day
23 |
November 2 Day
24 |
November 4 Day
25 |
November 6 Day
26 |
November 9 Day
27 |
November 11
Day 28 |
November 13
Day 29 |
November 16
Day 30 |
November 18
Day 31 |
November 20
Day 32 |
November 23
Day 33 |
November 25 |
November 27 |
November 30 Day
34 |
December 2 Day
35 |
December 4 Day
36 |
December 7 Day
37 |
December 9 Day
38 |
December 11
Day 39 |
December 14
Day 40 |
December 16
Day 41 |
December 18
Day 42 |
Homework
Assignments: (subject to change if
we discover difficulties as we go)
Homework 1 Due September 21, 2015
1.19, p. 35
: |
1 |
2 |
3 |
É |
118 |
119 |
120 |
: |
21 |
14 |
28 |
É |
28 |
16 |
28 |
: |
3.897 |
3.885 |
3.778 |
É |
3.914 |
1.860 |
2.948 |
Grade Point Average. The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a studentÕs grade point average (GPA) at the end of the freshman year can be predicted from the ACT test score . The results of the study follow. Assume that first-order regression model (1.1) is appropriate.
a.) Obtain the least squares estimates of and , and state the estimated regression function.
b.) Plot the estimated regression function and the data. Does the estimated regression function appear to fit the data well?
c.) Obtain a point estimate of the mean freshman GPA for students with ACT test score .
d.) What is the point estimate of the change in the mean response when the entrance test score increases by one point?
1.23, p. 36
Refer to Grade Point Average Problem 1.19.
a.) Obtain the residuals . Do they sum to zero in accord with (1.17)?
b.) Estimate and . In what units is expressed?
1.33, p. 37
Refer to the regression model in Exercise 1.30 Derive the least squares estimator of for this model.
2.4, p. 90
Refer to Grade Point Average Problem 1.19.
a.) Obtain a 99 percent confidence interval for . Interpret your confidence interval. Does it include zero? Why might the director of admissions be interested in whether the confidence interval includes zero?
b.) Test, using the test statistic , whether or not a linear association exists between studentÕs ACT score and GPA at the end of the freshman year . Use a level of significance of 0.01. State the alternatives, decision rule, and conclusion.
c.) What is the P-value of our test in part (b)? How does it support the conclusion reached in part (b)?
2.55, p. 97
Derive the expression for SSR in (2.51):
.
Homework 2 Due October 5, 2015
2.23, p. 93
Refer to Grade Point Average Problem 1.19.
a.) Set up the ANOVA table.
b.) What is estimated by MSR in your ANOVA table? By MSE? Under what condition do MSR and MSE estimate the same quantity?
c.) Conduct and test of whether or not . Control the risk at 0.01. State the alternatives, decision rule, and conclusion.
d.) What is the absolute magnitude of the reduction in the variation of when is introduced into the regression model? What is the relative reduction? What is the name of the latter measure?
e.) Obtain and attach the appropriate sign.
f.) Which measure, or , has the more clear-cut operational interpretation? Explain.
2.67, p. 99
Refer to Grade Point Average Problem 1.19.
a.) Plot the data, with the least squares regression line for ACT scores between 20 and 30 superimposed?
b.) On the plot from part (a), superimpose a plot of the 95 percent confidence band for the true regression line for ACT scores between 20 and 30. Does the confidence band suggest that the true regression relation has been precisely estimated? Discuss.
3.3, p. 146-147
Refer to Grade Point Average Problem 1.19.
a.) Prepare a box plot for the ACT scores . Are there any noteworthy features in this plot?
b.) Prepare a dot plot of the residuals. What information does this plot provide?
c.) Plot the residuals against the fitted values . What departures from regression model (2.1) can be studied from this plot? What are your findings?
d.) Prepare a normal probability plot of the residuals. Also obtain the coefficient of correlation between the ordered residuals and their expected values under normality. Test the reasonableness of the normality assumption here using Table B.6 and . What do you conclude?
e.) Conclude the Brown-Forsythe test to determine whether or not the error variance varies with the level of . Divide the data into the two groups, and , and use State the decision rule and conclusion. Does your conclusion support your preliminary findings in part (c)?
f.) Information is given below for each student on two variables not included in the model, namely, intelligence test score .
3.21, p. 151
Derive the result in (3.29):
SSE = SSPE + SSLF
Homework 3 Due October 26, 2015
3.17, p. 150-151
Sales growth. A marketing researcher studied annual sales of a product that had been introduced 10 years ago. The data are as follows, where is the year (coded) and is sales in thousands of units:
: |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
: |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
: |
98 |
135 |
162 |
178 |
221 |
232 |
283 |
300 |
374 |
395 |
a.) Prepare a scatter plot of the data. Does a linear relation appear adequate here?
b.) Use the Box-Cox procedure and standardization (3.36) to find an appropriate power transformation of . Evaluate SSE for . What transformation of is suggested?
c.) Use the transformation and obtain the estimated linear regression function for the transformed data.
d.) Plot the estimated regression line and the transformed data. Does the regression line appear to be a good fit to the transformed data?
e.) Obtain the residuals and plot them against the fitted values. Also prepare a normal probability plot. What do your plots show?
f.) Express the estimated regression function in the original units.
4.21, p. 175
When the predictor variable is so coded that and the normal error regression model (2.1) applies, are and independent? Are the joint confidence intervals for and then independent?
5.7, p. 210
Refer to Plastic hardness Problem 1.22. Using matrix methods, find:
1)
2)
3)
5.20, p. 211
Find the matrix of the quadratic form: .
5.26, p. 212
Refer to Plastic hardness Problems 1.22 and 5.7.
a) Using matrix methods, obtain the following:
1)
2)
3)
4)
5) SSE
6)
7) when .
b) From part (a6), obtain the following:
1)
2)
3)
c) Obtain the matrix of the quadratic form for SSE.
Homework 4 Due November 13, 2015
6.10, p. 249
Refer to Grocery retailer Problem 6.9.
a) Fit regression model (6.5) to the data for three predictor variables. State the estimated regression function. How are , , and interpreted here?
b) Obtain the residuals and prepare a box plot of the residuals. What information does this plot provide?
c) Plot the residuals against , , , , and on separate graphs. Also prepare a normal probability plot. Interpret the plots and summarize your findings.
d) Prepare a time plot of the residuals. Is there any indication that the error terms are correlated? Discuss.
e) Divide the 52 cases into two groups, placing the 26 cases with the smallest fitted values into group 1 and the other 26 cases into group 2. Conduct the Brown-Forsythe test for constancy of the error variance, using . State the decision rule and conclusion.
7.4, p. 289
Refer to Grocery retailer Problem 6.9.
a) Obtain the analysis of variance table that decomposes the regression sum of squares into extra sums of squares associated with ; with X3, given ; and with , given and X3.
b) Test whether can be dropped from the regression model given that and X3 are retained. Use the test statistic and . State the alternatives, decision rule, and conclusion. What is the P-value of the test?
c) Does SSR SSR equal SSR( SSR() here? Must this always be the case?
7.17, p. 290
Refer to Grocery retailer Problem 6.9.
a) Transform the variables by means of the correlation transformation (7.44) and fit the standardized regression model (7.45).
b) Calculate the coefficients of determination between all pairs of predictor variables. Is it meaningful here to consider the standardized regression coefficients to reflect the effect of one predictor variable when the others are held constant?
c) Transform the estimated standardized regression coefficients by means of (7.53) back to the ones for the fitted regression model in the original variables. Verify that they are the same as the ones obtained in Problem 6.10a.
8.16, p. 337-338
Refer to Grade point average Problem 1.19. An assistant to the director of admission conjectured that the predictive power of the model could be improved by adding information on whether the student had chosen a major field of concentration at the time the application was submitted. Assume that regression model (8.33) is appropriate, where is entrance test score and if student had indicated a major field of concentration at the time of application and 0 if the major field was undecided. Data for X2 were as follows:
: |
1 |
2 |
3 |
É |
118 |
119 |
120 |
: |
0 |
1 |
0 |
É |
1 |
1 |
0 |
a) Explain how each regression coefficient in model (8.33) is interpreted here.
b) Fit the regression model and state the estimated regression function.
c) Test whether the variable can be dropped from the regression model; use . State the alternatives, decision rule, and conclusion.
d) Obtain the residuals for regression model (8.33) and plot them against . Is there any evidence in your plot that it would be helpful to include an interaction term in the model?
8.34, p. 340
In a regression study, three types of banks were involved, namely, commercial, mutual savings, and savings and loan. Consider the following system of indicator variables for type of bank:
Type
of bank |
|
|
Commercial |
1 |
0 |
Mutual
savings |
0 |
1 |
Savings
and loan |
|
|
a) Develop a first-order linear regression model for relating last yearÕs profit or loss to size of bank and type of bank .
b) State the response functions for the three types of banks.
c) Interpret each of the following quantities;
1)
2)
3)
Homework 5 Due December 2, 2015
9.15, p. 378-379
Kidney function. Creatinine clearance is an important measure of kidney function, but is difficult to obtain in a clinical office setting because it requires 24-hour urine collection. To determine whether this measure can be predicted from some data that are easily available, a kidney specialist obtained the data that follow for 33 male subjects. The predictor variables are serum creatinine concentration , age , and weight .
a) Prepare separate dot plots for each of the three predictor variables. Are there any noteworthy features in these plots? Comment.
b) Obtain the scatter plot matrix. Also obtain the correlation matrix of the variables. What do the scatter plots suggest about the nature of the functional relationship between the response variable and each predictor variable? Discuss. Are any serious multicollinearity problems evident? Explain.
c) Fit the multiple regression function containing the three predictor variables as first-order terms. Does it appear that all predictor variables should be retained?
9.16, p. 379
Refer to Kidney function Problem 9.15.
a) Using first-order and second-order terms for each of the three predictor variables (centered around the mean) in the pool of potential variables (including cross products of the first-order terms), find the three best hierarchical subset regression models according to the criterion.
b) Is there much difference in for the three best subset models?
9.19, p. 379
Refer to Kidney function Problem 9.15.
a) Using the same pool of potential variables as in Problem 9.16a, find the best subset of variables according to forward stepwise regression with limits of and to add or delete a variable, respectively.
b) How does the best subset according to forward stepwise regression compare with the best subset according to the criterion obtained in Problem 9.16a?
10.10 a, p 415
Refer to Grocery retailer Problems 6.9 and 6.10.
a) Obtain the studentized deleted residuals and identify any outlying observations. Use the Bonferroni outlier test procedure with . State the decision rule and conclusion.
Homework 6 Due December 14, 2015
10.10 b-f, p 415
Refer to Grocery retailer Problems 6.9 and 6.10.
b) Obtain the diagonal elements of the hat matrix. Identify any outlying observations using the rule of thumb presented in the chapter.
c) Management wishes to predict the total labor hours required to handle the next shipment containing cases whose indirect costs of the total hours is and (no holiday in week). Construct a scatter plot of against and determine visually whether this prediction involves an extrapolation beyond the range of the data. Also, use (10.29) to determine whether an extrapolation is involved. Do your conclusions from the two methods agree?
d) Cases 16, 22, 43, and 48 appear to be outlying observations, and cases 10, 32, 38, and 40 appear to be outlying observations. Obtain the DFFITS, DFBETAS, and CookÕs distance values for each of these cases to assess their influence. What do you conclude?
e) Calculate the average absolute percent difference in the fitted values with and without each of these cases. What does this measure indicate about the influence of each of the cases?
f) Calculate CookÕs distance for each case and prepare an index plot. Are any cases influential according to this measure?
11.29, p. 479
Refer to Muscle Mass Problem 1.27.
a) Fit a two-region regression tree. What is the first split point based on age? What is SSE for this two-region tree?
b) Find the second split point given the two-region tree in part (a). What is SSE for the resulting three-region tree?
c) Find the third split point given the three-region tree in part (b). What is SSE for the resulting four-region tree?
d) Prepare a scatter plot of the data with the four-region tree in part (c) superimposed. How well does the tree fit the data? What does the tree suggest about the change in muscle mass with age?
e) Prepare a residual plot of versus for the four-region tree in part (d). State your findings.
13.10, p. 550
Enzyme kinetics. In an enzyme kinetics study the velocity of a reaction is expected to be related to the concentration as follows:
Eighteen concentrations have been studied and the results follow:
i: |
1 |
2 |
3 |
É |
16 |
17 |
18 |
: |
1 |
1.5 |
2 |
É |
30 |
35 |
40 |
: |
2.1 |
2.5 |
4.9 |
É |
19.7 |
21.3 |
21.6 |
a) To obtain starting values for g0 and g1, observe that when the error term is ignored we have where , , and . Therefore fit a linear regression function to the transformed data to obtain initial estimates and .
b) Using the starting values obtained in part (a), find the least square estimates of the parameters and .
13.12, p. 550
Refer to Enzyme kinetics Problem 13.10. Assume that the fitted model is appropriate and that large-sample inferences can be employed here.
1) Obtain an approximate percent confidence interval for .
2) Test whether or not ; use . State the alternatives, decision rule, and conclusion.
Managed
by: chris edwards
Last updated August 29, 2015