Day By Day
Notes for MATH 385
Fall 2007
Activity: Go over syllabus. Take roll. Guess some lines.
Goals: Review course objectives: To model linear relationships, interpret the model estimates, and explore the various uses of regression. Introduce Least Squares. Introduce Excel.
Most if not all of you are taking this course as a mathematics elective. The topic of regression is quite useful in fitting equations to data. Such equations can then be used with some degree of predictability; researchers will be able to accurately describe and predict future observations. Regression techniques are a basic part of many software packages, including MINITAB and Excel, both of which we will explore.
I believe to be successful in this course, you must actually read the text (and these notes) carefully, and work problems. The most important thing is to engage yourself in the material. However, our class activities will sometimes be unrelated to the homework you practice and/or turn in for the homework portion of your grade; instead they will be for understanding of the underlying principles. For example, when we are doing simulations of the regression model. This is something you would never do in practice, but which I think will demonstrate several lessons for us. In these notes, I will try to point out to you when we're doing something to gain understanding, and when we're doing something to gain skills.
I believe you get out of something what you put into it. Very rarely will someone fail a class by attending every day, doing all the assignments, and working many practice problems; typically people fail by not applying themselves enough - either through missing classes, or by not allocating enough time for the material. Obviously I cannot tell you how much time to spend each week on this class; you must all find the right balance for you and your life's priorities. One last piece of advice: don't procrastinate. I believe statistics is learned best by daily exposure. Cramming for exams may get you a passing grade, but you are only cheating yourself out of understanding and learning.
In these notes, I will put the daily task in gray background.
Today I would like to explore the mathematical idea of Least Squares. With this technique, an equation is "fitted" to data in such a way that the "squared errors" between the data and the fits is as small as possible. Some history:
In 1795, Carl Friedrich Gauss, at the age of 18, is credited with developing the fundamentals of the basis for least-squares analysis. However, as with many of his discoveries, he did not publish them. The strength of his method was demonstrated in 1801, when it was used to predict the future location of the newly discovered asteroid Ceres.
On January 1st, 1801, the Italian astronomer Giuseppe Piazzi had discovered the asteroid Ceres and had been able to track its path for 40 days before it was lost in the glare of the sun. Based on this data, it was desired to determine the location of Ceres after it emerged from behind the sun without solving the complicated Kepler's nonlinear equations of planetary motion. The only predictions that successfully allowed the German astronomer Franz Xaver von Zach to relocate Ceres were those performed by the 24-year-old Gauss using least-squares analysis. However, Gauss did not publish the method until 1809, when it appeared in volume two of his work on celestial mechanics, Theoria Motus Corporum Coelestium in sectionibus conicis solem ambientium.
The idea of least-squares analysis was independently formulated by the Frenchman Adrien-Marie Legendre in 1805 and the American Robert Adrain in 1808.
(Taken from http://en.wikipedia.org/wiki/Least_squares.)
I want each of us to guess a good fit to some data I will supply, and we will then use the computer to assess which of us made good guesses. I am going to begin using Excel, but later we will use MINITAB quite a bit. However, I like to begin with Excel because we can see dynamically what effect our changes have.
In the spreadsheet, I will use each of your guesses to calculate a "fit" for each point, and from that we will find the error. The sum of the squared errors will be our measure of goodness; small sums mean close (and therefore good) fits.
Skills: (In these notes, each day I will identify skills I believe you should have after working the day's
activity, reading the appropriate sections of the text, and practicing
exercises in the text.
¥
Understand the definition of Least Squares. "Least Squares" is a mathematical concept of goodness concerning
data and an equation describing the data.
Each data value has a "fit" from the model, and the "best
fitting equation" is the one that makes the total sum of the squared
deviations from the fitted model as small as possible.
¥
Know how to input the formulas in a spreadsheet (or by
hand) to assess the goodness of a model. We usually put our data into columns
when we use a spreadsheet.
Additional columns needed to calculate Least Squares are "fit",
"error", and "squared error". The sum of the squared error column is the measure for how
well a model is fitting.
¥
Realize that the idea of Least Squares is not tied to the
notion of Linear Models. the model that we fit can be any
calculable equation. It is common to use models that are linear in the parameters,
but is not necessary.
Reading: (The reading mentioned in these notes refers to what reading you should do for the next day's material.)
Sections 1.1 to 1.5.
Activity: Simulate the basic regression model.
The model we will begin using is the basic regression model, also called simple linear regression. It has one independent, or predictor variable, one dependent, or response variable, several parameters, and a random error term. Notice the model is linear in the parameters because they do not appear multiplied together or with any exponents. This idea becomes very important in Chapter 5 when we use matrices. The error term helps explain unexplained variation, sometimes called "white noise". These errors may be due to other unmeasured variables, or perhaps to just randomness that we cannot explain. One of our tasks in upcoming sessions is to try to determine if the data we've collected matches the model we've selected. Then we will be paying very close attention to all of our assumptions in this basic model.
The alternate model on page 12 is a centered model. We will want to use this model after we learn about multicollinearity. The important feature of this model is that it yields the same fitted values, and is thus an equivalent model. It is important for you to be able to show the equivalence of the two models algebraically.
I think the best way to understand the model presented on page 9 (equation 1.1) is to use a computer to simulate responses from it. We can see the true line on a graph, and how data are scattered around the line.
In my simulation I will assume the errors have a normal distribution, but that is not required for estimation purposes. When we apply statistical inference, however, we will require that assumption. In my spreadsheet, we will be able to control which error distribution we use. One of our goals is to see if different distributions create different views.
Goals: Understand
the notation and ideas of the basic linear model.
Skills:
¥
Understand each term in the basic regression model. Understanding
regression begins with understanding the model we are posing. You need to know what parameters are, and how they differ from random
error. You should be able to recite the model we use from memory.
¥
Know how to simulate the basic regression model. From
our class demonstration, you should be able to produce a simulation yourself,
using a spreadsheet or other computer program, such as MINITAB. While I haven't gone through the MINITAB
commands, if you would like to use that program to do simulations, I can help
you outside of class.
¥
Understand the alternative "centered" model. In
some cases we want to use transformed data instead of raw data. The resulting model produces identical
fitted values and is thus and equivalent model. However, the parameters we use are different. In a sense, parameters are merely a
convenience for us to describe a model, and are not unique.
Reading: Sections 1.6 to 1.8 (first part).
Activity: Estimation of Parameters.
Today we will use Least Squares, and some calculus, to derive the estimates for the simple linear regression model. We will encounter Non-Linear regression later (Chapter 13), but today I will introduce it with an separate model, and we will see how the calculus approach takes us only so far.
To minimize the sums of the squared errors, we will treat the data as fixed, and the parameters as variables. Then, as you know from Calculus I, we find where the derivatives are zero to locate the extrema, in this case the minimum(s). For Simple Linear Regression, it turns out we can do these results with simple algebra. Later on, with more variables (Chapter 5), we will have to use linear algebra instead.
The calculus we do today will involve partial derivatives; for those who haven't had Calculus III yet, fortunately these derivatives are not any trickier than regular Calculus I derivatives. The key is to think of the other variable not being looked at as a fixed constant. Once we have found the derivatives, we set them to zero, simultaneously, and solve. In general, this step is quite difficult. For the case of Simple Linear Regression, it turns out to be quite straightforward. The key is that the derivative of square functions are linear functions (the power rule).
Today is the first day we formally see the residuals, the deviations from the fitted values. Residual analysis is quite important as it is our chief tool for assessing model adequacy. We will come back to them later (Chapter 3). For now, what you need to know about them is their definition and some basic algebraic facts about them, in particular that they sum to zero.
Our last result today involves the error variance, s2. The reasoning behind our estimate of the variance, which is called the mean square error or MSE, is beyond our abilities; you would use techniques from Math 401. The idea is understandable, but requires that you understand the difference between the residuals ei and the error terms ei. The residuals are calculated from data values; the error terms are unobservable terms in the model, the result of a random selection from a distribution. The key result is that if the model is correct, then they have the same normal distribution, so that the variance of the residuals ought to be the same as the variance of the error terms. There is also one more additional complication: degrees of freedom. Again, using Math 401 results, we find estimates for variances by dividing sums of squares by degrees of freedom. The ANOVA results from Day 6 will shed additional light on this situation.
Goals: Know
how the estimates of the basic model are found.
Skills:
¥
Know how calculus is used to derive the Least Squares
estimates. Because Least Squares is an optimization, we can use
basic calculus results to derive the answers. The key idea that makes the solution feasible is that the
derivative of "squaring" is linear, and we know lots about solving
systems of linear equations. Once
we have the partial derivatives for all parameters in the model, we
simultaneously set them equal to zero and solve.
¥
Know the formulas for the model estimates for slope and
intercept. While I'm not a fan of memorizing results, it will be helpful to know at least the form of the least
squares estimates.
¥
Know the definition and simple results for residuals. The
residuals are the deviations of the data from the model. We can also think of them as the
"error" in the fit. You
should know the formula for them as well as simple facts about them, such as
their sum is zero and their sum of squares gives SSE.
¥
Know the best estimate for the model variance. Because
the residuals behave similarly to the model error terms, their mean
square, or sample variance, estimates the
model variance. The key idea will
come up again: variance is estimated using a ratio of sums of squares and
degrees of freedom. See Day 6.
¥
Know what the Estimation of Mean Response is and how to
calculate it. Often we want to estimate a particular point on the
regression line. This is a mean
response, and should not be confused with
a prediction of a new
observation. Basically, to
estimate a mean response for a particular x-value, substitute
that new x-value in the fitted equation.
Reading: Sections 2.1 to 2.3.
Activity: Inference on slope and intercept using a simulation.
I believe the best way to see how the distribution results work is the conduct a simulation. We will do problem 2.66 on page 98 to demonstrate how sampling distributions work. The idea is to generate many, many samples of results and then calculate the estimates for each sample. If we look at appropriate graphs (like a histogram) we can check the theoretical results with the simulated results.
In addition to the simulation, we will also look at the theoretical results. The key on page 42 is that the equations can be rewritten as linear combinations of the y-values. I don't expect you to memorize the details, but you should know the results: the least squares estimates have normal distributions. Therefore, to derive the means and variances we need for the confidence intervals and tests, we use the linear combination formulas from Math 301.
Fortunately, in MINITAB, the calculations are mostly done for us; it is just a matter of interpreting the outputs, which we will spend time doing in class. For hypothesis tests, you must know which null hypothesis is being tested, and how to interpret the P-value. The confidence interval needs to be calculated manually (from the basic output). Of course, it is up to you to learn how to use the software yourself.
Goals: Know the basic confidence interval and hypothesis test results for the slope and intercept.
Skills:
¥
Know the least squares estimates are linear combinations of
the y-values. Once we write the least
squares estimates as linear combinations of the y-values, the linear combination formulas can be used to calculate the
mean and standard deviation of the estimates. You do not need to memorize the particular weights in the
formulas, but you should be able to follow the algebra on page 42.
¥
Know the least squares estimates are tested with the t-test.
We know the estimates are linear combinations. We also know that MSE is an estimate of
s2. Using this information, and results
from MATH 301, we can test the estimates using the t-distribution results.
¥
Understand how simulation can be used to observe sampling
distributions. Using the spreadsheet in class, you should
understand what we mean by the sampling distributions of the least squares
estimates. In particular, you need
to have a solid understanding of the mean and standard deviation. If our model is correct, we can predict
how variable the fitted line can be, and from that information we can assess
the fitness of our model.
Reading: Sections 2.4 to 2.6.
Activity: Interval Estimates.
In addition to point estimates, we typically in statistics use interval estimates. Using our simulation from Day 4, we can investigate how variable the interval estimates are, and we can observe how the confidence coefficient is interpreted. The chief idea is that the interval estimate contains a notion of the sampling variability along with it. The confidence coefficient represents the chance that the interval contains the true parameter, in this case the value on the regression line. Our simulation should show us this.
It is important that you recognize the two types of intervals we are generally interested in: estimation of a mean response, and prediction of a new observation. With the mean response we are estimating where the true regression line falls. With the prediction interval, we are recognizing two sources of variation: the sampling variation, and the randomness inherent in the model, encompassed in the error terms. Thus, the formula (on page 59) shows us two sources of variation, and is thus much wider than the interval for the estimation of a mean response. I will refer back to these simulations often throughout the rest of the course.
We can construct a confidence band by applying our mean response results for all possible x-values. This yields hyperbolas (see page 62). The key difference between this band and an individual confidence interval is that the band's confidence coefficient is a family confidence coefficient. The proper interpretation is the coefficient is the chance that the true regression line lies entirely within the band.
Goals: Use
the MATH 301 results on confidence intervals to estimate the parameters with
intervals.
Skills:
¥
Know how to use the confidence interval results from MATH
301. In MATH 301 we used the normal curve to estimate a parameter, giving a
range of values between which we believed the parameter lies. It was composed of a lower value and an
upper value, and a confidence coefficient, which represented the chance that
the random interval contained the parameter.
¥
Know how a prediction interval differs from a confidence
interval for a mean response. To estimate a new response, we must not only account
for the variation inherent in the model (the epsilon error terms) but also the
uncertainty in our estimates themselves.
Thus the prediction interval is wider, having two sources of variation.
¥
Know how to construct and interpret a confidence band for
the regression line. A typical confidence interval estimates a single
parameter or combination of parameters.
A regression band around the line estimates the region where the line
may completely fall. In general,
due to this global sort of coverage, it must be a larger (wider) interval.
Reading: Section 2.7.
Activity: ANOVA. Homework 1 due today.
The ANOVA table is a convenient way to summarize the information we have from the least squares estimates. We will examine the details today. Some of what we will do is algebraic. The key result is that the sums of squares we are interested in are additive. The sums of squares decompose into two orthogonal components. This means their sums of squares are additive, as the cross product term equals zero. The degrees of freedom associated with these sums of squares are also additive. We will be unable to prove this fact, as it relates to advanced linear algebra. It happens that degrees of freedom can be conveniently associated with parameters estimated, although that appears to be a mysterious explanation.
The last column in the ANOVA table is the Mean Square column. You are already familiar with this idea from calculating the sample variance, where you took a sum of squared deviations and divided by one less than the sample size. That calculation was really a "Sum of Squares" divided by a "Degrees of Freedom". In regression, these Mean Squares, under suitable conditions and hypotheses, have a chi-squared distribution. The ratio of two independent chi-squares divided by their degrees of freedom has an F distribution.
Sometimes an additional column is included in the table, representing the Expected Mean Square. Developing these formulas requires much more mathematical statistics than we have so far, so we will accept these formulas from the text on faith. I have found the greatest use for these EMS's is in choosing the proper test statistic in experimental designs, one of the topics of MATH 386, taught in the spring.
With only one independent variable, the ANOVA table shows us nothing we didn't already have with our standard t-tests and intervals. However, when we proceed to more than one independent variable, the ANOVA approach will be needed as the t-procedures will be inadequate.
The main test we perform is whether there is a relationship present or not. This is most easily phrased by equating the slope to zero. Again, we already have a t-test for that, but there is a corresponding F-test too. One difference between the two is that the t-test is a little more flexible in that we can test one-sided alternate hypotheses, whereas with the F-test we must use the two-sided alternate.
Goals: Understand the details of the ANOVA table, including the F-test.
Skills:
¥
Know the layout and the relationships of the ANOVA
table. The sums of squares and degrees of freedom in an ANOVA table are
additive. We can show the sums add
using algebra; the degrees of freedom require advanced linear algebra. However, we can relate the degrees of
freedom to estimation of parameters.
The ratio of the two is a Mean Square, and ratios of Mean Squares form
the basis of our F-tests.
¥
Understand the components broken down by the Sums of
Squares decomposition. The sketch on page 64 helps remind us what
quantities are involved in the sums of squares. The key notion is that we have different estimates involved,
and the deviations from these estimates are the components in the sums of squares. Caution: not all sums of squares will
add in this way; we must also check the orthogonality. Fortunately, for the regression results
we encounter, the sums of squares always decompose.
¥
Know the F-test
for testing the slope is zero. While
we already have a test for the slope being zero, the t-test, we will not be able to get by with t-tests when we introduce more than one independent
variable. One important note
though is that the F-test can
only test the two-sided alternate hypothesis.
Reading: Section 2.8.
Activity: GLM and R2.
One convenient way of testing regression models is to test individual parameters, as we have been doing. For example, our basic test is whether the slope is zero or not. Another approach is to fit different models, and compare the SSE for each model. This new approach is called the General Linear Models approach, or GLM. We will see more of this approach later, in Chapter 7, but it is appropriate for us to see it now too.
The GLM test is another F-test, so we need a ratio of mean squares. The difference in this test is that one of the mean squares is found by subtraction. We follow the steps on the bottom of page 73. I will demonstrate using various null hypotheses, including b1 = 0, b1 = 2, and b1 = b0.
We have one last detail before we continue on to diagnostics and model checking. One common measure of the goodness of a model is the value of R2. This value is simply the percentage of the total variation accounted for by the regression line. Note the misconceptions on page 75. It is important that we don't misuse this measure. It says what it says, nothing more. Its best use is in comparing different models.
Goals: Introduce the General Linear Models approach to testing hypotheses. Explore the goodness of fit measure R2.
Skills:
¥
Know the strategy behind the General Linear Models
approach. For simple hypotheses, like the slope is equal to a
constant, we can use the GLM approach for testing. The key is to fit two models, and compare the SSE's
appropriately.
¥
Know the details of using the GLM approach. After
fitting the two models, we compare the MSE's in a new F-test. An important detail to worry about is
that some models have to have the y-values
transformed according to the null hypothesis. See the class notes.
¥
Know the calculation and interpretation of R2. R2
is a measure of the goodness of a model.
It is the fraction of explained variation, as compared to the overall
variation in the y-values.
Reading: Sections 3.1 to 3.6.
Activity: Residuals I.
So far we have discussed fitting a model. However, we must also check to make sure we have a reasonable model, and that all the assumptions of our model seem reasonable. Our chief tool for making these assessments is residual analysis. The behavior of the residuals, the deviations from the model fit, tells us a lot about the effectiveness of the model. We can use them to test for normality, for goodness of fit, for influential outliers, and other departures from the model.
We first will examine the formula for residuals and see what we can deduce about their behavior. For example, are they a linear combination of the y-values, as the slope and intercept were? We will take some time today to try our hand at algebraic manipulation to see if we can answer that question.
Next, we will check for goodness of fit by looking at plots of residuals versus independent variables, both those included in the model and those not yet included in the model. If our residual plots show any patterns, we have evidence of "lack of fit", or in the case of variables not yet included in the model, evidence of missing variables. What we're looking for is a random scatter of points. If we see patterns, such as increasing spread, or organized clustering, we suspect we have a not-so-perfect model. Sometimes we can correct the defect with remedial measures, which we will pursue on Days 10 and 11. Be cautious with your interpretations of these plots. It is tempting to say that something is a pattern when it really is just the result of randomness. Of course, this is somewhat of a judgment call. The more you study regression and use it in real world data, the better you will be at the art of model fitting.
Goals: Introduce residuals as a diagnostic tool.
Skills:
¥
Know the definition of residuals. The departures of the
data from our model are the residuals.
Each data value produces one residual, and they are measured in the same
units as the y-values.
We have encountered residuals before; they are the items being squared
and summed in the least squares exercise.
¥
Know how residuals are (roughly) distributed. Because
the residuals can be written as linear combinations of the y-values,
we know they have normal distributions.
Unfortunately, they aren't distributed with the same variance; their
variance depends on their distance from x-bar. However, we can use MSE as an
approximate variance.
¥
Know about basic residual plots. Our chief diagnostic
tool will be residual plots. If we
plot the residuals against the independent variable, we can see if we have lack
of fit, or non-constant variance, or extreme outliers. We can also plot residuals against
variables not already in the model to see if those variables would help explain
variation.
Reading: Sections 3.1 to 3.6.
Activity: Residuals II.
Continuing our exploration of residuals:
We check for normality of errors by looking at probability plots, histograms, etc and comparing them to the corresponding normal curve plots. If we use normal probability plots, we can use the Looney table, Table B.6. There are other tests for normality that we can discuss, but the Looney table is the easiest to use. The details of constructing a normal probability plot are on page 111. Essentially we are comparing the actual data with where data of that rank (3rd smallest, 4th smallest, etc) would fall if the data were truly normally distributed. If the data is normally distributed, this plot will be linear. To use the Looney table, we calculate a correlation coefficient for the normal probability plot. If it is large, the data looks normal.
Another simple test we can perform is the Brown-Forsythe Test. We assume in our model that the variance of the error terms is constant, i.e. not dependent on the value of x. An easy way to check this is to compare the spread of the residuals for the residuals associated with small x-values to the spread of the residuals for the residuals associated with large x-values. The details of the test use the 2-sample t-test: we first calculate the absolute size of the residuals (about their median) in each half and then perform a pooled 2-sample t-test on these absolute residual deviations.
Goals: Know how to create and interpret a probability plot. Understand the Brown-Forsythe test.
Skills:
¥
Know the steps needed to create a normal probability
plot. We have several options to creating a normal probability plot. Of course, we could use software, such
as on the TI-83 or in MINITAB. But
you will not be able to use the tests from the TI-83. Therefore, you should know how to create one yourself. Basically you are going to translate
each rank using an inverse normal calculation, (3.6) on page 111. Then plot these inverses versus the
data.
¥
Know how to use a normal probability plot to detect
normality. If the data is normally distributed, the normal
probability plot should be a straight line. The Looney and Gulledge Table (B.6) on page 1329 gives us
the critical values of the correlation coefficient for assessing whether the
observed line is close enough to straight. If the observed correlation is high enough, we conclude that
normality is plausible.
¥
Know the details of the Brown-Forsythe Test. The
Brown-Forsythe test helps us determine if the variance is constant. We split the data set into two parts,
based on the independent variable, and find the size of the residuals in each
half. The test statistic is a modified
two-sample t-test, based on the absolute size of the residuals
around their respective medians.
Reading: Section 3.7.
Activity: Lack of Fit.
When we have at least one x-value that has more than one observation, we can calculate a standard deviation for that "internal error". Using partitioning similar to that on Day 6, we can formulate a test to check for one kind of "lack of fit". We are able to estimate a "true" variance for the model, by pooling together all the variance estimates from all the unique x-values. If we compare this value to the MSE from the linear model, we have the basis for a test.
The lack-of-fit test is a GLM test. The Full model has a mean for each unique x-value. The Reduced model is the standard linear model we've been using. If we have c x-values with repeated observations, then our "pure error" sums of squares has n – c degrees of freedom. (We lose one degree of freedom for each unique mean we have to estimate to calculate the pure error sum of squares.) The standard linear model has n – 2 degrees of freedom. The details of the test are on page 123.
A few comments on this technique: we need only have one x-value with repeats. The idea is that we get an estimate of the variance that is independent of the linear model. If we have no replicates, we can use near replicates. These require judgment; if we choose cases too far from each other, our estimate of the variance may be too large. If we choose too few cases, the degrees of freedom may be too small to be useful.
Goals: See how to use internal variances to check for lack of fit.
Skills:
¥
Know how to set up the Lack of Fit testing procedure. To
setup the ANOVA table for the Lack of Fit test, we must calculate the variances
of each unique x-value.
(Note that the variance is zero for x-values with no replicates.)
We pool all the variance estimates together, weighting by the degrees of
freedom. The difference between
the linear model SSE and this new pure error sum of squares (the
numerator of the pooled variance) is the numerator for our F-test.
The pooled variance, MSPE, is the denominator.
¥
Know the details of when the Lack of Fit test can be
used. When the null hypothesis is true, the test statistic (3.25) on page 124
has an F distribution.
So, small values of F support the
null hypothesis that the linear model is an appropriate model. The alternate is that some other
model is appropriate. Notice we
are not specifying what the other model is if we reject the null. The degrees of freedom for MSPE is n – c, so we need at least one repeated x-value.
But note that not every x-value need be repeated.
¥
Near replicates require judgment. In some data sets, and
with the judgment of the user, we can use clusters of points as "near
replicates", pretending they are repeated x-values for
purposes of calculating MSPE. Of
course the further apart the real x-values
are, the less likely that particular estimate of the variance is to be
correct. One must use good
judgment as to what constitutes "close enough".
Reading: Sections 3.8 to 3.9.
Activity: Transformations.
After we uncover departures from the model, we often use remedial measures to correct the model. The most common of these is a transformation of the response variable, but sometimes transforming the x-values can be effective.
We can use several prototype plots to help with our choice. A few examples are on pages 130 and 132. But the extent of the curvature may be such that even these transformations are insufficient. However they are often a good first attempt at remediating the lack of fit.
Another procedure to select a transformation is the Box-Cox procedure. What we seek is a transformation of the Y variable that corrects the non-constant variance as well as the non-normal errors, if needed. The transformation is given on page 135, as well as the formulas for finding the optimal power (3.36). We will use Excel to implement these, but one could also have MINITAB perform the operations. The key is that we fit many models and choose the one that makes SSE small.
Goals: Investigate using transformations to improve the model fit.
Skills:
¥
Be familiar with the prototype plots for making
transformations. The prototypes on page 130 show us suggestions for
transforming the x-variables to correct certain types of
monotonic lack of fit. The ones on
page 132 give us an idea of transformations for the y-variable to correct some types of non-constant
variance. However, we may not find
a suitable transformation just using these diagrams.
¥
Know the Box-Cox transformation procedure. To
find a reasonable transformation of the response variable, the Box-Cox
procedure finds a suitable power for the transformation. Because of differences in scale, and
after a suitable transformation, we can compare SSE's for a variety of
powers. The one that minimizes SSE
is the most reasonable transformation, which will often correct non-constant
variance and lack of fit problems as well.
Reading: Sections 4.1 to 4.3.
Activity: Simultaneous Inference. Homework 2 due today.
Due to the nature of the relationship between the least squares estimates, it is inappropriate to make inferences about them separately. When one increases, the other is likely to decrease. Simply making two interval estimates would be too conservative, yielding a higher confidence coefficient than is needed. Today we will explore a more efficient method, making use of the correlation between the two estimates.
First, let's look at the two estimates jointly. Equation (4.5) on page 157 shows us the relationship between the two estimates. We can see this correlation from our simulation from Day 4. Notice that if the x-bar is zero, the two estimates are uncorrelated, and because they have normal distributions, this makes them independent. Our text does not have a method to construct exact confidence bounds when the two estimates are correlated, so I will show you a technique from an earlier edition of the text. Because the distributions are bivariate normal, we construct ellipses around the estimate, lying on a tilt, wide or narrow according to the correlation.
Here are the details of the method. We need to have the two estimates, b0 and b1, the sample mean, the sample sums of
squares, MSE, and F. We then
assemble using the following formula:
. Note that we
are plotting this equation in the b0/b1 axis system. The interpretation of this region is
analogous to the interpretation of a confidence interval: there is a 95% chance
that the constructed region captures the true parameters. We will compare this approach to the
Bonferroni approach next.
The exact joint confidence interval above is complicated at best, and is much worse for the multiple regression coming up later. A much simpler approach is to use the Bonferroni inequality (4.2) on page 155, which basically allows us to apportion the error probability into parts. For example, if we make 2 inference statements and want a 95% chance that both statements are true simultaneously, we could use 97.5% confidence levels on each statement. (We have divided up the 5% error into two equal parts in this case.) The joint confidence is a lower bound, so we have at least a 95% chance that both statements are correct. The real advantage of the Bonferroni method is that it extends easily to multiple regression.
Goals: Explore the important concept of simultaneous inference.
Skills:
¥
Recognize the issue of making multiple inference
statements. When we make several confidence statements, the
chances that they are all
correct at once gets exponentially smaller as more and more statements are
made. We have several approaches
to dealing with this. One is to
use the actual joint distributions of the estimates, but this approach is often
quite complicated. Another
approach is to use probability statements to produce conservative families of confidence intervals, such as with the
Bonferroni method.
¥ Know the Bonferroni method. The