Multiple Regression
Models with Multiple Continuous Predictors

If we want to make predictions using several predictors instead of a single predictor, we use Multiple Regression techniques.

Multiple Regression MODEL

Yi = ßo + ß1Xi1 + ß2Xi2 + . . . ßp-1Xi,p-1 +ei
Where:

In the multiple regression model the coefficients are called partial regression coefficients because, as we will see later, the value of, say, ß1 may well depend on whether, say, the predictor Xi2 and its parameter ß2 are included in the model.

The partial regression coefficients are the proportion of each observation's Xij value by which we adjust our prediction of y-hati.

We often refer to this model as the multiple regression of the general linear model. Linear simply means that after the separate predictors are first weighted by their respective ßj, the results are added together.

This model can be used to solve a vast array of statistical problems. Even the linear aspect is not a hindrance as we will see later. However, the power and generality of the multiple regression model does not come without a cost; that cost is a definite increase in complexity and the introduction of some special problems that do not arise in the case of the simple models we have previously discussed.


Redundancy
Suppose we were trying to predict the weights of a group of second graders using their height in centimeters and their height in inches as predictors. Obviously, the second predictor is entirely confounded with the first. Adding the second variable to the prediction would give us absolutely no advantage. We wouldn't be able to reduce our errors any more with both predictors than with just one.

Redundancy among predictors is seldom so extreme as measuring height in inches and centimeters. A more typical example in the social sciences would be predicting achievement in school children using verbal intelligence scores and performance intelligence scores. Obviously, VIQ and PIQ are related, however, they are not perfectly related. Sorting out the effects of predictor variables and their relative importance is a difficult task when there is redundancy among the predictors.

The following figure illustrates differing degrees or redundancy between two predictors:

Statistical Inference
The statistical inference procedures (in our statistical procedures steps 4 and 5 are the same with multiple regression as they were before.
Step 4 Do the calculation.

  1. We fit out MODEL to the DATA by estimating parameters so as to minimize ERROR.
  2. We calculate PRE for MODEL A vs MODEL C.
  3. We calculate F (index of PRE per added parameter).
Step 5 make the decision
If PRE and F are surprising then we reject MODEL C in favor of MODEL A
Only the details of estimating parameters are really new. The statistical inference process itself is the same as before.

Estimating Parameters in Multiple Regression
As in previous models, we want to find the least square estimators of ßo, ß1, . . . ßp-1 that will minimize SSE. That is we want to find estimates bo, b1, . . . bp-1 in the estimated model
Y-hati = bo + b1Xi1 + b2Xi2 + . . . bp-1Xi,p-1
so that the SSE is as small as possible.

If there is no redundancy in the predictors, the same formulas to estimate parameters given in Chapter 6 in Judd and McClelland can be used. If the predictors are related to one another (almost always the case except in experimental designs) we will leave all the calculations of parameter estimates to the computer software.

However, we must have a firm understanding of the concepts.
1. For one parameter, we used the mean which was a point which geometrically minimized SSE (variance).
2. For simple regression we estimated a constant and slope which produced a line which minimized SSE (mean square residual).
3. For multiple regression with two predictors we estimate a constant and two partial regression coefficients which produce a plane which minimizes SSE (mean square residual). The intersection of the plane with the Y axis defines bo, the slope of the plane with respect to Xi1 defines b1 and the slope with respect to Xi2 defines b2. Conceptually the complicated computer algorithms simply find the location of the plane which minimizes the sum of squared errors where an error is defined as the vertical distance from each observation to the model plane.
The following figure, created using SYSTAT shows a regression plane created using two predictors to estimate a criterion score.

For more than two predictors, the model is equivalent to a hyperplane in a space with four or more dimensions, which can't be drawn.

Statistical Inference in Multiple Regression
The general procedure is the same as before:

The only change is that the extra parameters give us lots of freedom in defining MODEL A and MODEL C. The only difficulty is selecting the appropriate models, Once we have the models, the estimation of the parameters (using Statlets) and the calculation of PRE and F is easy. Each different question will have its own PRE and F value.

There are many possible questions we can ask using the models available with more estimated parameters. There are several models which repeatedly occur. We will evaluate these generic models separately.


Testing an Overall Model
Sometimes we want to ask whether our predictors as a group are any better than the simple model which uses no predictors but predicts the mean for every observation.
Step 1 Ho: MODEL C = MODEL A
H1: MODEL C does not equal MODEL A
Step 2
alpha = .05
Step 3
Use the School Referrals data.

There is more data in this file than can be used in the academic version of Statlets. However, it is best to copy the entire file and paste it into the clipboard and then use Clip In as we have done in the past. Statlets will use the maximum amount of data it can handle. Step 4
Select the Models/Regression/Multiple Regression procedure
Fill out the dialog as follows:

Here are the results for the first 100 cases that Statlets can analyze.

                                       Standard          T
Parameter               Estimate         Error       Statistic      P-Value
---------------------------------------------------------------------------
CONSTANT                 31.5595         7.0776           4.46       1.0E-4
VIQ                     0.558374        0.10748           5.20       1.0E-4
PIQ                    0.0864393       0.105291           0.82       0.4137
---------------------------------------------------------------------------
 
                           Analysis of Variance
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
Model                 9604.43    2.0        4802.21      35.49       1.0E-4
Residual              13124.3   97.0        135.302
---------------------------------------------------------------------------
Total (Corr.)         22728.8   99.0
 
R-squared = 42.2567 percent
R-squared (adjusted for d.f.) = 41.0662 percent
Standard error of est. = 11.632
Coeff. of variation = 13.347 percent
Mean absolute error = 9.05871
Durbin-Watson statistic = 1.93272
 

Step 5
The decision(s). Look at the output above, what would your decision be?
Step 6
The summary statement(s)

Again, the ANOVA table at the bottom is the most important for the overall test.

The p value of .0001 really means if MODEL C was equal to MODEL A we would expect to find an F value this large less than 1 times in 10,000. Thus, we clearly reject the null hypothesis that ß1 = ß2 = 0.

Note that the PRE or R-squared = 42.2567 percent is calculated along with the estimated value for eta squared or the population PRE R-squared (adjusted for d.f.) = 41.0662 percent

When considering more complex models or hypotheses, the important number to be able to locate in computer output is the SUM OF SQUARES for ERROR or as Statlets calls it the SUM OF SQUARES RESIDUAL. This is equal to SSE for whatever MODEL has been estimated. Once we know the SSE for the different MODELS, calculating PRE and F is easy.


Note the standard error of estimate which is sometimes also called the ROOT MSE. Why would you guess this second name is used?
Note the parameter estimates.

While many textbooks on regression analysis stop here, these overall tests are quite problematic. First, if there were only one good predictor among the many you used in the model, the F value calculated might not indicate this fact (the PRE/added parameter would be smaller). Simultaneously the value (1-PRE)/remaining parameter would be larger than need be. Note that the first term is the numerator in the F statistic and the second term is the denominator. As Judd & McClelland note, you risk losing the needle - the one good predictor- in a haystack of useless predictors. In essence, by throwing away parameters for useless predictors you lose statistical power.

The second reason for avoiding overall tests is that your results are ambiguous. (See all the possibilities on page 162 in Judd and McClelland).

The question we ask with the test of the overall model is so diffuse that we are generally unsure what the answer means. For the reasons of power and removing ambiguity it is almost always better to ask more focused questions involving one additional parameter (one degree of freedom).



Statlets is a wonderful statistical package. I would suggest that you explore all the other output tabs in the multiple regression analysis output. Particularly important are the visualization tools. Below are two possible 3D graphs. The first is found on the 3D Response tab, the second is found in the Plot menu. While both plots can be rotated in space by clicking the directional arrows on the right of the figure, the major advantage of the second plot is that it rotates the individual points quickly. Seeing a plot from a different perspective is often highly informative. Explore - have fun.






Brain Exercise

Do separate calculations using Statlets for these two models.
MODEL C READING = CONSTANT + VIQ
MODEL A READING = CONSTANT + VIQ + PIQ

Then using the hand calculations (you can use the program written for EDPSY 507)(remember the SUM OF SQUARES ERROR (RESIDUAL)) calculate both PRE and the F statistics. Then compare them with the t values in the regression analysis above. Remember to calculate a F from a t, you square the t value.


Continue to Part 2 of Chapter 8.