If we want to make predictions using several predictors instead of a single predictor, we use Multiple Regression techniques.
Multiple Regression MODEL
In the multiple regression model the coefficients are called partial regression coefficients because, as we will see later, the value of, say, ß1 may well depend on whether, say, the predictor Xi2 and its parameter ß2 are included in the model.
The partial regression coefficients are the proportion of each observation's Xij value by which we adjust our prediction of y-hati.
We often refer to this model as the multiple regression of the general linear model. Linear simply means that after the separate predictors are first weighted by their respective ßj, the results are added together.
This model can be used to solve a vast array of statistical problems. Even the linear aspect is not a hindrance as we will see later. However, the power and generality of the multiple regression model does not come without a cost; that cost is a definite increase in complexity and the introduction of some special problems that do not arise in the case of the simple models we have previously discussed.
Redundancy
Suppose we were trying to predict the weights of a
group of second graders using their height in centimeters and their height in inches as predictors.
Obviously, the second predictor is entirely confounded with the first. Adding the second variable to
the prediction would give us absolutely no advantage. We wouldn't be able to reduce our errors any
more with both predictors than with just one.
Redundancy among predictors is seldom so extreme as measuring height in inches and centimeters. A more typical example in the social sciences would be predicting achievement in school children using verbal intelligence scores and performance intelligence scores. Obviously, VIQ and PIQ are related, however, they are not perfectly related. Sorting out the effects of predictor variables and their relative importance is a difficult task when there is redundancy among the predictors.
The following figure illustrates differing degrees or redundancy between two predictors:
Statistical Inference
The statistical inference procedures (in our statistical procedures steps 4 and 5 are the same with
multiple regression as they were before.
Step 4 Do the calculation.
Estimating Parameters in Multiple
Regression
As in previous models, we want to find the least square estimators of ßo,
ß1, . . . ßp-1 that will minimize SSE. That is we want to find
estimates bo, b1, . . . bp-1 in the estimated model
Y-hati = bo + b1Xi1 + b2Xi2 + . . .
bp-1Xi,p-1
so that the SSE is as small as possible.
If there is no redundancy in the predictors, the same formulas to estimate parameters given in Chapter 6 in Judd and McClelland can be used. If the predictors are related to one another (almost always the case except in experimental designs) we will leave all the calculations of parameter estimates to the computer software.
However, we must have a firm understanding of the concepts.
For more than two predictors, the model is equivalent to a hyperplane in a space with four or more dimensions, which can't be drawn.
Statistical Inference in Multiple Regression
The general procedure
is the same as before:
The only change is that the extra parameters give us lots of freedom in defining MODEL A and MODEL C. The only difficulty is selecting the appropriate models, Once we have the models, the estimation of the parameters (using Statlets) and the calculation of PRE and F is easy. Each different question will have its own PRE and F value.
There are many possible questions we can ask using the models available with more estimated parameters. There are several models which repeatedly occur. We will evaluate these generic models separately.
Testing an Overall Model
Sometimes we want to ask whether our predictors as a group are any better than the simple model which
uses no predictors but predicts the mean for every observation.
Step 1 Ho: MODEL C = MODEL A
H1: MODEL C does not equal MODEL A
Step 2
alpha = .05
Step 3
Use the School Referrals data.
There is more data in this file than can be used in the academic version of Statlets. However, it is best to copy the entire file and paste it into the clipboard and then use Clip In as we have done in the past. Statlets will use the maximum amount of data it can handle.
Step 4
Select the Models/Regression/Multiple Regression procedure
Fill out the dialog as follows:
Here are the results for the first 100 cases that Statlets can analyze.
Standard T Parameter Estimate Error Statistic P-Value --------------------------------------------------------------------------- CONSTANT 31.5595 7.0776 4.46 1.0E-4 VIQ 0.558374 0.10748 5.20 1.0E-4 PIQ 0.0864393 0.105291 0.82 0.4137 --------------------------------------------------------------------------- Analysis of Variance --------------------------------------------------------------------------- Source Sum of Squares Df Mean Square F-Ratio P-Value --------------------------------------------------------------------------- Model 9604.43 2.0 4802.21 35.49 1.0E-4 Residual 13124.3 97.0 135.302 --------------------------------------------------------------------------- Total (Corr.) 22728.8 99.0 R-squared = 42.2567 percent R-squared (adjusted for d.f.) = 41.0662 percent Standard error of est. = 11.632 Coeff. of variation = 13.347 percent Mean absolute error = 9.05871 Durbin-Watson statistic = 1.93272
Again, the ANOVA table at the bottom is the most important for the overall test.
The p value of .0001 really means if MODEL C was equal to MODEL A we would expect to find an F value this large less than 1 times in 10,000. Thus, we clearly reject the null hypothesis that ß1 = ß2 = 0.
Note that the PRE or R-squared = 42.2567 percent is calculated along with the estimated value for eta squared or the population PRE R-squared (adjusted for d.f.) = 41.0662 percentWhen considering more complex models or hypotheses, the important number to be able to locate in computer output is the SUM OF SQUARES for ERROR or as Statlets calls it the SUM OF SQUARES RESIDUAL. This is equal to SSE for whatever MODEL has been estimated. Once we know the SSE for the different MODELS, calculating PRE and F is easy.
While many textbooks on regression analysis stop here, these overall tests are quite problematic. First, if there were only one good predictor among the many you used in the model, the F value calculated might not indicate this fact (the PRE/added parameter would be smaller). Simultaneously the value (1-PRE)/remaining parameter would be larger than need be. Note that the first term is the numerator in the F statistic and the second term is the denominator. As Judd & McClelland note, you risk losing the needle - the one good predictor- in a haystack of useless predictors. In essence, by throwing away parameters for useless predictors you lose statistical power.
The second reason for avoiding overall tests is that your results are ambiguous. (See all the possibilities on page 162 in Judd and McClelland).
The question we ask with the test of the overall model is so diffuse that we are generally unsure what the answer means. For the reasons of power and removing ambiguity it is almost always better to ask more focused questions involving one additional parameter (one degree of freedom).


ExerciseThen using the hand calculations (you can use the program written for EDPSY 507)(remember the SUM OF SQUARES ERROR (RESIDUAL)) calculate both PRE and the F statistics. Then compare them with the t values in the regression analysis above. Remember to calculate a F from a t, you square the t value.