Testing Other Special Hypotheses
The number of special hypotheses that can be tested within multiple regression is virtually endless.
These are the general steps in the strategy
Lets work a problem with our School Referrals data set.
Suppose we are school psychologists and we have been filled with the notion that PIQs are as important as VIQs. With respect to predicting achievement, we can test this hypothesis. Below is the output using all 200 cases in the School Referrals data. The output was produced using SYSTAT. Suppose READ is our criterion measure.
MODEL A: READ = ßo + ß1VIQ + ß2PIQ + ei
STEP 1:
If we believe that VIQ and PIQ were equally important then we believe that ß1= ß2. Or stated differently that ß1- ß2= 0.
That is our null hypothesis.
MODEL C: READ = ßo + ß1VIQ + ß1PIQ + ei
Since b1= b2.
Or rewritten:
MODEL C: READ = ßo + ß1(VIQ + PIQ) + ei
Now the trick, and it is a fairly general trick, is to construct a new variable, say VIQPIQ for the combined IQs, which equals VIQ + PIQ. Again, do this in the Data Editor.
STEP3a:
Here is the ANOVA table from MODEL C:
ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P REGRESSION 9995.606 1 9995.606 60.807 0.000 RESIDUAL 32547.894 198 164.383
STEP 3b:
Here is the unrestricted MODEL A output.
ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P REGRESSION 12512.502 2 6256.251 41.040 0.000 RESIDUAL 30030.998 197 152.442
STEP 3c:
Of Course PRE = 0.07733
PA = 3, PC = 2
F = 16.51. We obviously reject MODEL C.
This same strategy can be used to solve countless problems using regression techniques.
CROSS VALIDATION
pg 178. You know someone else's regression equation. That becomes MODEL C with no estimated parameters. You can calculate SSE(C) in the Data Editor of some statistical packages, or using a spreadsheet program like Microsoft Excel. MODEL A is your sample data. You estimate all the parameters. The PRE and F equations are used. Of course, you have the same problems with all multiple df tests.
COMPLEX EXAMPLE
pg 179. The authors want to determine if one predictor is equal to the average of two others or if

Their augmented model is:
GPA = ßo + ß1HSRANK +ß2SATV +ß3SATM + ei
The compact model was then
![]()
multiply every term by 2
2GPA =2 ßo +(ß2 + ß3)HSRANK +ß2(2)SATV + ß3(2)SATM + ei
divide every term by 2 (or multiply by .5)
GPA = ßo +(ß2 + ß3)(.5)HSRANK +ß2SATV + ß3SATM + ei
do the multiplications
GPA = ßo + ß2(.5)HSRANK + ß3(.5)HSRANK +ß2SATV + ß3SATM + ei
collect similar terms
GPA = ßo + ß2((.5)HSRANK +SATV)
+ ß3((.5)HSRANK + SATM) + ei
We then do the trick mentioned previously by constructing the two new terms.
Using these new terms, the compact model can be estimated.
GPA = ßo + ß2X'1 +ß3X'2 + ei
Notice the superscripts on the predictor variables indicating that they are derived variables.

DEP VAR: UN N: 10 MULTIPLE R: 0.313 SQUARED MULTIPLE R: 0.098 ADJUSTED SQUARED MULTIPLE R: 0.000 STANDARD ERROR OF ESTIMATE: 0.972 VARIABLE COEFFICIENT STD ERROR STD COEF TOLERANCE T P(2 TAIL) CONSTANT -0.035 3.081 0.000 . -0.011 0.991 IP 0.021 0.022 0.313 1.000 0.931 0.379 ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P REGRESSION 0.819 1 0.819 0.867 0.379 RESIDUAL 7.557 8 0.945 --------------------------------------------------------------------------------------Or from the Statlet's using the menus Models/Regression/Simple Regression. Note everything checks with the SYSTAT output.
------------------------------------------------------------------------- Analysis of Variance ------------------------------------------------------------------------- Source Sum of Squares Df Mean Square F-Ratio P-Value ------------------------------------------------------------------------- Model 0.81931 1 0.81931 0.867375 0.3789 Residual 7.55669 8 0.944586 ------------------------------------------------------------------------- Total (Corr.) 8.376 9
Obviously the answer is that we fail to reject the null hypothesis.
Multiple Regression Analysis --------------------------------------------------------------------------- Dependent variable: UN --------------------------------------------------------------------------- Standard T Parameter Estimate Error Statistic P-Value --------------------------------------------------------------------------- CONSTANT -0.0351724 3.08106 -0.01 0.9912 IP 0.0206897 0.0222152 0.93 0.3789 --------------------------------------------------------------------------- Analysis of Variance --------------------------------------------------------------------------- Source Sum of Squares Df Mean Square F-Ratio P-Value --------------------------------------------------------------------------- Model 0.81931 1.0 0.81931 0.87 1.0000 Residual 7.55669 8.0 0.944586 --------------------------------------------------------------------------- Total (Corr.) 8.376 9.0Note that the F ratio of .87 with 1 and 8 degrees of freedom has a p -value of 1.00. This differs dramatically from the p-value given for the same statistic above. For now, when doing bivariate regressions use the Simple Regression routines or square the t value and use it's probability if you are using the multiple regression routine to solve bivariate problems.
This counterintuitive result (that industrial production is not related to unemployment) suggests that we look closer at the data.
Looking at the data and the following scattergram, we see that unemployment increases across the years. Could YR be used to predict Unemployment?

DEP VAR: UN N: 10 MULTIPLE R: 0.654 SQUARED MULTIPLE R: 0.428 ADJUSTED SQUARED MULTIPLE R: 0.357 STANDARD ERROR OF ESTIMATE: 0.774 VARIABLE COEFFICIENT STD ERROR STD COEF TOLERANCE T P(2 TAIL) CONSTANT 1.673 0.529 0.000 . 3.166 0.013 YR 0.208 0.085 0.654 1.000 2.447 0.040 ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P REGRESSION 3.586 1 3.586 5.989 0.040 RESIDUAL 4.790 8 0.599 --------------------------------------------------------------------------------------Notice that YR is a reliable predictor of UN with a PRE = .43
To examine the data further, let's look at the ERROR, or the residuals, remaining from this prediction. It is easy to produce the residuals using Statlets
Using the menus Models/Regression/Multiple Regression simply click the Report tab, then the Options button and finally select the Residuals option.Regression results for UN ---------------------- Row Residual ---------------------- 1 1.21818 2 -0.190303 3 -0.598788 4 -0.907273 5 0.484242 6 -0.224242 7 -0.532727 8 -0.441212 9 1.1503 10 0.0418182 ----------------------If you want (and we will want) you can add these values to the original data set.
Your residuals should look like the variable UNOYR in the data set. We have named this variable using the same symbolism the Judd and McClelland text uses on page 191. Below is Statlet's data page showing this data.

Notice how these values are identical to those in the text except for the number of places displayed.
What does UNOYR tell us? Looking at the particular values of UNOYR tells us when, relative to our MODEL, UN is unexpectedly large or small. For example, in the first year, the unemployment was 3.1 million persons. The residual of 1.218 tell us that this value was 1.218 million higher than expected given the regression equation.
Another way to think about these residuals is that they are the part of the original data that can not be predicted with YR. If any other variable is to be useful for making conditional predictions in a multiple regression, then that variable must be able to predict these residuals.
It makes sense to reexamine the original question about the relationship between unemployment and industrial production. Although IP was not a useful predictor by itself of UN, maybe IP is useful for predicting when UN is higher or lower than expected relative to the model of yearly changes.
It is important to make the distinction that asking whether a predictor variable is useful by itself is a very different question from asking whether it is useful after controlling for one or more other variables.
We are now asking whether IP is a useful predictor of UN after controlling for YR or after statistically removing the effects of YR.Equivalently, we could say we are testing to see whether IP reduces error in predicting UN over and above the reduction achieved by using YR.
You could proceed directly to the solution, except that the coefficients would be hard to interpret because while YR has been removed from UN in the UNOYR residuals, YR is still confounded with IP (the two predictors are correlated).
To see if the nonredundant part of IP can predict the UNOYR scores, we need to remove YR from the IP scores. To do this simply regress IP on YR and save those residuals. Then copy those residuals to your original data file. In your text as well as in our data set they are named IPOYR.
The data set scrolled so that the variable IPOYR is shown and is reproduced again directly below.
Note that when YR is used to predict IP, that there is an error on page 193 in your text. the coefficient is 4.36 instead of 4.75.
Now we can ask a critical question of whether IPOYR reduces the error in predicting UNOYR relative to a simple model for UNOYR. Maybe when unemployment is unexpectedly high, industrial production is unexpectedly low? Note that this is a still more sophisticated version of our original question, which simply asked if there was a relationship between unemployment and industrial production without any reference to expectations.
UNOYR tells us when unemployment is unexpectedly high or low given YR.
IPOYR tell us when industrial production is unexpectedly high or low given YR.
The effects of YR have been removed from both UN and IP. If we look at the relationship between UNOYR and IPOYR, we have a purer look at the relationship between UN and IP with the confounding effects of YR eliminated. Statisticians often use the words controlled for.
Look at this scattergram produced by Statlets. What do you think the answer to the question is?

It certainly looks like a relationship exists. To answer this question statistically, we simply use the procedures we already know. However, there are two small complications.
The first complication is that we know a priori that the means of the residuals are zero because the sums of residuals must be zero in a least squares model. Therefore, the constant or intercept must also be zero. So we won't need to estimate it.
The second complication is that UNOYR is used data. We already estimated two parameters to produce it. Therefore, instead of thinking of it as having an n of 10, we must think of it as having an n of 8.
MODEL C: UNOYR = 0 + eiWhile your text mentions that most regression programs have an option that allows estimation of the regression equation without estimating the intercept, it will not matter. The SSE Residual will be the same, the PRE will be the same, and the coefficient b1 will be the same regardless of whether you estimate the coefficient or not. The F value will be incorrect because the df used are incorrect. You will need to calculate your own.
Here are the results of Statlet's Simple Regression output using the Summary and ANOVA tabs.Regression Analysis for UNOYR versus IPOYR ----------------------------------------------------------------------- Model type: Linear ----------------------------------------------------------------------- Equation: UNOYR = -1.03515E-17 - 0.103313*IPOYR ----------------------------------------------------------------------- Coefficient Estimate Std. Error t-value P-value ----------------------------------------------------------------------- Intercept -1.03515E-17 0.118652 -8.7242E-17 1.0 Slope -0.103313 0.0202566 -5.1002 9.0E-4 ----------------------------------------------------------------------- Correlation = -0.8745 R-squared = 76.48% Std. error of est. = 0.375211 ------------------------------------------------------------------------- Analysis of Variance ------------------------------------------------------------------------- Source Sum of Squares Df Mean Square F-Ratio P-Value ------------------------------------------------------------------------- Model 3.66207 1 3.66207 26.0121 9.0E-4 Residual 1.12627 8 0.140783 ------------------------------------------------------------------------- Total (Corr.) 4.78834 9Note the agreement with your text on the PRE and the coefficient of -.103.
To calculate the same F (with 1 and 7 degrees of freedom) you will need to use the hand calculation formula. You have PRE for this model.
The conclusion from this analysis as it might be stated in a journal article could be:This conclusion contrasts sharply with the result of our first simple regression. The yearly changes were suppressing the relationship between IP and UN.
Variables which mask or suppress the simple relationship between other variables are known as suppressor variables.
Multiple Regression Analysis --------------------------------------------------------------------------- Dependent variable: UN --------------------------------------------------------------------------- Standard T Parameter Estimate Error Statistic P-Value --------------------------------------------------------------------------- CONSTANT 13.4539 2.48385 5.42 0.0010 IP -0.103339 0.0216551 -4.77 0.0020 YR 0.659417 0.104305 6.32 4.0E-4 --------------------------------------------------------------------------- Analysis of Variance --------------------------------------------------------------------------- Source Sum of Squares Df Mean Square F-Ratio P-Value --------------------------------------------------------------------------- Model 7.24976 2.0 3.62488 22.53 3.0E-4 Residual 1.12624 7.0 0.160891 --------------------------------------------------------------------------- Total (Corr.) 8.376 9.0 R-squared = 86.554 percent R-squared (adjusted for d.f.) = 82.7123 percent Standard error of est. = 0.401112 Coeff. of variation = 14.2238 percent Mean absolute error = 0.271585 Durbin-Watson statistic = 1.32794
Remember that the PRE values can be found from the F values and to find the F values we can square the t values. The formula for PRE is:
The interpretation of a parameter estimate in multiple regression is the same as the interpretation we developed regressing residuals on one another.
Clearly and fortunately we do not need to do the laborious series of simple regressions because we get the same information from a single multiple regression analysis.
Remember that the textbook complains about the Partial correlations reported by SAS. Compare the PRE values we calculated with the PARTIAL CORR TYPE II reported on page 197 - our PRE values are the same. Also, the partial correlation is simply the square root of these PRE values. So the partial correlations between UN and IP would be the square root of(.765) = .875. Interested students can calculate the correlation between UN and IP with YR partialed out (correlate the two residuals). You will find that the Pearson correlation is -.875. The partial correlation between UN and YR would equal square root of(.851) = .922. This value is the correlation between the residuals of UN and YR with IP partialed from each.