Return to Part1 of Chapter 8.
Return to the Syllabus

Testing the Addition of One More Predictor

Working Dumb and Hard
We would like to test the idea that adding PIQs to a regression analysis in addition to VIQs improves our prediction of READ scores.

Step 1
Ho: MODEL C = MODEL A
H1: MODEL C not= MODEL A
Note that MODEL C will only contain VIQ as a predictor while MODEL A will contain both VIQ and PIQs as predictors.

Step 2
alpha = .05

Step 3
You will want to use the School Referrals data set.

Step 4
As you remember from the last lecture the SSE in each output is the sum of squared errors for the model being estimated. In most statistical packages this is referred to as the Residual Sum-of-squares. Therefore for MODEL C regress READ on VIQ. Then for MODEL A regress READ on VIQ and PIQ.

The following output was produced using SYSTAT (another professional statistical program). These results are presented here because unlike the version of Statlets we are using, SYSTAT can handle large sample sizes. Later you will be asked to redo this analysis using the School Referrals data set and Statlets.

Here are the results for MODEL C:

Dep var:    READ      N:  200    Multiple R:  .542    Squared multiple R:  .294

Adjusted squared multiple R:  .290     Standard error of estimate:       12.318

  Variable    Coefficient    Std error     Std coef Tolerance    T    P(2 tail)

CONSTANT           42.262        5.086        0.000  .           8.309    0.000
     VIQ            0.530        0.058        0.542  .100E+01    9.077    0.000


                             Analysis of Variance

   Source   Sum-of-squares    DF  Mean-square     F-ratio       P

 Regression      12502.171     1    12502.171      82.401       0.000
   Residual      30041.329   198      151.724

NOTE that the Sum-of-squares residual is 30041.329.

Here is the output for the Augmented model:

Dep var:    READ      N:  200    Multiple R:  .542    Squared multiple R:  .294

Adjusted squared multiple R:  .287     Standard error of estimate:       12.347

  Variable    Coefficient    Std error     Std coef Tolerance    T    P(2 tail)

CONSTANT           42.882        5.627        0.000  .           7.620    0.000
     VIQ            0.543        0.078        0.556 0.5697305    7.006    0.000
     PIQ           -0.019        0.075       -0.021 0.5697305   -0.260    0.795


                             Analysis of Variance

   Source   Sum-of-squares    DF  Mean-square     F-ratio       P

 Regression      12512.502     2     6256.251      41.040       0.000
   Residual      30030.998   197      152.442
Note that the Sum-of-squares for this model is 30030.998.

or if the square root is taken to get a t value it would be = .2588

Working Fast and Smart
Look at the augmented output above. First remember that the PRE value we calculated for adding PIQ to the model was .00034. The t value of -.26 is reported in the middle table. If we square it we get an F = .067. This t value is indicating whether the addition of this variable in an augmented model adds to the prediction accuracy over a compact model that includes every other variable. In SYSTAT you can even get the direct F values by using the Test command after the full regression. Here is the dialog box, and then the Test results from SYSTAT.

TEST FOR EFFECT CALLED:
                   CONSTANT

TEST OF HYPOTHESIS

      SOURCE       SS        DF       MS              F              P

  HYPOTHESIS     8851.675     1     8851.675         58.066          0.000
       ERROR    30030.998   197      152.442

--------------------------------------------------------------------------------------
TEST FOR EFFECT CALLED:
                        VIQ

TEST OF HYPOTHESIS

      SOURCE       SS        DF       MS              F              P

  HYPOTHESIS     7483.192     1     7483.192         49.089          0.000
       ERROR    30030.998   197      152.442

--------------------------------------------------------------------------------------
TEST FOR EFFECT CALLED:
                        PIQ

TEST OF HYPOTHESIS

      SOURCE       SS        DF       MS              F              P

  HYPOTHESIS       10.331     1       10.331          0.068          0.795
       ERROR    30030.998   197      152.442

--------------------------------------------------------------------------------------

Note the F value of .068 reported for PIQ. Obviously, the F value of 49.089 is testing the significance of adding VIQ to a compact model that contains PIQ as the predictor.

From the reported t or F values, the PRE values are easily calculated. Note that these are 1 df tests in the numerator.

This PRE for the addition of exactly one parameter has the special name of coefficient of partial determination The square root of this special PRE is usually called the partial correlation coefficient because it is the simple correlation between Y and Xp when the effects of the other p-1 predictors have been removed from both Y and Xp.


Using Statlets

Now you know enough not to work dumb and hard. Use Statlets and School Referrals to answer the following question. Does adding PIQ to a model that already contains VIQ when predicting Reading produce a better model?

Imagine you are writing a research report to submit to a Journal for publication. Write this paper and turn it in as your first computer project if assigned by your instructor. You may leave out the sections that would deal with the literature review.

Tolerance
R2p is simply the PRE obtained when all the other predictors are used to predict the other predictor in question. Thus, it is a measure of redundancy of the predictor in question with the other predictors. The term 1 - R2p has the special name tolerance. Tolerance is a measure of the predictor's uniqueness in the regression. Only the unique part of a predictor is useful in reducing error. If tolerance is low (say below .01 or .001) then it will be exceedingly difficult for the predictor to be helpful.

Note some programs including Statlets report a variance inflation factor (VIF) which is the inverse of tolerance.


More than Two Predictors

Let's increase the complexity of our problem by using VIQ, PIQ, AGGRESS & WITHDRAW as predictors. We will simply do the augmented model. To input all four predictors choose the menus Model/Regression/Multiple Regression and complete the Input tab as shown in the figure below.

Here are the results from the Model Fit Tab

---------------------------------------------------------------------------
                                       Standard          T
Parameter               Estimate         Error       Statistic      P-Value
---------------------------------------------------------------------------
CONSTANT                 30.1774        7.02608           4.30       1.0E-4
VIQ                     0.559027        0.10748           5.20       1.0E-4
PIQ                    0.0695531       0.104357           0.67       0.5067
AGGRESS                 0.554008       0.265687           2.09       0.0397
WITHDRAW               0.0629026        0.14494           0.43       0.6653
---------------------------------------------------------------------------
 
                           Analysis of Variance
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
Model                 10179.4    4.0        2544.84      19.26       1.0E-4
Residual              12549.4   95.0        132.099
---------------------------------------------------------------------------
Total (Corr.)         22728.8   99.0
 
R-squared = 44.7862 percent
R-squared (adjusted for d.f.) = 42.4615 percent
Standard error of est. = 11.4934
Coeff. of variation = 13.1881 percent
Mean absolute error = 8.90248
Durbin-Watson statistic = 1.96066

And here is the output from the Further ANOVA tab.
             Further ANOVA for Variables in the Order Fitted
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
VIQ                   9513.24    1.0        9513.24      72.02       1.0E-4
PIQ                   91.1905    1.0        91.1905       0.69       0.4081
AGGRESS               550.044    1.0        550.044       4.16       0.0441
WITHDRAW              24.8807    1.0        24.8807       0.19       0.6653
---------------------------------------------------------------------------
Model                 10179.4    4.0



The F-ratios reported directly above are not the squares of the t values in the augmented model. Notice the title of this table "Further ANOVA for Variables in the Order Fitted. The first F value of 72.02 is testing an augmented model where VIQ and the constant are estimated, compared to a compact model where the mean on reading is used.

The second F-ratio of 0.69 uses a compact model where the constant and the coefficient for VIQ are estimated. The augmented model adds the estimate of PIQs coefficient.

The last augmented model then becomes the new compact model and the new augmented model now additionally includes the estimate for the AGGRESS coefficient producing an F-ratio of 4.16.

Finally, the compact model is Reading = b0 + b1VIQ + b2PIQ + b3AGGRESS, and the augmented model is Reading = b0 + b1VIQ + b2PIQ + b3AGGRESS + b4WITHDRAW, producing an F-ratio = 0.19.

As noted, these are not the squares of the t statistics where the compact model is every other predictor, excluding the one of interest, plus the constant in the compact model and the augmented model includes the constant and all predictors. There are two ways you can produce the F-ratios for these model comparisons. The first way is to square the t value, and use the "t to F" tool in this package or hand calcualtions. A second method would be to change the order of the variables in the model. If you include the variable of interest last in the model, and then its F-ratio in Statlets will be contrasting the models you are interested in.

Below is the Statlets' Input tab filled out to place VIQ last in the regression model, followed by the More ANOVA tab output. Now notice how the F-ratio reported for VIQ is the square of the t-ratio in the ModelFit tab.

             Further ANOVA for Variables in the Order Fitted
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
PIQ                   5952.65    1.0        5952.65      45.06       1.0E-4
AGGRESS               454.016    1.0        454.016       3.44       0.0669
WITHDRAW              199.068    1.0        199.068       1.51       0.2226
VIQ                   3573.62    1.0        3573.62      27.05       1.0E-4
---------------------------------------------------------------------------
Model                 10179.4    4.0
 
 
Statistical Interpreter
-----------------------
This table shows the statistical significance of each variable as it
was added to the model. You can use this table to help determine how
much the model could be simplified, especially if you are fitting a
polynomial. 


As an exercise list every compact and augmented model comparison that can be evaluated with the output above.

Finally followed by the output from the Coefficients tab.
95.0% confidence intervals for model coefficients
----------------------------------------------------------------------------
                                  Standard        Lower        Upper
Parameter            Estimate       Error         Limit        Limit  V.I.F.
----------------------------------------------------------------------------
CONSTANT              30.1774      7.02608      16.2289       44.126
VIQ                  0.559027      0.10748     0.345652     0.772403    2.15
PIQ                 0.0695531     0.104357    -0.137623     0.276729    2.11
AGGRESS              0.554008     0.265687    0.0265516      1.08146    1.04
WITHDRAW            0.0629026      0.14494     -0.22484     0.350645    1.08
----------------------------------------------------------------------------
 
Correlation matrix for coefficient estimates
---------------------------------------------------------------------------
                      Constant            VIQ            PIQ        AGGRESS
CONSTANT                1.0000        -0.3192        -0.4009        -0.0949
VIQ                    -0.3192         1.0000        -0.7145         0.0077
PIQ                    -0.4009        -0.7145         1.0000        -0.0779
AGGRESS                -0.0949         0.0077        -0.0779         1.0000
WITHDRAW                3.0E-4        -0.1499        -0.0053         0.1775
---------------------------------------------------------------------------
                      WITHDRAW
CONSTANT                3.0E-4
VIQ                    -0.1499
PIQ                    -0.0053
AGGRESS                 0.1775
WITHDRAW                1.0000
---------------------------------------------------------------------------
With all this information you could produce a full ANOVA table like that shown in your text.
See Judd and McClelland page 172

In asking several questions, we have the problem of doing multiple statistical tests on the same set of data (increasing the family-wise error rate). If we use a given level of a in repeated tests, our chances of making at least one Type I error increases rapidly. It is safer (but seldom done) to use alpha/p as the cutoff for each of the repeated tests. Using alpha/p as the criterion is known as the Bonferroni inequality for multiple comparisons.

Notice the tolerance values in the following output! What do these values indicate?

When we add FSIQ as a predictor.

                                       Standard          T
Parameter               Estimate         Error       Statistic      P-Value
---------------------------------------------------------------------------
CONSTANT                 45.0232        15.8079           2.85       0.0054
VIQ                    -0.473985       0.991308          -0.48       0.6337
PIQ                    -0.908559       0.938907          -0.97       0.3357
FSIQ                     1.86773        1.78177           1.05       0.2972
AGGRESS                 0.521996       0.267299           1.95       0.0538
WITHDRAW               0.0405218       0.146429           0.28       0.7826
---------------------------------------------------------------------------
 
                           Analysis of Variance
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
Model                 10324.4    5.0        2064.87      15.65       1.0E-4
Residual              12404.4   94.0        131.962
---------------------------------------------------------------------------
Total (Corr.)         22728.8   99.0
 
R-squared = 45.4242 percent
R-squared (adjusted for d.f.) = 42.5212 percent
Standard error of est. = 11.4875
Coeff. of variation = 13.1812 percent
Mean absolute error = 8.91412
Durbin-Watson statistic = 2.05376
 
             Further ANOVA for Variables in the Order Fitted
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
VIQ                   9513.24    1.0        9513.24      72.09       1.0E-4
PIQ                   91.1905    1.0        91.1905       0.69       0.4079
FSIQ                  215.397    1.0        215.397       1.63       0.2045
AGGRESS               494.423    1.0        494.423       3.75       0.0559
WITHDRAW              10.1058    1.0        10.1058       0.08       0.7826
---------------------------------------------------------------------------
Model                 10324.4    5.0

95.0% confidence intervals for model coefficients
----------------------------------------------------------------------------
                                  Standard        Lower        Upper
Parameter            Estimate       Error         Limit        Limit  V.I.F.
----------------------------------------------------------------------------
CONSTANT              45.0232      15.8079      13.6361      76.4103
VIQ                 -0.473985     0.991308     -2.44225      1.49428  182.99
PIQ                 -0.908559     0.938907     -2.77278     0.955667  171.05
FSIQ                  1.86773      1.78177     -1.67002      5.40548  605.50
AGGRESS              0.521996     0.267299  -0.00873423      1.05273    1.05
WITHDRAW            0.0405218     0.146429    -0.250218     0.331261    1.10
----------------------------------------------------------------------------
 
Correlation matrix for coefficient estimates
---------------------------------------------------------------------------
                      Constant            VIQ            PIQ           FSIQ
CONSTANT                1.0000        -0.9060        -0.9102         0.8959
VIQ                    -0.9060         1.0000         0.9794        -0.9941
PIQ                    -0.9102         0.9794         1.0000        -0.9938
FSIQ                    0.8959        -0.9941        -0.9938         1.0000
AGGRESS                -0.1442         0.1144         0.1049        -0.1143
WITHDRAW               -0.1305         0.1289         0.1443        -0.1458
---------------------------------------------------------------------------
                       AGGRESS       WITHDRAW
CONSTANT               -0.1442        -0.1305
VIQ                     0.1144         0.1289
PIQ                     0.1049         0.1443
FSIQ                   -0.1143        -0.1458
AGGRESS                 1.0000         0.1911
WITHDRAW                0.1911         1.0000
-------------------------------------------------------

Notice how dramatically the Variance Inflation Factor has changed for the variables that are highly correlated with one another (VIQ, PIQ, and FSIQ).


Testing the Addition of a Set of Predictors
Instead of asking whether the addition of just one additional parameter is worthwhile, we sometimes want to know whether the addition of a set of predictors would be useful.

Let's suppose we want to know whether the set of behavioral scores AGGRESS and WITHDRAW help the prediction using VIQ and PIQ.

We start by estimating the compact model:
MODEL C: READ = ßo + ß1VIQ + ß2PIQ + ei

Here are the Results:

---------------------------------------------------------------------------
Dependent variable: READ
---------------------------------------------------------------------------
                                       Standard          T
Parameter               Estimate         Error       Statistic      P-Value
---------------------------------------------------------------------------
CONSTANT                 31.5595         7.0776           4.46       1.0E-4
VIQ                     0.558374        0.10748           5.20       1.0E-4
PIQ                    0.0864393       0.105291           0.82       0.4137
---------------------------------------------------------------------------
 
                           Analysis of Variance
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
Model                 9604.43    2.0        4802.21      35.49       1.0E-4
Residual              13124.3   97.0        135.302
---------------------------------------------------------------------------
Total (Corr.)         22728.8   99.0
 
R-squared = 42.2567 percent
R-squared (adjusted for d.f.) = 41.0662 percent
Standard error of est. = 11.632
Coeff. of variation = 13.347 percent
Mean absolute error = 9.05871
Durbin-Watson statistic = 1.93272
 

Remember that the SSE RESIDUAL estimates the error for this model. In this case the value for the compact model is 13124.3

Now estimate the augmented model
MODEL A: READ = ßo + ß1VIQ + ß2PIQ + ß3AGGRESS + ß4WITHDRAW+ ei
Only the important results are repeated here:

                           Analysis of Variance
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
Model                 10179.4    4.0        2544.84      19.26       1.0E-4
Residual              12549.4   95.0        132.099
---------------------------------------------------------------------------
Total (Corr.)         22728.8   99.0

Note that PRE = (13124.3 - 12549.4)/13124.3 which equals .0438

The F statistic can now be calculated using (PRE/PA-PC)/((1-PRE)/(n-PA)) which equals (.0438/2)/(.9562/95) which equals .0219/.010065 which finally equals 2.1758. This F value has 2 and 95 degrees of freedom.

Using Statlets to calculate the p value

Note some of the pictures in this section need changed - the dfs are incorrect.

To find the probability of the calculated F statistic using Statlets, first choose the Plot/Distributions menus. Select the F distribution by clicking the radio button. Then select the PDF tab, and by clicking the options button, change the degrees of freedom as shown in the figure below.


Finally, select the Tail-Areas tab, and click the Option button as shown below an input the F value. Here we have input an F value of 2.1758.



After clicking OK, the probability will be calculated. The probability value for an F2,95 = .1191, therefore we would not reject the null hypothesis.

Using the Web to do F Distribution Calculations

There are also several good tools available at other web locations which calculate cumulative density distributions for the F statistic. One of the easiest to use is at UCLA, written by Jan de Leuw
Continue to Chapter 8 part 3.