Chapter 12 Overheads

Factorial ANOVA: Models with Multiple Categorical Predictors and Product Terms

In this chapter we investigate models with two or more categorical predictors.

As was demonstrated, classical analysis of variance with two or more categorical predictors is nothing more than a simple one-way ANOVA with a specific, clever set of contrast codes. The only thing new you will learn in this chapter is how to generate an appropriate set of contrast codes. You will need to learn (or relearn) how to interpret those codes.

Judd & McClelland start out with a problem where there are two categorical predictor variables. The first factor is type of drug (A, B, C) and the second factor is the presence or absence of an enzyme (E, -E) This would be referred to as a 3 X 2 design in your older texts.

First, the text demonstrates that you could think of this as 6 separate groups and you could code the data just like you did before. If so, here is what your data set would look like:



This data set has also been coded for use in Statlets

If you conducted the regression analysis as before, here are the MYSTAT results.


Dep var:    MOOD      N:   18    Multiple R:  .975    Squared multiple R:  .950

Adjusted squared multiple R:  .930     Standard error of estimate:        2.041

  Variable    Coefficient    Std error     Std coef Tolerance    T    P(2 tail)

CONSTANT           20.000        0.481        0.000  .          41.569    0.000
      Z1            2.400        0.215        0.716  .100E+01   11.154    0.000
      Z2            2.100        0.264        0.512  .100E+01    7.969    0.000
      Z3            0.500        0.340        0.094  .100E+01    1.470    0.167
      Z4            1.000        0.481        0.133  .100E+01    2.078    0.060
      Z5            5.000        0.833        0.385  .100E+01    6.000    0.000


                             Analysis of Variance

   Source   Sum-of-squares    DF  Mean-square     F-ratio       P

 Regression        960.000     5      192.000      46.080       0.000
   Residual         50.000    12        4.167
Of course, we have seen this many times before.

If you used the Statlet's package, your regression results would look like the following:
                                       Standard          T
Parameter               Estimate         Error       Statistic      P-Value
---------------------------------------------------------------------------
CONSTANT                    20.0       0.481125          41.57       1.0E-4
Z1                           2.4       0.215166          11.15       1.0E-4
Z2                           2.1       0.263523           7.97       1.0E-4
Z3                           0.5       0.340207           1.47       0.1674
Z4                           1.0       0.481125           2.08       0.0598
Z5                           5.0       0.833333           6.00       1.0E-4
---------------------------------------------------------------------------
 
                           Analysis of Variance
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
Model                   960.0    5.0          192.0      46.08       1.0E-4
Residual                 50.0   12.0        4.16667
---------------------------------------------------------------------------
Total (Corr.)          1010.0   17.0
 
R-squared = 95.0495 percent
R-squared (adjusted for d.f.) = 92.9868 percent
Standard error of est. = 2.04124
Coeff. of variation = 10.2062 percent
Mean absolute error = 1.33333
Durbin-Watson statistic = 2.08


Here are the results you would get with a typical ANOVA program using the two factors:
Dep var:    MOOD      N:   18    Multiple R:  .975    Squared multiple R:  .950


                             Analysis of Variance

   Source   Sum-of-squares    DF  Mean-square     F-ratio       P

     DRUG           453.000    2      226.500      54.360       0.000
   ENZYME           450.000    1      450.000     108.000       0.000
    DRUG*
   ENZYME            57.000    2       28.500       6.840       0.010

    Error            50.000   12        4.167


Look at how much more information was provided by the regression output. Indeed, many ANOVA programs don't even provide the regression pieces MYSTAT provides at before the table.

However, even though the regression output provided more information and correct information, the questions that were asked are usually uninteresting or unimportant with that set of codes.

To answer important questions, the strategy is to develop contrast codes identical to the ones that we would use if each categorical variable were considered along in its own one-way ANOVA. That is, your first step is to code each categorical variable as if the other one did not exist. Then multiply the separately developed codes (one from each factor) together to produce the interaction codes.

An important point to remember is that you must develop these codes for all the groups present in the study, For example, if you just take the first factor Drug A, Drug B, and Placebo C, you would think of an important contrast that would be between the two drugs and the placebo. You might write the code as this.

l1 = -1 -1 2

However, this would be for just three groups and not all six in the experiment. This code will need to be repeated for both the groups with the enzyme and the groups without the enzyme. Always make sure you keep your grouping consistent. Here is the group solution we used in both the first coding scheme and in this one.

First Code Solution


Second Code Solution

	g1	g2	g3	g4	g5	g6
l1	1	 1	-2	 1	 1	-2
l2	1	-1	 0	 1	-1	 0
l3	1	 1	 1	-1	-1	-1
l4	1	 1	-2	-1	-1	 2
l5	1	-1	 0	-1	 1	 0
Here is the coded data set in MYSTAT


This data set has also been coded for use in Statlets.

Here is the regression output from MYSTAT

DEP VAR:    MOOD      N:      18   MULTIPLE R: 0.975  SQUARED MULTIPLE R: 0.950
ADJUSTED SQUARED MULTIPLE R: 0.930     STANDARD ERROR OF ESTIMATE:        2.041

  VARIABLE    COEFFICIENT    STD ERROR     STD COEF TOLERANCE    T    P(2 TAIL)

CONSTANT           20.000        0.481        0.000      .      41.569    0.000
      X1            3.500        0.340        0.661     1.000   10.288    0.000
      X2            1.000        0.589        0.109     1.000    1.697    0.115
      X3            5.000        0.481        0.667     1.000   10.392    0.000
      X4            0.500        0.340        0.094     1.000    1.470    0.167
      X5            2.000        0.589        0.218     1.000    3.394    0.005


                             ANALYSIS OF VARIANCE

   SOURCE   SUM-OF-SQUARES    DF  MEAN-SQUARE     F-RATIO       P

 REGRESSION        960.000     5      192.000      46.080       0.000
   RESIDUAL         50.000    12        4.167


Here is the output from Statlets.
                                       Standard          T
Parameter               Estimate         Error       Statistic      P-Value
---------------------------------------------------------------------------
CONSTANT                    20.0       0.481125          41.57       1.0E-4
X1                           3.5       0.340207          10.29       1.0E-4
X2                           1.0       0.589256           1.70       0.1154
X3                           5.0       0.481125          10.39       1.0E-4
X4                           0.5       0.340207           1.47       0.1674
X5                           2.0       0.589256           3.39       0.0053
---------------------------------------------------------------------------
 
                           Analysis of Variance
---------------------------------------------------------------------------
Source         Sum of Squares     Df    Mean Square    F-Ratio      P-Value
---------------------------------------------------------------------------
Model                   960.0    5.0          192.0      46.08       1.0E-4
Residual                 50.0   12.0        4.16667
---------------------------------------------------------------------------
Total (Corr.)          1010.0   17.0
 
R-squared = 95.0495 percent
R-squared (adjusted for d.f.) = 92.9868 percent
Standard error of est. = 2.04124
Coeff. of variation = 10.2062 percent
Mean absolute error = 1.33333
Durbin-Watson statistic = 2.08

Note that you use the TtoF program to make a full ANOVA table like that in Exhibit 12.7. Look at page 332 and note how the SS are grouped together.

For X1 and X2 the tconverter results are below. Note that you could add these results to get the results for the Drug main effect.



Note that because X1 and X2 are uncorrelated, their sums of squares add to produce the sum of squares attributed to both of them.

You should prefer the single-degree-of-freedom tests because of the focused comparisons, and because the rejection of the main effect test ß1 = ß2 = 0 is ambiguous.

The text presents the omnibus test for the drug variable in Exhibit 12.7 even though they do not recommend it. Unfortunately, in the traditional approach to ANOVA it is usually the only test that is presented in the source table.

Refer back to the first ANOVA output.