Simple Models: Statistical Inferences about Parameter Values.
In Chapter 4 we decided because of efficiency, practicality, and tradition to define total ERROR as the sum of the squared errors. Thus, the best estimate of the MODEL is the mean.
In this chapter we develop procedures for asking questions or testing hypotheses
about simple models. What is done in this chapter is equivalent to the one-sample
t-test. You may review traditional approaches to the one-sample
t-test in the EDPSY 101 material.
The generic problem is that you have a batch of DATA which you have calculated the mean (bo) as the model. It is an estimate of ßo. You would like to determine if it is equal to the hypothesized value Bo.
The null hypothesis is:
bo = Bo
MODEL C: Yi = Bo + ei No parameters are estimated.
MODEL A: Yi = bo + ei Here ßo is estimated with the sample mean (bo).
The question is, is MODEL A enough better than MODEL C, that we should reject the null hypothesis.
To establish this, we first calculate the percent reduction in error. PRE
This dialog box using MYSTAT will generate the error values for the simple model with no estimated parameters for the murder data.
This dialog box using MYSTAT will generate the error values for the augmented model where the single parameter was estimated using the sample mean.
Finally, this dialog box calculates both error values shown in the results below:
Results
TOTAL OBSERVATIONS: 25
ERROR(C) ERROR(A)
N OF CASES 25 25
SUM 0.642 0.615
To calculate PRE use the following formula:
PRE = .042
That is, MODEL A has 4% less ERROR than MODEL C.
SSR = SSE(C) - SSE(A)
The PRE value has clear meaning, however, it is not obvious statistically. There is a sampling distribution for PRE. Assume that the null hypothesis is true then eta2 is equal to 0. eta2 is the true proportional reduction in error. Because the mean is the best estimate for reducing the SSE, the calculated PRE will always be at least a little greater than zero (never less than zero). PRE is therefore a biased estimator of eta2 because PRE will always overestimate the true value of eta2 .
Look at the sampling distribution of PRE on page 80.
PRE tables are rare, but can be easily changed to F statistics.
The two reasons for calculating F are (a) to examine proportional reduction in error per additional parameter added to the model and (b) to compare the proportion of error that was reduced (PRE) to the proportion of error that remains (1-PRE).
We can think of the numerator of F as indicating the average proportional reduction in error per parameter added, and the denominator as the average proportional reduction in error that could be obtained by adding all remaining parameters. For MODEL A to be significantly better than MODEL C, we want the average error reduction for the parameters added to be much greater than the average error reduction we could get by adding the remainder of the possible parameters. Hence, if F is about 1, then we are doing no better than we could expect on average, so values of F near 1 suggest that we should stick with the simpler MODEL C. Values of F much larger than 1 imply that the average PRE per parameter added in MODEL A is much greater than the average which could be obtained by adding still more parameters. In that case, we would want to reject MODEL C in favor of MODEL A.
Read the stared material on page 86 in Judd & McClelland and complete the ANOVA table for our data on page 87.
The stared material I referred to reads:
Note that for most reasonable numbers, the observations for the 95% critical value for F is about 4. If we ignore the fractional part of F, then a useful rule of thumb which reduces the need to consult statistical tables frequently is to reject MODEL C in favor of MODEL A whenever F is greater than 5.
This is a typical ANOVA SUMMARY TABLE FROM OUR DATA.
STEPS IN EVERY ANALYSIS
- State the null and alternative hypotheses.
- Often these can be stated as:
- MODEL A = MODEL C
- Set the alpha level.
- Get the data.
- Do the calculation. Usually you will run the regression program.
- Make a decision concerning the null hypothesis. (reject or fail to reject MODEL C)
- Write the summary statement. ANOVA summary tables and verbiage usually suffice.
For step #5
Reject MODEL C if F obtained is greater than the F critical value.
If PRE is greater than PRE critical value.
If p is less than the alpha level in step 2 above.
Power = (1 - Type II error)
Statisticians must be able to estimate power. The probability of rejecting MODEL C given that it is incorrect. There is absolutely no reason to continue with a research project if you are unable to reject MODEL C.
To determine power, we need to know the sampling distribution of PRE and F, assuming that MODEL C is incorrect. This is quite difficult because saying MODEL C is incorrect is simply saying that eta2 is greater than 0. Or the true proportional reduction in error is greater than zero. However, statisticians have derived these sampling distributions for us for specific values of eta2. They have also derived the power tables which make these calculations easy. Exhibit 5.13 on page 92 or A 90 will allow us to make power determinations for problems with 1 df in the numerator and alpha = .05. Turn to the page 92. What was our power to detect differences using 25 cities, if MODEL A truly reduced error by .05%? Answer = .19. You could say that we had a 19% chance of rejecting MODEL C if it were incorrect.
Obviously, you need eta2 values to enter this table. Where do they come from?
1. Using previous studies. Remember that these studies calculate biased values -- PRE overestimates eta2. Here is the formula to produce unbiased estimates.

Many computer programs calculate this value and call it the adjusted R2.
Brain Exercise
Work a problem, What would the unbiased estimate of eta2 be when PRE was found to be .36 and the investigator used a regression model with 50 subjects and the regression equation estimated 5 parameters.
2. You can use values suggested by Cohen 1977
small effect eta2 = .02
medium effect eta2 = .13
large effect eta2 = .26.
Sorry these are not in our table, but use .03, .1, and .3.
3. There is another way using what one knows about the variance in the data. It is presented on pages 94 -95. It usually requires far more experience with the data so it will not be discussed here, however, it is quite clear in the text.
How do we improve Power.
- Reduce error by using better measurement devices (more reliable instruments).
- Improve the quality of the MODEL
- Increase alpha.
- Increase the sample size.
There are two reasons for not routinely using a large number of observations. First, it may be infeasible due to cost or other data collection constraints to obtain more observations. Second, power might be so high that some statistically significant results may be misleading.
All else being equal, more statistical power is always better; we just need to be careful in interpreting the results as substantively important when power is very high.
Finally, note the error in the text on page 100. The confidence interval for bo should have an F1,n-1,alpha.
Statlets Sample Size
Demonstrate how to use Statlets to find the needed sample size for two and
multiple group samples with known Power.
Statlets Problem
Statlets is unable to calculate new values from old values and store them in the data window. However, you can transform values in the applets. Below we solve this simplest problem using Statlets.
First copy the data from Judd1-1e.html into the computer clipboard.
Next, start the Menu Version of Statlets.
Our compact model is going to be that the accident rates for the 50 states is 2.2. This number (Bo) is not estimated from the sample data. To calculate the errors for this model, you will need to subtract 2.2 from each of the rates and square those values. ERR_2.2 in the Judd1-1e.html data has 2.2 subtracted from each value. In Statlets choose the menus Analyze/One Sample/One Variable Analysis. Then click ERR_2.2 and the arrow to place it in the Sample Data box. Next click the arrows on the far right of the empty dialog entry box until the value ^2 is visible. This will square all these values before any further analysis. Next click the Stats tab. Your output should look like the following:
Summary Statistics for ERR_2.2^2
Sample size = 50
Mean = 2.6032
Median = 1.565
Standard deviation = 3.0452
Minimum = 0.0
Maximum = 12.96
Range = 12.96
Standardized skewness = 5.75857
Standardized kurtosis = 5.09012
Statistical Interpreter
-----------------------
This table shows summary statistics for ERR_2.2^2. It includes
measures of central tendency, measures of variability, and measures of
shape. Of particular interest here are the standardized skewness and
standardized kurtosis, which can be used to determine whether the
sample comes from a normal distribution. Values of these statistics
outside the range of -2 to +2 indicate significant departures from
normality, which would tend to invalidate any statistical test
regarding the standard deviation. In this case, the standardized
skewness is not within the range expected for data from a normal
distribution. The standardized kurtosis is not within the range
expected for data from a normal distribution.
To sum these error values, simply multiple the mean by the sample size. For this compact model the sum of squared errors would equal 2.6032 times 50 = 130.16.
The augmented model uses the mean (3.572). The errors for this model are also calculated in the variable MEAN_ERR.
Click the Input tab, and substitute MEAN_ERR for the ERR_2.2 value. Make sure ^2 is still present in the far right input box, then click the Stats tab. Your output should be the following:
Summary Statistics for MEAN_ERR^2
Sample size = 50
Mean = 0.720816
Median = 0.183184
Standard deviation = 1.15748
Minimum = 7.84E-4
Maximum = 4.96398
Range = 4.9632
Standardized skewness = 6.4223
Standardized kurtosis = 6.4951
Statistical Interpreter
-----------------------
This table shows summary statistics for MEAN_ERR^2. It includes
measures of central tendency, measures of variability, and measures of
shape. Of particular interest here are the standardized skewness and
standardized kurtosis, which can be used to determine whether the
sample comes from a normal distribution. Values of these statistics
outside the range of -2 to +2 indicate significant departures from
normality, which would tend to invalidate any statistical test
regarding the standard deviation. In this case, the standardized
skewness is not within the range expected for data from a normal
distribution. The standardized kurtosis is not within the range
expected for data from a normal distribution.
Again, to calculate the sum of squared error using this augmented model with one estimated parameters multiply the mean (0.720816) by 50 = 36.0408.
The null hypothesis is that the compact and augmented models are equivalent. We have calculated the sum of squared errors from each model, and can now compare the models by calculating PRE and the associated F statistic.
PRE = (130.16 - 36.0408)/130.16 = .72310387216
F = (PRE/PA-PC)/((1-PRE)/(n-PA)) or
F = (.72/1)/(.28/49) = 126.00
Reducing the error by 72% and an associated F of 126 indicates that the augmented model is not equal to compact model. The difference is both significant and important.
One-Sample t-test Equivalence
Earlier we said that this procedure was the equivalent of a one-sample t-test. We calculated a PRE and an F statistic. An F statistic with one degree of freedom in the numerator is the equivalent a squared t statistic.
Replace the MEAN_ERR term with the RATE variable in the input tab. Make sure you eliminate the ^2 term in the right input box in the dialog. This input box should be empty. We are not transforming RATE. Now select the t-test tab. Using the Options button enter 2.2 as the value in the null hypothesis box. This sets up the applet to do a one-sample t-test investigating whether the mean of RATE is different that the null hypothesis value of 2.2. Clicking OK should yield the following:
Estimation of Population Mean for RATE
Sample size = 50
Mean = 3.572
95.0% confidence interval for mean: 3.572 +/- 0.243736 [3.32826,3.81574]
t-test
------
Null hypothesis: mean = 2.2
Alt. hypothesis: not equal
Computed t-statistic = 11.312
P-value = 2.88658E-15
Reject the null hypothesis for alpha = 0.05
Statistical Interpreter
-----------------------
This table displays the result of a t-test performed to test the null
hypothesis that the mean of the population from which the sample data
come equals 2.2 versus the alternative hypothesis that the mean is not
equal to 2.2. Since the P-value for this test is less than 0.05, we
can reject the null hypothesis at the 95.0% confidence level. Also
shown is a 95.0% confidence interval for the population mean. In
repeated sampling, 95.0% of all such intervals will contain the true
mean.
If you square the t-statistic value above (11.312) you get within rounding error the same F value we calculated above.
Why
You may now be asking why we went to all the trouble to explain this model comparison approach when all we did was test something identical to what could be tested with the one-sample t-test. The reason is that the model comparison approach can be generalized to investigating most inferential statistical questions. The one-sample t-test doesn't generalize.