Answers To Student Generated Questions and Comments
You can go directly to the graph created to show the interaction of VIQ with itself in predicting reading
here.
The
final class contest is now available.
On 1/24/96 I received the following e-mail request:
Dr. Hale:
Can you please remember to mention the text errata (from p. 100 --
formula) in class? From your class notes it is not clear what the entire
formula should read.
Thanks!
Answer
On page 100 in the text is a trivial typographical error in the formula numbered 5.4. The error is in the subscripts for the F. As they are listed in the text they are:
1, n=1, alpha
They should be:
1, n-1, alpha
A Class Contest
On page 119 there is a problem in your text with the first equation. The first student to provide me with the correct equation will win an all expense paid trip to Kern for a hamburger and fries. The second place student will win just the hamburger (no fries). The data set the text is using is difficult to find it is in Appendix B1. You can e-mail me your response from here
h12@psu.edu. The contest winners will be announced below.
David Hermann -- first place
JoAnn Bartley -- second place
On 1/26/96 I had the following question asked after class.
Do I have the following correct when you are talking about the different models?
Models and Parameters Involved
| Yi = bo + b1X | two parameters |
| Yi = bo | one parameter |
| Yi = b1X | one parameter |
| Yi = Bo | none |
You are correct.
On 1/30/96 I had the following suggestion from a student
>Just thought this site might be of use to others in the class. It
contains Stuffit Expander for both Windows (yeah!) and Mac (boo!)
machines. Just go to the freeware/shareware section of this page:
http://www.aladdinsys.com/#SFP
I tried some of the links to your Judd & McClelland data sets, and the
links worked fine. However when I attempted to use them in MYSTAT, they
appeared to be corrupted, similar to what happened in class.
hope this helps
I Responded
Thanks for the information. I believe I have another URL for Stuffit. It probably leads to your location. I've been able to download and use the files, and a couple of students with Stuffit have too. We will try another adventure in class tomorrow. I'll post your information on the Answers page.
Again thanks,
Bob
On 2/2/96 a student offered this suggestion
Subject: data sets
I have an idea if downloading the data sets doesn't work. I was in a
class where the professor sent us the data sets through e-mail. He just
attached the file and sent it to our access accounts. This would be
easier than carting around disks. Hope this might help.
I Responded
Great Idea. I'll send an attached file on a Eudora message to everyone. Those with IBM computers will just need to ignore the files.
On 2/7/96 a student asked
In my notes I have:
NewerrorC = (Rate-(3.8-.002*SDensity))^2
NewerrorA = (Rate-(3.572-.002*SDensity))^2
Then I have a statement about this controlling for density.
In the Web notes, it seems to say that if I regress the deviated rate on
the deviated density I will be controlling for density. Am I correct. And
if so, what are we doing above in the Newerror A and C.
I'm also a little bit at a loss when it comes to interpreting the two
different T values. I think I understand the difference. The top T helps
us determine if the constant is 0 and the bottom T helps us to determine if
the slope is 0. The problem is in using that information for anything.
Just wondering (wandering?) out loud.
Thanks
I Responded
First remember that what we want to test in the end is whether the average criterion value (Rate) is different that 3.8.
Next, remember that the null hypothesis is always that the two models are equal to one another. The error statements above could be written this way:
NewerrorC = (Rate - MODELC)2
NewerrorA = (Rate - MODELA)2
Given the null hypothesis, we could write these two models as follows:
MODELC: Rate = B0 -.002*SDensity
MODELA: Rate = ß0 -.002*SDensity
Now you should see that we are testing whether B0 is equal to ß0. You also need to keep in mind that when you use mean deviated predictors, ß0 is the mean value of the criterion.
Both models contain the identical term -.002*SDensity. Therefore, SDensity is accounted for in both MODELS any differences we detect between MODELC and MODELA can't be due to SDensity because it is contained in both models. Also remember that the weight ß1 (-.002) does not change from the model where it was estimated with the raw data to the model where it is estimated using the mean deviated values. Therefore, Density is controlled.
Now the two MYSTAT t values. You are correct when you say that the top t tells us whether the constant ß0 is equal to zero. Put another way, it is telling us whether the mean criterion value (3.572) is different than zero. Looking at what we have done so far, this t isn't very helpful-- this is not what we want to detect. We want to test to see if ß0 is different than 3.8. Here is the trick that makes that top t value useful. We also can deviate the criterion score (Rate). If we subtract 3.8 from each of the Rate values (Rate-3.8), then zero on the criterion scale is where 3.8 used to be. This conforms to the question our null hypothesis is asking. The t value tests to see if ß0 is different than zero, but since zero is now where 3.8 was, we are testing whether ß0 is different than 3.8. This is what we wanted all along.
I've included some scatterplots to illustrate this using the Judd exhibit b1 data below. Note how the scatterplots using the raw data always have the same general shape, but when we deviate the predictor variable, the constant is the mean of the criterion (the top t is still of little value). When we deviate the criterion (using 3.8) the t is testing exactly what we want.



On 2/12/96 I received the following
Dr. Hale,
Our syllabus states that our exams will be open book and open note. Is
that correct? I thought that I heard you state something differently in
class.
I am also still confused about deviated scores. In your example on 2-9
both scores were deviated, the third and second grade. How do I know
which scores to deviate? One, both, or not at all?
I Responded
Your exams are open book and note. I hope what I said was that there are "In Class" and "Take Home" portions. The "In Class" test doesn't involve many calculations, but you should know the PRE and F formulas. I would have those memorized.
In deviating scores, it depends on what models you are comparing, and what you want to do? In a simple regression, you wouldn't deviate anything. If these were the models, you would deviate just the predictor:
MODEL C: Y-hat = 0 + ß1X1
MODEL A: Y-hat - ßo + + ß1X1
on the other hand if these were your models, you would deviate both:
MODEL C: Y-hat = 23 + ß1X1
MODEL A: Y-hat - ßo + ß1X1
On 2/21/96 I received the following
>Dear Dr. Hale,
>Small problem here...
>I just finished transferring the School Referral data to the Mystat data
>sheet and all went well...
>However I now see that you wish us to substitute a variable AGGRES for a
>variable called HSRANK as we work out the models on 180-181.
>The problem is that AGGRES is not a defined variable in the School
>Referral data set.
>Its variables are:
>Group, ID, Psych, VIQ, PIQ, Read, Spell, Arith, Unx, Onx, Age, and Grade
>Am I slipping into another dissociative state?
>Help! =)
I Responded
There may be a problem with the data set on the net. I'll fix it right now. In the MYSTAT text (Macintosh version page 30), Aggress is defined correctly. Thanks for the information on this problem.
The variables labels were fixed and should read MDT ID PSYCH VIQ PIQ FSIQ READ SPELL ARITH AGGRESS WITHDRAW AGE GRADE. Sorry for any confusion I might have caused.
A Second Class Contest
The first student responding correctly using e-mail to the following will win an all expense trip to my house, where I will personally prepare dinner for them and one guest. I won't be quite as nice as going to Martha Stewart's home, but I'll try to come close.
Using the
School Referrals data should the weights be identical for VIQ and PIQ and should the weights be identical for AGGRESS and WITHDRAW in predicting READ. Note the weights for VIQ must equal PIQ and the weight for AGGRESS must equal WITHDRAW, but the weights for both VIQ and PIQ do not need to equal the weights applied to AGGRESS and WITHDRAW. The compact model might be written like this:
Reading = ß0 + ß1VIQ + ß1PIQ + ß2AGGRESS + ß2WITHDRAW + error
Your answer should include your decision, the PRE value, and the F statistic. You can e-mail your answer to me at
h12@psu.edu. I'll post the winner's name below
The winners of this contest were Terri Wall and Mark Toci -- nice jobs.
On 2/26/96 I received the following
Concerning the tables you presented at the end of class on Monday 2/26
(and included on class notes) where you "TEST FOR EFFECT CALLED":
Could you review how you are generating each one of those separate
tables?
Also, in reviewing past chapters from the text, I find that I'm confused
about calculating confidence intervals. On p. 100 (text) the formula
for calculating the confidence interval for ß0 uses "n" in the
denominator. On p. 131 (text) the formula for calculating ß1 uses
"SSX" in the denominator. In class, you outlined a procedure for
calculating a confidence interval as:
Coef +/- (critical t x Std Err)
How are all these related?
I Responded
Let me take the first question. and start with the MYSTAT regression output and then the generated ANOVA table. With those both in view, then I explain how they were generated.
Below is the output generated using the School Referrals data and regressing VIQ, PIQ, AGGRESS on ARITH.
Dep var: ARITH N: 200 Multiple R: .632 Squared multiple R: .399
Adjusted squared multiple R: .390 Standard error of estimate: 9.212
Variable Coefficient Std error Std coef Tolerance T P(2 tail)
CONSTANT 34.595 4.256 0.000 . 8.129 0.000
VIQ 0.370 0.058 0.469 0.5696278 6.390 0.000
PIQ 0.164 0.056 0.216 0.5691935 2.942 0.004
AGGRESS 0.091 0.144 0.035 0.9989783 0.634 0.527
Analysis of Variance
Source Sum-of-squares DF Mean-square F-ratio P
Regression 11061.646 3 3687.215 43.451 0.000
Residual 16632.229 196 84.858
Below is a full ANOVA table like I would suggest in class
Source Sum-of-squares DF Mean-square F-ratio P PRE
Regression 11061.646 3 3687.215 43.451 0.000 .339
VIQ 3464.943 1 3464.943 40.832 0.000 .172
PIQ 734.483 1 734.483 8.655 0.004 .042
AGGRESS 34.113 1 34.113 0.402 0.527 .002
Residual 16632.229 196 84.858
Notice that the lines for Regression and Residual are identical to the lines in the MYSTAT output above. Next notice that the p values for VIQ, PIQ, and AGGRESS are identical to the p values listed for the tabled t values in the MYSTAT output.
Next, the F ratios are simply the square of the tabled t values in the MYSTAT output. Because these are squared t values, the df column for VIQ, PIQ, and AGGRESS are all 1. By using your calculator, the
ttoF Calculator program in Netscape, or the EDPSY 507 program you can calculate the PRE value, and the mean-square and Sum-of-squares columns. Note you could use the MYSTAT output and generate a row for Total. You would simply add together the rows for Regression and Residual for the Sum-of-squares and DF columns.
I've run out of time and will answer the second question later.
We generated an interaction model with VIQ interacting with itself to predict reading. The regression equation was:
Read = 104.401 -0.992*VIQ + .009*VIQ2
I did not have time to demonstrate the graph of this equation. The following figure does so.

Notice that I have allowed VIQ to range from 0 to 145. In actuality, VIQ would only range from 45 to 145.
On 3/24/96 I received the following
Dr. Hale--I'm a little (actually, a lot) confused about exponents when
figuring out derivatives. I'm afraid I'm oversimplying & don't see where
they fit in. I'm looking at this process as taking the variable of
interest w/its appropriate b weights & that's the slope; everything else is
the constant. I've worked the problems (without looking at the answers) &
get the correct equations. Am I oversimplifying? Am I leaving out
something important & inadvertently doing the process incorrectly? Will I
need to use the exponent process in the future?
I Responded
It certainly dosn't appear that you are oversimplifying if you are getting the correct answers. This is the process.
You have a complicated regression equation that you need to convert into simple form. You know that this equation contains both a slope which is multiplied by the variable of interest and the constant. Unfortunately, they are mixed up within the complicated equation. You also know that if you take the first derivative of this complex equation with respect to the variable of interest that you solve for the slope. Now you know the slope. If you simply multiply the slope by the variable of interest you have the major part of the simple regression equation. Now if you subtract this same product (slope*variable of interest) from the original regression equation, you will be left with the constant. Therefore, we say that we add this product to and subtract it from the complex equation.
Now let's try a few derivitaves of complex regression equations.
What is the derivatave dy/dx of:
y = 2 + 2x + x2
answer:
dy/dx = 2 + 2x
Now try this one:
What is the derivative of
y = 14x + 7x2 + 2.3333x3
The answer is:
dy/dx = 14 + 14x + 14x2
If these didn't give you any problem, then you know what you are doing. If you are confused about what is going on, then please set up an appointment to see me.
Here is the big one. Ten winners are possible!!! Prizes are a ice cream at the Penn State Dairy after class. Your problem is to use Helmert contrasts and effect code a 3X2X2 ANOVA problem. I have tried to indicate the group and group assignment in the figure below. Be sure to include all main and interaction effects.

Here is the
solution to the contest.
On 4/29 The following questions and answers occurred.
Dr. Hale,
I have a few questions concerning #5 of the take home final regarding
discriminant analysis. I was hoping that there would be some helpful
information on the 507 page, but unfortunately, the server has been down
all weekend, so as of this moment, I still don't know!
Yes the server was down, I discovered this early Mon. morning and reset it. While this won't help now, in the future, if someone would call about problems like this we could reset the server.
I've noticed that the frequency table at the end of the discriminant
analysis output has a total frequency of 300,
That is correct
whereas our data set has only 100.
That's correct you only have partial data from this data set.
I had planned on doing a chi-square of the hit-rates versus the
actual races of the subjects in order to answer 5a, but I cannot do that if the analysis used 300 subjects.
I don't understand why you couldn't do the Chi-square if you had the 300 subjects. Obviously you couldn't use the small data set you had, but you have all the values in the table. Fortunately, this is not what you need to do to answer the question. We discussed this when we went over the IRIS output in class. There is a single statistical calculation that tells you whether there is a significant ability to discriminant between the races with the variables. Later the output tells you whether each of the possible discriminant functions does a significant job of discriminating between the groups.
Also with regards to 5a is using a chi-square a correct approach?
No it is not necessary.
Or should I use another technique? Quite honestly, my notes from your demonstration on SYSTAT aren't complete with respect to answering this question, and others in the class that I know have had difficulty with this question as well. Any helpful comments or hints would be appreciated!
Thanks,
You are welcome.
You are visitor ...
... since January 24, 1996.