production date 2/5/00

Inferential Statistics & Hypothesis Testing

Table of Contents Objectives
Hypotheses Learn about the Null and Alternative Hypotheses.
Critical Z Values Calculating Critical Z Values.
Using Statlets Using Statlets to Calculate Critical Z Values.
Computer Project 17 Calculating Critical Z Values.
Z Statistic Calculating the One-Case Z Statistic.
Power Defining Power
Final Notes Defining Power
Computer Project 18 Conducting a One-Case Z Test
Additional Information Discover other distributions
Questions/Test Take the End of Chapter Test
Report Send a Chapter Report to your Instructor


Suppose you are a third grade reading teacher and know in the population of third grade readers that on an achievement test the mean is 100 and the standard deviation is 15 (mu = 100, sigma = 15). You have developed a new reading program. You place a single student in this reading program and later test the student. This student obtains a score of 122 on the achievement test. The question that must be answered is whether the reading program effected this youngster's score.

There are two conflicting statements to consider. First, the reading program may have had no effect on the student's reading ability. The difference between the student's score (122) and the reported mean (100) is simply due to chance in the sampling. The second statement is that the reading program had an effect. The student's score (122) is so different from the population mean (100) that it is exceedingly unlikely to be found by chance considering that population.

To make a decision concerning these statements we use an indirect proof. We assume one of the statements is true, and then try to reject it. This line of reasoning is used in several fields other than statistics examples include navigation and law. In law, for example, we assume that the person is innocent and only reject that assumption if the evidence is so great that there is essentially no doubt that the assumption of innocence is wrong.

Our first inferential tests will be conducted in distributions that are normal in shape. The figure above on the right, shows two normal distributions (A & B). Note a score at line one has a high probability of being in either distribution A or B. Line two's score has a fairly high probability of being in distribution B, but a very low probability of being in distribution A. If you assumed the statement that the score came from distribution A was true, you would reject this statement if you found a score at line two's position and fail to reject this assumption if the score was at line one's position.



Hypotheses   

In statistics, these conflicting statements are called hypotheses, and their truth is tested using hypothesis tests. There are two conflicting hypotheses. The hypothesis that states that there is no effect is called the null hypothesis. The statement that specifies an effect or a difference is called the alternative hypothesis. The null hypothesis is symbolized by Ho, while the alternative hypothesis is symbolized by H1.

These hypotheses can take either directional or nondirectional forms. Returning to the reading teacher's experiment. If the teacher simply wishes to test whether the reading program changed the student's reading, the null hypothesis would state that the reading program didn't change the scores. Stated in another way, we could say that this student's reading score comes from a population with a mean of 100. The alternative hypothesis would state that there was a change. Another way of stating the alternative hypothesis is to say that the youngster's score did not come from a population where the mean is equal to 100. The alternative hypothesis could be true if the youngster's score came from a population where the mean was greater than 100 or from a population where the mean was less than 100. The direction of the difference (positive or negative with respect to 100) would not matter. Of course, these hypotheses are always stated using symbols instead of words. These nondirectional null and alternative hypotheses would be written as below in symbol form.



Again, the null hypothesis simply says that the population mean for this student is equal to 100 (no change). The alternative hypothesis states that the population mean for this student is not equal to 100 (a change where direction is not important).

If the teacher was only concerned in detecting if the reading program increased the youngsters reading score, then we have a directional test. A good rule is to always write the alternative hypothesis first in the directional situation. In this case, the alternative hypothesis would only state that the child's score increased, or that the population mean from which the child's score came from was above 100. The null hypothesis would need to capture all the other possibilities. Written with symbols, the null and alternative hypotheses are as follows.



As a general rule, unless there are very strong theoretical reasons usually nondirectional hypotheses are preferred. However, if directional hypotheses can be generated, they allow more powerful tests. We will see why power increases in directional tests later. In this learning environment (as well as in real life), nondirectional tests are usually used.

Statistical testing involves determining whether Ho can be rejected. As Sir Ronald Fisher in 1935 stated:

In relation to any experiment we may speak of this hypothesis as the "null hypothesis," and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.

This rejection is based on a conditional probability [p(result | Ho is true)]. We reject the null hypothesis if this probability is a small value (we are unlikely to get our result given the null hypothesis is true). We only fail to reject the null hypothesis if this probability is not a small value (these results could happen given that the null hypothesis is true). In a sense, researchers only make a decision when they reject the null hypothesis. If they fail to reject the null hypothesis, then they have not really made a decision because they never never accept the null hypothesis, only fail to reject it.

The Decision Boxfigure below illustrates the decisions than can be made when the null hypothesis is either false or true. Note that in the upper left and lower right boxes, are the correct decisions. This is what is wanted -- researchers want to be correct in their decision making. In the upper right and lower left boxes decisions are in error. These errors are called type 1 and type 2 errors respectively. Type 1 errors are symbolized with the Greek letter alpha. Type 2 errors are symbolized with the Greek letter beta. Type 1 and 2 errors are inversely related to one another. As the likelihood of a type 1 error decreases, the likelihood of a type 2 error increases.



One of the important properties about making decisions using inferential statistical procedures is that you get to determine the level of your type 1 error. You know how often you will reject the null hypothesis when it is true. This type 1 error is often called the alpha level or alpha risk, or significance level, or p level or simply the probability.

The most typical values for type 1 errors are .05, .01, and .001. These are very low levels. We are, at the most, only going to make a type 1 error 5 out of every 100 times (.05).

This is a very important advantage of statistical decision making. A researcher using statistical techniques gets to decide how often they will be wrong when they make a decision. This makes statistical decision making unique. There are almost no other areas of human decision making where the degree of error in the decision making process is established before the decision is made. Usually, humans make the decision, and then simply wait to see if they are correct -- sometimes with disastrous results.

The Legal Decision Box below is equivalent to the Statistical Decision Box above, but substitutes law for statistics. Note that only the column and row headings have been changed. Also note that in courts of law, as in statistics, type 1 errors are set low. Society does not want an innocent person sent to jail. To make sure that type 1 errors are very low, our legal system has been designed to allow type 2 errors to be higher. They ways that juries are instructed mandates that their decisions will more likely let the guilty go free (type 2 error) than convict the innocent (type 1 error).




Critical Z values    

The largest value and, by far, the most frequent value set by scientists for the alpha level (type 1 error) is .05. Using that value, researchers can set up regions on the normal curve. If they only want to reject values given the null hypothesis is true 5% of the time or less. Therefore, if values are around the mean 95% of the time, they will fail to reject the null hypothesis. Remembering back to the Empirical Rule, you recall that the mean ±2z contain about 95% of the scores. Using the Z table or other applets found in Chapter 6, you can look up the exact value. Instead of +2 and -2 the correct value is ±1.96z. You should memorize this value. Thus, if we assume a normal distribution, and that the null hypothesis is true (Ho: mu = 100), then any z value that we would calculate between -1.96 and +1.96 happens frequently given the null hypothesis. Z values outside of ±1.96 happen 5% of the time or less. These are rare values and, if found, we will reject the null hypothesis. If we set alpha = .05, the rare z values of ±1.96 are called critical values (zcv). The figure directly below illustrates this situation.




Using Statlets    

Of course, Statlets can be used to find critical values using type 1 errors of .05 as well as other situations where the tests are directional and/or the type 1 error value has changed.

Alpha .05, Nondirectional Test

To have Statlets calculate the critical value for the Z score in a normal distribution first select the Plot/Probability Distributions procedure and select the Normal distribution. The default value for the normal distribution using this procedure is that the mean equals zero and the standard deviation is equal to one. Make sure these are the values used for Z scores by clicking the PDF tab, and if necessary, changing to these values using the Options button.

If the type 1 error value was set at 5%, and the test is nondirectional, then 2.5% of the normal distribution would be in each tail. To find the critical values for this test, click the Critical Values tab, and set the Tail Areas in the Options button to 0.025 and 0.975 as shown in the figure below.

The first tail area of 0.025 will allow Statlets to calculate the critical z value when 2.5% of the distribution is below it. The second tail area of 0.975 will allow Statlets to calculate the critical z value when 2.5% of the distribution is above it. Adding both these tail areas together produces the type 1 error of 5%. The calculated values of -1.95997 and 1.95997 are shown in the figure above.

Alpha .05, Directional Test

To have Statlets calculate the critical value for alpha = .05 and a directional test, the direction of the test needs to be determined first. Suppose that the null and alternative hypothesis were written as:
Ho: < or = 100
H1: µ > 100
Hypotheses stated in this manner would indicate that to reject the null hypothesis, all 5% of the error should be placed in the upper tail. Or stated conversely, 95% of the normal distribution's probability would be below the critical z value. The correct tail area value, and the calculation of the critical z value from the Plot/Probability Distribution procedure are shown directly below.


Alpha .01, Nondirectional Test

The correct tail area value, and the calculation of the critical z value when the type 1 error is set to .01 and the test is nondirectional are shown in the figure below. Remember, that since the test is nondirectional, that the type 1 error needs to be divided equally for each tail.


Alpha .001, Directional Test

Finally, the correct tail area value, and the calculation of the critical z value when the type 1 error is set at .001, and the test is directional, with the following alternative hypothesis is shown directly below. Remember, that since the test is directional, that all the error can be put in the correct tail area.
H1: > 100.

Of course if the alternative hypothesis would have been written as:
H1: < 100.
Then 0.001 would have been entered as the Tail Area value using the Critical Values' Option button, and the calculated critical z value would have been -3.09024 .


Computer Project 17   

Use the Statlets' procedure Plot/Probability Distributions to determine the critical z value for a directional test where the alternative hypothesis is:
H1: µ < 200
and the type 1 error rate is set at .025
If your instructor requests, submit the project 17 report.


The Z statistic    

Finally, we are ready to formally calculate the z statistic for a single case. However, this is no different than calculating the z score for a subject, which you have done previously. The only assumption that we must make is that the original population data is normally distributed. If the population is not normally distributed, then the percentages above, below, and between the z values may be dramatically different than those given for normal distributions. Assuming a normal distribution, you are going to test to see whether the one-subject score is different than the hypothesized population mean.

Like every statistic calculated from now on in this environment, there are six separate steps involved. Returning to the reading teacher example where the teacher knows that the population of readers score a mean on the test of 100 and a standard deviation of 15. A child is subjected to a new reading program and the teacher wants to know if the program changed the child's reading score. Because the teacher is looking only for a change, and not a direction for the change, this is a nondirectional experiment.

Step 1 Specify the null and alternative hypotheses



Step 2 Set the probability level

and if doing hand calculations find the critical values for your statistic. Let's set alpha = .05
This always sets up z critical values of ±1.96
Of course, if you set other type 1 (alpha) error values use Statlets to find the critical value(s).

Step 3 Collect the data

In this case, the data is easy. Our child scored a 122 on the reading test.

Step 4 Calculate the statistic

In this case we calculate the z statistic with equation below.



For our example z = (122-100)/15 = 1.466

Step 5 Make a decision concerning the null hypothesis

In this case, since a z of 1.466 is within the fail to reject region, we will fail to reject Ho.

Step 6 Write a summary statement

These summary statements will have an English component and a summary of the statistical analysis. For example, in this case we might report in our article that "This child's score was not found to be significantly different than the population mean of 100 (z = 1.466, p>.05).


Power    

Before proceeding, lets review some definitions and give a formal definition to the concept of power. The probability of a type 1 error is set by the researcher and is called the alpha risk. The probability of a type 2 error is symbolized as beta. It increases as the alpha level decreases. The power of a statistical test is its ability to detect a difference when one exists, or the probability of rejecting Ho when Ho is false. Note that this is a conditional probability. It is the probability of rejecting Ho, given that Ho is false. Since the type 2 error is the probability of failing to reject Ho when it is false, power must be 1-beta. The figure below illustrates this point.



There are many books and math procedures which allow scientists to estimate the power they have in an experiment. We will learn to use Statlets to do this in the following chapters. However before learning how to calculate power, here are the factors that effect the power in an experiment.

  1. 1. As alpha gets smaller, power decreases.
  2. 2. As the difference between Ho and H1 increases, power increases.
  3. 3. As the sample size in the experiment increases, power increases.
  4. 4. Directional hypotheses lead to more power

Final Notes    

In using the z score to test hypotheses, you must assume the original population is normally distributed. The values of the z scores provided in this environment are only good for normal distributions. If the population data is not normally distributed, then you use distribution free statistics.


Computer Project 18   

The average psychology test score administered to students entering the School Psychology Program at State University is 32, with a standard deviation of 6.2. Is a score of 42 so different that it wouldn't be expected to occur by chance? This is a nondirectional test with alpha set at .03. If your instructor requests, submit the project 18 report.

Statlets does not have a specific procedure to calculate the One-case Z statistic. This is not unusual, as almost none of the statistical software programs do this calculation because of its ease. So, for this project, a hand calculation of the Z statistic is necessary. Use the Statlets' procedure Plot/Probability Distributions to determine the critical Z value for a nondirectional test where the alpha level is set at .03. Typically, such an unusual type 1 error would never be chosen because, in the past without software programs to calculate them, researchers needed to rely on tabled critical values and an alpha of .03 was typically not found in tables. With Statlets researchers can use any type 1 error value they want. However, because alpha values of .05, .01, and .001 are so typically used, it is doubtful you will ever see this value used elsewhere.




Additional Information    

When the alpha level is set for an inferential test, it is also said that the confidence interval for the test is set. In Computer Project 18, an unusual confidence interval for the hypothesis test was set (0.03). As was noted, in that section, alpha levels of .05, .01, and .001 are quite typical. In Statlets you can set the default confidence level, and thus alpha for all your statistical tests by changing the Preference settings. Read the user manual section for changing Statlets' preferences. Note that if you set a confidence level of .90, the alpha level for your tests will be set at .10. Likewise, if you set a confidence level of .95, the alpha is .05, and a confidence level of .99 automatically sets an alpha level of .01. Making sure you set the confidence level where you want it is quite important in that it affects the interpretation of the P values by the software.

There is also quite a controversy about whether hypothesis testing is even appropriate. You should read some of the articles in the Fall 1998 issue of Research in the Schools.

Questions/Test    

This link allows you to take a computer scored end-of-chapter test. If your instructor requests to see the results of this examination, you can either copy and e-mail or print the feedback you will receive immediately after taking the test.

Report    

Please send a report indicating your understanding of this chapter to your instructor. You will need to know both your and your instructor's e-mail addresses.