production date 11/9/00

Measures of Central Tendency

Table of Contents Objectives
Mode Mode = most frequent score.
Median Median = middle score.
Mean Mean = average score.
Skew Revisited Learn more about skew.
What to Report When Learn when to report the mean, median or mode.
Lies Lies Lies Telling lies with statistics.
Additional Information Discover interesting Web Links
Computer Project 7 Using Statlets to calculate means, medians and modes.
Computer Project 8 Calculating means, medians and modes for Obedience.
Central Tendency for multiple variables Using Statlets Summarize/Statistics procedure.
Computer Project 9 Calculating means, medians and modes for multiple variables.
Questions/Test Take the End of Chapter Test
Report Send a Chapter Report to your Instructor


Another way to summarize data, in addition to the graphical methods you learned in Chapter 3, is to select a single number to stand for, or represent, the entire data set. Measures of central tendency indicate where the center or the most typical score in a distribution resides. There are three common measures of central tendency: mode, median, and mean.


Mode   

The mode is the most frequently occurring score in a distribution. There is no formula for the mode, it is simply found by inspecting the frequency distribution. In the distribution (shown on the left) the mode equals 8.

The mode is not the most useful of the three measures of central tendency, at least as far as conducting further statistical analyses. First, the mode is not a stable measure. Often from the same population, different samples will have different modes. Remember, one of the major advantages of statistical analysis is that calculations from samples (statistics) can be used to estimate the same property if it were calculated in the population (population parameter). Obviously, if the sample mode changes from sample to sample, it can not be a good estimate of the population mode. A second problem with modes, is that they are dead ends as far as further statistical calculations are concerned. After we have found the mode, we are done with it. Modes are rarely used in other calculations.

However, there are situations when modes are preferred as the measure of central tendency. When the variable is measured at the nominal level, modes are usually reported. Modes are also useful if several modes exist in a sample or population. Finally, if you must use one of the measures of central tendency to represent all the values in your sample, and if you are concerned with not making errors that are simply defined as being incorrect, there is no better representative measure than the mode.

For example, if a psychologist was looking at a data set that consisted of a list of handicapped children in a large urban school system, and the data consisted of the child's name, and the diagnostic label given to the youngster (i.e., Emotionally Disturbed, Mentally Retarded, Learning Disabled, etc.) the mode for the diagnostic label would be the label that occurred most frequently. If the psychologist did not have the list available, but knew that most of the children were Learning Disabled (the mode), if they simply chose to call every child Learning Disabled, they would be incorrect less often than if they chose any other label.

Count of Errors

When error is defined as simply being incorrect when you guess a variable's value, you measure your error by simply counting how many times you are incorrect. The Count of Errors page illustrates how selecting the mode as the one value to represent your data minimizes error defined in this manner.

Major and Minor Modes

Some textbooks define major and minor modes in a distribution. However, this distinction is not definitionally correct. The figure on the left, shows a distribution that some statisticians would say has a major and a minor mode. Obviously, there are two distinct humps. However, there is clearly only one score with the most frequencies.

Using Statlets to calculate the Mode

You can use this Statlets page to quickly calculate the mode for a data set. Simply type the data into the spreadsheet, click the Stats tab, and then click the Options button. Make sure that the Mode is selected as one of the statistics to calculate in the Options dialog.

Median   

The median is the middle score in a distribution in terms of frequencies. If a distribution has 5 scores then 2.5 of those frequencies are above the median and 2.5 of the frequencies are below the median. If there are 62 scores in a distribution then 31 of those frequencies are below and 31 of the frequencies are above the median. The median is the halfway point in a distribution in terms of frequencies.

Some textbooks go into great detail describing how to calculate the median, presenting quite complicated formulas. Actually, medians are easy to calculate if you always remember that they are the half way point in terms of frequencies, and that they minimize the sum of absolute error . The sum of absolute error will be explained later in this section. First, because it is easy to visualize, we will describe how to calculate the median when a value in a distribution only occurs once. When this happens we have an untied (no two values are identical) distribution. With any distribution, the number of frequencies may either be odd or even. The median formula changes slightly depending on whether the number of frequencies is odd or even.

Frequencies odd and untied

The figure on the left, shows a frequency distribution with an odd number of frequencies (5) and no two people have the same score (untied). By looking at this distribution you can tell that the middle score is 13.

If the number of frequencies is odd, then the median is the score that corresponds to the number of frequencies plus 1, divided by 2. Or more simply stated by this formula (N + 1)/2. In this example, N = 5 (for the 5 frequencies). The median is thus the score associated with frequency (5 + 1)/2 = 3. This value is 13. Remember, the median is a value and not a frequency.

Frequencies even and untied

The figure on the left, shows a frequency distribution with an even number of frequencies (6) and no two people having the same score (untied). By looking at this distribution one can see that the median must be halfway between 17 and 18. Thus, the median is 17.5.

When the number of frequencies is even, then the median is the score that corresponds to the average of the scores that are associated with the score at half the frequencies and the next higher score. Or, more simply put in a formula, when the distribution has an even number of frequencies, the median is the average of the score associated with the N/2 frequency and the score associated with the (N/2) +1 frequency.

Tied Frequencies

This figure shows a data set with tied frequencies. This frequency distribution has lines drawn indicating each class interval. Each line is drawn on a limit. For example the line below the score of 18 would be 17.5. The line above 18 would be 18.5 the upper limit of the score of 18. The class interval size is the difference between a score's upper and lower limit. In this case each class interval is one unit long. Cumulative frequencies (cf) are written on the right side of the distribution. Remember from Chapter 3 that cumulative frequencies should be written on the lines and not in the spaces between the lines.

From the figure we see that there are a total of 27 frequencies in this distribution. Thus the median must have 13.5 frequencies above and below it. The median by definition must have half the frequencies above and below it. If we work our way up the cumulative frequency column, we find that 13.5 frequencies is between 12 frequencies associated with the limit of 17.5 and 18 frequencies which is associated with the limit of 18.5. Thus, the median is somewhere between 17.5 and 18.5.

There is little difficulty in calculating the median. We have a distribution with an odd number of frequencies (27). Use the same rules for odd frequencies given above. First add one to the number of frequencies (27+1 = 28) and then choose the value associated with (N + 1)/2 frequencies. Here (28/2 = 14) the score value associated with the 14th frequency remains 18.

Important Steps

Below are the steps for calculating a median




Most computer programs use these same rules to calculate the median. Below is the stem and leaf output from a popular computer software package. The value for the median is shown using a peach color.
Stem and leaf plot of variable:    SCORE    , N =    27

Minimum is:       15.000
Lower hinge is:       16.500
Median is:        18.000
Upper hinge is:       19.000
Maximum is:       20.000

              15   000
              16 H 0000
              17   00000
              18 M 000000
              19 H 00000
              20   0000


Note that the value of the median (18) is the same as the value just calculated.

Sum of Absolute Differences

Another way to define error, instead of simply counting whether you are correct or incorrect, is by summing the absolute differences between the value selected to represent all the data, and each variable's value. The Sum of Absolute Differences page shows that if error is defined in this manner, then the median would minimize error.

Using Statlets to calculate the Median

You can use the same Statlets page to quickly calculate the median. Simply type the data into the spreadsheet, and click the Stats tab. If the median is not displayed in the output, make sure it is selected in the Options button's dialog.


Mean   

The single most used measure of central tendency is the mean. The mean is the score where all the algebraic deviations from it sum to zero. It is often called the simple average. There are two different symbols for the mean depending on whether it is a population parameter or a sample statistic. The equations for these means are shown on the below.

Mean Equations

Inspecting mean formulas

The new symbol in these equations () simply means to add up all of what follows.

can be read as "add all the x values together." "N" stands for all the members of the population, while "n" stands for all the members of the sample. Throughout this learning environment, we use N when considering population size and n when using a sample size in an equation.

A few quick calculations are in order. For this sample data set {10, 9, 8, 7, 6, 5} when we add all the numbers together .

When we divide 45 by the number of scores (6) we find the mean equal to 7.5.

xf
102
94
86
73
62
Using the data set shown on the left calculate the mean by adding all the numbers together and dividing by the number of scores. Add two tens, four nines, six eighths, etc. Then divide by the 17 scores added together.

The mean is equal to 8.0588.

Mean Calculators

Working several mean calculations will provide the practice necessary to calculate means accurately. This environment provides several tools to use in making those calculations. The first tool is the built in calculator in your computer. Don't forget it, it will allow you to calculate means quickly and efficiently. There is also a Mean Calculator page that allows you to calculate means for data sets that have 10 or fewer values. This page can be used for small problems like those you might find on an examination. The same Statlets' page can be used to calculate means. Simply type the data into the spreadsheet, and click the Stats tab. If the mean is not displayed, make sure it is selected in the Options button's dialog. Finally, for larger problems when you may want to do additional statistical analyses, or when you want to copy and paste data instead of directly typing it into the spreadsheet, you can use the Statlets menu applet found by clicking the Statlets button in the Navigation bar.

Sum of Squared Error

The final, and most frequently used definition of error is the sum of squared error. Here instead of a simple count of the number of times your measure of central tendency is wrong, or the sum of the absolute differences between the measure of central tendency and each variable's value, sum of squared error is defined as the sum of the differences between the measure of central tendency and each variable's value after those differences have been squared. The Sum of Squared Error page shows you that error defined in this manner is minimized by using the mean.


Skew Revisited   

xf
202
213
224
238
244
253
262
In Chapter 3 we talked about the shape of distributions. A distribution's skew is particularly important when calculating measures of central tendency. A distribution's skew affects whether the mean, median, and mode are identical, or different from one another. If a distribution is unimodal and symmetrical (skew = 0) then the mean, median, and mode are all identical. The figure on the left presents a unimodal symmetrical distribution. It is left as an exercise to calculate the mean, median, and look up the mode. They are all identical. You may want to enter the data into the Statlets Calculation Page, to conduct your calculations. After entering the data, click the Stats tab and select the three measures of central tendency using the Options button. You may also want to select the Skew calculation using the Options button.

In skewed distributions the mean tends to shift toward the tail of the distribution, the median tends to remain about the same, and the mode shifts to the highest point in the distribution. The figure on the left illustrates the effect in both negative and positive skewed distributions on the measures of central tendency.


What To Report When   



If the standardized skewness is outside of ±2 then, it is suggested that the median be reported as the best measure of central tendency. Of course, you can always report all three measures and expect that your audience is sophisticated enough to determine which is best.


Lies, Lies, Lies   

Reporting one of these measures of central tendency can often be to someone's advantage. Pretend that you are attempting to sell real estate. You would like to tell people that the typical house in your community is the largest value possible. Then, when you sell a home of lesser value, people will believe they are getting a deal -- this home is less expensive than the typical home in town. Now it shouldn't take much thought to see that the distribution of real estate in almost every community is positively skewed. There are many lower cost homes, fewer medium priced houses, and in every town, you can drive to that section where the wealthy people live. There are few of these homes. Obviously this distribution is positively skewed. If your newspaper reports the average price of a home in town, that value would be higher than either the median price of a home or the modal price of a house. Statisticians always have an ethical imperative to tell what is closest to the truth. Try a Cyber Real Estate Company. Choose your area and then look for homes without price limits, with 2 bedrooms, and one bathroom. Note the prices that are listed in this search. Use Statlets to analyze the data you collect. Is it positively skewed? If it isn't, why do you think it is distributed the way it is?


Additional Information   

Listservs are important information sharing tools. When you join a listserv, you join a community of people interested in the same information. This link provides you with a list of statistics lists


Computer Project 7   

Calculating Means, Medians, and Modes

To do this seventh computer project, you need to first read the Statlet's user manual for the Analyze/One Sample/One Variable Analysis that will be used in this procedure. Pay particular attention to the information concerning the information concerning the Input, and Stats tabs. While not everything in this set of procedures has been covered, a quick reading of the entire procedure will be helpful.

Next read the directions for this project. Then, close the page with the directions and look at the questions and possible answers in the project report.

To view the project report, you must be able to establish an active internet connection.

The project report will appear on a secondary page. After reading that secondary page, do not close it. Simply move the report page so that you can see this page (click and drag the window's title bar to expose this primary page). Click this page to activate it, and start Statlets by clicking the Statlets button on the Navigation panel. After completing the project, and if instructed to do so by your instructor, click the project report page to activate it, and answer the questions. After clicking the submit button, close both the report window, and the Statlet's windows.


Computer Project 8   

Using Obedience Data

For computer project 8, calculate the mean, median, and mode of the variable Volts in the Milgram Obedience to Authority Data. You can use the directions provided for Project 7 above, and simply substitute the Obedience data. After completing the project, and if instructed to do so, click the project report page and submit this project to your instructor.


Multiple Numeric Variables

Quick Calculations and Plots

Often data sets consist of several numeric variables. Using Statlet's Analyze/One Sample/One Variable Analysis procedure can take quite some time to calculate descriptive statistics for a single variable at a time. While this procedure produces a more extensive selection of tables and plots for a single variable, the procedure using the menu selections Summarize/Statistics does a quick job of producing descriptive statistics, and some plots for multiple numeric variables in the same data set. To understand this procedure, first read the user manual section for it. Understanding this procedure will allow you to complete Project 9 below.


Computer Project 9   

Using Data with Multiple Numeric Variables

For computer project 9, calculate the mean, median, and mode of all the numeric variables in Grades data set. This is a simple data set collected by a teacher concerning the scores on 5 tests for the 20 children in class. While this is a very small data set, it would be no more difficult to do the same calculations with Statlets with a much larger data set. After completing the project, and if instructed to do so, click the project report page and submit this project to your instructor.


Questions/Test   

This link allows you to take a computer scored end-of-chapter test. If your instructor requests to see the results of this examination, you can either copy and e-mail or print the feedback you will receive immediately after taking the test.

Report   

Please send a report indicating your understanding of this chapter to your instructor. You will need to know both your and your instructor's e-mail addresses.