Graphing DATA and ERROR

To display the basic equation DATA = MODEL + ERROR, we need to discuss graphic representations of the DATA and ERROR components.

Jittered Plots

A popular graph type for a single variable is called a jittered plot. Jittered plots graph each variable value along the x-axis and place a symbol above the variable value each time the variable value occurs. The symbols are randomly jittered (moved) on the y-axis so that symbols do not overlap. Directly below is a Statlet applet that constructs jittered plots and time-sequence plots. First use it to create a jittered plot for the following data: {10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 4}. Enter the 22 values in the Col_1 shown on the Input tab.

Sometimes the scroll bar for input does not show. Simply click the top or bottom right cell in the input grid to scroll the grid.

After entering all the data, click the Jittered Plot tab. Notice that the plot is most dense at the point where the x-axis value is seven.



Warning

In the discussion below uses the data set Judd 1-1 States fatality rates. You can only type data into the Statlet directly above. For a data set as large as Judd 1-1, this is not recommended. To put Judd 1-1 into the Statlets program, do the following:
You should see the Judd 1-1 data in the Data window. Now using the Analyze menu choose the "One Sample/One Variable Analysis" menus.

Histograms

In the discussion below, output from other statistics packages are illustrated in addition to Statlets. There are no frequency polygons available in Statlets, MYSTAT or SYSTAT as illustrated in the text, Historgrams will suffice. Here is the histogram using MYSTATand Judd 1-1 displaying DATA

You may want to try to create this histogram yourself using Statlets, and the directions in the list above.

Stem-and-leaf plot

The stem-and-leaf plot are shown below.
 Stem and leaf plot of variable:     RATE    , N =    50

Minimum is:        1.000
Lower hinge is:       12.000
Median is:        26.000
Upper hinge is:       39.500
Maximum is:       50.000

               0   1234
               0   557999
               1 H 22244
               1   777
               2   0000444
               2 M 88888
               3   2224
               3 H 5669999
               4   234
               4   55789
               5   0

Again, you may want to use the Judd 1-1 data and the Menu version of Statlets available in this package to recreate the Statlets version of this stem-and-leaf plot.

Box-and-Whiskers plot


Again, interested students are encouraged to use the Judd 1-1 data and the Statlets Menu Version to reproduce this plot.


Statlets Examples

All of these graph types and more are available through the Statlets applet below. Unfortunately, this stand-alone applet will not allow you to input date through the Clip In button. If you haven't already used the Menu Version of Statlets to evaluate the Judd 1-1 data, enter the data used previously to construct the frequency polygon, the jittered plot and the histogram. {10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 4}.



After entering the data, click on each of the available tabs. On the drawn graphs be sure also click on the Interpret button to read a full description of each graph.

Brain Exercise


Report Title:
Report To:
Your e-mail address:
Points 1 - 5
names:


Here are the descriptive statistics from SYSTAT

TOTAL OBSERVATIONS:     50


                     RATE

  N OF CASES               50
  MINIMUM               1.700
  MAXIMUM               5.800
  RANGE                 4.100
  MEAN                  3.572
  VARIANCE              0.736
  STANDARD DEV          0.858
  STD. ERROR            0.121
  SKEWNESS(G1)          0.699
  KURTOSIS(G2)          0.527
  SUM                 178.600
  C.V.                  0.240

Try this one with Judd 1-1.

Graphics of errors.

To graph errors using any statistical program, we first have to create them. Many statistical applications have utilities that enable users to create new variables and values from the variable values already entered in the data set. Unfortunately, Statlet's Data window does not allow you to generate new variables based on the values of other variables in the data set. You will need to generate these values yourself using a calculator. In the data set Judd1-1e.html, we have generated three error terms for the Judd 1-1 data. The variable names are now STATE, RATE, MED_ERR, MEAN_ERR, and ERR_2.2. The MED_ERR variable was calculated by forming the absolute value of each RATE value minus the median which equaled 3.45. In Excel the formula to produce MED_ERR would be =RC[-1]-3.45. The MEAN_ERR value is produced by taking each RATE, and then subtracting the mean of 3.572. In Excel the formula would be =RC[-2]-3.572. Finally, the ERR_2.2 is formed by subtracting 2.2 from each value. We will use these values in later.

Remember when these errors are finally calculated, these error values will be summed differently. For median differences, absolute values are first calculated and then summed. Mean differences are first squared and then summed.

Using Judd1-1e.html with these difference values already calculated, first copy everything from the variable names to the end of the file. Then start the Menu Version of Statlets, and using the steps above, place those values into the applet. Now graph these error values. Do the error values conform to a normal distribtion?