Overview
For help with research and writing
here is an excellent link to the EOU library. Tap
here.
Who's going to win the next baseball game? Who is going to
win the election? What are my chances of getting pregnant?
There are parameters to each of the above questions which have
very concrete (discrete) characteristics. For example, if the
opposing team is really made up of the volleyball who would you
think might win. If the opposing candidate is not legally certified
to run in this election, guess who wins (well...most likely win).
If you carry an X and Y chromosome your chances of getting pregnant
are zero. In all of the above, the question alludes to prediction.
That is, taking some variables and trying to predict an outcome
given these known conditions.
Science is not only interested in providing an observation of phenomenon, but also a means of predicting what might occur given certain variables. Swarms of earthquakes under Mt. St. Helens might suggest an eruption, how many? at what depth, and at what intensity, might be the variables attached to making a forecast for an eruption. One way that statistical predictions are accomplished is via a correlation. Seeing how well something of one thing (the Y variable) fits against another thing (the x variable). Appendix F describes Correlation-Regression Analysis.
Pearson Product Moment Coefficient of Correlation
What is the Pearson Product Moment Coefficient of Correlation?
Basically it is a way to get a number, a Coefficient, which suggests
how correlated (or together man) the variables are to one another.
In other words how strong is the relationship between two variables,
how well does one predict the other. Of course none of this is
as deep as
29 dimensions of compatibility but for geographic field work,
this method is pretty popular. Specially when you have data that
you want to analyze for its ability to predict phenomenon. Take
a look at the Pearson Product Moment Coefficient of Correlation
formula in the text. It looks somewhat complicated but it really
is not. Its just procedural steps that bring you to a number,
the number being the Coefficient of Correlation. This statistic
requires two groups of data which, through the process of doing
the statistic are measured twice.
Basically it works like this. The bottom line is the Coefficient of Correlation number. Whatever that number turns out to be positive or negative means something about the correlation. The correlation being, for example... if a sample scores high, on one measure what chance or tendency is there for that sample to score high in the other category or measure. What happens if they score low? Will they score low in the other category? If these questions are yes, that is, if one goes up and the other goes up, OR if one goes down and the other goes down there is a POSITIVE or DIRECT relationship and the Coefficient of Correlation number will be a positive number. On the other hand if one measure goes up and the other down...or if one goes down and the other up, the relationship is said to be negative or inverse. So the Coefficient of Correlation number itself runs from -1 to +1 or just plain 1. Further, +1 and negative 1 are called perfect correlations, which means that for any increase or decrease in a score there will be a perfectly matched increase or decrease in the matched score. If you get a zero....it means there is no relationship at all between the scores.
| X | Y |
| 1 | 5.0 |
| 2 | 4.8 |
| 3 | 4.6 |
| 4 | 4.4 |
| 5 | 4.2 |
| 6 | 4.0 |
| 7 | 3.8 |
| 8 | 3.6 |
| 9 | 3.4 |
| 10 | 3.3 |
The scores above are that of a correlation
of -1. Its a perfect correlation because for every change in score,
the change is the same, a change of .20 points. Both X and Y are
scores here. In addition can you see the relationship of the negative
factor?
The higher the X score the Lower the Y score. This derives the
negative to the -1 correlation. Figure 6.1 6.2 and 6.3 show this
relationship spatially with a scatter diagram. The method of calculation
in the book is given for a mean of zero. Also keep in mind that
it is suggested that the Pearson Product Moment Coefficient works
best with about 30 observations and that the observations maintain
a state of homoscedasticity. This word means that the X and Y
scores are not skewed and fit relatively well to a line, or the
distribution tends to be linear. Other kinds of distributions,
U shaped, for example would not work with this statistic.
Regression
Regression, a notion directly related to my age and the number
of teddy bears in my office? Hey, don't mess with my bears! Actually it sounds
strange but this kind of regression refers to dots being moved
back (regressed) theoretically to a line. The idea of regression
is once again a means of prediction. It suggests how does the
distribution of a set of data measure up to overall tendency.
What is its collective relationship. The text shows in figure
6.3 this relationship "fitted" to a line. The line carries
with it the characteristic of slope, either positive or negative.
The formula for regression analysis, (simple single regression)
is found in Appendix F. You can also interact with this process
automatically at this amazing site where you can see this line
"fit" instantly as you supply the Y and X variables
TAP
HERE.
When you look at a scatter plot keep in mind that points are bivariate. That means each point presents values on both the Y and X axis. The line of regression is the point where this distribution of bivariate data points lay closest to the line. The formula Y = bX+a is interpreted as Y= to the predicted value where the point will locate at; b is the slope of the regression line; a is the point on the Y axis that is also known as the Y intercept. To compute b and a see Appendix F. The way the prediction works is, once the line is established, plug in any value of X and you will generate a Y value, that is predict a value of Y.
(hand/mail
these in using the GAP).
| 6. Analysis of Data |
| 1. Draw three simple sample
plots showing a correlation of 1, -1, and 0. Label each plot with the data correlation. (Let 0 be the point of intersection of the Y and X axis) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2. What would the predicted
values of Y be if X were 20 Use this formula for the regression line: Y= 1.35X + 4.20 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3. Draw sample plots showing
the appropriate axis you would find for the following variables
given their (assumed) correlation positive or negative. Label
X and Y axis, state the subject with each graph and show and
state the correlation. House Values and Closeness to Inter city regions of disparity Bee Population to the number of Varroa Mite Infestation Summer Rainfall and Tree Growth Snowfall at Mountain Resorts and Ski Revenue Lake Trout production to the lake's food supply |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
4. Use the following data set to compute a Pearson Product Moment Coefficient of Correlation, show your calculations. Use the following data table for your data processing.
r= What does the correlation suggest given these data? |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5. Use the above data to calculate the regression formula. |
END OF ASSIGNMENT 1.