Unit 3
Quantitative Surveys
Environmental Research 401 DDE 2004
GEOG 401, 1-5 Credit
Instructor: Dr. M. Mustoe, Eastern Oregon University
This syllabus can be found at: http://www.eou.edu/~mmustoe/sgeog410.html
INSTRUCTOR CONTACT: My office is Zabel 203 GIS Lab. E-MAIL me at on internet at: mmustoe@eou.edu.
(EOU ACCOUNTS ONLY)
Telephone 541- WOodland-2 3502. Office Hours: 2:30- 4 PM Pacific Time, or by appointment.
Tap here for Geography In The EAO Catalog. In Class: EXAM SCHEDULES


Overview
For help with research and writing here is an excellent link to the EOU library. Tap here.
Who's going to win the next baseball game? Who is going to win the election? What are my chances of getting pregnant?
There are parameters to each of the above questions which have very concrete (discrete) characteristics. For example, if the opposing team is really made up of the volleyball who would you think might win. If the opposing candidate is not legally certified to run in this election, guess who wins (well...most likely win). If you carry an X and Y chromosome your chances of getting pregnant are zero. In all of the above, the question alludes to prediction. That is, taking some variables and trying to predict an outcome given these known conditions.

Science is not only interested in providing an observation of phenomenon, but also a means of predicting what might occur given certain variables. Swarms of earthquakes under Mt. St. Helens might suggest an eruption, how many? at what depth, and at what intensity, might be the variables attached to making a forecast for an eruption. One way that statistical predictions are accomplished is via a correlation. Seeing how well something of one thing (the Y variable) fits against another thing (the x variable). Appendix F describes Correlation-Regression Analysis.

Pearson Product Moment Coefficient of Correlation
What is the Pearson Product Moment Coefficient of Correlation? Basically it is a way to get a number, a Coefficient, which suggests how correlated (or together man) the variables are to one another. In other words how strong is the relationship between two variables, how well does one predict the other. Of course none of this is as deep as 29 dimensions of compatibility but for geographic field work, this method is pretty popular. Specially when you have data that you want to analyze for its ability to predict phenomenon. Take a look at the Pearson Product Moment Coefficient of Correlation formula in the text. It looks somewhat complicated but it really is not. Its just procedural steps that bring you to a number, the number being the Coefficient of Correlation. This statistic requires two groups of data which, through the process of doing the statistic are measured twice.

Basically it works like this. The bottom line is the Coefficient of Correlation number. Whatever that number turns out to be positive or negative means something about the correlation. The correlation being, for example... if a sample scores high, on one measure what chance or tendency is there for that sample to score high in the other category or measure. What happens if they score low? Will they score low in the other category? If these questions are yes, that is, if one goes up and the other goes up, OR if one goes down and the other goes down there is a POSITIVE or DIRECT relationship and the Coefficient of Correlation number will be a positive number. On the other hand if one measure goes up and the other down...or if one goes down and the other up, the relationship is said to be negative or inverse. So the Coefficient of Correlation number itself runs from -1 to +1 or just plain 1. Further, +1 and negative 1 are called perfect correlations, which means that for any increase or decrease in a score there will be a perfectly matched increase or decrease in the matched score. If you get a zero....it means there is no relationship at all between the scores.

 X
 1 5.0 
 2 4.8 
 3 4.6 
 4 4.4 
 5 4.2 
 6 4.0
 7 3.8
 8 3.6
 9 3.4 
 10 3.3 

The scores above are that of a correlation of -1. Its a perfect correlation because for every change in score, the change is the same, a change of .20 points. Both X and Y are scores here. In addition can you see the relationship of the negative factor?
The higher the X score the Lower the Y score. This derives the negative to the -1 correlation. Figure 6.1 6.2 and 6.3 show this relationship spatially with a scatter diagram. The method of calculation in the book is given for a mean of zero. Also keep in mind that it is suggested that the Pearson Product Moment Coefficient works best with about 30 observations and that the observations maintain a state of homoscedasticity. This word means that the X and Y scores are not skewed and fit relatively well to a line, or the distribution tends to be linear. Other kinds of distributions, U shaped, for example would not work with this statistic.

Regression
Regression, a notion directly related to my age and the number of teddy bears in my office? Hey, don't mess with my bears! Actually it sounds strange but this kind of regression refers to dots being moved back (regressed) theoretically to a line. The idea of regression is once again a means of prediction. It suggests how does the distribution of a set of data measure up to overall tendency. What is its collective relationship. The text shows in figure 6.3 this relationship "fitted" to a line. The line carries with it the characteristic of slope, either positive or negative. The formula for regression analysis, (simple single regression) is found in Appendix F. You can also interact with this process automatically at this amazing site where you can see this line "fit" instantly as you supply the Y and X variables TAP HERE.

When you look at a scatter plot keep in mind that points are bivariate. That means each point presents values on both the Y and X axis. The line of regression is the point where this distribution of bivariate data points lay closest to the line. The formula Y = bX+a is interpreted as Y= to the predicted value where the point will locate at; b is the slope of the regression line; a is the point on the Y axis that is also known as the Y intercept. To compute b and a see Appendix F. The way the prediction works is, once the line is established, plug in any value of X and you will generate a Y value, that is predict a value of Y.

 


(hand/mail these in using the GAP).

UNIT THREE READING
6. Analysis of Data



Discussion

UNIT ONE ESSAY QUESTIONS
1. Draw three simple sample plots showing a correlation of 1, -1, and 0.
Label each plot with the data correlation. (Let 0 be the point of intersection of the Y and X axis)
2. What would the predicted values of Y be if X were 20
Use this formula for the regression line: Y= 1.35X + 4.20
3. Draw sample plots showing the appropriate axis you would find for the following variables given their (assumed) correlation positive or negative. Label X and Y axis, state the subject with each graph and show and state the correlation.

House Values and Closeness to Inter city regions of disparity
Bee Population to the number of Varroa Mite Infestation
Summer Rainfall and Tree Growth
Snowfall at Mountain Resorts and Ski Revenue
Lake Trout production to the lake's food supply

4. Use the following data set to compute a Pearson Product Moment Coefficient of Correlation, show your calculations. Use the following data table for your data processing.

 OBS  X  Y  X Squared  Y Squared XY
 1  16  14      
 2  14  17      
 3  13  11      
 4  10  5      
 5  8  8      
 6  7  15      
 7  5  6      
 8  4  9      
 9  2  2      
 10  1  3      
 Sum          

r=

What does the correlation suggest given these data?

5. Use the above data to calculate the regression formula.

END OF ASSIGNMENT 1.