Unit Ten - Statistics for Fun and Analysis
The following are the first two short chapters from PDQ Statistics by Norman & Streiner. This is a fun discussion, comprehensible to the non-mathematically inclined (Source: Norman & Streiner). For a more detailed discussion of statistics in social science, see Social Statistics by Hubert M. Blalock, Jr. |
||
Chapter I - Names & Numbers: Types of Variables There are four types of variables. Nominal and Ordinal variables consist of counts in categories and must be analysed using "non-parametric" statistics. Interval and ratio variables are actual quantitative measurements and are analysed using "parametric" methods. Statistics provide a way of dealing with numbers. Before leaping headlong into statistical tests, it is necessary to get some idea of how these numbers come about, what they represent, and the various forms the numbers can take. Let's begin by examining a simple experiment. Suppose an investigator has a hunch that clam juice is an effective treatment for the misery of psoriasis. He proceeds to assemble a group of patients, randomizes them to a treatment control group and gives clam juice to his treatment group and something which looks, smells and tastes like clam juice, but isn't, to his control group. After a few weeks, he measures the extent of psoriasis on the patients, perhaps estimating the percent of the body involvement, or by looking at the change size of a particular lesion. He then proceeds to do some number-crunching to determine if clam juice is as good as he hopes it is. Let's have a closer look at the data from this experiment. To begin with, there are at least two variables. A definition of the term variable is a little hard come up with, but basically it relates to anything which is measured or manipulated in the study. The most obvious variable in the experiment is the measurement of the extent of psoriasis. It is pretty evident that this is something which be measured. A less obvious variable is the nature of treatment--drug or placebo. Although it is less evident how you might convert this to a number, still it is clearly something which is varied in the course of the experiment. A few more definitions are in order. We frequently speak of independent and dependent variables. In an experiment, the independent variables are those which are varied by, and under the control of, the experimenter, and the dependent variables are those which respond to the experimental manipulation. In the present example, the independent variable is the type of therapy--clam juice or placebo--and the dependent variable is the size of lesions or body involvement. Although in this example the identification of independent and dependent variables is straightforward, the distinction is not always so obvious. Frequently researchers must rely on natural variation in both types of variables and look for a relationship between the two. For example, in looking for a relation between smoking and lung cancer, an ethics committee would probably take a dim view of ordering a thousand children to smoke a pack a day for 20 years. Instead, the investigator must simply look for a relationship between smoking and cancer in the general population and assume smoking is the independent variable and lung cancer is the dependent variable; that is, the extent of lung cancer depends on variations in smoking. There are other ways of defining types of variables which turn out to be essential in determining the ways the numbers will be analyzed. Variables are frequently classified as nominal, ordinal, interval, or ratio. A nominal variable is simply a named category. Our clam juice vs. placebo is one such variable as is the sex of the patient, or the diagnosis given to a group of patients. An ordinal variable is a set of ordered categories. A common example in the
medical literature is the subjective judgment of disease staging in cancer, using
categories such as stage 1, II, or III. Although we can safely say that stage ll is worse
than stage I, and better than stage III, we don't really know by how much. So where does the classification lead us? The important distinction is between the nominal and ordinal variables on one hand, and the interval and ratio variables on the other. It makes no sense to speak of the average value of a nominal or ordinal variable--the average sex of a sample of patients or, strictly speaking, the average disability expressed on an ordinal scale. However, it is sensible to speak of the average blood pressure or average height of a sample of patients. For nominal variables, all we can really deal with is the number of patients in each category. Statistical methods applied to these two broad classes of data are very different. For measured variables, it is generally assumed that the data follow a bell curve and that the statistics focus on the center and width of the curve. These are the so-called parametric statistics. By contrast, nominal and ordinal data consist of counts of people or things in different categories, and a different class of statistics, called non-parametric statistics (obviously!) is used in dealing with these data. EXAMPLE To examine a program for educating health professionals in a sports injury clinic about the importance of keeping detailed records, a researcher does a controlled trial in which the dependent variable is range of motion of injured joints, which is classified as (a) worse, (b) same, or (c) better, and the independent variable is (a) program or (b) no program. Question Answer Remember: |