Methodology - Unit 10 Section 2 Page 1/5

Unit Ten - Statistics for Fun and Analysis

The following are the first two short chapters from PDQ Statistics by Norman & Streiner. This is a fun discussion, comprehensible to the non-mathematically inclined (Source: Norman & Streiner). For a more detailed discussion of statistics in social science, see Social Statistics by Hubert M. Blalock, Jr.

Chapter I - Names & Numbers: Types of Variables

There are four types of variables. Nominal and Ordinal variables consist of counts in categories and must be analysed using "non-parametric" statistics. Interval and ratio variables are actual quantitative measurements and are analysed using "parametric" methods.

Statistics provide a way of dealing with numbers. Before leaping headlong into statistical tests, it is necessary to get some idea of how these numbers come about, what they represent, and the various forms the numbers can take.

Let's begin by examining a simple experiment. Suppose an investigator has a hunch that clam juice is an effective treatment for the misery of psoriasis. He proceeds to assemble a group of patients, randomizes them to a treatment control group and gives clam juice to his treatment group and something which looks, smells and tastes like clam juice, but isn't, to his control group. After a few weeks, he measures the extent of psoriasis on the patients, perhaps estimating the percent of the body involvement, or by looking at the change size of a particular lesion. He then proceeds to do some number-crunching to determine if clam juice is as good as he hopes it is.

Let's have a closer look at the data from this experiment. To begin with, there are at least two variables. A definition of the term variable is a little hard come up with, but basically it relates to anything which is measured or manipulated in the study. The most obvious variable in the experiment is the measurement of the extent of psoriasis. It is pretty evident that this is something which be measured. A less obvious variable is the nature of treatment--drug or placebo. Although it is less evident how you might convert this to a number, still it is clearly something which is varied in the course of the experiment.

A few more definitions are in order. We frequently speak of independent and dependent variables. In an experiment, the independent variables are those which are varied by, and under the control of, the experimenter, and the dependent variables are those which respond to the experimental manipulation. In the present example, the independent variable is the type of therapy--clam juice or placebo--and the dependent variable is the size of lesions or body involvement. Although in this example the identification of independent and dependent variables is straightforward, the distinction is not always so obvious. Frequently researchers must rely on natural variation in both types of variables and look for a relationship between the two. For example, in looking for a relation between smoking and lung cancer, an ethics committee would probably take a dim view of ordering a thousand children to smoke a pack a day for 20 years. Instead, the investigator must simply look for a relationship between smoking and cancer in the general population and assume smoking is the independent variable and lung cancer is the dependent variable; that is, the extent of lung cancer depends on variations in smoking.

There are other ways of defining types of variables which turn out to be essential in determining the ways the numbers will be analyzed. Variables are frequently classified as nominal, ordinal, interval, or ratio. A nominal variable is simply a named category. Our clam juice vs. placebo is one such variable as is the sex of the patient, or the diagnosis given to a group of patients.

An ordinal variable is a set of ordered categories. A common example in the medical literature is the subjective judgment of disease staging in cancer, using categories such as stage 1, II, or III. Although we can safely say that stage ll is worse than stage I, and better than stage III, we don't really know by how much.

The other kinds of variables consist of actual measurements on individuals, such as height, weight, blood pressure, or serum electrolytes. Statisticians distinguish between interval variables, where the interval between measurements is meaningful (e.g., 38 - 32 Celsius), and ratio variables, where the ratio of the numbers has some meaning. Having made the distinction, they then go and analyse them all the same anyway. The important distinction is that these variables are measured quantities, unlike nominal and ordinal variables that are qualitative in nature.

So where does the classification lead us? The important distinction is between the nominal and ordinal variables on one hand, and the interval and ratio variables on the other. It makes no sense to speak of the average value of a nominal or ordinal variable--the average sex of a sample of patients or, strictly speaking, the average disability expressed on an ordinal scale. However, it is sensible to speak of the average blood pressure or average height of a sample of patients. For nominal variables, all we can really deal with is the number of patients in each category. Statistical methods applied to these two broad classes of data are very different. For measured variables, it is generally assumed that the data follow a bell curve and that the statistics focus on the center and width of the curve. These are the so-called parametric statistics. By contrast, nominal and ordinal data consist of counts of people or things in different categories, and a different class of statistics, called non-parametric statistics (obviously!) is used in dealing with these data.

EXAMPLE

To examine a program for educating health professionals in a sports injury clinic about the importance of keeping detailed records, a researcher does a controlled trial in which the dependent variable is range of motion of injured joints, which is classified as (a) worse, (b) same, or (c) better, and the independent variable is (a) program or (b) no program.

Question
What kind of variables are they--nominal or ordinal? Are they appropriate?

Answer
The independent variable is nominal, and the dependent variable, as stated, is ordinal. However, there are two problems with the choice. First, detailed medical records may be a good thing and may even save some lives somewhere. But range of motion is unlikely to be sensitive to changes In recording behavior. A better choice would be some rating of record quality. Second, range of motion Is a nice ratio variable. To shove it into three categories is just throwing away information.

Remember:

Dependent variables should be sensible. Ideally, they should be clinically important, but also related to the independent variable.
In general, the amount of information increases as one goes from nominal to ratio. Classifying good ratio measures into large categories is akin to throwing away data.

Back to Unit Overview

Introduction PDQ Cont. PDQ Cont. PDQ Cont. PDQ Cont.
Home TOC Directory Syllabus Schedule Dialogues Student Profiles Help Useful Links Contact Info