Limitations of Statistics

College and university buildings are usually named after the individual(s) who provide all or most of the money needed to design and build them. On rare occasions, however, a building is named in honor of a professor. That’s the case with the Lindquist Center at the University of Iowa. It is named after E. F. Lindquist, a teacher and scholar who made major contributions to the fields of statistics and testing.

In one of the books Lindquist authored, he provided some sage advice to those who analyze data with statistical tools and to those who read or hear the research-based claims made by those who have analyzed data statistically. Here is what Lindquist said:

“Sound statistical judgment involves a keen appreciation of the inherent LIMITATIONS of statistical techniques and of the original data to which they are applied. In the derivation of these techniques, assumptions are frequently made which cannot be satisfied completely in practical applications. The failure to satisfy these conditions necessitates many qualifications in the interpretations of the results obtained.”

In the middle sentence of this passage, notice that Lindquist points out that important assumptions (concerning data and analytic tools) frequently are not satisfied in studies conducted out in the “real world.” As a consequence of these assumptions being violated, Lindquist then asserts, research findings need to be qualified. Being aware of the LIMITATIONS of statistics, he argues, is necessary for sound statistical judgment.

Unfortunately, many applied researchers who publish research reports based on the statistical analysis of numerical data pay little or no attention to the limitations of their data and of the statistical tools they use. Theoretically, the review process used by good journals is supposed to prevent the publication of articles lacking the “sound statistical judgment” called for by Lindquist. In practice, however, not-so-good articles sometimes slip through the review process.

When reading or listening to the summary of a statistically-based research investigation, be vigilant and try to discern whether or not the researcher(s) who conducted the investigation used what Lindquist referred to as “sound statistical judgment.” If so, be more inclined to be influenced by the study’s finding(s). If not, resist the temptation to believe all you read or hear simply because it’s a summary of research.

Leave a comment

Filed under Famous People, Mini-Lessons, Quotes


Misconception #5

Imagine that each of N=6 men has a hat. Also imagine that these hats are identical except that each man’s name is written inside his hat. Finally, imagine that the 6 hats are taken up and then later, because they look alike, randomly returned to the men.

As the 6 hats are returned to the 6 men, there’s a chance that no man will receive his own hat.  The chance of this happening is a tad greater than 1 in 3. To be more precise, the probability (to 3 decimal places) of all 6 hats going to the wrong individuals is .368.

Now, let’s add a new wrinkle to this  imaginary situation. Suppose the number of men (each with a hat) is greater than 6. What if there are 7 men? Or 8? Or more? As N increases, what happens to the probability that no hat will be returned to its proper owner? Some people guess that this probability goes up as N increases. Others guess that this probability goes down.

Both thoughts are wrong.

That’s because the likelihood of no correct “match” is virtually the same for any N > 5, whether N = 6 or N = 600 or N = 600,000!

The actual probability (p) of having no hat returned to its proper owner is given by this formula:

p  =  1/(2!)  –  1/(3!)  +  1/(4!)  –  1/(5!)  +  . . .

where there are N-1 terms on the right side of the equation. With the symbol “!” standing for “factorial,” we could rewrite the above formula as

p  =  1/2  –  1/6  +  1/24  –  1/120  +  . . .

As either of the above formulas shows, additional terms on the right side of the equation have a smaller and smaller impact on the value of p. Moreover, the drop-off of this impact is sharp, not gradual. This fact is made clear by the following chart showing the value of p, to 6 decimal places, for the case where N = 2, 3, 4, … , 10.

N = 2     p = .500000

N = 3     p = .333333

N = 4     p = .375000

N = 5     p = .366666

N = 6     p = .368054

N = 7     p = .367857

N = 8     p = .367882

N = 9     p = .367879

N = 10     p = .367879

It should be noted that this puzzle question is sometimes referred to as “Montmort’s Problem.” Montmort was a Frenchman who studied the probability behind a game called “Treize.” (Treize is the French word for 13.) In its original form, the puzzle question dealt with a jar containing identical balls numbered 1, 2, 3, … , 13. If balls are randomly pulled out of the jar, one at a time, the puzzle question was stated like this: “What’s the probability that the 1st ball taken from the jar will not be the ball numbered 1, that the 2nd ball will not be the ball numbered 2, and so on, with the end result being that no number on any ball matches the order in which the ball is removed from the jar?”



Leave a comment

Filed under Mini-Lessons, Misconceptions, Puzzles/Games



(The following effort at statistical humor comes from S. Huck)


True to the weather forecast, the college campus was being blasted by a heavy snowfall. Inside a small dining hall, students were eating, studying, talking, & texting. Suddenly, Bubba darted outside where he scooped up some of the white stuff, packed it together in his hands, and then quickly returned inside. Getting everyone’s attention, Bubba held up the cold, white sphere he had just made and said: “Hey, I learned about this in my stats course. Guess what it is?”

BUBBA’S ANSWER: “A snowball sample!”

*     *     *     *     *     *     *     *     *     *

Beyond the Joke: Things worth knowing about snowball samples:

1. Definition: A snowball sample is formed during the time period when people are being recruited to serve as a study’s research participants. Through face-to-face contact or indirect methods (such as posted notices), the researcher successfully solicits certain individuals to voluntarily enter the study. Next, those initial volunteers are asked to recruit additional participants. This process—of existing volunteers recruiting new volunteers—continues until the desired sample size has been achieved.

2. Idea Behind the Name: Imagine a snowball rolling down a steep, snow-covered hill. At first, the snowball is small. But it gets larger and larger as it heads toward to the bottom of the hill. In a similar fashion, a snowball sample grows in size as volunteer participants successfully recruit additional participants.

3. When Used: Snowball samples are used mainly in studies wherein (1) the researcher doesn’t know who the potential participants are or how to contact them, or (2) potential volunteers are more likely to agree to be in a study if they are recruited by a “peer” rather than by an unknown researcher.

4. Example: In a research report entitled “The ‘Staying Safe’ Intervention: Training People Who Inject Drugs in Strategies to Avoid Injection-Related HCV and HIV Infections” (from the journal: AIDS EDUCATION AND PREVENTION), the researchers stated that “Snowball sampling of participants began with eight participants directly recruited from two sources…. These eight participants then recruited 60 eligible peers.”

5. Quality: Because of the way snowball samples are formed, it is difficult to generalize information about them to larger populations. (Such generalizations are much easier to make with stratified random samples and other kinds of samples classified as “probability samples.”) Thus, snowball samples are most useful in studies wherein (1) the goal is to generate rather than confirm hypotheses or (2) the participants, collectively, are considered to be the target group of interest.

Leave a comment

Filed under Jokes & Humor, Mini-Lessons


“In our case in Rwanda, we are determined to engender a statistics culture…. We are not only talking of professional statisticians in central statistics bureaus; rather, a whole range of policy makers, business operators, civil society, and indeed, engendering a culture of statistics across the board.”

(From a 2007 speech made by His Excellency Paul Kagame, President of the Republic of Rwanda)

Leave a comment

Filed under Importance of Statistics, Quotes


Worthy Point-of-View

The following passage appeared in a recent article in The New York Times.

“The rising stature of statisticians [comes as] a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore…. It is the size of the data sets on the Web that opens new worlds of discovery. Traditionally, social sciences tracked people’s behavior by interviewing or surveying them. ‘But the Web provides this amazing resource for observing how millions of people interact,’ said Jon Kleinberg, a computer scientist and social networking researcher at Cornell.”

To read the full NYT article (entitled “For Today’s Graduate, Just One Word: Statistics”), go to

Leave a comment

Filed under Importance of Statistics


Important Number (.7071)

If data exist on 2 variables (X & Y), the square of the correlation coefficient is called the “coefficient of determination.” This latter coefficient, if multiplied by 100, indicates the % of variability in either variable that’s associated with (or explained by) variability in the other variable.

For example, if r = .80, 64% of the variability in X is associated with variability in Y. Or, if r = –.40, 16% of the variability in X is associated with variability in Y.

To have at least 50% “explained variability,” the correlation must exceed ±.7071.

This number, .7071, is worth remembering because many researchers report that a correlation is “moderate” or has “medium strength” if r is near ±.50. In reality, such correlations are not so strong; they indicate that only about 25% of the variability in Y is associated with variability in X.

Leave a comment

Filed under Important Numbers


Baseball Pitcher


In major league baseball, pitchers usually throw four pitches. Three of these are the typical fastball, the ordinary slider, and the conventional knuckleball. The fourth kind of pitch? It’s the familiar kind that starts low to the left, then gradually goes up, and finally drifts downward slowly to the right.


What’s the name of this fourth kind of pitch?


It’s called the normal curve!

(NOTE: This little effort at statistical humor comes from S. Huck)

Leave a comment

Filed under Jokes & Humor