Category Archives: Jokes & Humor


The extrapolation in the accompanying cartoon (from Randall Monroe’s website, is rediculous. No one would ever do that kind of silly data-based projection into the future. In many areas of our daily lives, however, people make unjustified predictions based on existing, accurate data. Consider these 2 examples, one dealing with the stock market and the other concerning survey research.

How do people tend to invest their money in the stock market? From controlled experiments as well as from observational studies, the findings are the same. When the stock market has been doing well, most people are “bullish” and want to invest more. In contrast, when there’s been a recent drop in stock values, the typical investor gets “bearish” and wants to sell. The term extrapolation bias has been coined to describe cases like these wherein people think that the future will be a continuation of the past. You, too, possess this bias if you make short-term predictions that fail to consider (1) the variability of data points used to form a “trend line” and (2) the possibility that a trend line can change its direction and, for example, begin to angle down even though it has been angling up.

If you receive a mailed or online survey, do you fill it out and send it in? If you do, you help to increase the survey’s response rate: the percentage of contacted people who complete and return the survey. In a recent research study (, the response rate was only 8.41%. Despite receiving completed surveys from just 241 of the 2,865 people initially contacted, the researcher extrapolated the study’s findings to all of the individuals to whom the survey was sent. This is an unjustified thing to do because “nonrespondents” may well be different from those from whom data are collected. Most likely, we all are guilty of this kind of extrapolation-beyond-the-data. We hear opinions expressed by trusted friends, relatives, co-workers, neighbors, bloggers, or TV analysts, and then we presume that others have the same thoughts. ’Tis a risky thing to do!


Leave a comment

Filed under Jokes & Humor, Mini-Lessons




Year after year, the annual statistical convention is held in the same city. This metropolis has 26 pubs, each named by an alphabet letter: A, B, C, … , Y, Z. The statisticians who are nice, kind, & considerate people go to a wide variety of these “drinking holes.” The mean statisticians, however, patronize just one of them. Which one?


(This little effort at statistical humor comes from S. Huck)

*   *   *   *   *   *   *   *   *

Beyond the Joke:

In statistics, the concept or numerical value of the arithmetic mean can be symbolized in various ways.

Many people use the letter M (often capitalized and italicized) to represent the arithmetic mean. A second way to do this is with the lower-case Greek letter, mu. (In several textbooks, M is used to represent the sample mean whereas mu designates the population mean.)

A third way to symbolize the arithmetic mean is with the letter X accompanied by a short, horizontal line positioned directly above the X. This line is referred to as a “bar,” and the entire symbol is read as “X-bar.”

Symbols for mean

In written statistical discussions, a bar can be positioned above letters (or symbols) other than the letter X. When this occurs, the bar indicates that the arithmetic mean has been (or should be) computed for the various numerical values of the variable represented by whatever letter or symbol has the bar above it. For example, if you see the lower-case letter r with a bar above it, you should refer to it as “r-bar” and guess that it represents the mean of a group of correlation coefficients.  

Leave a comment

Filed under Jokes & Humor, Mini-Lessons



(The following effort at statistical humor comes from S. Huck)


True to the weather forecast, the college campus was being blasted by a heavy snowfall. Inside a small dining hall, students were eating, studying, talking, & texting. Suddenly, Bubba darted outside where he scooped up some of the white stuff, packed it together in his hands, and then quickly returned inside. Getting everyone’s attention, Bubba held up the cold, white sphere he had just made and said: “Hey, I learned about this in my stats course. Guess what it is?”

BUBBA’S ANSWER: “A snowball sample!”

*     *     *     *     *     *     *     *     *     *

Beyond the Joke: Things worth knowing about snowball samples:

1. Definition: A snowball sample is formed during the time period when people are being recruited to serve as a study’s research participants. Through face-to-face contact or indirect methods (such as posted notices), the researcher successfully solicits certain individuals to voluntarily enter the study. Next, those initial volunteers are asked to recruit additional participants. This process—of existing volunteers recruiting new volunteers—continues until the desired sample size has been achieved.

2. Idea Behind the Name: Imagine a snowball rolling down a steep, snow-covered hill. At first, the snowball is small. But it gets larger and larger as it heads toward to the bottom of the hill. In a similar fashion, a snowball sample grows in size as volunteer participants successfully recruit additional participants.

3. When Used: Snowball samples are used mainly in studies wherein (1) the researcher doesn’t know who the potential participants are or how to contact them, or (2) potential volunteers are more likely to agree to be in a study if they are recruited by a “peer” rather than by an unknown researcher.

4. Example: In a research report entitled “The ‘Staying Safe’ Intervention: Training People Who Inject Drugs in Strategies to Avoid Injection-Related HCV and HIV Infections” (from the journal: AIDS EDUCATION AND PREVENTION), the researchers stated that “Snowball sampling of participants began with eight participants directly recruited from two sources…. These eight participants then recruited 60 eligible peers.”

5. Quality: Because of the way snowball samples are formed, it is difficult to generalize information about them to larger populations. (Such generalizations are much easier to make with stratified random samples and other kinds of samples classified as “probability samples.”) Thus, snowball samples are most useful in studies wherein (1) the goal is to generate rather than confirm hypotheses or (2) the participants, collectively, are considered to be the target group of interest.

Leave a comment

Filed under Jokes & Humor, Mini-Lessons


Baseball Pitcher


In major league baseball, pitchers usually throw four pitches. Three of these are the typical fastball, the ordinary slider, and the conventional knuckleball. The fourth kind of pitch? It’s the familiar kind that starts low to the left, then gradually goes up, and finally drifts downward slowly to the right.


What’s the name of this fourth kind of pitch?


It’s called the normal curve!

(NOTE: This little effort at statistical humor comes from S. Huck)

Leave a comment

Filed under Jokes & Humor



“What is a two-tailed test?”

ANSWER FROM BUBBA (who came to class with a severe hangover):

“It’s when you’re forced to sit down and write an essay on the book by Charles Dickens in which he reveals the best and worst times to go visit London and Paris.”

(NOTE: This little effort at statistical humor comes from S. Huck)

Tail Not Tale

Leave a comment

Filed under Jokes & Humor


Regression Line 2 (real people)

QUESTION: What do you call a group of middle-aged adults standing in an orderly fashion waiting for the start of a store’s sale of bikini swim suits, muscle shirts, & tattoos?

ANSWER: A regression line

(NOTE: This little effort at statistical humor comes from S. Huck)

Beyond the Joke:

In statistics, a regression line is typically used to predict a person’s status on a criterion variable of interest. This prediction is based upon that person’s status on a different variable that hopefully serves as a good predictor. For example, a regression line could be used to predict how high a college applicant’s GPA will be at the end of his or her first year in college based upon that person’s score on a college entrance exam. Or, a regression line could be used to predict a person’s systolic blood pressure based upon his or her BMI (body mass index).

To develop a regression line, a group of people initially must be measured on both the predictor variable and the criterion variable. After these data are used to identify the regression line, predictions can be made for new people who have scores on the predictor variable but are yet to be measured on the criterion variable. (It’s important, of course, that the new people for whom predictions are made do not differ in dramatic ways from those used to develop the regression line.)

Using data from the initial group of people, a regression line can be displayed visually in a bivariate scatter plot. In such a picture, the criterion variable (often called the dependent variable) is positioned on Y-axis while the predictor variable (typically called the independent variable) is put on the X-axis. Each data point indicates a given individual’s scores on the two variables. To illustrate, the following scatter plot shows data for a hypothetical group of 19 students measured in terms of how long they studied for an essay exam and how well they did once the exam was scored.

Regression Line for Blog (19 dots)

The position of a regression line in this or any other a scatter plot is determined by analyzing the data to identify two properties of the sought-after line: its slope and the place where the line passes through the Y-axis. These two features of the regression line are computed so as to minimize the vertical distances between the data points and the line. Because of this, the regression line, once determined, is considered to be the “best-fitting straight line.”

To use the regression line for predictive purposes, a new person is identified for whom there exists a score only on the predictor variable. First, that score is located on the scatter plot’s X-axis. We start at that point, move in a vertical direction until reaching the regression line, and then move to the left in a horizontal fashion until ending up on the Y-axis. That “destination point” represents the predicted score on the criterion variable. For example, we’d predict that a new student who studies only 1 hour for the exam will receive an exam score of 2.

Instead of making predictions via the scatter plot’s regression line, it’s possible to accomplish the same objective by using a formula. Any regression line can be converted into a formula that has this form:

predicted Y-score  =  Y-intercept + (slope)(observed X-score)

Using this formula, we predict that a person who studies 1 hour will earn an exam score equal to 1.5 + (0.5)(1) = 2.

The degree to which the regression line can make accurate predictions is influenced by the correlation between the predictor and criterion variables. To the extent that the correlation is high, data points in the scatter diagram will lie closer to the regression line, thus increasing predictability (so long as the “new” people for whom predictions are made resemble those in the initial group). Typically, the square of the correlation coefficient is used as an indicator as to how well the regression line will work. For the data in the accompanying scatter plot, r = 0.50 and r-squared = 0.25. This means that 25% of the variability in the exam scores is associated with (i.e., explained by) study time.

If you’d like a watch a good, 6-minute tutorial on the basic concept of a regression line and how it helps with prediction, click this link:

Leave a comment

Filed under Jokes & Humor, Mini-Lessons


Big Data

One day, Bubba walked into his stats class carrying 10 huge posters, each with a single-digit number on it. The numbers, all different, ranged from 0 to 9. Upon seeing Bubba’s posters, Professor Garcia asked in a loud voice: “Why in the world, Bubba, did you bring those giant posters to class?” Without hesitation, Bubba responded confidently: “I thought these oversized numbers would help us, Doc, because the course syllabus says that today we’ll be dealing with ‘Big Data.’ ”

 (Note: This little effort at statistical humor comes from S. Huck)

Beyond the Joke: The phrase “Big Data” is a technical term that refers to data sets so gigantic that special tools are required to store, analyze, and “visualize” the data. If you don’t know much about the massive amounts of data that are being collected (routinely) these days, take a look at these 6 items. Items “a,” “b,” and “c” are YouTube videos that show how much data currently exists, items “d” and “e” are videos of TED Talks in which we see illustrative uses of large data sets, and item “f” is the Wikipedia information on “Big Data.”

a. Big Data Will Change Our World

b. Explanation of Big Data

c. Big Data, Big Opportunity

d. Big Data for Tomorrow

e. The Beauty of Data Visualization

f. Written info on “Big Data”

Leave a comment

Filed under Jokes & Humor