QUESTION: What do you call a group of middle-aged adults standing in an orderly fashion waiting for the start of a store’s sale of bikini swim suits, muscle shirts, & tattoos?
ANSWER: A regression line
(NOTE: This little effort at statistical humor comes from S. Huck)
Beyond the Joke:
In statistics, a regression line is typically used to predict a person’s status on a criterion variable of interest. This prediction is based upon that person’s status on a different variable that hopefully serves as a good predictor. For example, a regression line could be used to predict how high a college applicant’s GPA will be at the end of his or her first year in college based upon that person’s score on a college entrance exam. Or, a regression line could be used to predict a person’s systolic blood pressure based upon his or her BMI (body mass index).
To develop a regression line, a group of people initially must be measured on both the predictor variable and the criterion variable. After these data are used to identify the regression line, predictions can be made for new people who have scores on the predictor variable but are yet to be measured on the criterion variable. (It’s important, of course, that the new people for whom predictions are made do not differ in dramatic ways from those used to develop the regression line.)
Using data from the initial group of people, a regression line can be displayed visually in a bivariate scatter plot. In such a picture, the criterion variable (often called the dependent variable) is positioned on Y-axis while the predictor variable (typically called the independent variable) is put on the X-axis. Each data point indicates a given individual’s scores on the two variables. To illustrate, the following scatter plot shows data for a hypothetical group of 19 students measured in terms of how long they studied for an essay exam and how well they did once the exam was scored.
The position of a regression line in this or any other a scatter plot is determined by analyzing the data to identify two properties of the sought-after line: its slope and the place where the line passes through the Y-axis. These two features of the regression line are computed so as to minimize the vertical distances between the data points and the line. Because of this, the regression line, once determined, is considered to be the “best-fitting straight line.”
To use the regression line for predictive purposes, a new person is identified for whom there exists a score only on the predictor variable. First, that score is located on the scatter plot’s X-axis. We start at that point, move in a vertical direction until reaching the regression line, and then move to the left in a horizontal fashion until ending up on the Y-axis. That “destination point” represents the predicted score on the criterion variable. For example, we’d predict that a new student who studies only 1 hour for the exam will receive an exam score of 2.
Instead of making predictions via the scatter plot’s regression line, it’s possible to accomplish the same objective by using a formula. Any regression line can be converted into a formula that has this form:
predicted Y-score = Y-intercept + (slope)(observed X-score)
Using this formula, we predict that a person who studies 1 hour will earn an exam score equal to 1.5 + (0.5)(1) = 2.
The degree to which the regression line can make accurate predictions is influenced by the correlation between the predictor and criterion variables. To the extent that the correlation is high, data points in the scatter diagram will lie closer to the regression line, thus increasing predictability (so long as the “new” people for whom predictions are made resemble those in the initial group). Typically, the square of the correlation coefficient is used as an indicator as to how well the regression line will work. For the data in the accompanying scatter plot, r = 0.50 and r-squared = 0.25. This means that 25% of the variability in the exam scores is associated with (i.e., explained by) study time.
If you’d like a watch a good, 6-minute tutorial on the basic concept of a regression line and how it helps with prediction, click this link: