knowt ap exam guide logo

3.1: Correlation and Variate Relationships

Two-Variable (Bi-Variate) Relationships

  • Explanatory variable: a variable that attempts to explain or influence observed outcomes

    • What is being used to make the prediction

    • Displayed on the x-axis

  • Response variable: a variable that measures some outcome

    • What is being predicted

    • Displayed on the y-axis

Describing Scatterplots and Bi-Variate Data: FUDS

  • Form: linear, curve, u-shape, etc.

  • Unusual Points: outliers, influential points

    • Outlier: a point with a large residual (usually decreases the correlation)

    • Influential: a point which draws the line toward it (usually increases the correlation)

  • Direction: positive or negative association (or neither)

    • Positive association—as one variable increases, so does the other

    • Negative association—as one variable increases, the other decreases

  • Strength: how closely the points follow the form

    • Strong, weak, moderately strong/weak

Residuals

  • Individual points with large residuals are outliers in the y direction because they lie far from the line that describes the overall pattern

  • Individual points that are extreme in the x direction may not have large residuals, but can be very important; such points are influential if removing them would markedly change the results of the calculation

Correlation (r)

  • Gives the direction and strength of a linear relationship

    • Does not imply causation

  • Makes no distinction between explanatory and response variables

    • Can switch x’s and y’s and they would still be correlated

  • Both variables must be quantitative

  • Standardized and will not change if we change/convert units of measurement from x, y, or both

  • r itself has no units

  • Positive r = positive association

    • Negative r = negative association

  • Correlation only measures strength and direction of linear relationships

  • -1 ≤ x ≤ 1 always

  • The closer r is to 1 or -1, the stronger the linear form

    • The closer r is to 0, the weaker the linear form and the more scattered the points are

  • r does not tell the whole story

Displaying Data

Two-Way Tables

  • Two-way table: a table that displays data for two categorical variables about the same group of individuals

  • Marginal distribution: the total for one categorical variable

  • The yellow box shows the marginal distribution for gender, and the purple box is the marginal distribution of opinions

  • Conditional distribution: the distribution within just one value of one variable

    • Often uses language of the probability of A “given” B

Segmented Bar Graphs

  • Also known as segmented bar charts

  • Segmented bar graph: a chart that displays categorical data as a percentage of the whole

    • Similar to a pie chart

Mosaic Plots

  • Mosaic plot: a segmented bar graph used to compare groups where the widths of the bars are proportional to the size of the groups

  • Mosaic plots of the same data from the previous section:

R

3.1: Correlation and Variate Relationships

Two-Variable (Bi-Variate) Relationships

  • Explanatory variable: a variable that attempts to explain or influence observed outcomes

    • What is being used to make the prediction

    • Displayed on the x-axis

  • Response variable: a variable that measures some outcome

    • What is being predicted

    • Displayed on the y-axis

Describing Scatterplots and Bi-Variate Data: FUDS

  • Form: linear, curve, u-shape, etc.

  • Unusual Points: outliers, influential points

    • Outlier: a point with a large residual (usually decreases the correlation)

    • Influential: a point which draws the line toward it (usually increases the correlation)

  • Direction: positive or negative association (or neither)

    • Positive association—as one variable increases, so does the other

    • Negative association—as one variable increases, the other decreases

  • Strength: how closely the points follow the form

    • Strong, weak, moderately strong/weak

Residuals

  • Individual points with large residuals are outliers in the y direction because they lie far from the line that describes the overall pattern

  • Individual points that are extreme in the x direction may not have large residuals, but can be very important; such points are influential if removing them would markedly change the results of the calculation

Correlation (r)

  • Gives the direction and strength of a linear relationship

    • Does not imply causation

  • Makes no distinction between explanatory and response variables

    • Can switch x’s and y’s and they would still be correlated

  • Both variables must be quantitative

  • Standardized and will not change if we change/convert units of measurement from x, y, or both

  • r itself has no units

  • Positive r = positive association

    • Negative r = negative association

  • Correlation only measures strength and direction of linear relationships

  • -1 ≤ x ≤ 1 always

  • The closer r is to 1 or -1, the stronger the linear form

    • The closer r is to 0, the weaker the linear form and the more scattered the points are

  • r does not tell the whole story

Displaying Data

Two-Way Tables

  • Two-way table: a table that displays data for two categorical variables about the same group of individuals

  • Marginal distribution: the total for one categorical variable

  • The yellow box shows the marginal distribution for gender, and the purple box is the marginal distribution of opinions

  • Conditional distribution: the distribution within just one value of one variable

    • Often uses language of the probability of A “given” B

Segmented Bar Graphs

  • Also known as segmented bar charts

  • Segmented bar graph: a chart that displays categorical data as a percentage of the whole

    • Similar to a pie chart

Mosaic Plots

  • Mosaic plot: a segmented bar graph used to compare groups where the widths of the bars are proportional to the size of the groups

  • Mosaic plots of the same data from the previous section: