frequency

count

relative frequency

Percentage or proportion of the whole number of data.

pie chart

a circular chart divided into triangular areas proportional to the percentages of the whole

bar graph

a graph that uses vertical or horizontal bars to show comparisons among two or more items
spaces between the bars
categorical variables on the x-axis

two-way table

A table containing counts for two categorical variables. It has r rows and c columns.

marginal relative frequency

Gives the percent or proportion of individuals that have a specific value for one categorical variable on a two-way table

joint relative frequency

gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable on a two-way table

conditional relative frequency

gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition)

side-by-side bar graph

Used to compare the distribution of a categorical variable in each of several groups. For each value of the categorical variable, there is a bar corresponding to each group. The height of each bar is determined by the count or percent of individuals in the group with that value.

segmented bar graph

a bar graph stacked on top of another bar graph (not mosaic)

mosaic plot

a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

association

between two variables if knowing the value of one variable helps predict the value of another

Dot Plot

a graphical device that summarizes data by the number of dots above each data value on the horizontal axis

stem plot

A graphical display of quantitative data that involves splitting the individual values into two components
represents frequency of data points

histogram

A graph of vertical bars representing the frequency distribution of a set of data.

symmetric

Being equal or the same in size, shape, and relative position

skewed left

the tail to the left of the peak is longer than the tail to the right of the peak

skewed right

the tail to the right of the peak is longer than the tail to the left of the peak

shape

distribution of data (skew)

center

median/middle

variability

in a set of numbers, how widely dispersed the values are from each other and from the mean (range)

outliers

Numbers that are much greater or much less than the other numbers in the set

mean > median

skewed right

mean = median

symmetric

mean < median

skewed left

standard deviation

a measure of variability that describes an average distance of every score from the mean

Interquartile Range (IQR)

A measure of variability, defined to be the difference between the third and first quartiles.

first quartile

the median of the lower half of the data set

third quartile

the median of the upper half of the data set

resistant

relatively unaffected by extreme observations
median and IQR

1.5 x IQR rule

identifies outliers as an individual value that falls more than 1.5 x IQR above the 3rd quartile or below the 1st quartile.

box plots

A data display that shows the five-number summary. The whiskers, stretching outward from the first quartile and third quartile as shown below, are no longer than 1.5 times the interquartile range (IQR). Outliers beyond that are marked separately.

five-number summary

minimum, 1st quartile, median, 3rd quartile, maximum

percentiles

# values ( = or < ) the given value / total values

standardized scores (z score)

Known population standard deviation or variance?
The Z score give you the exact location of the score within the distribution
Z scores can be positive or negative

cumulative relative frequency

The term applies to an ordered set of observations from smallest to largest.
The sum of the relative frequencies for all values that are less than or equal to the given value.

transform data

add and subtract:
- mean, 5#, percentiles change
- range, IQR, SD dont
multiply and divide
- center, location, variability change
shape never changes

density curve

a curve that is always on or above the horizontal axis and has area exactly 1 underneath it
estimates the proportion of observations that falls in an interval of values

mean of the density curve

the balance point, at which the curve would balance if made of solid material

standard deviation of a density curve

Ïƒ (sigma)

normal distributions

data representation with a distinctive bell-shaped curve, symmetric about the mean

mean and median on density curves

equal if symmetric
mean is pushed toward the long tail if skewed

68-95-99.7 rule (empirical rule)

Within a normal distribution, 68% of scores will fall within +/- 1 standard deviation (SD) of the mean; 95% within 2 SDs of the mean; and 99.7% within 3 SDs of the mean.
(Almost all scores will fall between 3 SDs of the mean.)

normal probability plot

Used to assess normality.
if it has a linear form

Skewed Probability Distribution

skewed left - curves on the right/ top
skewed right - curves on the left/bottom