Two-way Table
Table summarizing relationship between 2 categorical variables
Marginal Relative Frequency
Percentage/proportion of individuals with specific value for 1 categorical variable
Joint Relative Frequency
Percentage/proportion of individuals with specific value for another categorical variable
Conditional Relative Frequency
Percentage/proportion of individuals with specific value for one variable given another variable
Conditional Distribution
Distribution of one variable given another variable
Association
Relationship between variables where knowing one helps predict the other
Individual
object described in a set of data (people, animals, things)
Variable
attribute that can take different values for different individuals
Categorical Variable
assigns labels that place each individual into a particular group, called a category
Quantitative Variable
Data that is in numbers (counts, measurements)
Distribution
tells us what values the variable takes and how often it takes those values
Histogram
shows each interval of values as a bar
Formula for low outliers
Low outliers < Q₁-1.5(IQR)
Formula for high outliers
High outliers > Q₃ + 1.5(IQR)
Mean: distribution of quantitative data is the avg of all individual data values
x-bar (x w/ a line above it)
Resistance
A statistical measure is resistant if it isn’t sensitive to extreme values
Range
distance between min & max value (max-min)
Standard deviation
measures typical distance of values in a distribution from the mean
S(subx) =
√ ∑ (x(subi) - x-bar) ² /n-1
Quartile
divides ordered data into 4 groups having roughly the same # of values
How to find quartiles
Arrange data values from smallest to largest & find the median
Q₁
median of the data values that are to the left of the median in the ordered list
Q₃
median of the data values that are to the right of the median in the ordered list
IQR
distance between the 1st & 3rd quartiles of a distribution
Q₃-Q₁
5 number summary
{ min, Q₁ , median, Q₃, max}
Boxplot
visual representation of five-number summary
Variance
value obtained before taking the square root in the standard deviation calculation
What should you mention when asked to describe the distribution of a quantitative variable?
shape, center, variability, outliers
Frequency table
shows number of individuals having each value
Relative frequency table
show proportion or % of individuals having each value