Percentage or proportion of the whole number of data.
pie chart
a circular chart divided into triangular areas proportional to the percentages of the whole
bar graph
a graph that uses vertical or horizontal bars to show comparisons among two or more items
spaces between the bars
categorical variables on the x-axis
two-way table
A table containing counts for two categorical variables. It has r rows and c columns.
marginal relative frequency
Gives the percent or proportion of individuals that have a specific value for one categorical variable on a two-way table
joint relative frequency
gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable on a two-way table
conditional relative frequency
gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition)
side-by-side bar graph
Used to compare the distribution of a categorical variable in each of several groups. For each value of the categorical variable, there is a bar corresponding to each group. The height of each bar is determined by the count or percent of individuals in the group with that value.
segmented bar graph
a bar graph stacked on top of another bar graph (not mosaic)
mosaic plot
a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category
association
between two variables if knowing the value of one variable helps predict the value of another
Dot Plot
a graphical device that summarizes data by the number of dots above each data value on the horizontal axis
stem plot
A graphical display of quantitative data that involves splitting the individual values into two components
represents frequency of data points
histogram
A graph of vertical bars representing the frequency distribution of a set of data.
symmetric
Being equal or the same in size, shape, and relative position
skewed left
the tail to the left of the peak is longer than the tail to the right of the peak
skewed right
the tail to the right of the peak is longer than the tail to the left of the peak
shape
distribution of data (skew)
center
median/middle
variability
in a set of numbers, how widely dispersed the values are from each other and from the mean (range)
outliers
Numbers that are much greater or much less than the other numbers in the set
mean > median
skewed right
mean = median
symmetric
mean < median
skewed left
standard deviation
a measure of variability that describes an average distance of every score from the mean
Interquartile Range (IQR)
A measure of variability, defined to be the difference between the third and first quartiles.
first quartile
the median of the lower half of the data set
third quartile
the median of the upper half of the data set
resistant
relatively unaffected by extreme observations
median and IQR
1.5 x IQR rule
identifies outliers as an individual value that falls more than 1.5 x IQR above the 3rd quartile or below the 1st quartile.
box plots
A data display that shows the five-number summary. The whiskers, stretching outward from the first quartile and third quartile as shown below, are no longer than 1.5 times the interquartile range (IQR). Outliers beyond that are marked separately.
five-number summary
minimum, 1st quartile, median, 3rd quartile, maximum
percentiles
# values ( = or < ) the given value / total values
standardized scores (z score)
Known population standard deviation or variance?
The Z score give you the exact location of the score within the distribution
Z scores can be positive or negative
cumulative relative frequency
The term applies to an ordered set of observations from smallest to largest.
The sum of the relative frequencies for all values that are less than or equal to the given value.
transform data
add and subtract:
- mean, 5#, percentiles change
- range, IQR, SD dont
multiply and divide
- center, location, variability change
shape never changes
density curve
a curve that is always on or above the horizontal axis and has area exactly 1 underneath it
estimates the proportion of observations that falls in an interval of values
mean of the density curve
the balance point, at which the curve would balance if made of solid material
standard deviation of a density curve
σ (sigma)
normal distributions
data representation with a distinctive bell-shaped curve, symmetric about the mean
mean and median on density curves
equal if symmetric
mean is pushed toward the long tail if skewed
68-95-99.7 rule (empirical rule)
Within a normal distribution, 68% of scores will fall within +/- 1 standard deviation (SD) of the mean; 95% within 2 SDs of the mean; and 99.7% within 3 SDs of the mean.
(Almost all scores will fall between 3 SDs of the mean.)
normal probability plot
Used to assess normality.
if it has a linear form
Skewed Probability Distribution
skewed left - curves on the right/ top
skewed right - curves on the left/bottom