population
the entire group of individuals that information is about
sample
subset of the population from which data is collected
parameter
something that models some attribute of population
statistic
value calculated from sample data
sampling bias
a systematic failure of a sampling method to represent the population
simple random sample
each individual is equally likely to be chosen for the sample, as well as each subset (unbiased)
voluntary response
self-selected sampling frame; the only people that participate have strong opinions; underrepresents the population that doesn’t have strong opinions
convenience sample
only surveying individuals that are easily accessible; e.g. only surveying seniors in ap stats instead of all seniors
bias
refers to whether an estimator tends to over/underestimate the perameter
variability
how much estimates vary from sample to sample
cluster sample
population is divided into several heterogenous groups; subgroups are randomly selected and all individuals from subgroups are included in the sample
systematic random sample
starting point is randomly selected, then a rule is used to select (ex. starting at #5, every 11th)
stratified random sample
population is divided into homogenous groups and a random sample is taken from each group
strata
homogenous groups within a population
multistage sample
e.g. randomly select 3 states, then randomly select 2 cities in each state
census
not a sample; surveys every member of a population
undercoverage
part of the population has a reduced chance of being included in the sample; therefore, the sample is unrepresentative of the population (convenience)
nonresponse bias
individuals chosen as part of a sample group do not respond (e.g. ignoring a phone call)
response bias - voluntary
comprised entirely of volunteers (with strong opinions) and tends to not be representative of the population
response bias - question
issues with data collection; influential question wording/tone or influential interviewers
sampling frame
list from which the sample is drawn
observational study
observe individuals without intervention (no treatments imposed)
retrospective observational study
collection of data that has already taken place
prospective observational study
collection of data in real time
experiment
treatments are imposed; must be randomly assigned to determine causation
experimental units
subjects/units that receive treatment
factors
explanatory variables - what is manipulated
levels
different values of a factor
treatments
a combination of factors and levels
confounding
when other variables associated with a factor have an effect on the response; prevents us from knowing if response is due to the factor or the other variable
control group
used to provide baseline data for comparison; may receive placebo or no treatment
single-blind
when the subject does not know which treatment they are receiving
double-blind
when the subject or the administer do not know which treatment the subject is receiving
placebo effect
showing a response to a fake treatment
completely randomized design
randomly and evenly design units to treatments
comparison
must have 2+ treatments
control
keep all variables (besides treatments) the same to reduce confounding
replication
using enough experimental units to distinguish differences and show variability
block
group of experimental units that are similar in some way that effects the response
block design
choose blocks to reduce variability in responses and randomly assign treatments in each block
matched pairs
special type of block design in which two similar units are paired up and split into different treatment groups
statistically significant
when results are way too unusual to be a coincidence
simulation
models a random process in order to estimate a probability
scope of random sample
can generalize to the population we sampled
scope of random assignment
balances groups to determine causation