knowt ap exam guide logo

4.1: Sampling and Surveys

Introductory Terms

  • Population: an entire group of people about which information is sought

  • Sample: the actual part of the population studied in order to gather information

    • Information from the sample is used to draw conclusions about the entire population

    • Subset of total population

  • Census: an attempt to contact everyone in a population

    • Very difficult to obtain

    • US only attempts national censuses once every 10 years

  • It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population

    • A sample is only generalizable to the population from which it was selected

  • Sample design: the method used to choose a sample from a population

  • Sampling frame: the list of individuals from which a sample is drawn

  • Biased sample/biased study: a sample or study which systematically favors certain individuals or outcomes

    • Does not represent the population

    • Consistently overestimates or underestimates the value sought

Replacement Sampling

  • Sampling with replacement: when an item from a population can be selected more than once

  • Sampling without replacement: when an item from a population cannot be selected more than once

Types of Sampling

Relatively Ineffective Methods

  • Convenience sampling: choosing individuals who are in close proximity or otherwise easy to reach

    • Often produces unrepresentative data

    • Almost guaranteed to show bias

  • Voluntary response sample: individuals choose themselves as participants by responding to a general appeal

    • Shows bias because people with strong opinions (often negative) are more likely to respond

    • Eg. call-in opinion polls

Generally Effective Methods (if used correctly)

  • Good sampling designs have the goal of creating a sample which is representative of the population

  • Random sample: an essential principle of statistical sampling

    • The use of chance to select a sample

    • Eg. dice, spinners, cards

  • Simple random sample (SRS): choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosen

  • The hat method is one type of SRS

    • Number the individuals on identical slips of paper

    • Place them in a hat

    • Mix thoroughly

    • Draw one at a time until the desired sample size has been selected

    • The numbers you draw represent the individuals that are chosen to be in the sample

  • Stratified random sample

    • More complicated than an SRS

    • Divide the population into groups of similar individuals based on something that might influence results

      • These groups are called strata (singular: stratum)

    • Select an SRS from each stratum and combine to form a full sample

      • Multiple hats; take a little from each

    • This way, you are guaranteed to have representation from each group

    • The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group

    • Can produce better information about the population than an SRS of the same size

  • Cluster sample

    • The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters

    • Number each cluster, then choose an SRS from the clusters

    • Use all of the individuals in the chosen clusters for the sample

  • Multistage sample

    • Perform selection in stages, often done for national samples

  • Systematic sample

    • Order list according to some feature you want to ensure a range of responses from

      • Eg. height, GPA, income

    • Will be selecting every nth item from the ordered list

      • To figure out what n should be, take the total number in the list divided by the number you want to have in your sample

    • Starting point should be randomized

    • Will spread the sample more evenly throughout the population

  • Systematic Random Sample: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic interval

    • Starting point should be randomized

Bias

  • Bias: when certain responses are systematically favored over others

  • When writing about bias, you must:

    • Identify the population and sample

    • Explain how the sampled individuals might differ from the general population

    • Explain how this leads to an overestimate or underestimate

  • Non-random sampling methods have the potential for bias because they do not use chance to select the individuals

    • Two such methods are voluntary response sampling and convenience sampling

      • Voluntary response bias: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population

      • Convenience bias: when those that are most convenient to access get selected for a sample

Types of Bias

  • In addition to the two types covered above:

  • Undercoverage bias: when some groups of the population are left out in the process of choosing a sample

  • Response bias: when the behavior of the respondent or the interviewer causes bias

    • Can be intentional or unintentional

  • Nonresponse bias: when an individual chosen for a sample can’t be reached or chooses not to respond

  • Question wording: when the complexity, style, or order that a question in influences a response

  • Self-reported responses: when individuals inaccurately report their own data

R

4.1: Sampling and Surveys

Introductory Terms

  • Population: an entire group of people about which information is sought

  • Sample: the actual part of the population studied in order to gather information

    • Information from the sample is used to draw conclusions about the entire population

    • Subset of total population

  • Census: an attempt to contact everyone in a population

    • Very difficult to obtain

    • US only attempts national censuses once every 10 years

  • It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population

    • A sample is only generalizable to the population from which it was selected

  • Sample design: the method used to choose a sample from a population

  • Sampling frame: the list of individuals from which a sample is drawn

  • Biased sample/biased study: a sample or study which systematically favors certain individuals or outcomes

    • Does not represent the population

    • Consistently overestimates or underestimates the value sought

Replacement Sampling

  • Sampling with replacement: when an item from a population can be selected more than once

  • Sampling without replacement: when an item from a population cannot be selected more than once

Types of Sampling

Relatively Ineffective Methods

  • Convenience sampling: choosing individuals who are in close proximity or otherwise easy to reach

    • Often produces unrepresentative data

    • Almost guaranteed to show bias

  • Voluntary response sample: individuals choose themselves as participants by responding to a general appeal

    • Shows bias because people with strong opinions (often negative) are more likely to respond

    • Eg. call-in opinion polls

Generally Effective Methods (if used correctly)

  • Good sampling designs have the goal of creating a sample which is representative of the population

  • Random sample: an essential principle of statistical sampling

    • The use of chance to select a sample

    • Eg. dice, spinners, cards

  • Simple random sample (SRS): choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosen

  • The hat method is one type of SRS

    • Number the individuals on identical slips of paper

    • Place them in a hat

    • Mix thoroughly

    • Draw one at a time until the desired sample size has been selected

    • The numbers you draw represent the individuals that are chosen to be in the sample

  • Stratified random sample

    • More complicated than an SRS

    • Divide the population into groups of similar individuals based on something that might influence results

      • These groups are called strata (singular: stratum)

    • Select an SRS from each stratum and combine to form a full sample

      • Multiple hats; take a little from each

    • This way, you are guaranteed to have representation from each group

    • The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group

    • Can produce better information about the population than an SRS of the same size

  • Cluster sample

    • The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters

    • Number each cluster, then choose an SRS from the clusters

    • Use all of the individuals in the chosen clusters for the sample

  • Multistage sample

    • Perform selection in stages, often done for national samples

  • Systematic sample

    • Order list according to some feature you want to ensure a range of responses from

      • Eg. height, GPA, income

    • Will be selecting every nth item from the ordered list

      • To figure out what n should be, take the total number in the list divided by the number you want to have in your sample

    • Starting point should be randomized

    • Will spread the sample more evenly throughout the population

  • Systematic Random Sample: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic interval

    • Starting point should be randomized

Bias

  • Bias: when certain responses are systematically favored over others

  • When writing about bias, you must:

    • Identify the population and sample

    • Explain how the sampled individuals might differ from the general population

    • Explain how this leads to an overestimate or underestimate

  • Non-random sampling methods have the potential for bias because they do not use chance to select the individuals

    • Two such methods are voluntary response sampling and convenience sampling

      • Voluntary response bias: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population

      • Convenience bias: when those that are most convenient to access get selected for a sample

Types of Bias

  • In addition to the two types covered above:

  • Undercoverage bias: when some groups of the population are left out in the process of choosing a sample

  • Response bias: when the behavior of the respondent or the interviewer causes bias

    • Can be intentional or unintentional

  • Nonresponse bias: when an individual chosen for a sample can’t be reached or chooses not to respond

  • Question wording: when the complexity, style, or order that a question in influences a response

  • Self-reported responses: when individuals inaccurately report their own data