$go to Math$

AP Statistics

AP Statistics Ultimate Guide

Studied by 24 people

0.0(0)

View linked note

get a hint

hint

Categorical Variables

1 / 97

Tags and Description

Statistics

AP Statistics

98 Terms

Categorical Variables

Variables that take on values as category names or group labels, organized into frequency tables or represented by displays like bar graphs, dot plots, and pie charts.

New cards

Quantitative Variable

Variables with numerical values for measured quantities, organized into frequency tables or represented by displays like histograms, dot plots, and box plots.

New cards

Discrete Quantitative Variable

Takes on a countable number of values with gaps between them.

New cards

Continuous Quantitative Variable

Can take on infinite values without gaps, like heights and weights.

New cards

Center

The value that separates the data roughly in half, indicating the middle.

New cards

Spread

The range of values from smallest to largest, showing the variability.

New cards

Clusters

Natural subgroups in the data, indicating where values fall.

New cards

Gaps

Holes in the data where no values fall, showing gaps in the distribution.

New cards

Unimodal Distribution

Distribution with one peak; Bimodal Distribution:Distribution with two peaks.

New cards

Skewed Distribution

Spread towards higher (right-skewed) or lower (left-skewed) values.

New cards

Bell-shaped Distribution

Symmetric with a center mound and sloping tails.

New cards

Descriptive Statistics

Data presentation including average values, variability measures, and distribution shape.

New cards

Inferential Statistics

Drawing inferences from limited data, discussed in later units.

New cards

Median

Middle number in a set; Mean:Average found by summing and dividing by the number of items.

New cards

Variability

Key concept in statistics, described by range, interquartile range, variance, and standard deviation.

New cards

Parallel Boxplots

Graphical representation showing the comparison of stock price statistics across different years, including median, quartiles, yearly low, and interquartile range.

New cards

Normal Distribution

A bell-shaped and symmetric distribution used to model various natural phenomena, with the mean equal to the median and points of inflection at one standard deviation from the mean.

New cards

Empirical Rule

Also known as the 68-95-99.7 rule, states the percentage of values within 1, 2, and 3 standard deviations from the mean in a normal distribution.

New cards

Two-Way Table

A table displaying qualitative data from two categorical variables, often used to calculate marginal frequencies and distributions.

New cards

Scatterplot

A visual representation of the relationship between two quantitative variables, showing form, direction, strength, and unusual features like outliers.

New cards

Correlation

A measure (r) of the strength of a linear relationship between two variables, ranging from -1 to +1, with r^2 indicating the proportion of variance explained by the relationship.

New cards

Coefficient of Determination (r^2)

The percentage of variation in the response variable explained by the linear regression model, derived from the correlation coefficient.

New cards

Least Squares Regression

A method to find the best-fitting line through a set of points by minimizing the sum of squared vertical differences, with the slope determined by the correlation coefficient.

New cards

Residuals

The differences between observed and predicted values in a regression model, with a sum of residuals always equal to zero.

New cards

Outliers

Data points that significantly deviate from the overall pattern in a scatterplot, often identified by large discrepancies in the response variable compared to predicted values.

New cards

Influential Scores

Scores whose removal would sharply change the regression line, especially points with extreme x-values.

New cards

High Leverage

Points with x-values far from the mean x-value, having the potential to strongly influence the regression line.

New cards

Regression Outlier

A point with a large residual compared to others, affecting the regression line but not necessarily influential.

New cards

Correlation Coefficient (r)

Indicates the strength and direction of a linear relationship between two variables.

New cards

Simple Random Sampling (SRS)

A sampling method where every possible sample of the desired size has an equal chance of being selected.

New cards

Stratified Sampling

Involves dividing the population into homogeneous groups (strata) and selecting random samples from each stratum.

New cards

Cluster Sampling

Divides the population into heterogeneous groups (clusters) and selects entire clusters randomly.

New cards

Systematic Sampling

Involves selecting every kth individual from a list after choosing a random starting point.

New cards

Sampling Variability

The natural presence of sampling error in a sample, which can be described using probability and tends to decrease with larger sample sizes.

New cards

Observational Studies

Studies where observations and measurements are made without influencing the subjects, aiming to show associations between variables.

New cards

Experiments

Studies where treatments are imposed on subjects to measure responses, aiming to establish cause-and-effect relationships.

New cards

Experimental Units

Objects on which an experiment is performed, while subjects refer to people as units.

New cards

Explanatory Variables

Factors in an experiment believed to affect the response variables, with different levels of treatment applied to groups.

New cards

Control Group

A group in an experiment that does not receive the treatment of interest, or receives a placebo, to determine the treatment's effect.

New cards

Placebo Effect

The phenomenon where individuals respond to any perceived treatment, even if it is inactive.

New cards

Blinding

When subjects are unaware of the treatment they are receiving in an experiment.

New cards

Double-blinding

When both subjects and evaluators are unaware of the treatment assignments in an experiment.

New cards

Matched Pairs Design

A design where two treatments are compared based on responses from paired subjects, often involving single subjects receiving both treatments in random order.

New cards

Guess Strategy

A strategy in a standard literacy test where the test taker selects answers randomly when the correct answer is unknown.

New cards

Score 60-79

A range of scores in a standard literacy test considered passing but not superior, falling between 60 and 79.

New cards

Does not score 60-79

The probability of a test taker not achieving a score between 60 and 79 in a standard literacy test.

New cards

Strategy "Answer (c)" and Scores 80-100

The joint probability of a test taker choosing answer (c) and scoring between 80 and 100 in a standard literacy test.

New cards

Strategy "Longest Answer" or Scores 0-59

The probability of a test taker either choosing the longest answer or scoring between 0 and 59 in a standard literacy test.

New cards

Guess Strategy given Score 0-59

The probability of a test taker using the guess strategy given that their score falls between 0 and 59 in a standard literacy test.

New cards

Scored 80-100 given Strategy "Longest Answer"

The probability of a test taker scoring between 80 and 100 given that they chose the strategy "longest answer" in a standard literacy test.

New cards

Guess Strategy and Scoring 0-59 Independence

The assessment of whether the strategy "guess" and scoring between 0 and 59 are independent events in a standard literacy test.

New cards

Strategy "Longest Answer" and Scoring 80-100 Mutual Exclusivity

The evaluation of whether the strategy "longest answer" and scoring between 80 and 100 are mutually exclusive events in a standard literacy test.

New cards

Cumulative Probability Distribution

A function, table, or graph linking outcomes with the probability of less than or equal to that outcome occurring.

New cards

Normal Distribution

Provides a model for how sample statistics vary under random sampling, often calculated using z-scores.

New cards

Central Limit Theorem

States that for sufficiently large sample sizes, the sampling distribution of the mean will be approximately normal.

New cards

Biased and Unbiased Estimators

Bias indicates the sampling distribution is not centered on the population parameter; unbiased estimators are centered on the population parameter.

New cards

Sampling Distribution for Sample Proportions

Focuses on the proportion of successes in a sample, approximating a normal distribution for large sample sizes.

New cards

Sampling Distribution for Differences in Sample Proportions

Deals with differences obtained by subtracting sample proportions of one population from another.

New cards

Sampling Distribution for Sample Means

The variance of sample means is the population variance divided by the sample size squared.

New cards

Sampling Distribution

The distribution of sample means or proportions taken from a population, with a mean equal to the population mean and a standard deviation equal to the population standard deviation divided by the square root of the sample size.

New cards

Confidence Interval

A range of values that is likely to contain the true population parameter with a certain level of confidence, typically expressed as (point estimate ± margin of error).

New cards

Standard Error

A measure of how much the sample statistic typically varies from the population parameter, calculated as the standard deviation of the sampling distribution.

New cards

Normality Assumption

The assumption that the sampling distribution of sample means or proportions is approximately normal if certain conditions are met, like the sample size being large enough.

New cards

Type I Error

Mistakenly rejecting a true null hypothesis in hypothesis testing, with a probability denoted as α (alpha).

New cards

Type II Error

Mistakenly failing to reject a false null hypothesis in hypothesis testing, with a probability denoted as β (beta).

New cards

Power of a Test

The probability of correctly rejecting a false null hypothesis, influenced by the sample size and significance level chosen for the test.

New cards

P-value

A measure that helps determine the significance of results in a hypothesis test; a small P-value indicates strong evidence against the null hypothesis.

New cards

Type I error

Occurs when the null hypothesis is rejected when it is actually true, leading to a false positive conclusion.

New cards

Type II error

Occurs when the null hypothesis is not rejected when it is false, resulting in a false negative conclusion.

New cards

Confidence Interval

A range of values that is likely to contain the true parameter being estimated, with a specified level of confidence.

New cards

Difference of Two Proportions

Refers to the contrast between two population proportions, often analyzed using hypothesis tests or confidence intervals.

New cards

t-distribution

A probability distribution that is used when the population standard deviation is unknown, providing a more accurate estimate than the normal distribution for small sample sizes.

New cards

Standard Error

An estimate of the standard deviation of a sampling distribution, often used to calculate confidence intervals and conduct hypothesis tests for means.

New cards

Significance Test

A statistical method used to determine whether there is enough evidence to reject the null hypothesis in favor of an alternative hypothesis.

New cards

Type-I Error

Mistakenly rejecting a true null hypothesis, leading to the consumer agency discouraging customers from purchasing a new brand of air-conditioning unit that could actually save on electricity consumption.

New cards

Confidence Interval

A range of values that is likely to contain the true parameter, such as the 95% confidence interval for the mean difference in accidents per month between two departments.

New cards

Type-II Error

Mistakenly failing to reject a false null hypothesis, potentially resulting in a company not making necessary fixes, affecting future sales.

New cards

Paired Data

Involves one-sample analysis on the differences from paired data, like finding a 90% confidence interval of the mean improvement in test scores for a SAT preparation class.

New cards

P-Value

A measure that helps determine the strength of the evidence against the null hypothesis, as seen in the simulation example where a recalibration of machinery was deemed necessary based on the P-value.

New cards

Power

The probability of correctly rejecting a false null hypothesis, contrasting with Type II error, as illustrated in the scenario where the candidate's true support was 63% but might not be recognized due to a Type II error.

New cards

Hypothesis Test

Involves making a claim about a population parameter and testing it, like the significance test for the difference of two means in the example of comparing computer downtimes.

New cards

Parameter

A characteristic of a population, such as the mean electricity usage of a new brand of air-conditioning units, denoted by μ in hypothesis testing.

New cards

Chi-Square Statistic

The sum of weighted differences or discrepancies used in the Chi-Square test denoted as χ2.

New cards

P-value

The probability of obtaining a Chi-Square value as extreme as the one obtained if the null hypothesis is true.

New cards

Degrees of Freedom (df)

The number of categories minus one used in Chi-Square distributions to determine the critical value.

New cards

Goodness-of-Fit Test

A test to determine if a given theoretical distribution correctly describes a situation, problem, or activity.

New cards

Chi-Square Test for Independence

A test to determine if there is a significant association between two categorical variables.

New cards

Chi-Square Test for Homogeneity

A test to compare samples from two or more populations to see if they are homogeneous.

New cards

Sampling Distribution for the Slope

The distribution of the sample slope b with mean μb and standard deviation σb.

New cards

Confidence Interval for the Slope

An interval estimate for the slope of the regression line using t-scores with degrees of freedom n-2.

New cards

Confidence Interval

A range of values that is likely to contain the true slope of the regression line with a certain level of confidence.

New cards

Null Hypothesis (H0)

The assumption that there is no relationship or no effect in a statistical test.

New cards

Residuals Plot

A graph that shows the differences between observed values and predicted values in a regression analysis.

New cards

P-Value

The probability of obtaining results as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is true.

New cards

Least Squares Regression Line

The line that minimizes the sum of the squared differences between the observed values and the values predicted by the line.

New cards

Slope

The measure of the steepness of a line, indicating the rate of change of the dependent variable with respect to the independent variable.

New cards

Linear Relationship

A relationship between two variables that can be represented by a straight line.

New cards

Scatterplot

A graph that shows the relationship between two variables by displaying data points on a two-dimensional plane.

New cards

Explore top notes

SAT VOCAB

Note

Studied by 83 people

Updated ... ago

5.0 Stars(2)

👩‍🔬

Chapter 2, Lesson 3

Note

Studied by 5 people

Updated ... ago

5.0 Stars(1)

🕳️

Chemie : Samevatting

Note

Studied by 16 people

Updated ... ago

5.0 Stars(1)

Chapter 8 - Percentages

Note

Studied by 16 people

Updated ... ago

5.0 Stars(1)

🌐

Unit 8: 20th-Century Global Conflicts

Note

Studied by 2681 people

Updated ... ago

4.8 Stars(17)

Sjorgen-Larsson syndrome

Note

Studied by 4 people

Updated ... ago

5.0 Stars(1)

Gospel of Luke Lecture

Note

Studied by 28 people

Updated ... ago

5.0 Stars(1)

🇫🇷

The Ultimate Guide for The AP French Language and Culture Exam

Note

Studied by 2663 people

Updated ... ago

4.8 Stars(10)

Explore top flashcards

Paper 1: Unit 2 - Muslim Prayer

Flashcard31 terms

Studied by 22 people

Updated ... ago

5.0 Stars(2)

5.2- excretion as an example of homeostatic control

Flashcard92 terms

Studied by 12 people

Updated ... ago

5.0 Stars(1)

Respiratory System

Flashcard44 terms

Studied by 3 people

Updated ... ago

5.0 Stars(1)

Med-Term (Suffixes)

Flashcard146 terms

Studied by 1 person

Updated ... ago

5.0 Stars(1)

filipino hell

Flashcard57 terms

Studied by 9 people

Updated ... ago

5.0 Stars(1)

JP5 - Slideshow flashcards

Flashcard88 terms

Studied by 12 people

Updated ... ago

5.0 Stars(1)

chem 2211L review

Flashcard146 terms

Studied by 11 people

Updated ... ago

5.0 Stars(1)

E11_OTHKII_Vocab

Flashcard156 terms

Studied by 392 people

Updated ... ago

5.0 Stars(1)