Looks like no one added any tags here yet for you.
Probability
The likelihood of an event occurring
Probability (p) equation
= specified outcome / total outcomes
A deck of 52 cards contains 12 royalty cards. If you randomly select a card from the deck, what is the probability of obtaining a royalty card?
(Express in proportion, decimal, percentage)
12:52 or 3:13
0.23
23.08%
Role of Probability
Used to calculate the likelihood of obtaining a specific sample from a given population
If the probability of getting a specific the sample is low, we can say that the sample probably came from some other population
Random Sampling
Random sampling is a statistical technique used to select a subset of individuals or items from a larger population. It involves selecting individuals or items in such a way that each member of the population has an equal chance of being chosen. This method helps to ensure that the sample is representative of the population and reduces bias.
sampling with replacement
Items can be selected from a population multiple times
Each time an item is selected, it is returned to the population before the next selection is made.
Commonly used in probability theory and statistics to simulate random processes and estimate population parameters.
each member of the population has an equal chance of being chosen no matter what has already been selected
sampling without replacement
Items can be selected from a population only once
Each time an item is selected, it is removed from the population before the next selection is made.
Commonly used in probability theory and statistics to ensure each selection is independent
Each member of the population’s chance of being selected increases after selections have been made
How to find probability given a normal distribution
Draw a vertical line at the data point
The line divides the distribution into 2 sections: the body and the tail
The exact location of the line can be specified by a z-score
Look at z-score table to see the proportion in body and tail.
To find probability corresponding to an X
First, transform the score into a z-score
Then look up the z-score in the table and read across the row to find the appropriate probability
To find the x corresponding to a proportion
First, look up the proportion in the table and read across the row to find the appropriate z-score
Then transform the z-score into a x-value
Critical value of z for a two-tailed significance test with α = .05
-+1.96
percentile
Indicates the percentage of data points that are equal to or below a specific value in a dataset
For example, if a student scores in the 80th percentile on a standardized test, it means they performed better than 80% of the test-takers.
how to calculate the percentile
To calculate the percentile, follow these steps:
Arrange the data in ascending order.
Determine the proportion of scores equal to or less than the score.
Normal distribution
A probability distribution that is symmetric and bell-shaped. It is characterized by its mean (μ) and standard deviation (σ).
The majority of the data falls near the mean, with fewer data points further away from the mean.
The shape of the distribution is determined by the mean and standard deviation.
The area under the curve represents the probability of observing a particular value or range of values.
Sampling Error
The difference between a sample statistic and the true population parameter it represents.
It occurs due to the inherent variability in the process of selecting a sample from a larger population.
Quantified using measures such as margin of error or standard error.
Standard Error of the Mean
Provides a measure of the average expected distance between M and mu
Law of large numbers
The larger the sample size (n) in a specific sample the more probable the M is close to mu
(the larger the sample size the small the standard error of the mean)
Understand how n and sigma effect the distribution
The bigger the sigma the wider the distribution
The expected value of the sample means is always
the population mean
Central limit theorem
The shape of the distribution of sample means is typically normal
Distribution of sample means approaches a normal distribution as n approaches infinity
the larger the sample size the _ the variability
the larger the sample size the Smaller the variability
It is guaranteed for a distribution to be almost perfectly normal if either:
the population the samples are obtained from is normal
the sample size is n=30 or more
Hypothesis Testing
Statistical method that uses sample data to evaluate a hypothesis about a population
Hypothesis Testing Goal
To rule out chance (sampling error) as a plausible explanation for the results from a research study
Null hypothesis
The observed findings are due to random chance (there does not appear to be a real effect)
Predicts that the independent variable had no effect on the dependent variable
Alternative hypothesis
The observed findings cannot be explained by sampling error (there does appear to be a real effect)
Predicts that the independent variable did have an effect on the dependent variable
Symbol for null hypothesis
H0
Symbol for alternative hypothesis
H1
Process of hypothesis testing (4 steps)
State your hypothesis about the population
Define the probability at which we think results indicate that there must be a true effect
Obtain and test a sample from the population
Compare data with the hypothesis predictions, make conclusion
Critical Region
Consists of outcomes very unlikely to occur if the null hypothesis is true
Defined by associations that are very unlikely to obtain (typically less than 5% chance) if no effect exists
Alpha level
Establishes a criterion or “cut-off”, for deciding if the null hypothesis is correct
Typically α = .05 (rarely α = .10 or α = .01)
Test statistic
Forms a ratio comparing the difference between the M and μ versus the amount of difference we would expect without any treatment effect (σM)
When do we reject the null hypothesis?
If the test statistic results are in the critical region, we conclude the difference is significant (an effect exists)
When do we fail to reject the null hypothesis?
If the test statistic is not in the critical region, conclude that the difference is not significant (any difference is just due to chance)
Factors that influence a hypothesis test
There are several factors that can influence a hypothesis test:
Sample size: A larger sample size generally provides more reliable results and increases the power of the test.
Significance level: The chosen level of significance (e.g., 0.05) determines the likelihood of rejecting the null hypothesis.
Effect size: The magnitude of the difference or relationship being tested can influence the outcome of the test.
Variability: The amount of variability in the data can affect the precision of the test and the ability to detect significant differences.
Assumptions: Violations of assumptions, such as normality or independence, can impact the validity of the test.
Type of test: The specific hypothesis test used (e.g., t-test, ANOVA, chi-square) will depend on the research question and data type.
These factors should be carefully considered when conducting a hypothesis test to ensure accurate and meaningful results.
Type I Error
Occur when the sample data indicate an effect when no effect actually exists.
Rejecting the null hypothesis when the null is true.
Caused by unusual, unrepresentative samples, falling in the critical region without any true effect.
Hypothesis tests are structured to make Type I errors unlikely.
Type II Errors
Occur when the hypothesis test does not indicate an effect but in reality an effect does exist.
We fail to reject the null hypothesis even though it was actually false.
More likely with a small treatment effect or poor study design (sample size too small).
Directional Test
includes a directional prediction in the statement of the hypotheses and in the location of the critical region
Rarely used as it doesn’t make sense
Critical region for direction tests
All probability located in one tail, making that tail larger
p-Values
The probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis
Used when performing hypothesis tests; the statistical significance is checked by seeing if our test scores (ex z-score) indicate a p-value of less than our α
Chance of a Type 1 error if p = .05
Probably about 20-50% depending on context
Assuming the null is true, you’d obtain the observed difference or more in 5% of studies due to random sampling error.
Effect size
Measures of the absolute magnitude of an effect, independent of sample size
Cohen’s d
Standardized effect size
Like a z-test, measures mean difference in terms of the standard deviation
Why should hypothesis tests be accompanied by effect size?
Because effect size provides a measure of the practical significance of the results.
Hypothesis tests determine if there is a statistically significant difference, and effect size quantifies the magnitude of the difference.
How is effect size influenced by sample size and standard deviation?
Sample Size: Effect size is independent of sample size
Standard Deviation: Effect size is inversely related to the standard deviation
Power of a hypothesis test
The probability that the test will reject the null hypothesis when there is actually an effect
Importance of the power of a hypothesis test
It represents the probability of detecting a true effect or relationship between variables.
A higher power indicates a greater likelihood of correctly rejecting the null hypothesis and avoiding a Type II error (false negative).
A well-powered test increases the confidence in the results and enhances the reliability of scientific conclusions.
4 factors that impacts how hard it is to find effects
effect size (larger effects are easier to find)
sample size (larger samples make it easier to find effects)
alpha level (larger alpha level makes it easier to find effects
non-directional vs directional hypothesis (directional tests make it easier to find effects)
t-value
"estimated z-test.“
Estimated because we are using the sample standard deviation to estimate the unknown population standard deviation.
Why would you use t statistic instead of z test
Allows researchers to use sample data to test hypotheses about the difference between a sample mean and a population mean.
The t statistic does not require knowledge of the population standard deviation (σ)
Can be used for a completely unknown population (both μ and σ are unknown)
All that required is a sample and a reasonable hypothesis about μ
3 types of t-tests
One-sample t-test
Independent Samples or Independent Measures t-test
Dependent Samples or Repeated Measures t-test
Estimated Standard Error
How much difference between M and μ is reasonable to expect
“The Noise”
Steps of conducting a t-test
The hypothesis test with a t statistic follows the same four-step procedure that was used with z-tests:
State the hypotheses and select a value for α. (Note: The null always states a value for μ.)
Locate the critical region. (Note: You must find the value for df and use the t distribution table.)
Calculate the test statistic.
Make a decision. (Either "reject" or "fail to reject" the null hypothesis.)
How to locate the critical region in t-test
Need to calculate the degrees of freedom
df = n – 1 = 25 – 1 = 24
Then we go to the t distribution table.
Since we are using a non-directional hypothesis the test is two-tailed
Since our alpha level is .05 go to column with .05 in two tails
Go to the row corresponding to our df.
Take note of Critical Region
Effect of sample size on t-test
With large samples, the t value will be very similar to a z-test.
With small samples, however, the t-value will provide a relatively poor estimate of z.
influence of sample variance on t-test
Sample variance affects the t-test by influencing the calculation of the t-statistic
A larger sample variance leads to a larger standard error and a smaller t-statistic, making it less likely to reject the null hypothesis
A smaller sample variance leads to a smaller standard error and a larger t-statistic, making it more likely to reject the null hypothesis.
r2
For t-test
Percentage of Variance Accounted for by the IV
Scores differ across individuals for many reasons.
By measuring the amount of variability that can be attributed to the IV, we obtain a new measure of effect size
SPSS Output for t test
Decision Tree for Comparing Two Populations
Independent-measures Designs
Used in situations where a researcher has no prior knowledge about either of the two populations (or treatments) being compared.
Allows evaluation of the mean difference between two unknown populations using data from two samples.
Independent-measures designs uses two separate and independent samples.
Use #1: Test for mean differences between two distinct populations (those with college degrees and those without).
Use #2: Test for mean differences between two different conditions (Acceptance and Commitment Therapy vs. placebo).
Effect size for independent-measures t test
Measured in the same way that we measured effect size for the one-sample t-test
Specifically, can calculate Cohen’s d or r2
Four assumptions should be true (or close to true) when using the t-statistic
1. The data are continuous.
2. The sample data have been randomly sampled from the population.
3. The variability of the data in each group is similar.
For Independent-Measures t-Tests this means the variances of the populations that samples are drawn from are similar.
4. The sampled population is approximately normally distributed.
SPSS OUTPUT: INDEPENDENT SAMPLES t-TEST