Biometry
t-test hypothesis tests: The mean/median of Group A = the mean/median of Group B
One-way ANOVA hypothesis tests: 3 groups. Tests means/medians of A = B = C
Descriptive Statistics: (e.g., mean, s.d., SEM, etc.) help organize/summarize data
Inferential Statistics: (e.g., t-test and ANOVA) allows us to generalize conclusions
Manipulative -Manipulating the application of the ind. variable __
Mensurative - NOT manipulating the application of the ind. variable__
Both are experimental studies, both have ind. & dep. variables, subtle difference
Variable - "characteristic that may differ from one biological entity to another" Zar (2010)
Dependent Variables - Response variablesIndependent Variables - Treatments, Factors
All experiments have at least 1 of each
How many variables & data type and scale of measurement dictate which inferential tool to apply
Experimental Goal?
Establish cause/effect relationship between ind. variable & dep. variable
To accomplish this ideally, all subjects must be identical/similar except for the level of the ind. variable they "receive"
Establish that variables are associated
Significant differences in responses correlate to good variables
Nuisance Variable Example
Prairie lizard manipulative study
Dep. - Snout vent length
Ind. Temperature at
n= 10 @ 10ºC
n = 10 @ 15ºC
n = 10 @ 20ºC
Other factors (nuisance/confounding variables)
Sex (controlled by only using one gender)
Diet (controlled by feeding the same thing in same amounts)
Age
Stress
Hormone levels
Reproductive status
Genetics
Competition
Approaches to nuisance/confounding variables
Identify variables beforehand and hold conditions constant across subjects and treatments
Distribute symmetrically across groups & have high sample size
Incorporate into the experimental model by adding another potential independent variable
Disperse the "nuisance effect" across all treatment conditions via randomization procedures (random assignment of subjects to treatments/conditions)
If concerned about "Procedural Effects": Implement a Control(s)
Negative control - test subjects/sample units that receive all "procedures" except the experimental treatment/manipulation (saline, sugar pill, non-restored study site, etc.)
Positive Control - test subjects/sample units receive all procedures except the experimental treatment/manipulation but you expect a known outcome from this group. This provides a group to compare with that controls for unknown sources of nuisance. (aspirin-headache example-give a group a drug known to deal with headaches)
"Controls" in Mensurative Experiments? Don't necessarily have a control for procedural effects, but have a good comparison across groups. Known benchmark
Examples of variable types and their scales of measurement
Attribute - Nominal - Sex of snake: male or female
Ranks - Ordinal - Pigmentation levels: going from no pigmentation to full pigmentation
Discrete Measurement - Ratio - Number of points on deer antlers
Continuous Measurement - Ordinal Ratio - Body temperature (ºCelsius), Weight of a warthog (kg)
Converting Data from One Scale to Another Example
Continuous variable (e.g., tree height) measured on a continuous scale ----> Convert to ranked data on an ordinal scale
Continuous in cm --> Ordinal in ranks
100 --> 6
500 --> 5
525 --> 4
1000 --> 1
642 --> 3
701 --> 2
10 --> 7
Continuous works with the mean while converting to ordinal works with the median
Distances between data has not been retained, but makes it more desirable to work with
Reduction in variation between data may allow better testing
Descriptive Statistics - way to summarize and organize dataMeasures of Central Tendency
location of sample along the measurement sale
what is the location of the "typical" individual?
Arithmetic Mean
u = population or universal mean (the true mean)
x̄ = sample mean (an estimate of the true mean)
Geometric Mean
antilog of arithmetic mean of log-transformed data
Median (M)
middle value of a ranked data set; most appropriate when data are highly skewed or you're dealing with data on an ordinal scale
Mode
the value that occurs most frequently; number of modes can be useful
Skewness
measure of symmetry
0 = symmetrical (normal distribution)
= tail to the right
= tail to the left
Measures of Dispersion and Variability
the distribution or spread of measurements
Range
difference between largest & smallest observation
usually given as minimum & maximum value
Variance
σ² = population variance
s² = sample variance
mean of squared deviations of measurements from their mean typical not reported since in different units from original data; used to calculate many statistical tests
cannot be negative
increases as dispersion or variability increases
(n-1) = degrees of freedom (df); real units of information about deviation from the average
Standard Deviation (sd)
s = square root of variance (s²)
Coefficient of Variation (CV): CV = (sd/mean) x 100%
a measure of relative variability
has no units so is useful to compare sets of data collected on different scales
(e.g., morphological data in mm and m; T to Dissolved Oxygen (DO))
most applicable to ratio scale data
Indices of Diversity: distribution of observations among categories
Types of Distributions
Many statistical tests are based on assumptions that the data adhere to the properties of a given distribution
Discrete Distributions
Poisson distribution
items distributed randomly (independently)
Binomial distribution
two possible outcomes w/ equal prob. of occurrence
Continuous Distributions
Normal distribution
symmetrical; bell-shaped curve
t-distribution
symmetrical; related to normal distribution
Chi-square distribution
asymmetrical
Normal Distribution
symmetrical, continuous distribution
described by the mean and standard deviation (estimated by sample mean & sd)
most values lie in proximity of the mean random samples of a given n from a normal population will be normally distributed
Central Limit Theorem: at some large n even means of samples from a non-normal population will approach normality (even means from Poisson & Binomial distr.)
normal distribution is the basis of many statistical tests
Are Sample Data Normally Distributed?
Despite CLT, sample data may not be normally distributed due to small n or, more typically, for unknown reasons
Will want to check sample data to see if it is approximately normally distributed
"Goodness-of-fit Tests": (not really recommended)
Kolmogorov-Smirnov goodness-of-fit test
Chi-square goodness-of-fit test
Normal Quantile Plot
If data are perfectly normal they will lie along a straight line, inside the LCI's
Lilliefors Confidence Intervals
used to test for normality in a graphical way; if points fall outside the CI (confidence intervals) then data are significantly different from normal at alpha = 0.05
Shapiro-Wilk test
What null hypothesis is it actually testing?
The distribution of the sample data is equal to the normal distribution
After Normal Quantile Plot, click continuous then normal
Go down to data and click under the Fitted Normal option triangle, then choose Goodness-of-Fit
Then JMP gives you the Probability so you can choose to reject or fail to reject
If not normally distributed...
ignore it?
Inflating chance to achieve a Type 1 error
When you reject the null when the null is actually true
transform raw data to "fix it" & resume test?
Doesn't provide much evidence that it fixed it
choose a nonparametric equivalent?
Usually the parametric tool has more statistical power than the nonparametric equivalent
Statistical Testing and Probability
Probability is the likelihood of an event
Statistical tests provide the likelihood that the null HO is true (P-values)
At low P-values the null is rejected and the alternate is accepted
The lower the P-value, the more confident you are that the null is false
What is a "low" P-value?
Researchers arbitrarily set the probability used as the criterion for rejection of the null
This value is called the significance level or α (alpha)
Convention is to apply an alpha level of 0.050
If α = 0.05 then at P-values less than 0.05 you reject the null HO (i.e. means are "significantly different")
Statistical Errors in Hypothesis Testing (Zar Section 6.3)
In reality, the null hypothesis is either true or false
Because inferences are made from samples, there is always the possibility of making the wrong inference
2 ways of making the wrong inference:
Type 1 Error
Rejecting the null hypothesis when in fact the null is true; a "false positive"'; you determine the means are significantly different when in fact they are not (We must control this error rate)
α error
Type 2 Error
Not rejecting the null when in fact the null is false; you determine the means are not significantly different when they really are (Considered to be a "less dangerous" error)
Designing experiments that give you the best chance possible to reject the null when it is in fact false is the best way to avoid Type 2 Error
β error
Insert probability notes here
Prospective Power Analysis
performed during planning stages of a study to explore how changes in study design (e.g., n, alpha, and effect size) impact objectives/goals of the study including interpretations of statistical tests & potential outcomes
Common applications of Prospective Power Analysis are:
to determine n required to attain a desired level of power at a specified minimum effect size, alpha-level, and standard deviation
to determine power of a test when n is constrained logistically (perhaps you then need to adjust alpha if power is too low)
to determine the minimum detectable meaningful effect size (the question here is what is a biologically meaningful difference)
JMP gives you big N, you need little n to determine sample size per group. Divide big N by however many groups you have to obtain little n
Can increase alpha level to increase power
Can increase effect size to increase power
How to perform Power Analysis in JMP:
DOE > Design Diagnostics > Sample Size & Power
Depending on scenario, choose
E.x. 2-sample means
Not messing around with Extra Parameters yet
Can change alpha level, Std. Dev = Dispersion, difference to detect = effect size
Std.Dev & Effect Size need to be in same units
Leave sample size & power blank
Click Continue & you will get a curve
Sample size on graph is always in Big N
Not using to get an exact # of sample size, power analysis is a guide
Increasing alpha level increases power, which will decrease sample size
Increasing Std.Dev (Dispersion across the Dependent Variable) increases sample size
Decreasing effect size increases sample size
E.x. Clinical Research - Experiments investigating treatment of tumors
Will Drug A reduce the size of brain tumors?
Minimum Effect Size that is Biologically Relevant?
Using a minimum of at least 50%
Decided alpha level of 0.01, defended because really want to make sure Drug works
Need an idea of size of wild-type tumor to get 50% into units
Need some measure of Dispersion
Collect some data by giving study specimens the drug
Has a good idea that wild-type tumor size will be about 30 cubic millimeters
Effect size will be 15 cubic millimeters since using minimum of 50%
OR look within the literature to see if somebody has done similar things
If Dispersions are different, pick the bigger one
Using 12 cubic millimeters for Dispersion based on the literature
Big N shows 36 at Power of 0.8
Based on biological ethics will only give 10 mice cancer, so N=20
Doesn't give us a good idea of if Drug will work or not
Increasing alpha level to 0.05, makes our N=20 look a lot better
Standard Error of the Mean (SEM; SE)
SE is the standard deviation of measurements around a set of means repeatedly calculated from a statistical population or universe
SE is a measure of the precision of x-bar as an estimate of u
as SE gets smaller, the precision of x-bar increases
SE = s(standard deviation)/square root of n
incorporates sd & n, two factors that will impact reliability
SD - a measure of the dispersion or spread of the sample data
SE - a measure of the sampling error or uncertainty in the sample mean as an estimate of the population mean
Confidence Intervals & the Student's t distribution (will never test over confidence intervals)
The t distribution
Family of distributions related to the normal distribution; shape depends on degrees of freedom
Reporting Rules and Conventions
Zar 2010 (Section 7.4 page 108)
"No widely accepted convention", but the measure of dispersion must be clearly stated
n should be stated somewhere
As Text in manuscript: (mean = 27.4 g +/- 2.80 SD) or SE or 95% CI
In a Table or Figure
Two-Sample Hypotheses
Do differences exist b/w two samples; i.e. are the two samples from two different statistical populations?
A number of types of comparisons:
means
medians
variances
CV
indices of diversity
We will explore 2-sample comparisons involving means:
comparison of independent samples
nonparametric tests of independent samples
comparison of paired samples
Comparison of Two Independent Samples
For Example: You measure hematocrit in two groups of 17 year olds, males (n=600) and females (n=600)
Is hematocrit different between groups?
Males - 45.8 +/- 2.8 SD
Females - 40.6 +/- 2.9 SD
What are the independent and dependent variables?
Independent: Sex (male or female)
Dependent: Hematocrit values
What would the data model types be in JMP?
Two columns, sex and hematocrit values
Independence of "Samples" or sample units?
Each individual person should be a sample unit
Can take multiple measurements, just make sure to average before inputting into chart
Pseudo replication?
Analyzing as you have more replicates when you're actually short
2-tail always has the exact same null hypotheses
That the means are the same
Comparing Means from Two Independent Samples
Under the following experimental conditions:
1 Dependent Variable that is continuous
1 Independent Variable that is nominal (grouping/categorical) with 2 levels/groups/categories
Apply the following test if assumptions hold:
Student's t-test
Prob > absolute value of t = 2-tailed either direction
Prob > t = 1-tailed to the right
Prob < t = 1-tailed to the left
Always double check degrees of freedom
Writing a Concise, Publication-Quality Interpretation of Results of a Statistical Test: "Manuscript Style" (Make it OBVIOUS)
Be direct, say what you mean, mean what you say
Don't just say means were different, or you rejected the null or that you detected a significant difference. State the direction of the effect (e.g., high/low, etc.)
Provide the statistical test, df (or n), test statistic (only go to 2 decimal places), and P-value (only go to 3 decimal places).
This is typically put in parentheses following the sentence
People given drug G had significantly longer blood clotting times than people given drug B (Student's t-test, df = 11, t = -2.47, P = 0.031) (Figure 1)
Assumptions of the Two-Sample t-test
Both samples were taken randomly; i.e., sample units are independent of each other & unbiased
The dependent variable is normally distributed (or ~ normal)
(combination of frequency distribution, normal quantile plot, and S-W test)
Variances of the two groups are equal (or almost equal)
(eyeball SD - 2-fold difference?, variance tests, e.g., Levene's Test)
What is the risk?
You risk elevating your Type I error rate above the stated alpha level
Violations are more serious if sample sizes are small (>30 Zar 2010), you are doing a 1-tailed test, or your sample sizes are severely unbalanced
If all assumptions are confirmed = Run of the mill t-test
If Assumptions are severely violated, what do you do?
If normality (normal distribution) is off, apply a data transformation, if "corrected", proceed w/ the t-test using the transformed data set
Report testing results from transformed analysis, but usually report sample means and dispersion on the original scale in text and/or in graphs/tables
Variance Fine
If only variances are violated you can conduct a t-test that has been "corrected for" unequal variances
Welch's t-test
Zar pg. 138
If normality cannot be "fixed"...
Conduct the Mann-Whitney U test which is a nonparametric equivalent of the t-test
JMP calculates the Wilcoxon Test which is equivalent to the Mann-Whitney test
Zar pg. 146
Wilcoxon test, Mann-Whitney U test, or the Wilcoxon-Mann-Whitney test
Nonparametric equivalent of the t-test for independent samples
Nonparametric test
Distribution-free test, where no or few assumptions are made about the shape of a distribution
Does not focus on any specific parameter such as the mean
Test specifics:
Used to test 2 groups
Assumes nothing about the underlying distribution or homogeneity of variances
H0: population distribution of sample 1 = sample 2 (test of medians)
Calculates test statistic based on ranks (position) of the raw data
Good when data set has extreme values in it, but has lower power than t-test, unless assumptions are severely violated
Why not just always apply a nonparametric test with every data set?
Usually has less power
Welch's test
A special derivation of the t-test
Used when normality is correct, but variances are not equal
Can be identified when df are not a whole number
e.g. df on normal t-test are 11, on Welch's they may be 10.70
Comparison of Paired Samples (Non-Independent)
In contrast to independent samples, in a "paired" design, sample units are linked or correlated in some way with a member in the other group(s)
i.e. members of a pair have more in common than with members of another pair
This dependency is planned or by design. However, you make a critical mistake if you apply the wrong inferential test
Paired t-test
Wilcoxon paired-sample test/Wilcoxon signed rank test (nonparametric equivalent)
What if you conduct test assuming independence?
Catastrophic failure
What happens to Type I error?
Increases if highest variance = lowest value
Decreases if highest variance = highest value
Family-wise error inflation is when doing multiple comparisons with the same data at the same alpha level. Raises Type I error rate
Multisample Hypotheses
H0: u1 = u2 = u3 ...
Design:
1 Dependent Variable (continuous)
1 Independent Variable (categorical/nominal): 3 or more levels
Why not conduct a series of t-tests?
Type I Error is inflated beyond your stated alpha
Type I errors accumulate with each statistical test conducted on the same data set
Experimentwise or familywise error rate - must be controlled
Analysis of Variance (ANOVA)
The ANOVA family of tests are the most commonly applied statistical tests
Inferences about means are made by analyzing variability in the data
One model is constructed that includes all means simultaneously; therefore, it controls for familywise error
*F-*value (*F-*ratio) is the test statistic (Sir Ronald Aylmer Fisher 1918)
A factor/treatment is an independent variable whose values are controlled and varied by the experimenter (e.g., drug type)
Are categorical/nominal variables
A level is a specific value of a factor (e.g., drug A, drug B, drug C)
Analyzes and partitions sources of variation in a dataset
2 kinds of variability
Between sample means (among groups)
Within groups
Total variability comprises within group variability and variability between groups
Between Group Variability
Treatment effects
Group
Within Group Variability
Individual differences
Errors of measurement
Error
F = Group/Error
As the test statistic gets bigger, the variation should get bigger as well. Higher F value is desired
In ANOVA, the total variation in the response measurements is divided into portions that may be attributed to various factors, (e.g., amount of variation due to Drug A and amount due to Drug B) Which factor(s) or combination of factors account for significant amounts of the total variation?
Partitioning of the variance within the data set
If a factor/treatment represents a lot of the total variability relative to variability within groups (error) then it is an important “player”
Example: Sandwich types. On One-Way ANOVA Powerpoint
ANOVA-F distribution is the underlying distribution
F = (Between Group Variability/Within Group Variability)=(MS-group/MS-error)
MS stands for Mean Square
H0: F = 1 -> No treatment effects (sample means are drawn from same population). (No Sandwich effect)
Large F -> Means are different (sample means are from different populations). (There is a Sandwich effect)
Summary of Logic
Calculate two estimates of the population variance. MS-error, based on variability within groups, is independent of H0.
Calculations for the ANOVA
In order to calculate MS-groups and MS-error we must first calculate the appropriate sums of squares (SS)
SS-total
Represents sum of squared deviations of all observations from the grand mean
SS-total = SS-group + SS-error
SS-group
Sum of squared deviations of group means from the grand mean. In effect, a measure of differences between groups
Insert formula
SS-error
Sum of squared deviations within each group. Usually obtained by subtraction
SS-error = SS-total - SS-group
Degrees of Freedom
In order to calculate MS-group and MS-error we need to know the degrees of freedom associated with SS-group and SS-error
df-total = N - 1 (where N is total number of observations)
df-group = k - 1 (where k is the number of groups)
df-error = df-total - df-group
MS-group = (SS-group/df-group)
MS-error = (SS-error/df-error)
F-value
Having calculated MSgroup and MSerror we can now calculate F
F = MS-group / MS-error
Between groups estimate of the population variance is much larger than the within groups estimate ® F value greater than 1
How much larger than 1.0 must the value of F be to decide that there are differences among the means?
Use tables of the F distribution, Zar Table B4, Appendix 21.
Gives critical values of F corresponding to the degrees of freedom for the two mean squares (dfgroup and dferror).
dfgroup = numerator df (2)
dferror = denominator df (12)
From tables: (alpha = 0.05) Fcrit=5.10 (F2,12 = 8.45)
Because Fobt > Fcrit we can reject Ho and conclude that the groups were sampled from populations with different means. There is an effect of Sandwich Type
The ANOVA Table (HAVE TO BE ABLE TO BUILD FOR MIDTERM)
Source ----> SS ----> df ----> MS ----> F ----> P-value
Group
Error
JMP Analysis of One-Way ANOVA
Fit Model
Do NOT go into Fit Y by X and conduct the One-Way ANOVA !!!
Dependent variable goes into Y box
Add independent variables into model effects (bottom) box
JMP has different columns for every Independent Variable
P-value in JMP under Analysis of Variance = Prob > F
Capture Analysis of Variance and Effect Test boxes to give evaluation of the Omnibus Hypothesis
What is a Residual?
Distance between the Observed Y and the Predicted Y on a Y by X chart with line of fit
How to graph Residuals in JMP
1 variable is Nominal, 1 variable is Continuous
Fit Model & run ANOVA
Click on Response carat on top left and Click Save Columns
Click Residuals, then puts it into spreadsheet
Testing assumptions using Residuals in JMP
Analyze distribution, add residuals into columns, check distributions, quantile plots, Shapiro-Wilk test
Shows us distribution and assumptions of the data
If non-normal, transform the RAW DATA and NOT the residuals
Then re-find the residuals using the transformed data and THEN test assumptions
Run the ANOVA and click Road Diagnostics
If Residual by Predicted Plot is not on plot standard, add it using Road Diagnostics
Assumptions of ANOVA (Step 1)
Observations/sample units are independent of each other. (i.e. no systematic biases within the data set). Best achieved via random sampling
The data are normally distributed, better yet, the residuals are normally distributed
Save residuals to the data spreadsheet
Examine frequency distribution, normal quantile plot, and Shapiro-Wilk of residuals & interpret
Homogeneity of variance (i.e. the variances among groups are equal)
Examine the plot of residuals (residual by predicted plot) vs the predicted values. Are the points equally scattered for each group
Pig mass varied significantly with type of food (One way ANOVA, F (subscript numerator df (model) denominator df (error)), p-value). Next sentence or 2 would add biological / supporting stats to build answers. (Slide 22)
Assessing Normality within the ANOVA framework
In JMP:
Fit the ANOVA model
Go to "Save Columns"
Click on "Residuals"
Residuals should be in the data spreadsheet
A Significant Overall F... What's next?
Significant overall *F-*test does not indicate that all factors are different from each other
Don't know how many means are different, nor which means are different
Due to experiment wise error inflation, you cannot proceed with a series of run-of-the-mill t-tests
The proper statistical approach is to employ a multiple comparison test (i.e. post hoc testing)
Do NOT go through with this if you do not reject the Omnibus hypothesis in Step 1
What if the overall F-test is not significant?
You cannot proceed
Multiple Comparison Tests - (Parametric) (Step 2)
Also known as post hoc or a posteriori tests
Many different ones
Tukey test, Student Newman-Keuls test, Duncan test, LSD test, Scheffe's test, Fisher test, Bonferroni adjusted t-tests (a special case)
Their application is debated in the literature and there is no absolute agreement on the best to use
Although, Tukey & SNK are the most commonly employed and, therefore, accepted techniques
They operate under the same assumptions as ANOVA and must follow a significant F test
Post hoc testing usually involved testing of all possible combinations of means, even comparisons you might not be interested in
This is why post hoc testing is often referred to as the testing of "unplanned comparisons"
Post hoc tests have built-in procedures that correct for experiment wise error and its influence on Type I error inflation
Each one differs slightly in how conservative it is
Post hoc __Testing: The Tukey HSD Test__**
Rank the means in ascending order
First, compare largest to smallest, then largest to next smallest, etc.
You will need to use the table of critical values of the q distribution on Zar pg. 723
Post hoc Testing: The Student Newman-Keuls test (Never have to calculate in this class)
SNK is conservative enough (i.e. it controls experimentwise error) but it has more power than the Turkey test; SNK calculations are very similar to Tukey
"Multiple Range Test". The "family" of comparisons changes
Graphical Display of post hoc Results
Put explanation of notation in figure caption along with results of F-test
Start by assigning A to the mean(s) of highest magnitude
Shared letters indicate means were not significantly different
Strontium concentrations varied significantly (F4,25 = 56.2, P < 0.001) across water bodies, and concentrations were highest in Rock River, moderate in Angler’s Cove, Appletree Lake, and Beaver Pond and lowest in Grayson’s Pond (SNK) (Figure 1).
Tukey vs. SNK
Both tests adequately control experiment wise error rate and are appropriate post hoc tests when multiple comparisons are desirable following a significant F test
Both can be applied at a specified alpha level (e.g., 0.05)
Both are better approaches than multiple t-tests
Tukey will result in fewer Type 1 errors than SNK
SNK has more power than Tukey (>3 means)
I apply Tukey when I want a more conservative test and SNK when the research is more exploratory
Tukey is probably more commonly used by Biologists; SNK common in Psychology (Zar recommends Tukey)
The Bonferroni Method: An Additional Way to Control Experimentwise Error
The Bonferroni adjustment to alpha levels is commonly used to control experimentwise error in situations where multiple tests are applied (e.g. post hoc comparisons and multiple correlations)
a= 0.05 / # of comparisons
You can “start” with whatever alpha you want
For example: You want to conduct 5 tests (denominator) @ an initial stated alpha of 0.05 (numerator)
0.05/5 = 0.01
All 5 tests would actually each be conducted @ alpha = 0.01
5 comp. = 0.01, 8 comp. = 0.006, 12 comp. = 0.004
An acceptable approach to post hoc testing is to conduct multiple t-tests but with Bonferroni adjusted alpha levels for each comparison
Dunnett's Test (Control Group vs Other Groups Individually)
Accepted post hoc test provided in JMP for this special case
Multiple Comparison Study
We learned 3 techniques to control experimentwise error during post hoc testing:
Tukey Test
built-in adjustments/corrections such that you actually conduct the test at the stated alpha
SNK
built-in adjustments/corrections such that you actually conduct the test at the stated alpha
Bonferroni Method
Directly adjust stated alpha based on # of comparisons (You don’t have to do all possible comparisons) All three approaches are conservative relative to not controlling experimentwise error.
Bonferroni Method is ultraconservative, particularly at > 5 comparisons (0.01)
Basically, you pay a “penalty” when you test all possible, unplanned comparisons following a significant F test because these comparisons have been adjusted to control for experimentwise error
There is another option that circumvents being penalized, but you cannot make all pairwise comparisons
Nonparametric ANOVA: The Kruskal-Wallis test
Apply this nonparametric equivalent to One-way ANOVA when k>2
It is a distribution-free method that analyzes the ranks of the data
Sometimes called "ANOVA by ranks"
ANOVA is generally more powerful, but K-W provides an alternative when assumptions are not met and a transformation doesn't help
The test statistic is H
The K-W test is equivalent to the Omnibus *F-*test in ANOVA
K-W in JMP
Use the Fit Y by X platform
Go to "nonparametric"
Choose "Wilcoxon"
Desired info is under 1-way Test, ChiSquare Approximation
Do NOT report ChiSquare value
DO report df, test statistic, and p-value
Post hoc testing Following significant K-W test
Dunn Method for Joint Ranking - (Zar pg. 240-241)
Preferred, more powerful method for nonparametric
Steel-Dwass procedure
Less power, doesn't work well when sample sizes are unequal
Wilcoxon all pairs (apply Bonferroni adjusted alpha)
Less power, highly conservative when groups 5 or more
In JMP
Fit Y by X
Run ANOVA
Click carat and choose "nonparametric"
Click "Nonparametric Multiple Comparisons"
Will show up under "Nonparametric Comparisons"
Report p-value and possibly Z-value
Planned Comparisons (Contrasts) vs Unplanned Comparisons
Typically, when you design an experiment with multiple levels of the independent variable, you have particular comparisons of interest in mind
Planned comparisons are stated a priori while unplanned comparisons are a posteriori, or "thought of after the data are collected"
You pay a price for conducting post hoc tests because they incorporate a correction for experimentwise (family wise) error
In contrast, planned comparisons (at least some special combinations) are made at the stated alpha, even if the omnibus F is not significant, within the ANOVA itself because they partition the SS-Model
Orthogonal Contrasts/Comparisons
Statistical Orthogonality
Usually in reference to groups or multiple independent variables
Non-overlapping, independent, not correlated
Assumption for planned contrasts & multiple regression modeling
Set of Planned Comparisons Must = Orthogonal Contrasts
If you want to conduct planned comparisons, you need to decide how many and which ones to make
To enjoy the luxury of testing multiple comparisons at the stated alpha, you must follow certain rules (i.e. the comparisons must be orthogonal)
A full set of orthogonal contrasts completely partition the SS-Model
Therefore, they represent independent pieces of information (i.e., this allows you to work at the originally stated alpha)
There are up to a-1 possible contrasts that can comprise a full orthogonal set (but you don't have to conduct all a-1 comparisons)
a = # of groups
Planned contrasts are 1 df comparisons & you cannot use over the a-1 df
How do you establish a set of orthogonal contrasts?
Coding Planned Contrasts
Coding is achieved by assigning weights/coefficients to groups to indicate contrasts
Rules
Groups coded with positive weights will be compared to groups coded with negative weights
The sum of weights for a single comparison/contrast should be zero
Group(s) not involved in a specific comparison is/are given a zero
To be orthogonal, the sum of the products of coefficients within a group must equal zero
In JMP
Build ANOVA
Find normal Sum of Squares and record
Click carat where you'd normally run Tukey test
Build table using positives and negatives, adding new column after every row
Click done and look at SS, needs to be lower or equal to normal Sum of Squares
Planned Comparisons: Wrap-up
Incorporate planned comparisons if you can to avoid experimentwise error.
Can be a useful approach if grouping “groups” for comparisons is insightful and a goal of the research.
It’s best if each comparison represents a unique portion of the SSModel so that comparisons meet the orthogonal requirement.
You don’t have to perform all a-1 comparisons, but beware of “unexplained” blocks of variance in SSModel.
Don’t “force” orthogonality. In other words, if the planned comparisons of interest aren’t orthogonal, proceed but Bonferroni adjust the alpha levels for each comparison.
If all pairwise comparisons of groups are of interest, you should probably just proceed with Tukey or SNK (Season example)
Data Transformations (Zar Chapter 13)
To apply parametric statistics, the data set must meet (or approximate)
the assumptions of normality, equality of variances, and that the magnitude
of the variances don’t increase with the magnitude of the means (nonadditivity).If you judge that the data violate assumptions, then one option is to try and “correct” the data by applying a transformation.
When applying a transformation you change the raw data to a different form or scale. (e.g., F to C is a transformation)
After you perform a transformation, and judge the transformation “fixed” the data, conduct parametric tests on the transformed data set. Transform all data,
not just one level of a variable!Complications arise when reporting means and variability around the means following transformation. You should probably report the mean in the original scale. The most appropriate thing is to report the antilog of the transformed mean (geometric mean). I don’t see people doing this? How to report the variability is another issue ... Be sure you inform readers you analyzed transformed data!
Three Common Transformations
The Logarithmic Transformation
The Square Root Transformation
The Arcsine Transformation
The Logarithmic Transformation
X'=log10(X)
X'=log10(X+1) -->0' or to avoid (-) values
The log family of transformations are the most common
It is a variance-stabilizing transformation that will also address nonadditivity and non-normality if the data are right skewed
You can apply a log of any base, but log10 appears to be used the most
Beware of “log” “log10” “ln” – this is particularly important when reporting a model designed to predict y’s based on inputs of x
Always check your transformation with a calculator
The Square Root Transformation
X'=SqRt(X+0.5)
Variance-stabilizing transformation, particularly when variances increase as the means increase, also when the variances & means are of similar magnitude and aren't independent of each other (i.e. Poisson distribution)
Helpful to try this transformation if Log doesn't work, especially when a non--parametric tool is not at your fingertips
May help transform percentage data when data range is between 0-20% or between 80-100%
Similar reporting issues (square the transformed mean & calculate CI's)
The ArcSine Transformation
p'=arcsin*(SqRt(p))
Proportions tend to form a binomial distribution vs a normal distribution
This transformation will "centralize" the data - bring values closer to 50%
Arcsin is the inverse sine (sin^-1)
Radians vs degrees .. Ugh!
Check your transformation vs Zar Appendix Table B.24
Only Pre-Midterm Topics Above
Power and Sample Size in anova
Power = 1-Beta
In JMP
Set up regular Power Analysis
Choose k sample means option
Set alpha for Omnibus F test
Enter SD, variability among all groups combined
Enter estimated means of each group (represent smallest detectable difference)
Leave sample size & power blank to examine power curves
Sample size gives Big N. Remember to divide N/k # of groups
Reports sample size required to reject the Omnibus
Different Types of ANOVA ModelsFixed-Effects Model (Model I ANOVA)
Levels of the factor are specifically chosen by the experimenter; it is these specific groups about which the experimenter is trying to draw conclusions
Most common
Random-Effects Model (Model II ANOVA)
Levels of the factor are a random sample of all possible levels, a wider universe of groups
Instead of being concerned with effects of specific levels, you are trying to generalize effects across a random selection
Mixed-Effects Model (Model III ANOVA)
Some factors/treatments are fixed & some are random
SS & MS calculated the same; ANOVA table looks similar
Differences in the MS term used in F test for some HO's and how secondary analyses are performed
Factorial ANOVA (Zar Chapter 12)MultiSample HO:
One-Way ANOVA model
1 Independent Variable
Nominal w/ more than two levels
1 Dependent Variable
Continuous
Factorial Analysis of Variance
Consider the effects of more than 1 independent variable on a dependent variable simultaneously (in the same model)
Advantages
No need for multiple 1-way ANOVAs;
Can test for interaction among factors
Two-way ANOVA model
2 Independent Variable
BOTH Nominal each w/ two or more levels
1 Dependent Variable
Continuous
Two-way ANOVA/Two-factor ANOVA
2 independent variables = 2 treatments = 2 factors = 2 main effects
Don't confuse with multiple levels or groups
Let's add a variable to the experiment that tested the effect of 5 sugars. Now we want to test the effect of both sugar and pH on pea growth
5x2 factorial design
Each level of 1 factor is in combination with each level of the second factor; "crossed"
Balanced design (equal replication)
10 combinations, 50 observations
Sugar will have an F value, pH will have an F value, Sugar x pH (interaction between 2 variables) will have an F value
Certain tests are informative (Tukey is good, provides pairwise comparisons)
Examine interaction plots to help us see visually why we have interaction
2x2 Design Rat & Lard Example
Effect of lard type on food consumption of rats (N=12; n=6 per main effect)
2 Main Effects (Fixed) each w/ 2 levels:
Fat (Fresh, Rancid)
Sex (Female, Male)
3 replicates per subgroup
Fit Model Platform
Dependent Variable into Y box
Independent Variables & Interaction into model effects box
Analysis of Variance Prob>F = Omnibus F
Manuscript statement looks at Effect Test results & post hoc results
To find variation of a certain effect, divide sum of squares model/group by sum of squares total
Grab LS Means Plots for both Effects and the Interaction
If Main Effect is not significant, then post hoc testing is not necessary
Publication Statement:
Consumption by rats was significantly higher for fresh fat versus rancid fat (Two-way ANOVA, F1,8 = 41.96, P < 0.001), and main effect “Fat” accounted for 79.0% of the total variation in rat consumption (Table/Figure 1). Sex (F1,8= 2.59, P = 0.146) and Fat*Sex (F1,8= 0.63, P = 0.450) were not significant.
Effect tests F & P value tells us that we have a variable effect. REPORT EFFECT TESTS F&P VALUES
3x2 Factorial Design Sandwich Data
Sandwich: Meatball, BLT, Spicy Italian
Season: Spring, Summer
Dependent Variable: Sales from ___ Subway Stores
Significant variation within Sandwich Types, so run post hoc analysis to see how and why
When a main effect is significant, and has more than 2-levels, proceed as you would in a one-way ANOVA
Sales of BLT are significantly higher than sales of Meatball & Spicy Italian
Sales of sandwiches did not differ between spring and summer
There was no interaction of the main effects, indicating that Season affected sales equally across all Sandwiches
Analysis of Significant Interactions
The interaction term allows for examination of the joint effect of factors on the dependent variable (advantage of factorial design)
If the interaction term is significant, it means the nature of the effect of one factor on the dependent variable is dependent on levels of the other factor
In factorial anova, you should first look to see if the interaction term is significant, because if it is, then biological conclusions made about the main effects are unreliable or not applicable in all instances (combinations)
Interaction among factors indicates the effect of the two Main Effects are not independent of each other
What if you have a significant Interaction Effect?
Effect of Sex and Season on hematocrit of the dark-eyed junco:
No effect of Sex (p=0.26)
Significant effect of Season (p=0.021) (Hematocrit was higher in the Spring)
Significant interaction of Sex X Season (p=0.032)
The F-test of the interaction is enough statistical information to conclude that Hc of female juncos in spring is higher than Hc of female juncos in summer
Note: You could not draw strong conclusions about the Season-effect without
Interpreting Interaction: Factors with >2-levels
A professor gives a final exam that's an essay. Students are randomly assigned to either take the exam with laptops or write in blue-books. Additionally, students are put into three categories based on typing ability: None, Moderate, Skilled. The instructor was interested in the effect of Method, Ability, and the interaction of those two on score on the essay. Grades assigned "blindly".
Method: Laptop, blue-book
Ability: None, Moderate, Skilled
Method X Ability
Dependent Variable: Essay score
Main Effects:
Ability - Significant
Proceed with post hoc testing (Tukey HSD)
Method - Not Significant
Interaction - Significant
Now, you must analyze the simple main effects. This entails examining the changes in effect of one factor over levels of the other
Focus on key comparisons, don't have to run all tests
Presenting results
For F-ratio of ability, present F 2,12 then p-value
For F-ratio of ability*method, present F 2,12 then p-value
There was a significant effect of Ability (Test, F, p=0.032), where scores f students with moderate typing ability were higher than scores of students with no typing ability (Tukey HSD). The effect of Method was not significant (F, p=0.901); however, there was a significant AbilityXMethod interaction (F, p=0.0465). Examination of simple main effects was inconclusive, but there was a trend of lower scores in students skilled in typing and using laptops versus skilled students using bluebooks (test).
A researcher is interested in studying the effect of group psychotherapy and medication on depression. 30 patients participated in the study
The researcher designed this experiment to examine if types of therapy, psychotherapy and medication, interact in their effect on depression
2x3 factorial design
30 total patients
6 subgroups
5 patients per subgroup
Psychotherapy: Psychotherapy, No Psychotherapy
Medication: Placebo, low Dose, High Dose
Dependent Variable: Depression scores
Both main effects are Statistically Significant
The Interaction Term is significant
Run a Tukey HSD after to find why
Group psychotherapy influenced subjects in the placebo and low dose treatments, but it had no influence on people given the high dose treatment
Efficient Use of Resources
Interested in testing the effects of 2 environmental variables, air temperature and nitrate, on the growth of cotton
For each independent variable, you have 3 levels
High, Medium, and Low
Produces a 3x3 design
For power purposes, you want to have 27 cotton plants per level of each main effect
Unexplained variance is reduced in the 2-way ANOVA over the 1-way ANOVA
Results in higher f-ratios
2-way ANOVA has more Power
Incorporating variables into models that explain or account for observed variability in the dependent variable is a good thing, for this is one of our major goals as researchers
Two Types of Independent Variables incorporated into factorial Designs:
Experimental Variables
We are directly interested in the effect of all of those variables, including their interaction, on the dependent variable;
These are typically fixed effects
Control or Block(ing) Variables
Incorporated solely to reduce the amount of experimental error, giving more resolution for exploring effects of Experimental Variables of interest
Typically the Block Variable is a random effect
The Randomized Group Design
This is they "typical" factorial design we've considered so far
3x2 factorial design with 2 experimental factors
Method and ability are fixed effects
Method: laptop, bluebook
Ability: none, moderate, skilled
Method*Ability
Dependent Variable: essay score
In this design, each cell (6 subgroup combinations) has multiple subjects or replicates (3). Multiple subjects were "randomly" assigned to each cell in the 3x2 design.
In this design, you apply a 2-way ANOVA and include the interaction term
The Simple Randomized Block Design
5 m-squared of Bermuda grass
Enough space for 3 plots per location
Dependent Variable - amount of above ground grass (kg)
There is 1 replicate per cell
There are 5 replicates per level of Nutrient
The factor of interest is Nutrient
Can calculate Block Effect since n=3 per block
Why is it advantageous to include Location as a Block factor?
What is the unit of replication in this design?
The Simple Randomized Block Design (A Simple Mixed Model)
"Randomized Complete Blocks", "ANOVA w/o Replication"
Interspersion of treatments across "Blocks"
This design is used to reduce the amount of experimental error through the inclusion of a block variable that is usually a random effect factor
This design can be viewed as somewhat of a hybrid between a 1-way and 2-way ANOVA because you have 2 factors in the model, but you are only interested in the effect of one fixed Independent Variable (at least in the simple randomized block design)
The simple randomized block design has 1 fixed effect and 1 random effect
The statistical model is a mixed, Model III, 2-way ANOVA without replication
The interaction term is not included in the model because there is not enough replication per cell to calculate it
In the simple randomized block design, you assume no interaction between Experimental factor and the Block factor
In JMP:
3 total columns
1 fixed effect column
1 random effect column
1 dependent variable column
1 dependent and 2 independent
Test assumptions of the dependent variable the same way we've been doing it
Save residuals and test assumptions
Analyze --> Fit Model
Independent variable into Y
Fixed effect into model effects
To change a variable to a random effect click "Attributes" and then click Random Effect
If balanced, change "method" to "Traditional"
If imbalanced, change "method" to "REML"
Make sure "Effect Details" is enabled
Enables you to perform Tukey HSD and post hoc tests
Randomized Block - A Biological Example (Zar 12.4)
HO: The mean weight of guinea pigs is the same on four specified diets
1 block comprised of 4 cages
1 guinea pig per cage
1 replicate of each diet per block (assigned randomly)
Expected gradients within barn:
Temperature
Light
Noise
Draft
Pigs in Blocks will experience similar conditions
Interspersion of Treatments
N=20 pigs
5 replicates per diet
Columns
Block - nominal
Diet - nominal
Weight gain - Continuous discrete
Repeated Measures Design
Each subject receives all levels of Factor A
Slide 65 2-Way ANOVA ppt
Possibly subject to the "carry-over effect"
Subject goes through Treatment 1 and then Treatment 2, but it going through the 1st Treatment may have affected its response
Dependencies/correlations across treatments
Advantages:
Individuals/subjects are acting like "blocks" - homogeneity of potential sources of error
Experimental error introduced into the study due to variability between subjects can be accounted for
Also called a within-subjects or treatment-by-subject design
Subjects are receiving all levels of Factor A
Among subject variability can be accounted for and "factored out"
RMD design is similar to randomized block design b/c subjects function similarly to "blocks"
Intra subject dependencies present both positives & negatives
Dependencies are a negative if "carry-over" effects exist across treatments
Assess normality using residuals as in one-way ANOVA (will require re-entering data in traditional form)
Assess correlation structure (Test of sphericity)
Addressing the Sphericity Assumption
JMP provides a test for Sphericity using the Multivariate framework:
If the assumption is met, proceed with the unadjusted, univariate F-test
If the assumption is not met (Chi-Square <0.05 Sphericity Test)
Apply an adjusted, univariate F-test (Geisser-Greenhouse or Huynh-Feldt
2. Apply a F-test generated MANOVA - "Multivariate F" (Multiple dependent variables)
In JMP:
Go into Graph Builder and compare individuals (ex. Cholesterol vs. Drug)
Add random effect into right-hand overlay (top right box above "Color")
Examine correlation structure within Subjects across Groups
Note the y-intercept variation (Subject Variation is Important)
Analyzing Repeated Measures Design using Mixed Model
Assess normality and variances using residuals in One-Way ANOVA
Assess correlations within subjects across groups visually (Graph Builder)
Paired t-Test is most appropriate post hoc analysis w/ Bonferroni adjusted alpha level, but can do Tukey HSD
Multiway Factorial Analysis of Variance
You can extend the basic ANOVA design to experiments with more than 2 factors
In these multiway designs you can examine the effects of numerous factors (3,4,5,etc) simultaneously, with interactions, in one model
Factors can be a combination of both fixed and random effects
Typically, you never see more than 3 or 4 factors (3-way & 4-way ANOVA)
Tests of Difference vs Tests of Relationships
Tests of Difference
Is this group different from that group(s)?
t-Tests, ANOVA's
Independent variable is typically nominal/categorical
Tests of Relationships
Is variable A related (co-vary) with variable B?
Correlation and Regression
Independent variable is typically continuous (but doesn't have to be, particularly in correlation, where there isn't a "dependent" or "independent" variable)
Correlation
To what degree does one variable vary with another? (Does not imply cause & effect)
Regression
To what degree is variable Y dependent on variable X? (Implies a cause-and-effect relationship exists)
Correlation
Research question: Are two (or >2) variables "associated" with other?
Important Point: The research question is not one of cause & effect
Examples:
Do two methods of measuring blood pressure tend to give corresponding results?
Blood pressure measurements with 2 methods on the same units
How strongly associated are pairs of morphometric characteristics of grizzly bears?
Data points are 1 grizzly bear
Y and X values are leg length and arm length
Is there correspondence between concentrations of cadmium and lead in sediments of streams in a watershed impacted by industrial pollution?
Sample unit is sediment core sample
From that sample we get a measurement of [Cadmium] and a measurement of [Lead]. These are X and Y values
In JMP:
Make sure both are normally distributed & relationship is linear
Go to Analyze -> Multivariate Methods -> Multivariate -> Pairwise Correlations
Put Correlation value & P-value (r=0.952, p<0.001)
Simple Linear Correlation
Three questions:
Are two measurement variables related in a linear fashion?
If they are related, what is the direction (+ or -)?
How strong is the relationship? (Differences, parallel to effect size)
Smoking Example: cig smoking/day & CHD mortality/10,000 people
Null H0: There is no correlation b/w Smoking and CHD. (correlation coefficient = 0)
X-axis and Y-axis placement does NOT matter in correlations
Do NOT put a line of best fit in scatterplot. If putting a visual aide, add an ellipse
The Correlation Coefficient (r)
aka Pearson's r or the Pearson product-moment correlation coefficient
r is a measure of association between two variables (X&Y)
Two pieces of information obtained from rvalues:
Sign (+ or -) of r indicates whether the association is positive or negative
Size of r (from -1 to 1) indicates the magnitude of the association (further from 0 = stronger)
Since r values range from -1 to 1, 0.85 indicates a strong positive relationship exists b/w cigarette consumption and CHD
We can make this conclusion regardless of the underlying distribution of the two variables. In this sense, we view the r value as an index (rules of thumb)
The Correlation Coefficient (r): Test of Significance
There are no statistical assumptions associated with calculating r, and if an index is what you need to make the inference of interest, then stop here. (However, this is typically not how Biologists use r)
Assumptions of Normality
Both variables X&Y, were sampled randomly from a population with a normal distribution (bivariate normal distribution) and the relationship between the variables is linear
To calculate a p-value for r, X&Y need to be normally distributed and the relationship needs to be linear (DO IN GRAPH BUILDER)
If performing a transformation to one variable, apply to both
Nonparametric (Spearman's rs & Kendall's tau) has good power so transformation is not necessary
Nonparametric Correlation (Ranks)
Spearman's correlation coefficient (rs); ranges from -1 to 1 or Kendall's tau. Analyses based on ranks
Apply when bivariate normal assumptions are violated, when data are ordinal, or the relationship is nonlinear
In JMP:
Analyze -> Multivariate Methods -> Multivariate
Then you can find "linear correlations" and "nonparametric correlations"
Reid usually goes with Spearman's over Kendall's
Reporting is (r(sub-s), p-value)
The more elliptical the scatter of points, the more intense the correlation
Be careful interpreting significant r's
As sample size increases, the critical value decreases
Two-tailed usually
Multiple Correlations: The Correlation Matrix
What if you have more than 2 measurement variables
8 variables: 28 total comparisons
alpha per correlation = ?
Holm-Bonferroni Method (Holm 1979) WILL NOT BE TESTED OVER
Stop at the first non-significant outcome
Order the p-values from smallest to greatest
H4 = 0.005
H1 = 0.01
H3 = 0.03
H2 = 0.04
Work the Holm-Bonferroni formula for the first rank:
HB = Target alpha / (n-rank+1)
HB = 0.05 / (4 -1+1) = 0.0125
The Limitations of Correlation
Correlation analysis indicates cigarette consumption and CHD are related. It also tells us the relationship is positive and relatively strong (r= 0.85)
BUT...
You may want to predict incidence of CHD for given levels of cigarette consumption or how much does CHD increase with a unit increase in cigarette consumption. This hypothesis of causality about the relationship requires Regression Analysis
Sparrow Wing Length as a Function of Age
Correlation--> X <--> Y (Covariation)
Just has 2 variables, technically not independent or dependent variables
Regression--> X ---> Y
Age (Independent) ---> Wing Length (Dependent)
A Regression Example
A snake physiologist wished to investigate the effect of temperature on the heart rate of juniper pythons. She selected nine specimens of approximately the same age, size, and sex and placed each animal at preselected temperature between 2 and 18 C. After the snakes equilibrated to their ambient temperatures, she measured their heart rates. n=9
Temp (IV) (Fixed)
Heart Rate (DV)
Simple Linear Regression
Simple vs. Multiple regression (one predictor variable vs multiple predictors)
Linear regression vs non-linear regression
Simple Linear Regression:
Species a straight line relationship between two variables
Predictor variable (X) (usually a Fixed Effect-Model 1)
Response variable (Y)
Specifies a predictive relationship between X&Y
SLR analysis involves producing
A regression line or "best fit" line through points on a scatterplot of X&Y
A regression equation that relates X&Y
Building the Regression Equation
Regression implies a functional (cause-effect) relationship between variables
Two components need to be calculated from the data:
Slope (b) (the "regression coefficient"
y-intercept (a)
Y=bX+a
The Regression Analysis
The regression analysis calculates values of a and b for a data set so that the resulting equation is the best obtainable for the data
The "Best Fit" Line
Not all observed values of y fall on the line
The values of y that fall directly on the line are the predicted values "y-hat"
The sums of squares resulting from squaring the values of y - y hat is much smaller than the SS of y without consideration of x
Testing the Statistical Significance of the Regression Model
Evaluate significance with ANOVA where the F test is testing the overall significance of the model
3 Sums of Squares calculations (& df's) are needed:
Total SS (df = N-1)
Regression or "Model" SS (df=1)
Residual/Error SS (df = N-2)
In JMP:
Fit "Y by X"
Input variables
Select "Fit Line" at the red carat to get "Linear Fit" results box
Results are in "Analysis of Variance" under Prob > F (significance of Model)
The F-test tells us we have a very low probability of committing a Type I error and that python heart rate does vary linearly with temperature
How good is the model?
Big F tells us that it's good
Calculate % of how much variation is within the model (Model/C. Total)
The Coefficient of Determination (r^2)
Tells us what % of the variability in the dependent variable (y) is explained by the independent variable (x)
r^2 = SSmodel/SStotal
93.9% of the variability observed in example
Using the Model: Predicting y from x
y=2.14+1.77x
Heart Rate=2.14+1.77(Temp.)
Plug in values of x and solve for y
Can predict heart rate at temps that were not tested (e.g, 5 and 9ºC)
Model can be used by other researchers
Assumptions of Model 1 Simple Linear Regression
The IV is usually fixed but can be random
The y observations are independent
The functional relationship is linear
Residuals are normally distributed
Variances are equal
After you've built the model, Fit a line, and looking at the JMP output..
Click on the red triangle beside "Linear Fit" (Use graph in analysis)
To check normality - click on "save residuals"
To assess variances - click on "plot residuals"
If non-normal or variances are violated, transform both variables
Regression or ANOVA?
Age as a continuous variable
Data from basically 13 levels of the IV (Age)
"Replicated" regression is best
Relationship between y and all x's within the range of values tested is quantified
Problem Background:
A researcher is interested in site-specific differences in body size among populations of rattlesnakes. Why an interest in body size, well, reproductive traits in animals (e.g., number and size of offspring) often vary with body size. Populations may vary in body size due to differences in resource availability, resource quality, size-specific predation, population density, etc. Most importantly, size of rattlesnakes may vary with age.
How can the researcher examine for geographic differences in body size between two populations knowing that size will also vary due to differences in age?
Y-variable is body size (Continuous discrete)
Location (Categorical)
X-Variable is Age (Continuous discrete)
ANCOVA: Analysis of Covariance, 2 X-variables that are Continuous & Categorical
ANCOVA Requirements
1 Dependent Variable (Continuous)
1 IV (Categorical)
1 Covariate (Continuous)
Covariate
Variable that is related to the DV, which you can't manipulate, but you want to account for its relationship with the DV
Increased sensitivity of tests of main effects and interactions since usage of a covariate will result in a reduction of error variance
ANCOVA Assumptions
Residuals are normally distributed and variances are homogenous
Linearity - significant linear relationship between covariate and DV
Since covariate is used as a linear predictor of the DV yet it is not a fixed effect, the covariate is assumed to be measured without any error
Homogeneity of regressions (i.e. no significant interaction of GroupXCovariate)
ANCOVA In JMP:
Use the "Fit Model" Platform
Model should contain:
IV (Drug)
Covariate (X)
Interaction Term (Drug*X)
Look if Covariate is significant. (Example is significant)
Look at Interaction Term (Example is NOT significant, lines are statistically parallel)
Look at IV (Example is not significant, no drug effect)
Adjusted Means
When using ANCOVA, the means for each group get adjusted by the Covariate-Dependent Variable relationship
If the Covariate has a significant relationship with the Dependent Variable then comparisons are made on the adjusted means
When doing ANCOVA, you should graph/report adjusted means
ANCOVA - A Biological Example
Fish inhabiting caves often have small eyes relative to fish living in streams on the surface and is though to be an example of adaptation to life in a cave environment. Banded Sculpin is a common fish found in surface streams of North America, but it can also sometimes be found living in caves. A cave population of Banded Sculpin in Missouri are showing signs of cave adaptation similar to "true" cavefishes. Researchers are interested in whether or not sculpin in surface streams have different eye size relative to sculpin living in caves. Eye size of individual sculpin may also vary with total length of the fish
Construct the model
DV: Y-variable: (Eye size)
IV: (Location)
Covariate: X-variable (Total length)
Interaction Term: Location*Total length
Advanced Regression Techniques
Multiple regression (>1 IV)
Analysis of Covariance (ANCOVA)-mixture of regression & ANOVA
Analysis of Frequencies
Interest in the frequency that an event occurs...
How does an observed outcome compare to an expected outcome or distribution?
Is the sex ratio (M:F) in a population of box turtles the expected 1:1 ratio?
Do the frequencies of observed phenotypes conform to the expected 3:1 ratio?
Do mountain lions eat equal amounts of white-tailed deer and mule deer?
Goodness of Fit?
Data Characteristics
"Count data" - discrete number of observations
1 variable with categories or "bins"
2 or more categories/bins
Multiple independent observations within categories (5 minimum; >10 recommended)
Chi-Square Goodness of Fit Test
Quarter tossing
Probability of Heads? Tails?
Is observed significantly different from expected? Is the disparity due to random chance?
OR is the deviation not due to random chance?
JMP does test for us, but simple to calculate by hand
We can test to see if our observed frequencies "Fit" our expectations
This is the chi-squared Goodness-of-Fit test
Converts the difference between frequencies we observe and frequencies we expected
Nonparametric test (NO ASSUMPTIONS)
Data are frequencies (counts)
Observations are independent
Categories have large enough expected frequencies. When there are 4 or fewer categories, none of the expected frequencies are less than 5%
Conducting Chi-Square Analysis: Goodness of Fit IN JMP
2 variables, so 2 columns
Frequency/Count column & IV column
"Analyze" -> "Distributions"
Input IV into Y column spot
Input Frequency/Count data into Frequency spot
JMP spits out percentages
Click carat next to IV and click "Test Probabilities"
Input expected/hypothesized probabilities
Should add to 1.0
Run the test
ONLY report Pearson test results. ChiSquare value, df, & p-value
Report as: (Chi-Square test, ChiSquare value, df, p-value)
Chi-Square Test for Association or Independence
Are two categorical/nominal variables related/associated
Same data type and assumptions as Goodness of Fit Tests
Calculations are similar except not comparing to an "expected"
Contingency Table
Example
H0: Age 0 male & female Ohio shrimp captures at McCallie Access, Mississippi River are not associated (do not differ) with month
HA: Captures of male and female Age 0 Ohio shrimp are related to month
Variable A: June, July, August
Variable B: Male Age 0, Female Age 0
Fit Y by X
Y: Sex
X: Month
Frequency: Count
JMP gives "Contingency Table" and resulting Chi-Square values (look at Pearson results)
Can apply Bonferroni adjusted alpha level, but not necessary if risk of family-wise error is low
Biometry
t-test hypothesis tests: The mean/median of Group A = the mean/median of Group B
One-way ANOVA hypothesis tests: 3 groups. Tests means/medians of A = B = C
Descriptive Statistics: (e.g., mean, s.d., SEM, etc.) help organize/summarize data
Inferential Statistics: (e.g., t-test and ANOVA) allows us to generalize conclusions
Manipulative -Manipulating the application of the ind. variable __
Mensurative - NOT manipulating the application of the ind. variable__
Both are experimental studies, both have ind. & dep. variables, subtle difference
Variable - "characteristic that may differ from one biological entity to another" Zar (2010)
Dependent Variables - Response variablesIndependent Variables - Treatments, Factors
All experiments have at least 1 of each
How many variables & data type and scale of measurement dictate which inferential tool to apply
Experimental Goal?
Establish cause/effect relationship between ind. variable & dep. variable
To accomplish this ideally, all subjects must be identical/similar except for the level of the ind. variable they "receive"
Establish that variables are associated
Significant differences in responses correlate to good variables
Nuisance Variable Example
Prairie lizard manipulative study
Dep. - Snout vent length
Ind. Temperature at
n= 10 @ 10ºC
n = 10 @ 15ºC
n = 10 @ 20ºC
Other factors (nuisance/confounding variables)
Sex (controlled by only using one gender)
Diet (controlled by feeding the same thing in same amounts)
Age
Stress
Hormone levels
Reproductive status
Genetics
Competition
Approaches to nuisance/confounding variables
Identify variables beforehand and hold conditions constant across subjects and treatments
Distribute symmetrically across groups & have high sample size
Incorporate into the experimental model by adding another potential independent variable
Disperse the "nuisance effect" across all treatment conditions via randomization procedures (random assignment of subjects to treatments/conditions)
If concerned about "Procedural Effects": Implement a Control(s)
Negative control - test subjects/sample units that receive all "procedures" except the experimental treatment/manipulation (saline, sugar pill, non-restored study site, etc.)
Positive Control - test subjects/sample units receive all procedures except the experimental treatment/manipulation but you expect a known outcome from this group. This provides a group to compare with that controls for unknown sources of nuisance. (aspirin-headache example-give a group a drug known to deal with headaches)
"Controls" in Mensurative Experiments? Don't necessarily have a control for procedural effects, but have a good comparison across groups. Known benchmark
Examples of variable types and their scales of measurement
Attribute - Nominal - Sex of snake: male or female
Ranks - Ordinal - Pigmentation levels: going from no pigmentation to full pigmentation
Discrete Measurement - Ratio - Number of points on deer antlers
Continuous Measurement - Ordinal Ratio - Body temperature (ºCelsius), Weight of a warthog (kg)
Converting Data from One Scale to Another Example
Continuous variable (e.g., tree height) measured on a continuous scale ----> Convert to ranked data on an ordinal scale
Continuous in cm --> Ordinal in ranks
100 --> 6
500 --> 5
525 --> 4
1000 --> 1
642 --> 3
701 --> 2
10 --> 7
Continuous works with the mean while converting to ordinal works with the median
Distances between data has not been retained, but makes it more desirable to work with
Reduction in variation between data may allow better testing
Descriptive Statistics - way to summarize and organize dataMeasures of Central Tendency
location of sample along the measurement sale
what is the location of the "typical" individual?
Arithmetic Mean
u = population or universal mean (the true mean)
x̄ = sample mean (an estimate of the true mean)
Geometric Mean
antilog of arithmetic mean of log-transformed data
Median (M)
middle value of a ranked data set; most appropriate when data are highly skewed or you're dealing with data on an ordinal scale
Mode
the value that occurs most frequently; number of modes can be useful
Skewness
measure of symmetry
0 = symmetrical (normal distribution)
= tail to the right
= tail to the left
Measures of Dispersion and Variability
the distribution or spread of measurements
Range
difference between largest & smallest observation
usually given as minimum & maximum value
Variance
σ² = population variance
s² = sample variance
mean of squared deviations of measurements from their mean typical not reported since in different units from original data; used to calculate many statistical tests
cannot be negative
increases as dispersion or variability increases
(n-1) = degrees of freedom (df); real units of information about deviation from the average
Standard Deviation (sd)
s = square root of variance (s²)
Coefficient of Variation (CV): CV = (sd/mean) x 100%
a measure of relative variability
has no units so is useful to compare sets of data collected on different scales
(e.g., morphological data in mm and m; T to Dissolved Oxygen (DO))
most applicable to ratio scale data
Indices of Diversity: distribution of observations among categories
Types of Distributions
Many statistical tests are based on assumptions that the data adhere to the properties of a given distribution
Discrete Distributions
Poisson distribution
items distributed randomly (independently)
Binomial distribution
two possible outcomes w/ equal prob. of occurrence
Continuous Distributions
Normal distribution
symmetrical; bell-shaped curve
t-distribution
symmetrical; related to normal distribution
Chi-square distribution
asymmetrical
Normal Distribution
symmetrical, continuous distribution
described by the mean and standard deviation (estimated by sample mean & sd)
most values lie in proximity of the mean random samples of a given n from a normal population will be normally distributed
Central Limit Theorem: at some large n even means of samples from a non-normal population will approach normality (even means from Poisson & Binomial distr.)
normal distribution is the basis of many statistical tests
Are Sample Data Normally Distributed?
Despite CLT, sample data may not be normally distributed due to small n or, more typically, for unknown reasons
Will want to check sample data to see if it is approximately normally distributed
"Goodness-of-fit Tests": (not really recommended)
Kolmogorov-Smirnov goodness-of-fit test
Chi-square goodness-of-fit test
Normal Quantile Plot
If data are perfectly normal they will lie along a straight line, inside the LCI's
Lilliefors Confidence Intervals
used to test for normality in a graphical way; if points fall outside the CI (confidence intervals) then data are significantly different from normal at alpha = 0.05
Shapiro-Wilk test
What null hypothesis is it actually testing?
The distribution of the sample data is equal to the normal distribution
After Normal Quantile Plot, click continuous then normal
Go down to data and click under the Fitted Normal option triangle, then choose Goodness-of-Fit
Then JMP gives you the Probability so you can choose to reject or fail to reject
If not normally distributed...
ignore it?
Inflating chance to achieve a Type 1 error
When you reject the null when the null is actually true
transform raw data to "fix it" & resume test?
Doesn't provide much evidence that it fixed it
choose a nonparametric equivalent?
Usually the parametric tool has more statistical power than the nonparametric equivalent
Statistical Testing and Probability
Probability is the likelihood of an event
Statistical tests provide the likelihood that the null HO is true (P-values)
At low P-values the null is rejected and the alternate is accepted
The lower the P-value, the more confident you are that the null is false
What is a "low" P-value?
Researchers arbitrarily set the probability used as the criterion for rejection of the null
This value is called the significance level or α (alpha)
Convention is to apply an alpha level of 0.050
If α = 0.05 then at P-values less than 0.05 you reject the null HO (i.e. means are "significantly different")
Statistical Errors in Hypothesis Testing (Zar Section 6.3)
In reality, the null hypothesis is either true or false
Because inferences are made from samples, there is always the possibility of making the wrong inference
2 ways of making the wrong inference:
Type 1 Error
Rejecting the null hypothesis when in fact the null is true; a "false positive"'; you determine the means are significantly different when in fact they are not (We must control this error rate)
α error
Type 2 Error
Not rejecting the null when in fact the null is false; you determine the means are not significantly different when they really are (Considered to be a "less dangerous" error)
Designing experiments that give you the best chance possible to reject the null when it is in fact false is the best way to avoid Type 2 Error
β error
Insert probability notes here
Prospective Power Analysis
performed during planning stages of a study to explore how changes in study design (e.g., n, alpha, and effect size) impact objectives/goals of the study including interpretations of statistical tests & potential outcomes
Common applications of Prospective Power Analysis are:
to determine n required to attain a desired level of power at a specified minimum effect size, alpha-level, and standard deviation
to determine power of a test when n is constrained logistically (perhaps you then need to adjust alpha if power is too low)
to determine the minimum detectable meaningful effect size (the question here is what is a biologically meaningful difference)
JMP gives you big N, you need little n to determine sample size per group. Divide big N by however many groups you have to obtain little n
Can increase alpha level to increase power
Can increase effect size to increase power
How to perform Power Analysis in JMP:
DOE > Design Diagnostics > Sample Size & Power
Depending on scenario, choose
E.x. 2-sample means
Not messing around with Extra Parameters yet
Can change alpha level, Std. Dev = Dispersion, difference to detect = effect size
Std.Dev & Effect Size need to be in same units
Leave sample size & power blank
Click Continue & you will get a curve
Sample size on graph is always in Big N
Not using to get an exact # of sample size, power analysis is a guide
Increasing alpha level increases power, which will decrease sample size
Increasing Std.Dev (Dispersion across the Dependent Variable) increases sample size
Decreasing effect size increases sample size
E.x. Clinical Research - Experiments investigating treatment of tumors
Will Drug A reduce the size of brain tumors?
Minimum Effect Size that is Biologically Relevant?
Using a minimum of at least 50%
Decided alpha level of 0.01, defended because really want to make sure Drug works
Need an idea of size of wild-type tumor to get 50% into units
Need some measure of Dispersion
Collect some data by giving study specimens the drug
Has a good idea that wild-type tumor size will be about 30 cubic millimeters
Effect size will be 15 cubic millimeters since using minimum of 50%
OR look within the literature to see if somebody has done similar things
If Dispersions are different, pick the bigger one
Using 12 cubic millimeters for Dispersion based on the literature
Big N shows 36 at Power of 0.8
Based on biological ethics will only give 10 mice cancer, so N=20
Doesn't give us a good idea of if Drug will work or not
Increasing alpha level to 0.05, makes our N=20 look a lot better
Standard Error of the Mean (SEM; SE)
SE is the standard deviation of measurements around a set of means repeatedly calculated from a statistical population or universe
SE is a measure of the precision of x-bar as an estimate of u
as SE gets smaller, the precision of x-bar increases
SE = s(standard deviation)/square root of n
incorporates sd & n, two factors that will impact reliability
SD - a measure of the dispersion or spread of the sample data
SE - a measure of the sampling error or uncertainty in the sample mean as an estimate of the population mean
Confidence Intervals & the Student's t distribution (will never test over confidence intervals)
The t distribution
Family of distributions related to the normal distribution; shape depends on degrees of freedom
Reporting Rules and Conventions
Zar 2010 (Section 7.4 page 108)
"No widely accepted convention", but the measure of dispersion must be clearly stated
n should be stated somewhere
As Text in manuscript: (mean = 27.4 g +/- 2.80 SD) or SE or 95% CI
In a Table or Figure
Two-Sample Hypotheses
Do differences exist b/w two samples; i.e. are the two samples from two different statistical populations?
A number of types of comparisons:
means
medians
variances
CV
indices of diversity
We will explore 2-sample comparisons involving means:
comparison of independent samples
nonparametric tests of independent samples
comparison of paired samples
Comparison of Two Independent Samples
For Example: You measure hematocrit in two groups of 17 year olds, males (n=600) and females (n=600)
Is hematocrit different between groups?
Males - 45.8 +/- 2.8 SD
Females - 40.6 +/- 2.9 SD
What are the independent and dependent variables?
Independent: Sex (male or female)
Dependent: Hematocrit values
What would the data model types be in JMP?
Two columns, sex and hematocrit values
Independence of "Samples" or sample units?
Each individual person should be a sample unit
Can take multiple measurements, just make sure to average before inputting into chart
Pseudo replication?
Analyzing as you have more replicates when you're actually short
2-tail always has the exact same null hypotheses
That the means are the same
Comparing Means from Two Independent Samples
Under the following experimental conditions:
1 Dependent Variable that is continuous
1 Independent Variable that is nominal (grouping/categorical) with 2 levels/groups/categories
Apply the following test if assumptions hold:
Student's t-test
Prob > absolute value of t = 2-tailed either direction
Prob > t = 1-tailed to the right
Prob < t = 1-tailed to the left
Always double check degrees of freedom
Writing a Concise, Publication-Quality Interpretation of Results of a Statistical Test: "Manuscript Style" (Make it OBVIOUS)
Be direct, say what you mean, mean what you say
Don't just say means were different, or you rejected the null or that you detected a significant difference. State the direction of the effect (e.g., high/low, etc.)
Provide the statistical test, df (or n), test statistic (only go to 2 decimal places), and P-value (only go to 3 decimal places).
This is typically put in parentheses following the sentence
People given drug G had significantly longer blood clotting times than people given drug B (Student's t-test, df = 11, t = -2.47, P = 0.031) (Figure 1)
Assumptions of the Two-Sample t-test
Both samples were taken randomly; i.e., sample units are independent of each other & unbiased
The dependent variable is normally distributed (or ~ normal)
(combination of frequency distribution, normal quantile plot, and S-W test)
Variances of the two groups are equal (or almost equal)
(eyeball SD - 2-fold difference?, variance tests, e.g., Levene's Test)
What is the risk?
You risk elevating your Type I error rate above the stated alpha level
Violations are more serious if sample sizes are small (>30 Zar 2010), you are doing a 1-tailed test, or your sample sizes are severely unbalanced
If all assumptions are confirmed = Run of the mill t-test
If Assumptions are severely violated, what do you do?
If normality (normal distribution) is off, apply a data transformation, if "corrected", proceed w/ the t-test using the transformed data set
Report testing results from transformed analysis, but usually report sample means and dispersion on the original scale in text and/or in graphs/tables
Variance Fine
If only variances are violated you can conduct a t-test that has been "corrected for" unequal variances
Welch's t-test
Zar pg. 138
If normality cannot be "fixed"...
Conduct the Mann-Whitney U test which is a nonparametric equivalent of the t-test
JMP calculates the Wilcoxon Test which is equivalent to the Mann-Whitney test
Zar pg. 146
Wilcoxon test, Mann-Whitney U test, or the Wilcoxon-Mann-Whitney test
Nonparametric equivalent of the t-test for independent samples
Nonparametric test
Distribution-free test, where no or few assumptions are made about the shape of a distribution
Does not focus on any specific parameter such as the mean
Test specifics:
Used to test 2 groups
Assumes nothing about the underlying distribution or homogeneity of variances
H0: population distribution of sample 1 = sample 2 (test of medians)
Calculates test statistic based on ranks (position) of the raw data
Good when data set has extreme values in it, but has lower power than t-test, unless assumptions are severely violated
Why not just always apply a nonparametric test with every data set?
Usually has less power
Welch's test
A special derivation of the t-test
Used when normality is correct, but variances are not equal
Can be identified when df are not a whole number
e.g. df on normal t-test are 11, on Welch's they may be 10.70
Comparison of Paired Samples (Non-Independent)
In contrast to independent samples, in a "paired" design, sample units are linked or correlated in some way with a member in the other group(s)
i.e. members of a pair have more in common than with members of another pair
This dependency is planned or by design. However, you make a critical mistake if you apply the wrong inferential test
Paired t-test
Wilcoxon paired-sample test/Wilcoxon signed rank test (nonparametric equivalent)
What if you conduct test assuming independence?
Catastrophic failure
What happens to Type I error?
Increases if highest variance = lowest value
Decreases if highest variance = highest value
Family-wise error inflation is when doing multiple comparisons with the same data at the same alpha level. Raises Type I error rate
Multisample Hypotheses
H0: u1 = u2 = u3 ...
Design:
1 Dependent Variable (continuous)
1 Independent Variable (categorical/nominal): 3 or more levels
Why not conduct a series of t-tests?
Type I Error is inflated beyond your stated alpha
Type I errors accumulate with each statistical test conducted on the same data set
Experimentwise or familywise error rate - must be controlled
Analysis of Variance (ANOVA)
The ANOVA family of tests are the most commonly applied statistical tests
Inferences about means are made by analyzing variability in the data
One model is constructed that includes all means simultaneously; therefore, it controls for familywise error
*F-*value (*F-*ratio) is the test statistic (Sir Ronald Aylmer Fisher 1918)
A factor/treatment is an independent variable whose values are controlled and varied by the experimenter (e.g., drug type)
Are categorical/nominal variables
A level is a specific value of a factor (e.g., drug A, drug B, drug C)
Analyzes and partitions sources of variation in a dataset
2 kinds of variability
Between sample means (among groups)
Within groups
Total variability comprises within group variability and variability between groups
Between Group Variability
Treatment effects
Group
Within Group Variability
Individual differences
Errors of measurement
Error
F = Group/Error
As the test statistic gets bigger, the variation should get bigger as well. Higher F value is desired
In ANOVA, the total variation in the response measurements is divided into portions that may be attributed to various factors, (e.g., amount of variation due to Drug A and amount due to Drug B) Which factor(s) or combination of factors account for significant amounts of the total variation?
Partitioning of the variance within the data set
If a factor/treatment represents a lot of the total variability relative to variability within groups (error) then it is an important “player”
Example: Sandwich types. On One-Way ANOVA Powerpoint
ANOVA-F distribution is the underlying distribution
F = (Between Group Variability/Within Group Variability)=(MS-group/MS-error)
MS stands for Mean Square
H0: F = 1 -> No treatment effects (sample means are drawn from same population). (No Sandwich effect)
Large F -> Means are different (sample means are from different populations). (There is a Sandwich effect)
Summary of Logic
Calculate two estimates of the population variance. MS-error, based on variability within groups, is independent of H0.
Calculations for the ANOVA
In order to calculate MS-groups and MS-error we must first calculate the appropriate sums of squares (SS)
SS-total
Represents sum of squared deviations of all observations from the grand mean
SS-total = SS-group + SS-error
SS-group
Sum of squared deviations of group means from the grand mean. In effect, a measure of differences between groups
Insert formula
SS-error
Sum of squared deviations within each group. Usually obtained by subtraction
SS-error = SS-total - SS-group
Degrees of Freedom
In order to calculate MS-group and MS-error we need to know the degrees of freedom associated with SS-group and SS-error
df-total = N - 1 (where N is total number of observations)
df-group = k - 1 (where k is the number of groups)
df-error = df-total - df-group
MS-group = (SS-group/df-group)
MS-error = (SS-error/df-error)
F-value
Having calculated MSgroup and MSerror we can now calculate F
F = MS-group / MS-error
Between groups estimate of the population variance is much larger than the within groups estimate ® F value greater than 1
How much larger than 1.0 must the value of F be to decide that there are differences among the means?
Use tables of the F distribution, Zar Table B4, Appendix 21.
Gives critical values of F corresponding to the degrees of freedom for the two mean squares (dfgroup and dferror).
dfgroup = numerator df (2)
dferror = denominator df (12)
From tables: (alpha = 0.05) Fcrit=5.10 (F2,12 = 8.45)
Because Fobt > Fcrit we can reject Ho and conclude that the groups were sampled from populations with different means. There is an effect of Sandwich Type
The ANOVA Table (HAVE TO BE ABLE TO BUILD FOR MIDTERM)
Source ----> SS ----> df ----> MS ----> F ----> P-value
Group
Error
JMP Analysis of One-Way ANOVA
Fit Model
Do NOT go into Fit Y by X and conduct the One-Way ANOVA !!!
Dependent variable goes into Y box
Add independent variables into model effects (bottom) box
JMP has different columns for every Independent Variable
P-value in JMP under Analysis of Variance = Prob > F
Capture Analysis of Variance and Effect Test boxes to give evaluation of the Omnibus Hypothesis
What is a Residual?
Distance between the Observed Y and the Predicted Y on a Y by X chart with line of fit
How to graph Residuals in JMP
1 variable is Nominal, 1 variable is Continuous
Fit Model & run ANOVA
Click on Response carat on top left and Click Save Columns
Click Residuals, then puts it into spreadsheet
Testing assumptions using Residuals in JMP
Analyze distribution, add residuals into columns, check distributions, quantile plots, Shapiro-Wilk test
Shows us distribution and assumptions of the data
If non-normal, transform the RAW DATA and NOT the residuals
Then re-find the residuals using the transformed data and THEN test assumptions
Run the ANOVA and click Road Diagnostics
If Residual by Predicted Plot is not on plot standard, add it using Road Diagnostics
Assumptions of ANOVA (Step 1)
Observations/sample units are independent of each other. (i.e. no systematic biases within the data set). Best achieved via random sampling
The data are normally distributed, better yet, the residuals are normally distributed
Save residuals to the data spreadsheet
Examine frequency distribution, normal quantile plot, and Shapiro-Wilk of residuals & interpret
Homogeneity of variance (i.e. the variances among groups are equal)
Examine the plot of residuals (residual by predicted plot) vs the predicted values. Are the points equally scattered for each group
Pig mass varied significantly with type of food (One way ANOVA, F (subscript numerator df (model) denominator df (error)), p-value). Next sentence or 2 would add biological / supporting stats to build answers. (Slide 22)
Assessing Normality within the ANOVA framework
In JMP:
Fit the ANOVA model
Go to "Save Columns"
Click on "Residuals"
Residuals should be in the data spreadsheet
A Significant Overall F... What's next?
Significant overall *F-*test does not indicate that all factors are different from each other
Don't know how many means are different, nor which means are different
Due to experiment wise error inflation, you cannot proceed with a series of run-of-the-mill t-tests
The proper statistical approach is to employ a multiple comparison test (i.e. post hoc testing)
Do NOT go through with this if you do not reject the Omnibus hypothesis in Step 1
What if the overall F-test is not significant?
You cannot proceed
Multiple Comparison Tests - (Parametric) (Step 2)
Also known as post hoc or a posteriori tests
Many different ones
Tukey test, Student Newman-Keuls test, Duncan test, LSD test, Scheffe's test, Fisher test, Bonferroni adjusted t-tests (a special case)
Their application is debated in the literature and there is no absolute agreement on the best to use
Although, Tukey & SNK are the most commonly employed and, therefore, accepted techniques
They operate under the same assumptions as ANOVA and must follow a significant F test
Post hoc testing usually involved testing of all possible combinations of means, even comparisons you might not be interested in
This is why post hoc testing is often referred to as the testing of "unplanned comparisons"
Post hoc tests have built-in procedures that correct for experiment wise error and its influence on Type I error inflation
Each one differs slightly in how conservative it is
Post hoc __Testing: The Tukey HSD Test__**
Rank the means in ascending order
First, compare largest to smallest, then largest to next smallest, etc.
You will need to use the table of critical values of the q distribution on Zar pg. 723
Post hoc Testing: The Student Newman-Keuls test (Never have to calculate in this class)
SNK is conservative enough (i.e. it controls experimentwise error) but it has more power than the Turkey test; SNK calculations are very similar to Tukey
"Multiple Range Test". The "family" of comparisons changes
Graphical Display of post hoc Results
Put explanation of notation in figure caption along with results of F-test
Start by assigning A to the mean(s) of highest magnitude
Shared letters indicate means were not significantly different
Strontium concentrations varied significantly (F4,25 = 56.2, P < 0.001) across water bodies, and concentrations were highest in Rock River, moderate in Angler’s Cove, Appletree Lake, and Beaver Pond and lowest in Grayson’s Pond (SNK) (Figure 1).
Tukey vs. SNK
Both tests adequately control experiment wise error rate and are appropriate post hoc tests when multiple comparisons are desirable following a significant F test
Both can be applied at a specified alpha level (e.g., 0.05)
Both are better approaches than multiple t-tests
Tukey will result in fewer Type 1 errors than SNK
SNK has more power than Tukey (>3 means)
I apply Tukey when I want a more conservative test and SNK when the research is more exploratory
Tukey is probably more commonly used by Biologists; SNK common in Psychology (Zar recommends Tukey)
The Bonferroni Method: An Additional Way to Control Experimentwise Error
The Bonferroni adjustment to alpha levels is commonly used to control experimentwise error in situations where multiple tests are applied (e.g. post hoc comparisons and multiple correlations)
a= 0.05 / # of comparisons
You can “start” with whatever alpha you want
For example: You want to conduct 5 tests (denominator) @ an initial stated alpha of 0.05 (numerator)
0.05/5 = 0.01
All 5 tests would actually each be conducted @ alpha = 0.01
5 comp. = 0.01, 8 comp. = 0.006, 12 comp. = 0.004
An acceptable approach to post hoc testing is to conduct multiple t-tests but with Bonferroni adjusted alpha levels for each comparison
Dunnett's Test (Control Group vs Other Groups Individually)
Accepted post hoc test provided in JMP for this special case
Multiple Comparison Study
We learned 3 techniques to control experimentwise error during post hoc testing:
Tukey Test
built-in adjustments/corrections such that you actually conduct the test at the stated alpha
SNK
built-in adjustments/corrections such that you actually conduct the test at the stated alpha
Bonferroni Method
Directly adjust stated alpha based on # of comparisons (You don’t have to do all possible comparisons) All three approaches are conservative relative to not controlling experimentwise error.
Bonferroni Method is ultraconservative, particularly at > 5 comparisons (0.01)
Basically, you pay a “penalty” when you test all possible, unplanned comparisons following a significant F test because these comparisons have been adjusted to control for experimentwise error
There is another option that circumvents being penalized, but you cannot make all pairwise comparisons
Nonparametric ANOVA: The Kruskal-Wallis test
Apply this nonparametric equivalent to One-way ANOVA when k>2
It is a distribution-free method that analyzes the ranks of the data
Sometimes called "ANOVA by ranks"
ANOVA is generally more powerful, but K-W provides an alternative when assumptions are not met and a transformation doesn't help
The test statistic is H
The K-W test is equivalent to the Omnibus *F-*test in ANOVA
K-W in JMP
Use the Fit Y by X platform
Go to "nonparametric"
Choose "Wilcoxon"
Desired info is under 1-way Test, ChiSquare Approximation
Do NOT report ChiSquare value
DO report df, test statistic, and p-value
Post hoc testing Following significant K-W test
Dunn Method for Joint Ranking - (Zar pg. 240-241)
Preferred, more powerful method for nonparametric
Steel-Dwass procedure
Less power, doesn't work well when sample sizes are unequal
Wilcoxon all pairs (apply Bonferroni adjusted alpha)
Less power, highly conservative when groups 5 or more
In JMP
Fit Y by X
Run ANOVA
Click carat and choose "nonparametric"
Click "Nonparametric Multiple Comparisons"
Will show up under "Nonparametric Comparisons"
Report p-value and possibly Z-value
Planned Comparisons (Contrasts) vs Unplanned Comparisons
Typically, when you design an experiment with multiple levels of the independent variable, you have particular comparisons of interest in mind
Planned comparisons are stated a priori while unplanned comparisons are a posteriori, or "thought of after the data are collected"
You pay a price for conducting post hoc tests because they incorporate a correction for experimentwise (family wise) error
In contrast, planned comparisons (at least some special combinations) are made at the stated alpha, even if the omnibus F is not significant, within the ANOVA itself because they partition the SS-Model
Orthogonal Contrasts/Comparisons
Statistical Orthogonality
Usually in reference to groups or multiple independent variables
Non-overlapping, independent, not correlated
Assumption for planned contrasts & multiple regression modeling
Set of Planned Comparisons Must = Orthogonal Contrasts
If you want to conduct planned comparisons, you need to decide how many and which ones to make
To enjoy the luxury of testing multiple comparisons at the stated alpha, you must follow certain rules (i.e. the comparisons must be orthogonal)
A full set of orthogonal contrasts completely partition the SS-Model
Therefore, they represent independent pieces of information (i.e., this allows you to work at the originally stated alpha)
There are up to a-1 possible contrasts that can comprise a full orthogonal set (but you don't have to conduct all a-1 comparisons)
a = # of groups
Planned contrasts are 1 df comparisons & you cannot use over the a-1 df
How do you establish a set of orthogonal contrasts?
Coding Planned Contrasts
Coding is achieved by assigning weights/coefficients to groups to indicate contrasts
Rules
Groups coded with positive weights will be compared to groups coded with negative weights
The sum of weights for a single comparison/contrast should be zero
Group(s) not involved in a specific comparison is/are given a zero
To be orthogonal, the sum of the products of coefficients within a group must equal zero
In JMP
Build ANOVA
Find normal Sum of Squares and record
Click carat where you'd normally run Tukey test
Build table using positives and negatives, adding new column after every row
Click done and look at SS, needs to be lower or equal to normal Sum of Squares
Planned Comparisons: Wrap-up
Incorporate planned comparisons if you can to avoid experimentwise error.
Can be a useful approach if grouping “groups” for comparisons is insightful and a goal of the research.
It’s best if each comparison represents a unique portion of the SSModel so that comparisons meet the orthogonal requirement.
You don’t have to perform all a-1 comparisons, but beware of “unexplained” blocks of variance in SSModel.
Don’t “force” orthogonality. In other words, if the planned comparisons of interest aren’t orthogonal, proceed but Bonferroni adjust the alpha levels for each comparison.
If all pairwise comparisons of groups are of interest, you should probably just proceed with Tukey or SNK (Season example)
Data Transformations (Zar Chapter 13)
To apply parametric statistics, the data set must meet (or approximate)
the assumptions of normality, equality of variances, and that the magnitude
of the variances don’t increase with the magnitude of the means (nonadditivity).If you judge that the data violate assumptions, then one option is to try and “correct” the data by applying a transformation.
When applying a transformation you change the raw data to a different form or scale. (e.g., F to C is a transformation)
After you perform a transformation, and judge the transformation “fixed” the data, conduct parametric tests on the transformed data set. Transform all data,
not just one level of a variable!Complications arise when reporting means and variability around the means following transformation. You should probably report the mean in the original scale. The most appropriate thing is to report the antilog of the transformed mean (geometric mean). I don’t see people doing this? How to report the variability is another issue ... Be sure you inform readers you analyzed transformed data!
Three Common Transformations
The Logarithmic Transformation
The Square Root Transformation
The Arcsine Transformation
The Logarithmic Transformation
X'=log10(X)
X'=log10(X+1) -->0' or to avoid (-) values
The log family of transformations are the most common
It is a variance-stabilizing transformation that will also address nonadditivity and non-normality if the data are right skewed
You can apply a log of any base, but log10 appears to be used the most
Beware of “log” “log10” “ln” – this is particularly important when reporting a model designed to predict y’s based on inputs of x
Always check your transformation with a calculator
The Square Root Transformation
X'=SqRt(X+0.5)
Variance-stabilizing transformation, particularly when variances increase as the means increase, also when the variances & means are of similar magnitude and aren't independent of each other (i.e. Poisson distribution)
Helpful to try this transformation if Log doesn't work, especially when a non--parametric tool is not at your fingertips
May help transform percentage data when data range is between 0-20% or between 80-100%
Similar reporting issues (square the transformed mean & calculate CI's)
The ArcSine Transformation
p'=arcsin*(SqRt(p))
Proportions tend to form a binomial distribution vs a normal distribution
This transformation will "centralize" the data - bring values closer to 50%
Arcsin is the inverse sine (sin^-1)
Radians vs degrees .. Ugh!
Check your transformation vs Zar Appendix Table B.24
Only Pre-Midterm Topics Above
Power and Sample Size in anova
Power = 1-Beta
In JMP
Set up regular Power Analysis
Choose k sample means option
Set alpha for Omnibus F test
Enter SD, variability among all groups combined
Enter estimated means of each group (represent smallest detectable difference)
Leave sample size & power blank to examine power curves
Sample size gives Big N. Remember to divide N/k # of groups
Reports sample size required to reject the Omnibus
Different Types of ANOVA ModelsFixed-Effects Model (Model I ANOVA)
Levels of the factor are specifically chosen by the experimenter; it is these specific groups about which the experimenter is trying to draw conclusions
Most common
Random-Effects Model (Model II ANOVA)
Levels of the factor are a random sample of all possible levels, a wider universe of groups
Instead of being concerned with effects of specific levels, you are trying to generalize effects across a random selection
Mixed-Effects Model (Model III ANOVA)
Some factors/treatments are fixed & some are random
SS & MS calculated the same; ANOVA table looks similar
Differences in the MS term used in F test for some HO's and how secondary analyses are performed
Factorial ANOVA (Zar Chapter 12)MultiSample HO:
One-Way ANOVA model
1 Independent Variable
Nominal w/ more than two levels
1 Dependent Variable
Continuous
Factorial Analysis of Variance
Consider the effects of more than 1 independent variable on a dependent variable simultaneously (in the same model)
Advantages
No need for multiple 1-way ANOVAs;
Can test for interaction among factors
Two-way ANOVA model
2 Independent Variable
BOTH Nominal each w/ two or more levels
1 Dependent Variable
Continuous
Two-way ANOVA/Two-factor ANOVA
2 independent variables = 2 treatments = 2 factors = 2 main effects
Don't confuse with multiple levels or groups
Let's add a variable to the experiment that tested the effect of 5 sugars. Now we want to test the effect of both sugar and pH on pea growth
5x2 factorial design
Each level of 1 factor is in combination with each level of the second factor; "crossed"
Balanced design (equal replication)
10 combinations, 50 observations
Sugar will have an F value, pH will have an F value, Sugar x pH (interaction between 2 variables) will have an F value
Certain tests are informative (Tukey is good, provides pairwise comparisons)
Examine interaction plots to help us see visually why we have interaction
2x2 Design Rat & Lard Example
Effect of lard type on food consumption of rats (N=12; n=6 per main effect)
2 Main Effects (Fixed) each w/ 2 levels:
Fat (Fresh, Rancid)
Sex (Female, Male)
3 replicates per subgroup
Fit Model Platform
Dependent Variable into Y box
Independent Variables & Interaction into model effects box
Analysis of Variance Prob>F = Omnibus F
Manuscript statement looks at Effect Test results & post hoc results
To find variation of a certain effect, divide sum of squares model/group by sum of squares total
Grab LS Means Plots for both Effects and the Interaction
If Main Effect is not significant, then post hoc testing is not necessary
Publication Statement:
Consumption by rats was significantly higher for fresh fat versus rancid fat (Two-way ANOVA, F1,8 = 41.96, P < 0.001), and main effect “Fat” accounted for 79.0% of the total variation in rat consumption (Table/Figure 1). Sex (F1,8= 2.59, P = 0.146) and Fat*Sex (F1,8= 0.63, P = 0.450) were not significant.
Effect tests F & P value tells us that we have a variable effect. REPORT EFFECT TESTS F&P VALUES
3x2 Factorial Design Sandwich Data
Sandwich: Meatball, BLT, Spicy Italian
Season: Spring, Summer
Dependent Variable: Sales from ___ Subway Stores
Significant variation within Sandwich Types, so run post hoc analysis to see how and why
When a main effect is significant, and has more than 2-levels, proceed as you would in a one-way ANOVA
Sales of BLT are significantly higher than sales of Meatball & Spicy Italian
Sales of sandwiches did not differ between spring and summer
There was no interaction of the main effects, indicating that Season affected sales equally across all Sandwiches
Analysis of Significant Interactions
The interaction term allows for examination of the joint effect of factors on the dependent variable (advantage of factorial design)
If the interaction term is significant, it means the nature of the effect of one factor on the dependent variable is dependent on levels of the other factor
In factorial anova, you should first look to see if the interaction term is significant, because if it is, then biological conclusions made about the main effects are unreliable or not applicable in all instances (combinations)
Interaction among factors indicates the effect of the two Main Effects are not independent of each other
What if you have a significant Interaction Effect?
Effect of Sex and Season on hematocrit of the dark-eyed junco:
No effect of Sex (p=0.26)
Significant effect of Season (p=0.021) (Hematocrit was higher in the Spring)
Significant interaction of Sex X Season (p=0.032)
The F-test of the interaction is enough statistical information to conclude that Hc of female juncos in spring is higher than Hc of female juncos in summer
Note: You could not draw strong conclusions about the Season-effect without
Interpreting Interaction: Factors with >2-levels
A professor gives a final exam that's an essay. Students are randomly assigned to either take the exam with laptops or write in blue-books. Additionally, students are put into three categories based on typing ability: None, Moderate, Skilled. The instructor was interested in the effect of Method, Ability, and the interaction of those two on score on the essay. Grades assigned "blindly".
Method: Laptop, blue-book
Ability: None, Moderate, Skilled
Method X Ability
Dependent Variable: Essay score
Main Effects:
Ability - Significant
Proceed with post hoc testing (Tukey HSD)
Method - Not Significant
Interaction - Significant
Now, you must analyze the simple main effects. This entails examining the changes in effect of one factor over levels of the other
Focus on key comparisons, don't have to run all tests
Presenting results
For F-ratio of ability, present F 2,12 then p-value
For F-ratio of ability*method, present F 2,12 then p-value
There was a significant effect of Ability (Test, F, p=0.032), where scores f students with moderate typing ability were higher than scores of students with no typing ability (Tukey HSD). The effect of Method was not significant (F, p=0.901); however, there was a significant AbilityXMethod interaction (F, p=0.0465). Examination of simple main effects was inconclusive, but there was a trend of lower scores in students skilled in typing and using laptops versus skilled students using bluebooks (test).
A researcher is interested in studying the effect of group psychotherapy and medication on depression. 30 patients participated in the study
The researcher designed this experiment to examine if types of therapy, psychotherapy and medication, interact in their effect on depression
2x3 factorial design
30 total patients
6 subgroups
5 patients per subgroup
Psychotherapy: Psychotherapy, No Psychotherapy
Medication: Placebo, low Dose, High Dose
Dependent Variable: Depression scores
Both main effects are Statistically Significant
The Interaction Term is significant
Run a Tukey HSD after to find why
Group psychotherapy influenced subjects in the placebo and low dose treatments, but it had no influence on people given the high dose treatment
Efficient Use of Resources
Interested in testing the effects of 2 environmental variables, air temperature and nitrate, on the growth of cotton
For each independent variable, you have 3 levels
High, Medium, and Low
Produces a 3x3 design
For power purposes, you want to have 27 cotton plants per level of each main effect
Unexplained variance is reduced in the 2-way ANOVA over the 1-way ANOVA
Results in higher f-ratios
2-way ANOVA has more Power
Incorporating variables into models that explain or account for observed variability in the dependent variable is a good thing, for this is one of our major goals as researchers
Two Types of Independent Variables incorporated into factorial Designs:
Experimental Variables
We are directly interested in the effect of all of those variables, including their interaction, on the dependent variable;
These are typically fixed effects
Control or Block(ing) Variables
Incorporated solely to reduce the amount of experimental error, giving more resolution for exploring effects of Experimental Variables of interest
Typically the Block Variable is a random effect
The Randomized Group Design
This is they "typical" factorial design we've considered so far
3x2 factorial design with 2 experimental factors
Method and ability are fixed effects
Method: laptop, bluebook
Ability: none, moderate, skilled
Method*Ability
Dependent Variable: essay score
In this design, each cell (6 subgroup combinations) has multiple subjects or replicates (3). Multiple subjects were "randomly" assigned to each cell in the 3x2 design.
In this design, you apply a 2-way ANOVA and include the interaction term
The Simple Randomized Block Design
5 m-squared of Bermuda grass
Enough space for 3 plots per location
Dependent Variable - amount of above ground grass (kg)
There is 1 replicate per cell
There are 5 replicates per level of Nutrient
The factor of interest is Nutrient
Can calculate Block Effect since n=3 per block
Why is it advantageous to include Location as a Block factor?
What is the unit of replication in this design?
The Simple Randomized Block Design (A Simple Mixed Model)
"Randomized Complete Blocks", "ANOVA w/o Replication"
Interspersion of treatments across "Blocks"
This design is used to reduce the amount of experimental error through the inclusion of a block variable that is usually a random effect factor
This design can be viewed as somewhat of a hybrid between a 1-way and 2-way ANOVA because you have 2 factors in the model, but you are only interested in the effect of one fixed Independent Variable (at least in the simple randomized block design)
The simple randomized block design has 1 fixed effect and 1 random effect
The statistical model is a mixed, Model III, 2-way ANOVA without replication
The interaction term is not included in the model because there is not enough replication per cell to calculate it
In the simple randomized block design, you assume no interaction between Experimental factor and the Block factor
In JMP:
3 total columns
1 fixed effect column
1 random effect column
1 dependent variable column
1 dependent and 2 independent
Test assumptions of the dependent variable the same way we've been doing it
Save residuals and test assumptions
Analyze --> Fit Model
Independent variable into Y
Fixed effect into model effects
To change a variable to a random effect click "Attributes" and then click Random Effect
If balanced, change "method" to "Traditional"
If imbalanced, change "method" to "REML"
Make sure "Effect Details" is enabled
Enables you to perform Tukey HSD and post hoc tests
Randomized Block - A Biological Example (Zar 12.4)
HO: The mean weight of guinea pigs is the same on four specified diets
1 block comprised of 4 cages
1 guinea pig per cage
1 replicate of each diet per block (assigned randomly)
Expected gradients within barn:
Temperature
Light
Noise
Draft
Pigs in Blocks will experience similar conditions
Interspersion of Treatments
N=20 pigs
5 replicates per diet
Columns
Block - nominal
Diet - nominal
Weight gain - Continuous discrete
Repeated Measures Design
Each subject receives all levels of Factor A
Slide 65 2-Way ANOVA ppt
Possibly subject to the "carry-over effect"
Subject goes through Treatment 1 and then Treatment 2, but it going through the 1st Treatment may have affected its response
Dependencies/correlations across treatments
Advantages:
Individuals/subjects are acting like "blocks" - homogeneity of potential sources of error
Experimental error introduced into the study due to variability between subjects can be accounted for
Also called a within-subjects or treatment-by-subject design
Subjects are receiving all levels of Factor A
Among subject variability can be accounted for and "factored out"
RMD design is similar to randomized block design b/c subjects function similarly to "blocks"
Intra subject dependencies present both positives & negatives
Dependencies are a negative if "carry-over" effects exist across treatments
Assess normality using residuals as in one-way ANOVA (will require re-entering data in traditional form)
Assess correlation structure (Test of sphericity)
Addressing the Sphericity Assumption
JMP provides a test for Sphericity using the Multivariate framework:
If the assumption is met, proceed with the unadjusted, univariate F-test
If the assumption is not met (Chi-Square <0.05 Sphericity Test)
Apply an adjusted, univariate F-test (Geisser-Greenhouse or Huynh-Feldt
2. Apply a F-test generated MANOVA - "Multivariate F" (Multiple dependent variables)
In JMP:
Go into Graph Builder and compare individuals (ex. Cholesterol vs. Drug)
Add random effect into right-hand overlay (top right box above "Color")
Examine correlation structure within Subjects across Groups
Note the y-intercept variation (Subject Variation is Important)
Analyzing Repeated Measures Design using Mixed Model
Assess normality and variances using residuals in One-Way ANOVA
Assess correlations within subjects across groups visually (Graph Builder)
Paired t-Test is most appropriate post hoc analysis w/ Bonferroni adjusted alpha level, but can do Tukey HSD
Multiway Factorial Analysis of Variance
You can extend the basic ANOVA design to experiments with more than 2 factors
In these multiway designs you can examine the effects of numerous factors (3,4,5,etc) simultaneously, with interactions, in one model
Factors can be a combination of both fixed and random effects
Typically, you never see more than 3 or 4 factors (3-way & 4-way ANOVA)
Tests of Difference vs Tests of Relationships
Tests of Difference
Is this group different from that group(s)?
t-Tests, ANOVA's
Independent variable is typically nominal/categorical
Tests of Relationships
Is variable A related (co-vary) with variable B?
Correlation and Regression
Independent variable is typically continuous (but doesn't have to be, particularly in correlation, where there isn't a "dependent" or "independent" variable)
Correlation
To what degree does one variable vary with another? (Does not imply cause & effect)
Regression
To what degree is variable Y dependent on variable X? (Implies a cause-and-effect relationship exists)
Correlation
Research question: Are two (or >2) variables "associated" with other?
Important Point: The research question is not one of cause & effect
Examples:
Do two methods of measuring blood pressure tend to give corresponding results?
Blood pressure measurements with 2 methods on the same units
How strongly associated are pairs of morphometric characteristics of grizzly bears?
Data points are 1 grizzly bear
Y and X values are leg length and arm length
Is there correspondence between concentrations of cadmium and lead in sediments of streams in a watershed impacted by industrial pollution?
Sample unit is sediment core sample
From that sample we get a measurement of [Cadmium] and a measurement of [Lead]. These are X and Y values
In JMP:
Make sure both are normally distributed & relationship is linear
Go to Analyze -> Multivariate Methods -> Multivariate -> Pairwise Correlations
Put Correlation value & P-value (r=0.952, p<0.001)
Simple Linear Correlation
Three questions:
Are two measurement variables related in a linear fashion?
If they are related, what is the direction (+ or -)?
How strong is the relationship? (Differences, parallel to effect size)
Smoking Example: cig smoking/day & CHD mortality/10,000 people
Null H0: There is no correlation b/w Smoking and CHD. (correlation coefficient = 0)
X-axis and Y-axis placement does NOT matter in correlations
Do NOT put a line of best fit in scatterplot. If putting a visual aide, add an ellipse
The Correlation Coefficient (r)
aka Pearson's r or the Pearson product-moment correlation coefficient
r is a measure of association between two variables (X&Y)
Two pieces of information obtained from rvalues:
Sign (+ or -) of r indicates whether the association is positive or negative
Size of r (from -1 to 1) indicates the magnitude of the association (further from 0 = stronger)
Since r values range from -1 to 1, 0.85 indicates a strong positive relationship exists b/w cigarette consumption and CHD
We can make this conclusion regardless of the underlying distribution of the two variables. In this sense, we view the r value as an index (rules of thumb)
The Correlation Coefficient (r): Test of Significance
There are no statistical assumptions associated with calculating r, and if an index is what you need to make the inference of interest, then stop here. (However, this is typically not how Biologists use r)
Assumptions of Normality
Both variables X&Y, were sampled randomly from a population with a normal distribution (bivariate normal distribution) and the relationship between the variables is linear
To calculate a p-value for r, X&Y need to be normally distributed and the relationship needs to be linear (DO IN GRAPH BUILDER)
If performing a transformation to one variable, apply to both
Nonparametric (Spearman's rs & Kendall's tau) has good power so transformation is not necessary
Nonparametric Correlation (Ranks)
Spearman's correlation coefficient (rs); ranges from -1 to 1 or Kendall's tau. Analyses based on ranks
Apply when bivariate normal assumptions are violated, when data are ordinal, or the relationship is nonlinear
In JMP:
Analyze -> Multivariate Methods -> Multivariate
Then you can find "linear correlations" and "nonparametric correlations"
Reid usually goes with Spearman's over Kendall's
Reporting is (r(sub-s), p-value)
The more elliptical the scatter of points, the more intense the correlation
Be careful interpreting significant r's
As sample size increases, the critical value decreases
Two-tailed usually
Multiple Correlations: The Correlation Matrix
What if you have more than 2 measurement variables
8 variables: 28 total comparisons
alpha per correlation = ?
Holm-Bonferroni Method (Holm 1979) WILL NOT BE TESTED OVER
Stop at the first non-significant outcome
Order the p-values from smallest to greatest
H4 = 0.005
H1 = 0.01
H3 = 0.03
H2 = 0.04
Work the Holm-Bonferroni formula for the first rank:
HB = Target alpha / (n-rank+1)
HB = 0.05 / (4 -1+1) = 0.0125
The Limitations of Correlation
Correlation analysis indicates cigarette consumption and CHD are related. It also tells us the relationship is positive and relatively strong (r= 0.85)
BUT...
You may want to predict incidence of CHD for given levels of cigarette consumption or how much does CHD increase with a unit increase in cigarette consumption. This hypothesis of causality about the relationship requires Regression Analysis
Sparrow Wing Length as a Function of Age
Correlation--> X <--> Y (Covariation)
Just has 2 variables, technically not independent or dependent variables
Regression--> X ---> Y
Age (Independent) ---> Wing Length (Dependent)
A Regression Example
A snake physiologist wished to investigate the effect of temperature on the heart rate of juniper pythons. She selected nine specimens of approximately the same age, size, and sex and placed each animal at preselected temperature between 2 and 18 C. After the snakes equilibrated to their ambient temperatures, she measured their heart rates. n=9
Temp (IV) (Fixed)
Heart Rate (DV)
Simple Linear Regression
Simple vs. Multiple regression (one predictor variable vs multiple predictors)
Linear regression vs non-linear regression
Simple Linear Regression:
Species a straight line relationship between two variables
Predictor variable (X) (usually a Fixed Effect-Model 1)
Response variable (Y)
Specifies a predictive relationship between X&Y
SLR analysis involves producing
A regression line or "best fit" line through points on a scatterplot of X&Y
A regression equation that relates X&Y
Building the Regression Equation
Regression implies a functional (cause-effect) relationship between variables
Two components need to be calculated from the data:
Slope (b) (the "regression coefficient"
y-intercept (a)
Y=bX+a
The Regression Analysis
The regression analysis calculates values of a and b for a data set so that the resulting equation is the best obtainable for the data
The "Best Fit" Line
Not all observed values of y fall on the line
The values of y that fall directly on the line are the predicted values "y-hat"
The sums of squares resulting from squaring the values of y - y hat is much smaller than the SS of y without consideration of x
Testing the Statistical Significance of the Regression Model
Evaluate significance with ANOVA where the F test is testing the overall significance of the model
3 Sums of Squares calculations (& df's) are needed:
Total SS (df = N-1)
Regression or "Model" SS (df=1)
Residual/Error SS (df = N-2)
In JMP:
Fit "Y by X"
Input variables
Select "Fit Line" at the red carat to get "Linear Fit" results box
Results are in "Analysis of Variance" under Prob > F (significance of Model)
The F-test tells us we have a very low probability of committing a Type I error and that python heart rate does vary linearly with temperature
How good is the model?
Big F tells us that it's good
Calculate % of how much variation is within the model (Model/C. Total)
The Coefficient of Determination (r^2)
Tells us what % of the variability in the dependent variable (y) is explained by the independent variable (x)
r^2 = SSmodel/SStotal
93.9% of the variability observed in example
Using the Model: Predicting y from x
y=2.14+1.77x
Heart Rate=2.14+1.77(Temp.)
Plug in values of x and solve for y
Can predict heart rate at temps that were not tested (e.g, 5 and 9ºC)
Model can be used by other researchers
Assumptions of Model 1 Simple Linear Regression
The IV is usually fixed but can be random
The y observations are independent
The functional relationship is linear
Residuals are normally distributed
Variances are equal
After you've built the model, Fit a line, and looking at the JMP output..
Click on the red triangle beside "Linear Fit" (Use graph in analysis)
To check normality - click on "save residuals"
To assess variances - click on "plot residuals"
If non-normal or variances are violated, transform both variables
Regression or ANOVA?
Age as a continuous variable
Data from basically 13 levels of the IV (Age)
"Replicated" regression is best
Relationship between y and all x's within the range of values tested is quantified
Problem Background:
A researcher is interested in site-specific differences in body size among populations of rattlesnakes. Why an interest in body size, well, reproductive traits in animals (e.g., number and size of offspring) often vary with body size. Populations may vary in body size due to differences in resource availability, resource quality, size-specific predation, population density, etc. Most importantly, size of rattlesnakes may vary with age.
How can the researcher examine for geographic differences in body size between two populations knowing that size will also vary due to differences in age?
Y-variable is body size (Continuous discrete)
Location (Categorical)
X-Variable is Age (Continuous discrete)
ANCOVA: Analysis of Covariance, 2 X-variables that are Continuous & Categorical
ANCOVA Requirements
1 Dependent Variable (Continuous)
1 IV (Categorical)
1 Covariate (Continuous)
Covariate
Variable that is related to the DV, which you can't manipulate, but you want to account for its relationship with the DV
Increased sensitivity of tests of main effects and interactions since usage of a covariate will result in a reduction of error variance
ANCOVA Assumptions
Residuals are normally distributed and variances are homogenous
Linearity - significant linear relationship between covariate and DV
Since covariate is used as a linear predictor of the DV yet it is not a fixed effect, the covariate is assumed to be measured without any error
Homogeneity of regressions (i.e. no significant interaction of GroupXCovariate)
ANCOVA In JMP:
Use the "Fit Model" Platform
Model should contain:
IV (Drug)
Covariate (X)
Interaction Term (Drug*X)
Look if Covariate is significant. (Example is significant)
Look at Interaction Term (Example is NOT significant, lines are statistically parallel)
Look at IV (Example is not significant, no drug effect)
Adjusted Means
When using ANCOVA, the means for each group get adjusted by the Covariate-Dependent Variable relationship
If the Covariate has a significant relationship with the Dependent Variable then comparisons are made on the adjusted means
When doing ANCOVA, you should graph/report adjusted means
ANCOVA - A Biological Example
Fish inhabiting caves often have small eyes relative to fish living in streams on the surface and is though to be an example of adaptation to life in a cave environment. Banded Sculpin is a common fish found in surface streams of North America, but it can also sometimes be found living in caves. A cave population of Banded Sculpin in Missouri are showing signs of cave adaptation similar to "true" cavefishes. Researchers are interested in whether or not sculpin in surface streams have different eye size relative to sculpin living in caves. Eye size of individual sculpin may also vary with total length of the fish
Construct the model
DV: Y-variable: (Eye size)
IV: (Location)
Covariate: X-variable (Total length)
Interaction Term: Location*Total length
Advanced Regression Techniques
Multiple regression (>1 IV)
Analysis of Covariance (ANCOVA)-mixture of regression & ANOVA
Analysis of Frequencies
Interest in the frequency that an event occurs...
How does an observed outcome compare to an expected outcome or distribution?
Is the sex ratio (M:F) in a population of box turtles the expected 1:1 ratio?
Do the frequencies of observed phenotypes conform to the expected 3:1 ratio?
Do mountain lions eat equal amounts of white-tailed deer and mule deer?
Goodness of Fit?
Data Characteristics
"Count data" - discrete number of observations
1 variable with categories or "bins"
2 or more categories/bins
Multiple independent observations within categories (5 minimum; >10 recommended)
Chi-Square Goodness of Fit Test
Quarter tossing
Probability of Heads? Tails?
Is observed significantly different from expected? Is the disparity due to random chance?
OR is the deviation not due to random chance?
JMP does test for us, but simple to calculate by hand
We can test to see if our observed frequencies "Fit" our expectations
This is the chi-squared Goodness-of-Fit test
Converts the difference between frequencies we observe and frequencies we expected
Nonparametric test (NO ASSUMPTIONS)
Data are frequencies (counts)
Observations are independent
Categories have large enough expected frequencies. When there are 4 or fewer categories, none of the expected frequencies are less than 5%
Conducting Chi-Square Analysis: Goodness of Fit IN JMP
2 variables, so 2 columns
Frequency/Count column & IV column
"Analyze" -> "Distributions"
Input IV into Y column spot
Input Frequency/Count data into Frequency spot
JMP spits out percentages
Click carat next to IV and click "Test Probabilities"
Input expected/hypothesized probabilities
Should add to 1.0
Run the test
ONLY report Pearson test results. ChiSquare value, df, & p-value
Report as: (Chi-Square test, ChiSquare value, df, p-value)
Chi-Square Test for Association or Independence
Are two categorical/nominal variables related/associated
Same data type and assumptions as Goodness of Fit Tests
Calculations are similar except not comparing to an "expected"
Contingency Table
Example
H0: Age 0 male & female Ohio shrimp captures at McCallie Access, Mississippi River are not associated (do not differ) with month
HA: Captures of male and female Age 0 Ohio shrimp are related to month
Variable A: June, July, August
Variable B: Male Age 0, Female Age 0
Fit Y by X
Y: Sex
X: Month
Frequency: Count
JMP gives "Contingency Table" and resulting Chi-Square values (look at Pearson results)
Can apply Bonferroni adjusted alpha level, but not necessary if risk of family-wise error is low