knowt logo

Biometry

t-test hypothesis tests: The mean/median of Group A = the mean/median of Group B

One-way ANOVA hypothesis tests: 3 groups. Tests means/medians of A = B = C

Descriptive Statistics: (e.g., mean, s.d., SEM, etc.) help organize/summarize data

Inferential Statistics: (e.g., t-test and ANOVA) allows us to generalize conclusions

Manipulative -Manipulating the application of the ind. variable __
Mensurative - NOT manipulating the application of the ind. variable__

Both are experimental studies, both have ind. & dep. variables, subtle difference

Variable - "characteristic that may differ from one biological entity to another" Zar (2010)

Dependent Variables - Response variablesIndependent Variables - Treatments, Factors

All experiments have at least 1 of each

How many variables & data type and scale of measurement dictate which inferential tool to apply

Experimental Goal?

  • Establish cause/effect relationship between ind. variable & dep. variable

    • To accomplish this ideally, all subjects must be identical/similar except for the level of the ind. variable they "receive"

  • Establish that variables are associated

  • Significant differences in responses correlate to good variables

Nuisance Variable Example

Prairie lizard manipulative study

Dep. - Snout vent length

Ind. Temperature at

  • n= 10 @ 10ºC

  • n = 10 @ 15ºC

  • n = 10 @ 20ºC

Other factors (nuisance/confounding variables)

  • Sex (controlled by only using one gender)

  • Diet (controlled by feeding the same thing in same amounts)

  • Age

  • Stress

  • Hormone levels

  • Reproductive status

  • Genetics

  • Competition

Approaches to nuisance/confounding variables

  • Identify variables beforehand and hold conditions constant across subjects and treatments

  • Distribute symmetrically across groups & have high sample size

  • Incorporate into the experimental model by adding another potential independent variable

  • Disperse the "nuisance effect" across all treatment conditions via randomization procedures (random assignment of subjects to treatments/conditions)

If concerned about "Procedural Effects": Implement a Control(s)

  • Negative control - test subjects/sample units that receive all "procedures" except the experimental treatment/manipulation (saline, sugar pill, non-restored study site, etc.)

  • Positive Control - test subjects/sample units receive all procedures except the experimental treatment/manipulation but you expect a known outcome from this group. This provides a group to compare with that controls for unknown sources of nuisance. (aspirin-headache example-give a group a drug known to deal with headaches)

  • "Controls" in Mensurative Experiments? Don't necessarily have a control for procedural effects, but have a good comparison across groups. Known benchmark

Examples of variable types and their scales of measurement

  • Attribute - Nominal - Sex of snake: male or female

  • Ranks - Ordinal - Pigmentation levels: going from no pigmentation to full pigmentation

  • Discrete Measurement - Ratio - Number of points on deer antlers

  • Continuous Measurement - Ordinal Ratio - Body temperature (ºCelsius), Weight of a warthog (kg)

Converting Data from One Scale to Another Example

  • Continuous variable (e.g., tree height) measured on a continuous scale ----> Convert to ranked data on an ordinal scale

Continuous in cm --> Ordinal in ranks

  • 100 --> 6

  • 500 --> 5

  • 525 --> 4

  • 1000 --> 1

  • 642 --> 3

  • 701 --> 2

  • 10 --> 7

  • Continuous works with the mean while converting to ordinal works with the median

  • Distances between data has not been retained, but makes it more desirable to work with

  • Reduction in variation between data may allow better testing

Descriptive Statistics - way to summarize and organize dataMeasures of Central Tendency

  • location of sample along the measurement sale

  • what is the location of the "typical" individual?

  • Arithmetic Mean

    • u = population or universal mean (the true mean)

    • x̄ = sample mean (an estimate of the true mean)

  • Geometric Mean

    • antilog of arithmetic mean of log-transformed data

  • Median (M)

    • middle value of a ranked data set; most appropriate when data are highly skewed or you're dealing with data on an ordinal scale

  • Mode

    • the value that occurs most frequently; number of modes can be useful

  • Skewness

    • measure of symmetry

      • 0 = symmetrical (normal distribution)

        • = tail to the right

        • = tail to the left

Measures of Dispersion and Variability

  • the distribution or spread of measurements

  • Range

    • difference between largest & smallest observation

    • usually given as minimum & maximum value

  • Variance

    • σ² = population variance

    • s² = sample variance

    • mean of squared deviations of measurements from their mean typical not reported since in different units from original data; used to calculate many statistical tests

    • S^2 = rac{sum (x

    • cannot be negative

    • increases as dispersion or variability increases

    • (n-1) = degrees of freedom (df); real units of information about deviation from the average

  • Standard Deviation (sd)

    • s = square root of variance (s²)

  • Coefficient of Variation (CV): CV = (sd/mean) x 100%

    • a measure of relative variability

    • has no units so is useful to compare sets of data collected on different scales

    • (e.g., morphological data in mm and m; T to Dissolved Oxygen (DO))

    • most applicable to ratio scale data

  • Indices of Diversity: distribution of observations among categories

Types of Distributions

  • Many statistical tests are based on assumptions that the data adhere to the properties of a given distribution

  • Discrete Distributions

    • Poisson distribution

      • items distributed randomly (independently)

    • Binomial distribution

      • two possible outcomes w/ equal prob. of occurrence

  • Continuous Distributions

    • Normal distribution

      • symmetrical; bell-shaped curve

    • t-distribution

      • symmetrical; related to normal distribution

    • Chi-square distribution

      • asymmetrical

  • Normal Distribution

    • symmetrical, continuous distribution

    • described by the mean and standard deviation (estimated by sample mean & sd)

    • most values lie in proximity of the mean random samples of a given n from a normal population will be normally distributed

    • Central Limit Theorem: at some large n even means of samples from a non-normal population will approach normality (even means from Poisson & Binomial distr.)

    • normal distribution is the basis of many statistical tests

Are Sample Data Normally Distributed?

  • Despite CLT, sample data may not be normally distributed due to small n or, more typically, for unknown reasons

  • Will want to check sample data to see if it is approximately normally distributed

  • "Goodness-of-fit Tests": (not really recommended)

    • Kolmogorov-Smirnov goodness-of-fit test

    • Chi-square goodness-of-fit test

Normal Quantile Plot

  • If data are perfectly normal they will lie along a straight line, inside the LCI's

  • Lilliefors Confidence Intervals

    • used to test for normality in a graphical way; if points fall outside the CI (confidence intervals) then data are significantly different from normal at alpha = 0.05

Shapiro-Wilk test

  • What null hypothesis is it actually testing?

    • The distribution of the sample data is equal to the normal distribution

  • After Normal Quantile Plot, click continuous then normal

  • Go down to data and click under the Fitted Normal option triangle, then choose Goodness-of-Fit

  • Then JMP gives you the Probability so you can choose to reject or fail to reject

  • If not normally distributed...

    • ignore it?

      • Inflating chance to achieve a Type 1 error

        • When you reject the null when the null is actually true

    • transform raw data to "fix it" & resume test?

      • Doesn't provide much evidence that it fixed it

    • choose a nonparametric equivalent?

      • Usually the parametric tool has more statistical power than the nonparametric equivalent

Statistical Testing and Probability

  • Probability is the likelihood of an event

  • Statistical tests provide the likelihood that the null HO is true (P-values)

  • At low P-values the null is rejected and the alternate is accepted

  • The lower the P-value, the more confident you are that the null is false

What is a "low" P-value?

  • Researchers arbitrarily set the probability used as the criterion for rejection of the null

  • This value is called the significance level or α (alpha)

  • Convention is to apply an alpha level of 0.050

  • If α = 0.05 then at P-values less than 0.05 you reject the null HO (i.e. means are "significantly different")



Statistical Errors in Hypothesis Testing (Zar Section 6.3)

  • In reality, the null hypothesis is either true or false

  • Because inferences are made from samples, there is always the possibility of making the wrong inference

  • 2 ways of making the wrong inference:

  • Type 1 Error

    • Rejecting the null hypothesis when in fact the null is true; a "false positive"'; you determine the means are significantly different when in fact they are not (We must control this error rate)

    • α error

  • Type 2 Error

    • Not rejecting the null when in fact the null is false; you determine the means are not significantly different when they really are (Considered to be a "less dangerous" error)

    • Designing experiments that give you the best chance possible to reject the null when it is in fact false is the best way to avoid Type 2 Error

    • β error

Insert probability notes here

Prospective Power Analysis

  • performed during planning stages of a study to explore how changes in study design (e.g., n, alpha, and effect size) impact objectives/goals of the study including interpretations of statistical tests & potential outcomes

  • Common applications of Prospective Power Analysis are:

    • to determine n required to attain a desired level of power at a specified minimum effect size, alpha-level, and standard deviation

    • to determine power of a test when n is constrained logistically (perhaps you then need to adjust alpha if power is too low)

    • to determine the minimum detectable meaningful effect size (the question here is what is a biologically meaningful difference)

    • JMP gives you big N, you need little n to determine sample size per group. Divide big N by however many groups you have to obtain little n

    • Can increase alpha level to increase power

    • Can increase effect size to increase power

  • How to perform Power Analysis in JMP:

    • DOE > Design Diagnostics > Sample Size & Power

    • Depending on scenario, choose

    • E.x. 2-sample means

    • Not messing around with Extra Parameters yet

    • Can change alpha level, Std. Dev = Dispersion, difference to detect = effect size

    • Std.Dev & Effect Size need to be in same units

    • Leave sample size & power blank

    • Click Continue & you will get a curve

    • Sample size on graph is always in Big N

    • Not using to get an exact # of sample size, power analysis is a guide

    • Increasing alpha level increases power, which will decrease sample size

    • Increasing Std.Dev (Dispersion across the Dependent Variable) increases sample size

    • Decreasing effect size increases sample size

E.x. Clinical Research - Experiments investigating treatment of tumors

  • Will Drug A reduce the size of brain tumors?

  • Minimum Effect Size that is Biologically Relevant?

    • Using a minimum of at least 50%

    • Decided alpha level of 0.01, defended because really want to make sure Drug works

    • Need an idea of size of wild-type tumor to get 50% into units

    • Need some measure of Dispersion

      • Collect some data by giving study specimens the drug

      • Has a good idea that wild-type tumor size will be about 30 cubic millimeters

      • Effect size will be 15 cubic millimeters since using minimum of 50%

      • OR look within the literature to see if somebody has done similar things

      • If Dispersions are different, pick the bigger one

      • Using 12 cubic millimeters for Dispersion based on the literature

    • Big N shows 36 at Power of 0.8

    • Based on biological ethics will only give 10 mice cancer, so N=20

    • Doesn't give us a good idea of if Drug will work or not

    • Increasing alpha level to 0.05, makes our N=20 look a lot better

Standard Error of the Mean (SEM; SE)

  • SE is the standard deviation of measurements around a set of means repeatedly calculated from a statistical population or universe

  • SE is a measure of the precision of x-bar as an estimate of u

  • as SE gets smaller, the precision of x-bar increases

    • SE = s(standard deviation)/square root of n

    • incorporates sd & n, two factors that will impact reliability

  • SD - a measure of the dispersion or spread of the sample data

  • SE - a measure of the sampling error or uncertainty in the sample mean as an estimate of the population mean

Confidence Intervals & the Student's t distribution (will never test over confidence intervals)

  • The t distribution

    • Family of distributions related to the normal distribution; shape depends on degrees of freedom

Reporting Rules and Conventions

  • Zar 2010 (Section 7.4 page 108)

  • "No widely accepted convention", but the measure of dispersion must be clearly stated

  • n should be stated somewhere

  • As Text in manuscript: (mean = 27.4 g +/- 2.80 SD) or SE or 95% CI

  • In a Table or Figure

Two-Sample Hypotheses

  • Do differences exist b/w two samples; i.e. are the two samples from two different statistical populations?

  • A number of types of comparisons:

    • means

    • medians

    • variances

    • CV

    • indices of diversity

  • We will explore 2-sample comparisons involving means:

    • comparison of independent samples

    • nonparametric tests of independent samples

    • comparison of paired samples

Comparison of Two Independent Samples

  • For Example: You measure hematocrit in two groups of 17 year olds, males (n=600) and females (n=600)

    • Is hematocrit different between groups?

    • Males - 45.8 +/- 2.8 SD

    • Females - 40.6 +/- 2.9 SD

  • What are the independent and dependent variables?

    • Independent: Sex (male or female)

    • Dependent: Hematocrit values

  • What would the data model types be in JMP?

    • Two columns, sex and hematocrit values

  • Independence of "Samples" or sample units?

    • Each individual person should be a sample unit

    • Can take multiple measurements, just make sure to average before inputting into chart

  • Pseudo replication?

    • Analyzing as you have more replicates when you're actually short

  • 2-tail always has the exact same null hypotheses

    • That the means are the same

Comparing Means from Two Independent Samples

  • Under the following experimental conditions:

    • 1 Dependent Variable that is continuous

    • 1 Independent Variable that is nominal (grouping/categorical) with 2 levels/groups/categories

  • Apply the following test if assumptions hold:

    • Student's t-test

    • Prob > absolute value of t = 2-tailed either direction

    • Prob > t = 1-tailed to the right

    • Prob < t = 1-tailed to the left

    • Always double check degrees of freedom

  • Writing a Concise, Publication-Quality Interpretation of Results of a Statistical Test: "Manuscript Style" (Make it OBVIOUS)

    1. Be direct, say what you mean, mean what you say

    1. Don't just say means were different, or you rejected the null or that you detected a significant difference. State the direction of the effect (e.g., high/low, etc.)

    1. Provide the statistical test, df (or n), test statistic (only go to 2 decimal places), and P-value (only go to 3 decimal places).

    • This is typically put in parentheses following the sentence

  • People given drug G had significantly longer blood clotting times than people given drug B (Student's t-test, df = 11, t = -2.47, P = 0.031) (Figure 1)

Assumptions of the Two-Sample t-test

  • Both samples were taken randomly; i.e., sample units are independent of each other & unbiased

  • The dependent variable is normally distributed (or ~ normal)

    • (combination of frequency distribution, normal quantile plot, and S-W test)

  • Variances of the two groups are equal (or almost equal)

    • (eyeball SD - 2-fold difference?, variance tests, e.g., Levene's Test)

  • What is the risk?

    • You risk elevating your Type I error rate above the stated alpha level

    • Violations are more serious if sample sizes are small (>30 Zar 2010), you are doing a 1-tailed test, or your sample sizes are severely unbalanced

  • If all assumptions are confirmed = Run of the mill t-test

If Assumptions are severely violated, what do you do?

  • If normality (normal distribution) is off, apply a data transformation, if "corrected", proceed w/ the t-test using the transformed data set

    • Report testing results from transformed analysis, but usually report sample means and dispersion on the original scale in text and/or in graphs/tables

      • Variance Fine

  • If only variances are violated you can conduct a t-test that has been "corrected for" unequal variances

    • Welch's t-test

      • Zar pg. 138

  • If normality cannot be "fixed"...

    • Conduct the Mann-Whitney U test which is a nonparametric equivalent of the t-test

      • JMP calculates the Wilcoxon Test which is equivalent to the Mann-Whitney test

        • Zar pg. 146

Wilcoxon test, Mann-Whitney U test, or the Wilcoxon-Mann-Whitney test

  • Nonparametric equivalent of the t-test for independent samples

  • Nonparametric test

    • Distribution-free test, where no or few assumptions are made about the shape of a distribution

    • Does not focus on any specific parameter such as the mean

  • Test specifics:

    • Used to test 2 groups

    • Assumes nothing about the underlying distribution or homogeneity of variances

    • H0: population distribution of sample 1 = sample 2 (test of medians)

    • Calculates test statistic based on ranks (position) of the raw data

    • Good when data set has extreme values in it, but has lower power than t-test, unless assumptions are severely violated

    • Why not just always apply a nonparametric test with every data set?

      • Usually has less power

Welch's test

  • A special derivation of the t-test

    • Used when normality is correct, but variances are not equal

  • Can be identified when df are not a whole number

    • e.g. df on normal t-test are 11, on Welch's they may be 10.70

Comparison of Paired Samples (Non-Independent)

  • In contrast to independent samples, in a "paired" design, sample units are linked or correlated in some way with a member in the other group(s)

    • i.e. members of a pair have more in common than with members of another pair

    • This dependency is planned or by design. However, you make a critical mistake if you apply the wrong inferential test

  • Paired t-test

    • Wilcoxon paired-sample test/Wilcoxon signed rank test (nonparametric equivalent)

  • What if you conduct test assuming independence?

    • Catastrophic failure

    • What happens to Type I error?

      • Increases if highest variance = lowest value

      • Decreases if highest variance = highest value

Family-wise error inflation is when doing multiple comparisons with the same data at the same alpha level. Raises Type I error rate



Multisample Hypotheses

H0: u1 = u2 = u3 ...

  • Design:

    • 1 Dependent Variable (continuous)

    • 1 Independent Variable (categorical/nominal): 3 or more levels

  • Why not conduct a series of t-tests?

    • Type I Error is inflated beyond your stated alpha

    • Type I errors accumulate with each statistical test conducted on the same data set

      • Experimentwise or familywise error rate - must be controlled

Analysis of Variance (ANOVA)

  • The ANOVA family of tests are the most commonly applied statistical tests

  • Inferences about means are made by analyzing variability in the data

  • One model is constructed that includes all means simultaneously; therefore, it controls for familywise error

  • *F-*value (*F-*ratio) is the test statistic (Sir Ronald Aylmer Fisher 1918)

    • A factor/treatment is an independent variable whose values are controlled and varied by the experimenter (e.g., drug type)

      • Are categorical/nominal variables

    • A level is a specific value of a factor (e.g., drug A, drug B, drug C)

  • Analyzes and partitions sources of variation in a dataset

    • 2 kinds of variability

      • Between sample means (among groups)

      • Within groups

    • Total variability comprises within group variability and variability between groups

    • Between Group Variability

      • Treatment effects

        • Group

    • Within Group Variability

      • Individual differences

      • Errors of measurement

        • Error

    • F = Group/Error

    • As the test statistic gets bigger, the variation should get bigger as well. Higher F value is desired

  • In ANOVA, the total variation in the response measurements is divided into portions that may be attributed to various factors, (e.g., amount of variation due to Drug A and amount due to Drug B) Which factor(s) or combination of factors account for significant amounts of the total variation?

    • Partitioning of the variance within the data set

    • If a factor/treatment represents a lot of the total variability relative to variability within groups (error) then it is an important “player”

  • Example: Sandwich types. On One-Way ANOVA Powerpoint

  • ANOVA-F distribution is the underlying distribution

  • F = (Between Group Variability/Within Group Variability)=(MS-group/MS-error)

    • MS stands for Mean Square

  • H0: F = 1 -> No treatment effects (sample means are drawn from same population). (No Sandwich effect)

  • Large F -> Means are different (sample means are from different populations). (There is a Sandwich effect)

Summary of Logic

  • Calculate two estimates of the population variance. MS-error, based on variability within groups, is independent of H0.

Calculations for the ANOVA

  • In order to calculate MS-groups and MS-error we must first calculate the appropriate sums of squares (SS)

  • SS-total

    • Represents sum of squared deviations of all observations from the grand mean

      • SS-total = SS-group + SS-error

  • SS-group

    • Sum of squared deviations of group means from the grand mean. In effect, a measure of differences between groups

      • Insert formula

  • SS-error

    • Sum of squared deviations within each group. Usually obtained by subtraction

      • SS-error = SS-total - SS-group

Degrees of Freedom

  • In order to calculate MS-group and MS-error we need to know the degrees of freedom associated with SS-group and SS-error

    • df-total = N - 1 (where N is total number of observations)

    • df-group = k - 1 (where k is the number of groups)

    • df-error = df-total - df-group

  • MS-group = (SS-group/df-group)

  • MS-error = (SS-error/df-error)

F-value

  • Having calculated MSgroup and MSerror we can now calculate F

    • F = MS-group / MS-error

  • Between groups estimate of the population variance is much larger than the within groups estimate ® F value greater than 1

  • How much larger than 1.0 must the value of F be to decide that there are differences among the means?

  • Use tables of the F distribution, Zar Table B4, Appendix 21.

    • Gives critical values of F corresponding to the degrees of freedom for the two mean squares (dfgroup and dferror).

      • dfgroup = numerator df (2)

      • dferror = denominator df (12)

        • From tables: (alpha = 0.05) Fcrit=5.10 (F2,12 = 8.45)

        • Because Fobt > Fcrit we can reject Ho and conclude that the groups were sampled from populations with different means. There is an effect of Sandwich Type

The ANOVA Table (HAVE TO BE ABLE TO BUILD FOR MIDTERM)

  • Source ----> SS ----> df ----> MS ----> F ----> P-value

  • Group

  • Error

JMP Analysis of One-Way ANOVA

  • Fit Model

    • Do NOT go into Fit Y by X and conduct the One-Way ANOVA !!!

  • Dependent variable goes into Y box

  • Add independent variables into model effects (bottom) box

  • JMP has different columns for every Independent Variable

  • P-value in JMP under Analysis of Variance = Prob > F

  • Capture Analysis of Variance and Effect Test boxes to give evaluation of the Omnibus Hypothesis

What is a Residual?

  • Distance between the Observed Y and the Predicted Y on a Y by X chart with line of fit

  • How to graph Residuals in JMP

    • 1 variable is Nominal, 1 variable is Continuous

    • Fit Model & run ANOVA

    • Click on Response carat on top left and Click Save Columns

    • Click Residuals, then puts it into spreadsheet

    • Testing assumptions using Residuals in JMP

    • Analyze distribution, add residuals into columns, check distributions, quantile plots, Shapiro-Wilk test

      • Shows us distribution and assumptions of the data

        • If non-normal, transform the RAW DATA and NOT the residuals

        • Then re-find the residuals using the transformed data and THEN test assumptions

      • Run the ANOVA and click Road Diagnostics

        • If Residual by Predicted Plot is not on plot standard, add it using Road Diagnostics

Assumptions of ANOVA (Step 1)

  • Observations/sample units are independent of each other. (i.e. no systematic biases within the data set). Best achieved via random sampling

  • The data are normally distributed, better yet, the residuals are normally distributed

    • Save residuals to the data spreadsheet

    • Examine frequency distribution, normal quantile plot, and Shapiro-Wilk of residuals & interpret

  • Homogeneity of variance (i.e. the variances among groups are equal)

    • Examine the plot of residuals (residual by predicted plot) vs the predicted values. Are the points equally scattered for each group

  • Pig mass varied significantly with type of food (One way ANOVA, F (subscript numerator df (model) denominator df (error)), p-value). Next sentence or 2 would add biological / supporting stats to build answers. (Slide 22)

Assessing Normality within the ANOVA framework

  • In JMP:

    • Fit the ANOVA model

    • Go to "Save Columns"

    • Click on "Residuals"

    • Residuals should be in the data spreadsheet

A Significant Overall F... What's next?

  • Significant overall *F-*test does not indicate that all factors are different from each other

    • Don't know how many means are different, nor which means are different

  • Due to experiment wise error inflation, you cannot proceed with a series of run-of-the-mill t-tests

    • The proper statistical approach is to employ a multiple comparison test (i.e. post hoc testing)

      • Do NOT go through with this if you do not reject the Omnibus hypothesis in Step 1

  • What if the overall F-test is not significant?

    • You cannot proceed

Multiple Comparison Tests - (Parametric) (Step 2)

  • Also known as post hoc or a posteriori tests

  • Many different ones

    • Tukey test, Student Newman-Keuls test, Duncan test, LSD test, Scheffe's test, Fisher test, Bonferroni adjusted t-tests (a special case)

  • Their application is debated in the literature and there is no absolute agreement on the best to use

    • Although, Tukey & SNK are the most commonly employed and, therefore, accepted techniques

  • They operate under the same assumptions as ANOVA and must follow a significant F test

  • Post hoc testing usually involved testing of all possible combinations of means, even comparisons you might not be interested in

    • This is why post hoc testing is often referred to as the testing of "unplanned comparisons"

  • Post hoc tests have built-in procedures that correct for experiment wise error and its influence on Type I error inflation

    • Each one differs slightly in how conservative it is

Post hoc __Testing: The Tukey HSD Test__**

  • Rank the means in ascending order

  • First, compare largest to smallest, then largest to next smallest, etc.

  • You will need to use the table of critical values of the q distribution on Zar pg. 723

Post hoc Testing: The Student Newman-Keuls test (Never have to calculate in this class)

  • SNK is conservative enough (i.e. it controls experimentwise error) but it has more power than the Turkey test; SNK calculations are very similar to Tukey

  • "Multiple Range Test". The "family" of comparisons changes

Graphical Display of post hoc Results

  • Put explanation of notation in figure caption along with results of F-test

  • Start by assigning A to the mean(s) of highest magnitude

  • Shared letters indicate means were not significantly different

  • Strontium concentrations varied significantly (F4,25 = 56.2, P < 0.001) across water bodies, and concentrations were highest in Rock River, moderate in Angler’s Cove, Appletree Lake, and Beaver Pond and lowest in Grayson’s Pond (SNK) (Figure 1).

Tukey vs. SNK

  • Both tests adequately control experiment wise error rate and are appropriate post hoc tests when multiple comparisons are desirable following a significant F test

  • Both can be applied at a specified alpha level (e.g., 0.05)

  • Both are better approaches than multiple t-tests

  • Tukey will result in fewer Type 1 errors than SNK

  • SNK has more power than Tukey (>3 means)

  • I apply Tukey when I want a more conservative test and SNK when the research is more exploratory

  • Tukey is probably more commonly used by Biologists; SNK common in Psychology (Zar recommends Tukey)

The Bonferroni Method: An Additional Way to Control Experimentwise Error

  • The Bonferroni adjustment to alpha levels is commonly used to control experimentwise error in situations where multiple tests are applied (e.g. post hoc comparisons and multiple correlations)

  • a= 0.05 / # of comparisons

    • You can “start” with whatever alpha you want

    • For example: You want to conduct 5 tests (denominator) @ an initial stated alpha of 0.05 (numerator)

    • 0.05/5 = 0.01

      • All 5 tests would actually each be conducted @ alpha = 0.01

      • 5 comp. = 0.01, 8 comp. = 0.006, 12 comp. = 0.004

  • An acceptable approach to post hoc testing is to conduct multiple t-tests but with Bonferroni adjusted alpha levels for each comparison

Dunnett's Test (Control Group vs Other Groups Individually)

  • Accepted post hoc test provided in JMP for this special case

Multiple Comparison Study

  • We learned 3 techniques to control experimentwise error during post hoc testing:

    • Tukey Test

      • built-in adjustments/corrections such that you actually conduct the test at the stated alpha

    • SNK

      • built-in adjustments/corrections such that you actually conduct the test at the stated alpha

    • Bonferroni Method

      • Directly adjust stated alpha based on # of comparisons (You don’t have to do all possible comparisons) All three approaches are conservative relative to not controlling experimentwise error.

    • Bonferroni Method is ultraconservative, particularly at > 5 comparisons (0.01)

  • Basically, you pay a “penalty” when you test all possible, unplanned comparisons following a significant F test because these comparisons have been adjusted to control for experimentwise error

  • There is another option that circumvents being penalized, but you cannot make all pairwise comparisons

Nonparametric ANOVA: The Kruskal-Wallis test

  • Apply this nonparametric equivalent to One-way ANOVA when k>2

  • It is a distribution-free method that analyzes the ranks of the data

  • Sometimes called "ANOVA by ranks"

  • ANOVA is generally more powerful, but K-W provides an alternative when assumptions are not met and a transformation doesn't help

  • The test statistic is H

  • The K-W test is equivalent to the Omnibus *F-*test in ANOVA

K-W in JMP

  • Use the Fit Y by X platform

  • Go to "nonparametric"

  • Choose "Wilcoxon"

  • Desired info is under 1-way Test, ChiSquare Approximation

  • Do NOT report ChiSquare value

  • DO report df, test statistic, and p-value

Post hoc testing Following significant K-W test

  • Dunn Method for Joint Ranking - (Zar pg. 240-241)

    • Preferred, more powerful method for nonparametric

  • Steel-Dwass procedure

    • Less power, doesn't work well when sample sizes are unequal

  • Wilcoxon all pairs (apply Bonferroni adjusted alpha)

    • Less power, highly conservative when groups 5 or more

  • In JMP

    • Fit Y by X

    • Run ANOVA

    • Click carat and choose "nonparametric"

    • Click "Nonparametric Multiple Comparisons"

    • Will show up under "Nonparametric Comparisons"

    • Report p-value and possibly Z-value

Planned Comparisons (Contrasts) vs Unplanned Comparisons

  • Typically, when you design an experiment with multiple levels of the independent variable, you have particular comparisons of interest in mind

  • Planned comparisons are stated a priori while unplanned comparisons are a posteriori, or "thought of after the data are collected"

  • You pay a price for conducting post hoc tests because they incorporate a correction for experimentwise (family wise) error

  • In contrast, planned comparisons (at least some special combinations) are made at the stated alpha, even if the omnibus F is not significant, within the ANOVA itself because they partition the SS-Model

  • Orthogonal Contrasts/Comparisons

Statistical Orthogonality

  • Usually in reference to groups or multiple independent variables

  • Non-overlapping, independent, not correlated

  • Assumption for planned contrasts & multiple regression modeling

Set of Planned Comparisons Must = Orthogonal Contrasts

  • If you want to conduct planned comparisons, you need to decide how many and which ones to make

  • To enjoy the luxury of testing multiple comparisons at the stated alpha, you must follow certain rules (i.e. the comparisons must be orthogonal)

    • A full set of orthogonal contrasts completely partition the SS-Model

    • Therefore, they represent independent pieces of information (i.e., this allows you to work at the originally stated alpha)

    • There are up to a-1 possible contrasts that can comprise a full orthogonal set (but you don't have to conduct all a-1 comparisons)

      • a = # of groups

    • Planned contrasts are 1 df comparisons & you cannot use over the a-1 df

  • How do you establish a set of orthogonal contrasts?

Coding Planned Contrasts

  • Coding is achieved by assigning weights/coefficients to groups to indicate contrasts

  • Rules

    • Groups coded with positive weights will be compared to groups coded with negative weights

    • The sum of weights for a single comparison/contrast should be zero

    • Group(s) not involved in a specific comparison is/are given a zero

    • To be orthogonal, the sum of the products of coefficients within a group must equal zero

  • In JMP

    • Build ANOVA

    • Find normal Sum of Squares and record

    • Click carat where you'd normally run Tukey test

    • Build table using positives and negatives, adding new column after every row

    • Click done and look at SS, needs to be lower or equal to normal Sum of Squares

Planned Comparisons: Wrap-up

  • Incorporate planned comparisons if you can to avoid experimentwise error.

  • Can be a useful approach if grouping “groups” for comparisons is insightful and a goal of the research.

  • It’s best if each comparison represents a unique portion of the SSModel so that comparisons meet the orthogonal requirement.

  • You don’t have to perform all a-1 comparisons, but beware of “unexplained” blocks of variance in SSModel.

  • Don’t “force” orthogonality. In other words, if the planned comparisons of interest aren’t orthogonal, proceed but Bonferroni adjust the alpha levels for each comparison.

  • If all pairwise comparisons of groups are of interest, you should probably just proceed with Tukey or SNK (Season example)



Data Transformations (Zar Chapter 13)

  • To apply parametric statistics, the data set must meet (or approximate)
    the assumptions of normality, equality of variances, and that the magnitude
    of the variances don’t increase with the magnitude of the means (nonadditivity).

  • If you judge that the data violate assumptions, then one option is to try and “correct” the data by applying a transformation.

  • When applying a transformation you change the raw data to a different form or scale. (e.g., F to C is a transformation)

  • After you perform a transformation, and judge the transformation “fixed” the data, conduct parametric tests on the transformed data set. Transform all data,
    not just one level of a variable!

  • Complications arise when reporting means and variability around the means following transformation. You should probably report the mean in the original scale. The most appropriate thing is to report the antilog of the transformed mean (geometric mean). I don’t see people doing this? How to report the variability is another issue ... Be sure you inform readers you analyzed transformed data!

Three Common Transformations

  • The Logarithmic Transformation

  • The Square Root Transformation

  • The Arcsine Transformation

The Logarithmic Transformation

  • X'=log10(X)

  • X'=log10(X+1) -->0' or to avoid (-) values

  • The log family of transformations are the most common

  • It is a variance-stabilizing transformation that will also address nonadditivity and non-normality if the data are right skewed

  • You can apply a log of any base, but log10 appears to be used the most

  • Beware of “log” “log10” “ln” – this is particularly important when reporting a model designed to predict y’s based on inputs of x
    Always check your transformation with a calculator

The Square Root Transformation

  • X'=SqRt(X+0.5)

  • Variance-stabilizing transformation, particularly when variances increase as the means increase, also when the variances & means are of similar magnitude and aren't independent of each other (i.e. Poisson distribution)

  • Helpful to try this transformation if Log doesn't work, especially when a non--parametric tool is not at your fingertips

  • May help transform percentage data when data range is between 0-20% or between 80-100%

  • Similar reporting issues (square the transformed mean & calculate CI's)

The ArcSine Transformation

  • p'=arcsin*(SqRt(p))

  • Proportions tend to form a binomial distribution vs a normal distribution

  • This transformation will "centralize" the data - bring values closer to 50%

  • Arcsin is the inverse sine (sin^-1)

  • Radians vs degrees .. Ugh!

  • Check your transformation vs Zar Appendix Table B.24

Only Pre-Midterm Topics Above


Power and Sample Size in anova

  • Power = 1-Beta

  • In JMP

    • Set up regular Power Analysis

    • Choose k sample means option

    • Set alpha for Omnibus F test

    • Enter SD, variability among all groups combined

    • Enter estimated means of each group (represent smallest detectable difference)

    • Leave sample size & power blank to examine power curves

    • Sample size gives Big N. Remember to divide N/k # of groups

    • Reports sample size required to reject the Omnibus



Different Types of ANOVA ModelsFixed-Effects Model (Model I ANOVA)

  • Levels of the factor are specifically chosen by the experimenter; it is these specific groups about which the experimenter is trying to draw conclusions

  • Most common

Random-Effects Model (Model II ANOVA)

  • Levels of the factor are a random sample of all possible levels, a wider universe of groups

  • Instead of being concerned with effects of specific levels, you are trying to generalize effects across a random selection

Mixed-Effects Model (Model III ANOVA)

  • Some factors/treatments are fixed & some are random

  • SS & MS calculated the same; ANOVA table looks similar

  • Differences in the MS term used in F test for some HO's and how secondary analyses are performed



Factorial ANOVA (Zar Chapter 12)MultiSample HO:

  • One-Way ANOVA model

    • 1 Independent Variable

      • Nominal w/ more than two levels

    • 1 Dependent Variable

      • Continuous

Factorial Analysis of Variance

  • Consider the effects of more than 1 independent variable on a dependent variable simultaneously (in the same model)

  • Advantages

    • No need for multiple 1-way ANOVAs;

      • Can test for interaction among factors

  • Two-way ANOVA model

    • 2 Independent Variable

      • BOTH Nominal each w/ two or more levels

    • 1 Dependent Variable

      • Continuous

Two-way ANOVA/Two-factor ANOVA

  • 2 independent variables = 2 treatments = 2 factors = 2 main effects

    • Don't confuse with multiple levels or groups

  • Let's add a variable to the experiment that tested the effect of 5 sugars. Now we want to test the effect of both sugar and pH on pea growth

    • 5x2 factorial design

    • Each level of 1 factor is in combination with each level of the second factor; "crossed"

    • Balanced design (equal replication)

    • 10 combinations, 50 observations

    • Sugar will have an F value, pH will have an F value, Sugar x pH (interaction between 2 variables) will have an F value

    • Certain tests are informative (Tukey is good, provides pairwise comparisons)

      • Examine interaction plots to help us see visually why we have interaction

  • 2x2 Design Rat & Lard Example

    • Effect of lard type on food consumption of rats (N=12; n=6 per main effect)

    • 2 Main Effects (Fixed) each w/ 2 levels:

      • Fat (Fresh, Rancid)

      • Sex (Female, Male)

      • 3 replicates per subgroup

    • Fit Model Platform

    • Dependent Variable into Y box

    • Independent Variables & Interaction into model effects box

    • Analysis of Variance Prob>F = Omnibus F

    • Manuscript statement looks at Effect Test results & post hoc results

    • To find variation of a certain effect, divide sum of squares model/group by sum of squares total

    • Grab LS Means Plots for both Effects and the Interaction

    • If Main Effect is not significant, then post hoc testing is not necessary

  • Publication Statement:

    • Consumption by rats was significantly higher for fresh fat versus rancid fat (Two-way ANOVA, F1,8 = 41.96, P < 0.001), and main effect “Fat” accounted for 79.0% of the total variation in rat consumption (Table/Figure 1). Sex (F1,8= 2.59, P = 0.146) and Fat*Sex (F1,8= 0.63, P = 0.450) were not significant.

    • Effect tests F & P value tells us that we have a variable effect. REPORT EFFECT TESTS F&P VALUES

  • 3x2 Factorial Design Sandwich Data

    • Sandwich: Meatball, BLT, Spicy Italian

    • Season: Spring, Summer

    • Dependent Variable: Sales from ___ Subway Stores

      • Significant variation within Sandwich Types, so run post hoc analysis to see how and why

    • When a main effect is significant, and has more than 2-levels, proceed as you would in a one-way ANOVA

      • Sales of BLT are significantly higher than sales of Meatball & Spicy Italian

      • Sales of sandwiches did not differ between spring and summer

      • There was no interaction of the main effects, indicating that Season affected sales equally across all Sandwiches

Analysis of Significant Interactions

  • The interaction term allows for examination of the joint effect of factors on the dependent variable (advantage of factorial design)

  • If the interaction term is significant, it means the nature of the effect of one factor on the dependent variable is dependent on levels of the other factor

  • In factorial anova, you should first look to see if the interaction term is significant, because if it is, then biological conclusions made about the main effects are unreliable or not applicable in all instances (combinations)

  • Interaction among factors indicates the effect of the two Main Effects are not independent of each other

  • What if you have a significant Interaction Effect?

    • Effect of Sex and Season on hematocrit of the dark-eyed junco:

      • No effect of Sex (p=0.26)

      • Significant effect of Season (p=0.021) (Hematocrit was higher in the Spring)

      • Significant interaction of Sex X Season (p=0.032)

        • The F-test of the interaction is enough statistical information to conclude that Hc of female juncos in spring is higher than Hc of female juncos in summer

        • Note: You could not draw strong conclusions about the Season-effect without

Interpreting Interaction: Factors with >2-levels

  • A professor gives a final exam that's an essay. Students are randomly assigned to either take the exam with laptops or write in blue-books. Additionally, students are put into three categories based on typing ability: None, Moderate, Skilled. The instructor was interested in the effect of Method, Ability, and the interaction of those two on score on the essay. Grades assigned "blindly".

    • Method: Laptop, blue-book

    • Ability: None, Moderate, Skilled

    • Method X Ability

    • Dependent Variable: Essay score

  • Main Effects:

    • Ability - Significant

      • Proceed with post hoc testing (Tukey HSD)

    • Method - Not Significant

    • Interaction - Significant

      • Now, you must analyze the simple main effects. This entails examining the changes in effect of one factor over levels of the other

      • Focus on key comparisons, don't have to run all tests

  • Presenting results

    • For F-ratio of ability, present F 2,12 then p-value

    • For F-ratio of ability*method, present F 2,12 then p-value

    • There was a significant effect of Ability (Test, F, p=0.032), where scores f students with moderate typing ability were higher than scores of students with no typing ability (Tukey HSD). The effect of Method was not significant (F, p=0.901); however, there was a significant AbilityXMethod interaction (F, p=0.0465). Examination of simple main effects was inconclusive, but there was a trend of lower scores in students skilled in typing and using laptops versus skilled students using bluebooks (test).

  • A researcher is interested in studying the effect of group psychotherapy and medication on depression. 30 patients participated in the study

  • The researcher designed this experiment to examine if types of therapy, psychotherapy and medication, interact in their effect on depression

    • 2x3 factorial design

      • 30 total patients

      • 6 subgroups

      • 5 patients per subgroup

    • Psychotherapy: Psychotherapy, No Psychotherapy

    • Medication: Placebo, low Dose, High Dose

    • Dependent Variable: Depression scores

    • Both main effects are Statistically Significant

    • The Interaction Term is significant

      • Run a Tukey HSD after to find why

    • Group psychotherapy influenced subjects in the placebo and low dose treatments, but it had no influence on people given the high dose treatment

Efficient Use of Resources

  • Interested in testing the effects of 2 environmental variables, air temperature and nitrate, on the growth of cotton

  • For each independent variable, you have 3 levels

    • High, Medium, and Low

  • Produces a 3x3 design

  • For power purposes, you want to have 27 cotton plants per level of each main effect

  • Unexplained variance is reduced in the 2-way ANOVA over the 1-way ANOVA

    • Results in higher f-ratios

    • 2-way ANOVA has more Power

  • Incorporating variables into models that explain or account for observed variability in the dependent variable is a good thing, for this is one of our major goals as researchers

  • Two Types of Independent Variables incorporated into factorial Designs:

    • Experimental Variables

      • We are directly interested in the effect of all of those variables, including their interaction, on the dependent variable;

      • These are typically fixed effects

    • Control or Block(ing) Variables

      • Incorporated solely to reduce the amount of experimental error, giving more resolution for exploring effects of Experimental Variables of interest

      • Typically the Block Variable is a random effect

The Randomized Group Design

  • This is they "typical" factorial design we've considered so far

  • 3x2 factorial design with 2 experimental factors

  • Method and ability are fixed effects

    • Method: laptop, bluebook

    • Ability: none, moderate, skilled

    • Method*Ability

    • Dependent Variable: essay score

  • In this design, each cell (6 subgroup combinations) has multiple subjects or replicates (3). Multiple subjects were "randomly" assigned to each cell in the 3x2 design.

  • In this design, you apply a 2-way ANOVA and include the interaction term

The Simple Randomized Block Design

  • 5 m-squared of Bermuda grass

  • Enough space for 3 plots per location

  • Dependent Variable - amount of above ground grass (kg)

    • There is 1 replicate per cell

    • There are 5 replicates per level of Nutrient

    • The factor of interest is Nutrient

    • Can calculate Block Effect since n=3 per block

  • Why is it advantageous to include Location as a Block factor?

  • What is the unit of replication in this design?

The Simple Randomized Block Design (A Simple Mixed Model)

  • "Randomized Complete Blocks", "ANOVA w/o Replication"

  • Interspersion of treatments across "Blocks"

  • This design is used to reduce the amount of experimental error through the inclusion of a block variable that is usually a random effect factor

  • This design can be viewed as somewhat of a hybrid between a 1-way and 2-way ANOVA because you have 2 factors in the model, but you are only interested in the effect of one fixed Independent Variable (at least in the simple randomized block design)

  • The simple randomized block design has 1 fixed effect and 1 random effect

  • The statistical model is a mixed, Model III, 2-way ANOVA without replication

  • The interaction term is not included in the model because there is not enough replication per cell to calculate it

  • In the simple randomized block design, you assume no interaction between Experimental factor and the Block factor

  • In JMP:

    • 3 total columns

      • 1 fixed effect column

      • 1 random effect column

      • 1 dependent variable column

      • 1 dependent and 2 independent

        • Test assumptions of the dependent variable the same way we've been doing it

          • Save residuals and test assumptions

    • Analyze --> Fit Model

    • Independent variable into Y

    • Fixed effect into model effects

    • To change a variable to a random effect click "Attributes" and then click Random Effect

    • If balanced, change "method" to "Traditional"

    • If imbalanced, change "method" to "REML"

    • Make sure "Effect Details" is enabled

      • Enables you to perform Tukey HSD and post hoc tests

Randomized Block - A Biological Example (Zar 12.4)

  • HO: The mean weight of guinea pigs is the same on four specified diets

  • 1 block comprised of 4 cages

  • 1 guinea pig per cage

  • 1 replicate of each diet per block (assigned randomly)

  • Expected gradients within barn:

    • Temperature

    • Light

    • Noise

    • Draft

  • Pigs in Blocks will experience similar conditions

    • Interspersion of Treatments

  • N=20 pigs

  • 5 replicates per diet

  • Columns

    • Block - nominal

    • Diet - nominal

    • Weight gain - Continuous discrete

Repeated Measures Design

  • Each subject receives all levels of Factor A

  • Slide 65 2-Way ANOVA ppt

  • Possibly subject to the "carry-over effect"

    • Subject goes through Treatment 1 and then Treatment 2, but it going through the 1st Treatment may have affected its response

  • Dependencies/correlations across treatments

  • Advantages:

    • Individuals/subjects are acting like "blocks" - homogeneity of potential sources of error

    • Experimental error introduced into the study due to variability between subjects can be accounted for

  • Also called a within-subjects or treatment-by-subject design

  • Subjects are receiving all levels of Factor A

  • Among subject variability can be accounted for and "factored out"

  • RMD design is similar to randomized block design b/c subjects function similarly to "blocks"

  • Intra subject dependencies present both positives & negatives

  • Dependencies are a negative if "carry-over" effects exist across treatments

  • Assess normality using residuals as in one-way ANOVA (will require re-entering data in traditional form)

  • Assess correlation structure (Test of sphericity)

Addressing the Sphericity Assumption

  • JMP provides a test for Sphericity using the Multivariate framework:

    • If the assumption is met, proceed with the unadjusted, univariate F-test

    • If the assumption is not met (Chi-Square <0.05 Sphericity Test)

        1. Apply an adjusted, univariate F-test (Geisser-Greenhouse or Huynh-Feldt

      • 2. Apply a F-test generated MANOVA - "Multivariate F" (Multiple dependent variables)

    • In JMP:

      • Go into Graph Builder and compare individuals (ex. Cholesterol vs. Drug)

      • Add random effect into right-hand overlay (top right box above "Color")

        • Examine correlation structure within Subjects across Groups

        • Note the y-intercept variation (Subject Variation is Important)

Analyzing Repeated Measures Design using Mixed Model

  • Assess normality and variances using residuals in One-Way ANOVA

  • Assess correlations within subjects across groups visually (Graph Builder)

  • Paired t-Test is most appropriate post hoc analysis w/ Bonferroni adjusted alpha level, but can do Tukey HSD

Multiway Factorial Analysis of Variance

  • You can extend the basic ANOVA design to experiments with more than 2 factors

  • In these multiway designs you can examine the effects of numerous factors (3,4,5,etc) simultaneously, with interactions, in one model

  • Factors can be a combination of both fixed and random effects

  • Typically, you never see more than 3 or 4 factors (3-way & 4-way ANOVA)


Tests of Difference vs Tests of Relationships

  • Tests of Difference

    • Is this group different from that group(s)?

      • t-Tests, ANOVA's

      • Independent variable is typically nominal/categorical

  • Tests of Relationships

    • Is variable A related (co-vary) with variable B?

      • Correlation and Regression

      • Independent variable is typically continuous (but doesn't have to be, particularly in correlation, where there isn't a "dependent" or "independent" variable)

  • Correlation

    • To what degree does one variable vary with another? (Does not imply cause & effect)

  • Regression

    • To what degree is variable Y dependent on variable X? (Implies a cause-and-effect relationship exists)

Correlation

  • Research question: Are two (or >2) variables "associated" with other?

  • Important Point: The research question is not one of cause & effect

    • Examples:

      • Do two methods of measuring blood pressure tend to give corresponding results?

        • Blood pressure measurements with 2 methods on the same units

      • How strongly associated are pairs of morphometric characteristics of grizzly bears?

        • Data points are 1 grizzly bear

        • Y and X values are leg length and arm length

      • Is there correspondence between concentrations of cadmium and lead in sediments of streams in a watershed impacted by industrial pollution?

        • Sample unit is sediment core sample

        • From that sample we get a measurement of [Cadmium] and a measurement of [Lead]. These are X and Y values

  • In JMP:

    • Make sure both are normally distributed & relationship is linear

    • Go to Analyze -> Multivariate Methods -> Multivariate -> Pairwise Correlations

      • Put Correlation value & P-value (r=0.952, p<0.001)

Simple Linear Correlation

  • Three questions:

      1. Are two measurement variables related in a linear fashion?

      1. If they are related, what is the direction (+ or -)?

      1. How strong is the relationship? (Differences, parallel to effect size)

  • Smoking Example: cig smoking/day & CHD mortality/10,000 people

  • Null H0: There is no correlation b/w Smoking and CHD. (correlation coefficient = 0)

  • X-axis and Y-axis placement does NOT matter in correlations

  • Do NOT put a line of best fit in scatterplot. If putting a visual aide, add an ellipse

The Correlation Coefficient (r)

  • aka Pearson's r or the Pearson product-moment correlation coefficient

  • r is a measure of association between two variables (X&Y)

  • Two pieces of information obtained from rvalues:

      1. Sign (+ or -) of r indicates whether the association is positive or negative

      1. Size of r (from -1 to 1) indicates the magnitude of the association (further from 0 = stronger)

  • Since r values range from -1 to 1, 0.85 indicates a strong positive relationship exists b/w cigarette consumption and CHD

  • We can make this conclusion regardless of the underlying distribution of the two variables. In this sense, we view the r value as an index (rules of thumb)

The Correlation Coefficient (r): Test of Significance

  • There are no statistical assumptions associated with calculating r, and if an index is what you need to make the inference of interest, then stop here. (However, this is typically not how Biologists use r)

  • Assumptions of Normality

    • Both variables X&Y, were sampled randomly from a population with a normal distribution (bivariate normal distribution) and the relationship between the variables is linear

    • To calculate a p-value for r, X&Y need to be normally distributed and the relationship needs to be linear (DO IN GRAPH BUILDER)

      • If performing a transformation to one variable, apply to both

    • Nonparametric (Spearman's rs & Kendall's tau) has good power so transformation is not necessary

Nonparametric Correlation (Ranks)

  • Spearman's correlation coefficient (rs); ranges from -1 to 1 or Kendall's tau. Analyses based on ranks

  • Apply when bivariate normal assumptions are violated, when data are ordinal, or the relationship is nonlinear

    • In JMP:

      • Analyze -> Multivariate Methods -> Multivariate

      • Then you can find "linear correlations" and "nonparametric correlations"

      • Reid usually goes with Spearman's over Kendall's

        • Reporting is (r(sub-s), p-value)

    • The more elliptical the scatter of points, the more intense the correlation

    • Be careful interpreting significant r's

    • As sample size increases, the critical value decreases

    • Two-tailed usually

Multiple Correlations: The Correlation Matrix

  • What if you have more than 2 measurement variables

    • 8 variables: 28 total comparisons

    • alpha per correlation = ?

  • Holm-Bonferroni Method (Holm 1979) WILL NOT BE TESTED OVER

    • Stop at the first non-significant outcome

    • Order the p-values from smallest to greatest

      • H4 = 0.005

      • H1 = 0.01

      • H3 = 0.03

      • H2 = 0.04

    • Work the Holm-Bonferroni formula for the first rank:

      • HB = Target alpha / (n-rank+1)

      • HB = 0.05 / (4 -1+1) = 0.0125

The Limitations of Correlation

  • Correlation analysis indicates cigarette consumption and CHD are related. It also tells us the relationship is positive and relatively strong (r= 0.85)

    • BUT...

      • You may want to predict incidence of CHD for given levels of cigarette consumption or how much does CHD increase with a unit increase in cigarette consumption. This hypothesis of causality about the relationship requires Regression Analysis

Sparrow Wing Length as a Function of Age

  • Correlation--> X <--> Y (Covariation)

    • Just has 2 variables, technically not independent or dependent variables

  • Regression--> X ---> Y

    • Age (Independent) ---> Wing Length (Dependent)

A Regression Example

  • A snake physiologist wished to investigate the effect of temperature on the heart rate of juniper pythons. She selected nine specimens of approximately the same age, size, and sex and placed each animal at preselected temperature between 2 and 18 C. After the snakes equilibrated to their ambient temperatures, she measured their heart rates. n=9

    • Temp (IV) (Fixed)

    • Heart Rate (DV)

Simple Linear Regression

  • Simple vs. Multiple regression (one predictor variable vs multiple predictors)

  • Linear regression vs non-linear regression

  • Simple Linear Regression:

    • Species a straight line relationship between two variables

      • Predictor variable (X) (usually a Fixed Effect-Model 1)

      • Response variable (Y)

    • Specifies a predictive relationship between X&Y

    • SLR analysis involves producing

      • A regression line or "best fit" line through points on a scatterplot of X&Y

      • A regression equation that relates X&Y

Building the Regression Equation

  • Regression implies a functional (cause-effect) relationship between variables

  • Two components need to be calculated from the data:

    • Slope (b) (the "regression coefficient"

    • y-intercept (a)

    • Y=bX+a

The Regression Analysis

  • The regression analysis calculates values of a and b for a data set so that the resulting equation is the best obtainable for the data

The "Best Fit" Line

  • Not all observed values of y fall on the line

  • The values of y that fall directly on the line are the predicted values "y-hat"

  • The sums of squares resulting from squaring the values of y - y hat is much smaller than the SS of y without consideration of x

Testing the Statistical Significance of the Regression Model

  • Evaluate significance with ANOVA where the F test is testing the overall significance of the model

  • 3 Sums of Squares calculations (& df's) are needed:

    • Total SS (df = N-1)

    • Regression or "Model" SS (df=1)

    • Residual/Error SS (df = N-2)

  • In JMP:

    • Fit "Y by X"

    • Input variables

    • Select "Fit Line" at the red carat to get "Linear Fit" results box

    • Results are in "Analysis of Variance" under Prob > F (significance of Model)

  • The F-test tells us we have a very low probability of committing a Type I error and that python heart rate does vary linearly with temperature

  • How good is the model?

    • Big F tells us that it's good

    • Calculate % of how much variation is within the model (Model/C. Total)

The Coefficient of Determination (r^2)

  • Tells us what % of the variability in the dependent variable (y) is explained by the independent variable (x)

  • r^2 = SSmodel/SStotal

  • 93.9% of the variability observed in example

Using the Model: Predicting y from x

  • y=2.14+1.77x

  • Heart Rate=2.14+1.77(Temp.)

  • Plug in values of x and solve for y

  • Can predict heart rate at temps that were not tested (e.g, 5 and 9ºC)

  • Model can be used by other researchers

Assumptions of Model 1 Simple Linear Regression

  • The IV is usually fixed but can be random

  • The y observations are independent

  • The functional relationship is linear

  • Residuals are normally distributed

  • Variances are equal

    • After you've built the model, Fit a line, and looking at the JMP output..

    • Click on the red triangle beside "Linear Fit" (Use graph in analysis)

      • To check normality - click on "save residuals"

      • To assess variances - click on "plot residuals"

        • If non-normal or variances are violated, transform both variables

Regression or ANOVA?

  • Age as a continuous variable

    • Data from basically 13 levels of the IV (Age)

    • "Replicated" regression is best

    • Relationship between y and all x's within the range of values tested is quantified

Problem Background:

  • A researcher is interested in site-specific differences in body size among populations of rattlesnakes. Why an interest in body size, well, reproductive traits in animals (e.g., number and size of offspring) often vary with body size. Populations may vary in body size due to differences in resource availability, resource quality, size-specific predation, population density, etc. Most importantly, size of rattlesnakes may vary with age.

  • How can the researcher examine for geographic differences in body size between two populations knowing that size will also vary due to differences in age?

    • Y-variable is body size (Continuous discrete)

    • Location (Categorical)

    • X-Variable is Age (Continuous discrete)

    • ANCOVA: Analysis of Covariance, 2 X-variables that are Continuous & Categorical

ANCOVA Requirements

  • 1 Dependent Variable (Continuous)

  • 1 IV (Categorical)

  • 1 Covariate (Continuous)

Covariate

  • Variable that is related to the DV, which you can't manipulate, but you want to account for its relationship with the DV

  • Increased sensitivity of tests of main effects and interactions since usage of a covariate will result in a reduction of error variance

ANCOVA Assumptions

  • Residuals are normally distributed and variances are homogenous

  • Linearity - significant linear relationship between covariate and DV

  • Since covariate is used as a linear predictor of the DV yet it is not a fixed effect, the covariate is assumed to be measured without any error

  • Homogeneity of regressions (i.e. no significant interaction of GroupXCovariate)

ANCOVA In JMP:

  • Use the "Fit Model" Platform

  • Model should contain:

    • IV (Drug)

    • Covariate (X)

    • Interaction Term (Drug*X)

  • Look if Covariate is significant. (Example is significant)

  • Look at Interaction Term (Example is NOT significant, lines are statistically parallel)

  • Look at IV (Example is not significant, no drug effect)

    • Adjusted Means

      • When using ANCOVA, the means for each group get adjusted by the Covariate-Dependent Variable relationship

      • If the Covariate has a significant relationship with the Dependent Variable then comparisons are made on the adjusted means

      • When doing ANCOVA, you should graph/report adjusted means

ANCOVA - A Biological Example

  • Fish inhabiting caves often have small eyes relative to fish living in streams on the surface and is though to be an example of adaptation to life in a cave environment. Banded Sculpin is a common fish found in surface streams of North America, but it can also sometimes be found living in caves. A cave population of Banded Sculpin in Missouri are showing signs of cave adaptation similar to "true" cavefishes. Researchers are interested in whether or not sculpin in surface streams have different eye size relative to sculpin living in caves. Eye size of individual sculpin may also vary with total length of the fish

  • Construct the model

    • DV: Y-variable: (Eye size)

    • IV: (Location)

    • Covariate: X-variable (Total length)

    • Interaction Term: Location*Total length

Advanced Regression Techniques

  • Multiple regression (>1 IV)

  • Analysis of Covariance (ANCOVA)-mixture of regression & ANOVA

Analysis of Frequencies

  • Interest in the frequency that an event occurs...

  • How does an observed outcome compare to an expected outcome or distribution?

    • Is the sex ratio (M:F) in a population of box turtles the expected 1:1 ratio?

    • Do the frequencies of observed phenotypes conform to the expected 3:1 ratio?

    • Do mountain lions eat equal amounts of white-tailed deer and mule deer?

      • Goodness of Fit?

Data Characteristics

  • "Count data" - discrete number of observations

  • 1 variable with categories or "bins"

  • 2 or more categories/bins

  • Multiple independent observations within categories (5 minimum; >10 recommended)

Chi-Square Goodness of Fit Test

  • Quarter tossing

    • Probability of Heads? Tails?

    • Is observed significantly different from expected? Is the disparity due to random chance?

      • OR is the deviation not due to random chance?

  • JMP does test for us, but simple to calculate by hand

  • We can test to see if our observed frequencies "Fit" our expectations

    • This is the chi-squared Goodness-of-Fit test

    • Converts the difference between frequencies we observe and frequencies we expected

  • Nonparametric test (NO ASSUMPTIONS)

  • Data are frequencies (counts)

  • Observations are independent

  • Categories have large enough expected frequencies. When there are 4 or fewer categories, none of the expected frequencies are less than 5%

Conducting Chi-Square Analysis: Goodness of Fit IN JMP

  • 2 variables, so 2 columns

  • Frequency/Count column & IV column

    • "Analyze" -> "Distributions"

      • Input IV into Y column spot

      • Input Frequency/Count data into Frequency spot

    • JMP spits out percentages

    • Click carat next to IV and click "Test Probabilities"

      • Input expected/hypothesized probabilities

        • Should add to 1.0

        • Run the test

          • ONLY report Pearson test results. ChiSquare value, df, & p-value

          • Report as: (Chi-Square test, ChiSquare value, df, p-value)

Chi-Square Test for Association or Independence

  • Are two categorical/nominal variables related/associated

  • Same data type and assumptions as Goodness of Fit Tests

  • Calculations are similar except not comparing to an "expected"

  • Contingency Table

  • Example

    • H0: Age 0 male & female Ohio shrimp captures at McCallie Access, Mississippi River are not associated (do not differ) with month

    • HA: Captures of male and female Age 0 Ohio shrimp are related to month

      • Variable A: June, July, August

      • Variable B: Male Age 0, Female Age 0

        • Fit Y by X

          • Y: Sex

          • X: Month

          • Frequency: Count

            • JMP gives "Contingency Table" and resulting Chi-Square values (look at Pearson results)

              • Can apply Bonferroni adjusted alpha level, but not necessary if risk of family-wise error is low







PB

Biometry

t-test hypothesis tests: The mean/median of Group A = the mean/median of Group B

One-way ANOVA hypothesis tests: 3 groups. Tests means/medians of A = B = C

Descriptive Statistics: (e.g., mean, s.d., SEM, etc.) help organize/summarize data

Inferential Statistics: (e.g., t-test and ANOVA) allows us to generalize conclusions

Manipulative -Manipulating the application of the ind. variable __
Mensurative - NOT manipulating the application of the ind. variable__

Both are experimental studies, both have ind. & dep. variables, subtle difference

Variable - "characteristic that may differ from one biological entity to another" Zar (2010)

Dependent Variables - Response variablesIndependent Variables - Treatments, Factors

All experiments have at least 1 of each

How many variables & data type and scale of measurement dictate which inferential tool to apply

Experimental Goal?

  • Establish cause/effect relationship between ind. variable & dep. variable

    • To accomplish this ideally, all subjects must be identical/similar except for the level of the ind. variable they "receive"

  • Establish that variables are associated

  • Significant differences in responses correlate to good variables

Nuisance Variable Example

Prairie lizard manipulative study

Dep. - Snout vent length

Ind. Temperature at

  • n= 10 @ 10ºC

  • n = 10 @ 15ºC

  • n = 10 @ 20ºC

Other factors (nuisance/confounding variables)

  • Sex (controlled by only using one gender)

  • Diet (controlled by feeding the same thing in same amounts)

  • Age

  • Stress

  • Hormone levels

  • Reproductive status

  • Genetics

  • Competition

Approaches to nuisance/confounding variables

  • Identify variables beforehand and hold conditions constant across subjects and treatments

  • Distribute symmetrically across groups & have high sample size

  • Incorporate into the experimental model by adding another potential independent variable

  • Disperse the "nuisance effect" across all treatment conditions via randomization procedures (random assignment of subjects to treatments/conditions)

If concerned about "Procedural Effects": Implement a Control(s)

  • Negative control - test subjects/sample units that receive all "procedures" except the experimental treatment/manipulation (saline, sugar pill, non-restored study site, etc.)

  • Positive Control - test subjects/sample units receive all procedures except the experimental treatment/manipulation but you expect a known outcome from this group. This provides a group to compare with that controls for unknown sources of nuisance. (aspirin-headache example-give a group a drug known to deal with headaches)

  • "Controls" in Mensurative Experiments? Don't necessarily have a control for procedural effects, but have a good comparison across groups. Known benchmark

Examples of variable types and their scales of measurement

  • Attribute - Nominal - Sex of snake: male or female

  • Ranks - Ordinal - Pigmentation levels: going from no pigmentation to full pigmentation

  • Discrete Measurement - Ratio - Number of points on deer antlers

  • Continuous Measurement - Ordinal Ratio - Body temperature (ºCelsius), Weight of a warthog (kg)

Converting Data from One Scale to Another Example

  • Continuous variable (e.g., tree height) measured on a continuous scale ----> Convert to ranked data on an ordinal scale

Continuous in cm --> Ordinal in ranks

  • 100 --> 6

  • 500 --> 5

  • 525 --> 4

  • 1000 --> 1

  • 642 --> 3

  • 701 --> 2

  • 10 --> 7

  • Continuous works with the mean while converting to ordinal works with the median

  • Distances between data has not been retained, but makes it more desirable to work with

  • Reduction in variation between data may allow better testing

Descriptive Statistics - way to summarize and organize dataMeasures of Central Tendency

  • location of sample along the measurement sale

  • what is the location of the "typical" individual?

  • Arithmetic Mean

    • u = population or universal mean (the true mean)

    • x̄ = sample mean (an estimate of the true mean)

  • Geometric Mean

    • antilog of arithmetic mean of log-transformed data

  • Median (M)

    • middle value of a ranked data set; most appropriate when data are highly skewed or you're dealing with data on an ordinal scale

  • Mode

    • the value that occurs most frequently; number of modes can be useful

  • Skewness

    • measure of symmetry

      • 0 = symmetrical (normal distribution)

        • = tail to the right

        • = tail to the left

Measures of Dispersion and Variability

  • the distribution or spread of measurements

  • Range

    • difference between largest & smallest observation

    • usually given as minimum & maximum value

  • Variance

    • σ² = population variance

    • s² = sample variance

    • mean of squared deviations of measurements from their mean typical not reported since in different units from original data; used to calculate many statistical tests

    • S^2 = rac{sum (x

    • cannot be negative

    • increases as dispersion or variability increases

    • (n-1) = degrees of freedom (df); real units of information about deviation from the average

  • Standard Deviation (sd)

    • s = square root of variance (s²)

  • Coefficient of Variation (CV): CV = (sd/mean) x 100%

    • a measure of relative variability

    • has no units so is useful to compare sets of data collected on different scales

    • (e.g., morphological data in mm and m; T to Dissolved Oxygen (DO))

    • most applicable to ratio scale data

  • Indices of Diversity: distribution of observations among categories

Types of Distributions

  • Many statistical tests are based on assumptions that the data adhere to the properties of a given distribution

  • Discrete Distributions

    • Poisson distribution

      • items distributed randomly (independently)

    • Binomial distribution

      • two possible outcomes w/ equal prob. of occurrence

  • Continuous Distributions

    • Normal distribution

      • symmetrical; bell-shaped curve

    • t-distribution

      • symmetrical; related to normal distribution

    • Chi-square distribution

      • asymmetrical

  • Normal Distribution

    • symmetrical, continuous distribution

    • described by the mean and standard deviation (estimated by sample mean & sd)

    • most values lie in proximity of the mean random samples of a given n from a normal population will be normally distributed

    • Central Limit Theorem: at some large n even means of samples from a non-normal population will approach normality (even means from Poisson & Binomial distr.)

    • normal distribution is the basis of many statistical tests

Are Sample Data Normally Distributed?

  • Despite CLT, sample data may not be normally distributed due to small n or, more typically, for unknown reasons

  • Will want to check sample data to see if it is approximately normally distributed

  • "Goodness-of-fit Tests": (not really recommended)

    • Kolmogorov-Smirnov goodness-of-fit test

    • Chi-square goodness-of-fit test

Normal Quantile Plot

  • If data are perfectly normal they will lie along a straight line, inside the LCI's

  • Lilliefors Confidence Intervals

    • used to test for normality in a graphical way; if points fall outside the CI (confidence intervals) then data are significantly different from normal at alpha = 0.05

Shapiro-Wilk test

  • What null hypothesis is it actually testing?

    • The distribution of the sample data is equal to the normal distribution

  • After Normal Quantile Plot, click continuous then normal

  • Go down to data and click under the Fitted Normal option triangle, then choose Goodness-of-Fit

  • Then JMP gives you the Probability so you can choose to reject or fail to reject

  • If not normally distributed...

    • ignore it?

      • Inflating chance to achieve a Type 1 error

        • When you reject the null when the null is actually true

    • transform raw data to "fix it" & resume test?

      • Doesn't provide much evidence that it fixed it

    • choose a nonparametric equivalent?

      • Usually the parametric tool has more statistical power than the nonparametric equivalent

Statistical Testing and Probability

  • Probability is the likelihood of an event

  • Statistical tests provide the likelihood that the null HO is true (P-values)

  • At low P-values the null is rejected and the alternate is accepted

  • The lower the P-value, the more confident you are that the null is false

What is a "low" P-value?

  • Researchers arbitrarily set the probability used as the criterion for rejection of the null

  • This value is called the significance level or α (alpha)

  • Convention is to apply an alpha level of 0.050

  • If α = 0.05 then at P-values less than 0.05 you reject the null HO (i.e. means are "significantly different")



Statistical Errors in Hypothesis Testing (Zar Section 6.3)

  • In reality, the null hypothesis is either true or false

  • Because inferences are made from samples, there is always the possibility of making the wrong inference

  • 2 ways of making the wrong inference:

  • Type 1 Error

    • Rejecting the null hypothesis when in fact the null is true; a "false positive"'; you determine the means are significantly different when in fact they are not (We must control this error rate)

    • α error

  • Type 2 Error

    • Not rejecting the null when in fact the null is false; you determine the means are not significantly different when they really are (Considered to be a "less dangerous" error)

    • Designing experiments that give you the best chance possible to reject the null when it is in fact false is the best way to avoid Type 2 Error

    • β error

Insert probability notes here

Prospective Power Analysis

  • performed during planning stages of a study to explore how changes in study design (e.g., n, alpha, and effect size) impact objectives/goals of the study including interpretations of statistical tests & potential outcomes

  • Common applications of Prospective Power Analysis are:

    • to determine n required to attain a desired level of power at a specified minimum effect size, alpha-level, and standard deviation

    • to determine power of a test when n is constrained logistically (perhaps you then need to adjust alpha if power is too low)

    • to determine the minimum detectable meaningful effect size (the question here is what is a biologically meaningful difference)

    • JMP gives you big N, you need little n to determine sample size per group. Divide big N by however many groups you have to obtain little n

    • Can increase alpha level to increase power

    • Can increase effect size to increase power

  • How to perform Power Analysis in JMP:

    • DOE > Design Diagnostics > Sample Size & Power

    • Depending on scenario, choose

    • E.x. 2-sample means

    • Not messing around with Extra Parameters yet

    • Can change alpha level, Std. Dev = Dispersion, difference to detect = effect size

    • Std.Dev & Effect Size need to be in same units

    • Leave sample size & power blank

    • Click Continue & you will get a curve

    • Sample size on graph is always in Big N

    • Not using to get an exact # of sample size, power analysis is a guide

    • Increasing alpha level increases power, which will decrease sample size

    • Increasing Std.Dev (Dispersion across the Dependent Variable) increases sample size

    • Decreasing effect size increases sample size

E.x. Clinical Research - Experiments investigating treatment of tumors

  • Will Drug A reduce the size of brain tumors?

  • Minimum Effect Size that is Biologically Relevant?

    • Using a minimum of at least 50%

    • Decided alpha level of 0.01, defended because really want to make sure Drug works

    • Need an idea of size of wild-type tumor to get 50% into units

    • Need some measure of Dispersion

      • Collect some data by giving study specimens the drug

      • Has a good idea that wild-type tumor size will be about 30 cubic millimeters

      • Effect size will be 15 cubic millimeters since using minimum of 50%

      • OR look within the literature to see if somebody has done similar things

      • If Dispersions are different, pick the bigger one

      • Using 12 cubic millimeters for Dispersion based on the literature

    • Big N shows 36 at Power of 0.8

    • Based on biological ethics will only give 10 mice cancer, so N=20

    • Doesn't give us a good idea of if Drug will work or not

    • Increasing alpha level to 0.05, makes our N=20 look a lot better

Standard Error of the Mean (SEM; SE)

  • SE is the standard deviation of measurements around a set of means repeatedly calculated from a statistical population or universe

  • SE is a measure of the precision of x-bar as an estimate of u

  • as SE gets smaller, the precision of x-bar increases

    • SE = s(standard deviation)/square root of n

    • incorporates sd & n, two factors that will impact reliability

  • SD - a measure of the dispersion or spread of the sample data

  • SE - a measure of the sampling error or uncertainty in the sample mean as an estimate of the population mean

Confidence Intervals & the Student's t distribution (will never test over confidence intervals)

  • The t distribution

    • Family of distributions related to the normal distribution; shape depends on degrees of freedom

Reporting Rules and Conventions

  • Zar 2010 (Section 7.4 page 108)

  • "No widely accepted convention", but the measure of dispersion must be clearly stated

  • n should be stated somewhere

  • As Text in manuscript: (mean = 27.4 g +/- 2.80 SD) or SE or 95% CI

  • In a Table or Figure

Two-Sample Hypotheses

  • Do differences exist b/w two samples; i.e. are the two samples from two different statistical populations?

  • A number of types of comparisons:

    • means

    • medians

    • variances

    • CV

    • indices of diversity

  • We will explore 2-sample comparisons involving means:

    • comparison of independent samples

    • nonparametric tests of independent samples

    • comparison of paired samples

Comparison of Two Independent Samples

  • For Example: You measure hematocrit in two groups of 17 year olds, males (n=600) and females (n=600)

    • Is hematocrit different between groups?

    • Males - 45.8 +/- 2.8 SD

    • Females - 40.6 +/- 2.9 SD

  • What are the independent and dependent variables?

    • Independent: Sex (male or female)

    • Dependent: Hematocrit values

  • What would the data model types be in JMP?

    • Two columns, sex and hematocrit values

  • Independence of "Samples" or sample units?

    • Each individual person should be a sample unit

    • Can take multiple measurements, just make sure to average before inputting into chart

  • Pseudo replication?

    • Analyzing as you have more replicates when you're actually short

  • 2-tail always has the exact same null hypotheses

    • That the means are the same

Comparing Means from Two Independent Samples

  • Under the following experimental conditions:

    • 1 Dependent Variable that is continuous

    • 1 Independent Variable that is nominal (grouping/categorical) with 2 levels/groups/categories

  • Apply the following test if assumptions hold:

    • Student's t-test

    • Prob > absolute value of t = 2-tailed either direction

    • Prob > t = 1-tailed to the right

    • Prob < t = 1-tailed to the left

    • Always double check degrees of freedom

  • Writing a Concise, Publication-Quality Interpretation of Results of a Statistical Test: "Manuscript Style" (Make it OBVIOUS)

    1. Be direct, say what you mean, mean what you say

    1. Don't just say means were different, or you rejected the null or that you detected a significant difference. State the direction of the effect (e.g., high/low, etc.)

    1. Provide the statistical test, df (or n), test statistic (only go to 2 decimal places), and P-value (only go to 3 decimal places).

    • This is typically put in parentheses following the sentence

  • People given drug G had significantly longer blood clotting times than people given drug B (Student's t-test, df = 11, t = -2.47, P = 0.031) (Figure 1)

Assumptions of the Two-Sample t-test

  • Both samples were taken randomly; i.e., sample units are independent of each other & unbiased

  • The dependent variable is normally distributed (or ~ normal)

    • (combination of frequency distribution, normal quantile plot, and S-W test)

  • Variances of the two groups are equal (or almost equal)

    • (eyeball SD - 2-fold difference?, variance tests, e.g., Levene's Test)

  • What is the risk?

    • You risk elevating your Type I error rate above the stated alpha level

    • Violations are more serious if sample sizes are small (>30 Zar 2010), you are doing a 1-tailed test, or your sample sizes are severely unbalanced

  • If all assumptions are confirmed = Run of the mill t-test

If Assumptions are severely violated, what do you do?

  • If normality (normal distribution) is off, apply a data transformation, if "corrected", proceed w/ the t-test using the transformed data set

    • Report testing results from transformed analysis, but usually report sample means and dispersion on the original scale in text and/or in graphs/tables

      • Variance Fine

  • If only variances are violated you can conduct a t-test that has been "corrected for" unequal variances

    • Welch's t-test

      • Zar pg. 138

  • If normality cannot be "fixed"...

    • Conduct the Mann-Whitney U test which is a nonparametric equivalent of the t-test

      • JMP calculates the Wilcoxon Test which is equivalent to the Mann-Whitney test

        • Zar pg. 146

Wilcoxon test, Mann-Whitney U test, or the Wilcoxon-Mann-Whitney test

  • Nonparametric equivalent of the t-test for independent samples

  • Nonparametric test

    • Distribution-free test, where no or few assumptions are made about the shape of a distribution

    • Does not focus on any specific parameter such as the mean

  • Test specifics:

    • Used to test 2 groups

    • Assumes nothing about the underlying distribution or homogeneity of variances

    • H0: population distribution of sample 1 = sample 2 (test of medians)

    • Calculates test statistic based on ranks (position) of the raw data

    • Good when data set has extreme values in it, but has lower power than t-test, unless assumptions are severely violated

    • Why not just always apply a nonparametric test with every data set?

      • Usually has less power

Welch's test

  • A special derivation of the t-test

    • Used when normality is correct, but variances are not equal

  • Can be identified when df are not a whole number

    • e.g. df on normal t-test are 11, on Welch's they may be 10.70

Comparison of Paired Samples (Non-Independent)

  • In contrast to independent samples, in a "paired" design, sample units are linked or correlated in some way with a member in the other group(s)

    • i.e. members of a pair have more in common than with members of another pair

    • This dependency is planned or by design. However, you make a critical mistake if you apply the wrong inferential test

  • Paired t-test

    • Wilcoxon paired-sample test/Wilcoxon signed rank test (nonparametric equivalent)

  • What if you conduct test assuming independence?

    • Catastrophic failure

    • What happens to Type I error?

      • Increases if highest variance = lowest value

      • Decreases if highest variance = highest value

Family-wise error inflation is when doing multiple comparisons with the same data at the same alpha level. Raises Type I error rate



Multisample Hypotheses

H0: u1 = u2 = u3 ...

  • Design:

    • 1 Dependent Variable (continuous)

    • 1 Independent Variable (categorical/nominal): 3 or more levels

  • Why not conduct a series of t-tests?

    • Type I Error is inflated beyond your stated alpha

    • Type I errors accumulate with each statistical test conducted on the same data set

      • Experimentwise or familywise error rate - must be controlled

Analysis of Variance (ANOVA)

  • The ANOVA family of tests are the most commonly applied statistical tests

  • Inferences about means are made by analyzing variability in the data

  • One model is constructed that includes all means simultaneously; therefore, it controls for familywise error

  • *F-*value (*F-*ratio) is the test statistic (Sir Ronald Aylmer Fisher 1918)

    • A factor/treatment is an independent variable whose values are controlled and varied by the experimenter (e.g., drug type)

      • Are categorical/nominal variables

    • A level is a specific value of a factor (e.g., drug A, drug B, drug C)

  • Analyzes and partitions sources of variation in a dataset

    • 2 kinds of variability

      • Between sample means (among groups)

      • Within groups

    • Total variability comprises within group variability and variability between groups

    • Between Group Variability

      • Treatment effects

        • Group

    • Within Group Variability

      • Individual differences

      • Errors of measurement

        • Error

    • F = Group/Error

    • As the test statistic gets bigger, the variation should get bigger as well. Higher F value is desired

  • In ANOVA, the total variation in the response measurements is divided into portions that may be attributed to various factors, (e.g., amount of variation due to Drug A and amount due to Drug B) Which factor(s) or combination of factors account for significant amounts of the total variation?

    • Partitioning of the variance within the data set

    • If a factor/treatment represents a lot of the total variability relative to variability within groups (error) then it is an important “player”

  • Example: Sandwich types. On One-Way ANOVA Powerpoint

  • ANOVA-F distribution is the underlying distribution

  • F = (Between Group Variability/Within Group Variability)=(MS-group/MS-error)

    • MS stands for Mean Square

  • H0: F = 1 -> No treatment effects (sample means are drawn from same population). (No Sandwich effect)

  • Large F -> Means are different (sample means are from different populations). (There is a Sandwich effect)

Summary of Logic

  • Calculate two estimates of the population variance. MS-error, based on variability within groups, is independent of H0.

Calculations for the ANOVA

  • In order to calculate MS-groups and MS-error we must first calculate the appropriate sums of squares (SS)

  • SS-total

    • Represents sum of squared deviations of all observations from the grand mean

      • SS-total = SS-group + SS-error

  • SS-group

    • Sum of squared deviations of group means from the grand mean. In effect, a measure of differences between groups

      • Insert formula

  • SS-error

    • Sum of squared deviations within each group. Usually obtained by subtraction

      • SS-error = SS-total - SS-group

Degrees of Freedom

  • In order to calculate MS-group and MS-error we need to know the degrees of freedom associated with SS-group and SS-error

    • df-total = N - 1 (where N is total number of observations)

    • df-group = k - 1 (where k is the number of groups)

    • df-error = df-total - df-group

  • MS-group = (SS-group/df-group)

  • MS-error = (SS-error/df-error)

F-value

  • Having calculated MSgroup and MSerror we can now calculate F

    • F = MS-group / MS-error

  • Between groups estimate of the population variance is much larger than the within groups estimate ® F value greater than 1

  • How much larger than 1.0 must the value of F be to decide that there are differences among the means?

  • Use tables of the F distribution, Zar Table B4, Appendix 21.

    • Gives critical values of F corresponding to the degrees of freedom for the two mean squares (dfgroup and dferror).

      • dfgroup = numerator df (2)

      • dferror = denominator df (12)

        • From tables: (alpha = 0.05) Fcrit=5.10 (F2,12 = 8.45)

        • Because Fobt > Fcrit we can reject Ho and conclude that the groups were sampled from populations with different means. There is an effect of Sandwich Type

The ANOVA Table (HAVE TO BE ABLE TO BUILD FOR MIDTERM)

  • Source ----> SS ----> df ----> MS ----> F ----> P-value

  • Group

  • Error

JMP Analysis of One-Way ANOVA

  • Fit Model

    • Do NOT go into Fit Y by X and conduct the One-Way ANOVA !!!

  • Dependent variable goes into Y box

  • Add independent variables into model effects (bottom) box

  • JMP has different columns for every Independent Variable

  • P-value in JMP under Analysis of Variance = Prob > F

  • Capture Analysis of Variance and Effect Test boxes to give evaluation of the Omnibus Hypothesis

What is a Residual?

  • Distance between the Observed Y and the Predicted Y on a Y by X chart with line of fit

  • How to graph Residuals in JMP

    • 1 variable is Nominal, 1 variable is Continuous

    • Fit Model & run ANOVA

    • Click on Response carat on top left and Click Save Columns

    • Click Residuals, then puts it into spreadsheet

    • Testing assumptions using Residuals in JMP

    • Analyze distribution, add residuals into columns, check distributions, quantile plots, Shapiro-Wilk test

      • Shows us distribution and assumptions of the data

        • If non-normal, transform the RAW DATA and NOT the residuals

        • Then re-find the residuals using the transformed data and THEN test assumptions

      • Run the ANOVA and click Road Diagnostics

        • If Residual by Predicted Plot is not on plot standard, add it using Road Diagnostics

Assumptions of ANOVA (Step 1)

  • Observations/sample units are independent of each other. (i.e. no systematic biases within the data set). Best achieved via random sampling

  • The data are normally distributed, better yet, the residuals are normally distributed

    • Save residuals to the data spreadsheet

    • Examine frequency distribution, normal quantile plot, and Shapiro-Wilk of residuals & interpret

  • Homogeneity of variance (i.e. the variances among groups are equal)

    • Examine the plot of residuals (residual by predicted plot) vs the predicted values. Are the points equally scattered for each group

  • Pig mass varied significantly with type of food (One way ANOVA, F (subscript numerator df (model) denominator df (error)), p-value). Next sentence or 2 would add biological / supporting stats to build answers. (Slide 22)

Assessing Normality within the ANOVA framework

  • In JMP:

    • Fit the ANOVA model

    • Go to "Save Columns"

    • Click on "Residuals"

    • Residuals should be in the data spreadsheet

A Significant Overall F... What's next?

  • Significant overall *F-*test does not indicate that all factors are different from each other

    • Don't know how many means are different, nor which means are different

  • Due to experiment wise error inflation, you cannot proceed with a series of run-of-the-mill t-tests

    • The proper statistical approach is to employ a multiple comparison test (i.e. post hoc testing)

      • Do NOT go through with this if you do not reject the Omnibus hypothesis in Step 1

  • What if the overall F-test is not significant?

    • You cannot proceed

Multiple Comparison Tests - (Parametric) (Step 2)

  • Also known as post hoc or a posteriori tests

  • Many different ones

    • Tukey test, Student Newman-Keuls test, Duncan test, LSD test, Scheffe's test, Fisher test, Bonferroni adjusted t-tests (a special case)

  • Their application is debated in the literature and there is no absolute agreement on the best to use

    • Although, Tukey & SNK are the most commonly employed and, therefore, accepted techniques

  • They operate under the same assumptions as ANOVA and must follow a significant F test

  • Post hoc testing usually involved testing of all possible combinations of means, even comparisons you might not be interested in

    • This is why post hoc testing is often referred to as the testing of "unplanned comparisons"

  • Post hoc tests have built-in procedures that correct for experiment wise error and its influence on Type I error inflation

    • Each one differs slightly in how conservative it is

Post hoc __Testing: The Tukey HSD Test__**

  • Rank the means in ascending order

  • First, compare largest to smallest, then largest to next smallest, etc.

  • You will need to use the table of critical values of the q distribution on Zar pg. 723

Post hoc Testing: The Student Newman-Keuls test (Never have to calculate in this class)

  • SNK is conservative enough (i.e. it controls experimentwise error) but it has more power than the Turkey test; SNK calculations are very similar to Tukey

  • "Multiple Range Test". The "family" of comparisons changes

Graphical Display of post hoc Results

  • Put explanation of notation in figure caption along with results of F-test

  • Start by assigning A to the mean(s) of highest magnitude

  • Shared letters indicate means were not significantly different

  • Strontium concentrations varied significantly (F4,25 = 56.2, P < 0.001) across water bodies, and concentrations were highest in Rock River, moderate in Angler’s Cove, Appletree Lake, and Beaver Pond and lowest in Grayson’s Pond (SNK) (Figure 1).

Tukey vs. SNK

  • Both tests adequately control experiment wise error rate and are appropriate post hoc tests when multiple comparisons are desirable following a significant F test

  • Both can be applied at a specified alpha level (e.g., 0.05)

  • Both are better approaches than multiple t-tests

  • Tukey will result in fewer Type 1 errors than SNK

  • SNK has more power than Tukey (>3 means)

  • I apply Tukey when I want a more conservative test and SNK when the research is more exploratory

  • Tukey is probably more commonly used by Biologists; SNK common in Psychology (Zar recommends Tukey)

The Bonferroni Method: An Additional Way to Control Experimentwise Error

  • The Bonferroni adjustment to alpha levels is commonly used to control experimentwise error in situations where multiple tests are applied (e.g. post hoc comparisons and multiple correlations)

  • a= 0.05 / # of comparisons

    • You can “start” with whatever alpha you want

    • For example: You want to conduct 5 tests (denominator) @ an initial stated alpha of 0.05 (numerator)

    • 0.05/5 = 0.01

      • All 5 tests would actually each be conducted @ alpha = 0.01

      • 5 comp. = 0.01, 8 comp. = 0.006, 12 comp. = 0.004

  • An acceptable approach to post hoc testing is to conduct multiple t-tests but with Bonferroni adjusted alpha levels for each comparison

Dunnett's Test (Control Group vs Other Groups Individually)

  • Accepted post hoc test provided in JMP for this special case

Multiple Comparison Study

  • We learned 3 techniques to control experimentwise error during post hoc testing:

    • Tukey Test

      • built-in adjustments/corrections such that you actually conduct the test at the stated alpha

    • SNK

      • built-in adjustments/corrections such that you actually conduct the test at the stated alpha

    • Bonferroni Method

      • Directly adjust stated alpha based on # of comparisons (You don’t have to do all possible comparisons) All three approaches are conservative relative to not controlling experimentwise error.

    • Bonferroni Method is ultraconservative, particularly at > 5 comparisons (0.01)

  • Basically, you pay a “penalty” when you test all possible, unplanned comparisons following a significant F test because these comparisons have been adjusted to control for experimentwise error

  • There is another option that circumvents being penalized, but you cannot make all pairwise comparisons

Nonparametric ANOVA: The Kruskal-Wallis test

  • Apply this nonparametric equivalent to One-way ANOVA when k>2

  • It is a distribution-free method that analyzes the ranks of the data

  • Sometimes called "ANOVA by ranks"

  • ANOVA is generally more powerful, but K-W provides an alternative when assumptions are not met and a transformation doesn't help

  • The test statistic is H

  • The K-W test is equivalent to the Omnibus *F-*test in ANOVA

K-W in JMP

  • Use the Fit Y by X platform

  • Go to "nonparametric"

  • Choose "Wilcoxon"

  • Desired info is under 1-way Test, ChiSquare Approximation

  • Do NOT report ChiSquare value

  • DO report df, test statistic, and p-value

Post hoc testing Following significant K-W test

  • Dunn Method for Joint Ranking - (Zar pg. 240-241)

    • Preferred, more powerful method for nonparametric

  • Steel-Dwass procedure

    • Less power, doesn't work well when sample sizes are unequal

  • Wilcoxon all pairs (apply Bonferroni adjusted alpha)

    • Less power, highly conservative when groups 5 or more

  • In JMP

    • Fit Y by X

    • Run ANOVA

    • Click carat and choose "nonparametric"

    • Click "Nonparametric Multiple Comparisons"

    • Will show up under "Nonparametric Comparisons"

    • Report p-value and possibly Z-value

Planned Comparisons (Contrasts) vs Unplanned Comparisons

  • Typically, when you design an experiment with multiple levels of the independent variable, you have particular comparisons of interest in mind

  • Planned comparisons are stated a priori while unplanned comparisons are a posteriori, or "thought of after the data are collected"

  • You pay a price for conducting post hoc tests because they incorporate a correction for experimentwise (family wise) error

  • In contrast, planned comparisons (at least some special combinations) are made at the stated alpha, even if the omnibus F is not significant, within the ANOVA itself because they partition the SS-Model

  • Orthogonal Contrasts/Comparisons

Statistical Orthogonality

  • Usually in reference to groups or multiple independent variables

  • Non-overlapping, independent, not correlated

  • Assumption for planned contrasts & multiple regression modeling

Set of Planned Comparisons Must = Orthogonal Contrasts

  • If you want to conduct planned comparisons, you need to decide how many and which ones to make

  • To enjoy the luxury of testing multiple comparisons at the stated alpha, you must follow certain rules (i.e. the comparisons must be orthogonal)

    • A full set of orthogonal contrasts completely partition the SS-Model

    • Therefore, they represent independent pieces of information (i.e., this allows you to work at the originally stated alpha)

    • There are up to a-1 possible contrasts that can comprise a full orthogonal set (but you don't have to conduct all a-1 comparisons)

      • a = # of groups

    • Planned contrasts are 1 df comparisons & you cannot use over the a-1 df

  • How do you establish a set of orthogonal contrasts?

Coding Planned Contrasts

  • Coding is achieved by assigning weights/coefficients to groups to indicate contrasts

  • Rules

    • Groups coded with positive weights will be compared to groups coded with negative weights

    • The sum of weights for a single comparison/contrast should be zero

    • Group(s) not involved in a specific comparison is/are given a zero

    • To be orthogonal, the sum of the products of coefficients within a group must equal zero

  • In JMP

    • Build ANOVA

    • Find normal Sum of Squares and record

    • Click carat where you'd normally run Tukey test

    • Build table using positives and negatives, adding new column after every row

    • Click done and look at SS, needs to be lower or equal to normal Sum of Squares

Planned Comparisons: Wrap-up

  • Incorporate planned comparisons if you can to avoid experimentwise error.

  • Can be a useful approach if grouping “groups” for comparisons is insightful and a goal of the research.

  • It’s best if each comparison represents a unique portion of the SSModel so that comparisons meet the orthogonal requirement.

  • You don’t have to perform all a-1 comparisons, but beware of “unexplained” blocks of variance in SSModel.

  • Don’t “force” orthogonality. In other words, if the planned comparisons of interest aren’t orthogonal, proceed but Bonferroni adjust the alpha levels for each comparison.

  • If all pairwise comparisons of groups are of interest, you should probably just proceed with Tukey or SNK (Season example)



Data Transformations (Zar Chapter 13)

  • To apply parametric statistics, the data set must meet (or approximate)
    the assumptions of normality, equality of variances, and that the magnitude
    of the variances don’t increase with the magnitude of the means (nonadditivity).

  • If you judge that the data violate assumptions, then one option is to try and “correct” the data by applying a transformation.

  • When applying a transformation you change the raw data to a different form or scale. (e.g., F to C is a transformation)

  • After you perform a transformation, and judge the transformation “fixed” the data, conduct parametric tests on the transformed data set. Transform all data,
    not just one level of a variable!

  • Complications arise when reporting means and variability around the means following transformation. You should probably report the mean in the original scale. The most appropriate thing is to report the antilog of the transformed mean (geometric mean). I don’t see people doing this? How to report the variability is another issue ... Be sure you inform readers you analyzed transformed data!

Three Common Transformations

  • The Logarithmic Transformation

  • The Square Root Transformation

  • The Arcsine Transformation

The Logarithmic Transformation

  • X'=log10(X)

  • X'=log10(X+1) -->0' or to avoid (-) values

  • The log family of transformations are the most common

  • It is a variance-stabilizing transformation that will also address nonadditivity and non-normality if the data are right skewed

  • You can apply a log of any base, but log10 appears to be used the most

  • Beware of “log” “log10” “ln” – this is particularly important when reporting a model designed to predict y’s based on inputs of x
    Always check your transformation with a calculator

The Square Root Transformation

  • X'=SqRt(X+0.5)

  • Variance-stabilizing transformation, particularly when variances increase as the means increase, also when the variances & means are of similar magnitude and aren't independent of each other (i.e. Poisson distribution)

  • Helpful to try this transformation if Log doesn't work, especially when a non--parametric tool is not at your fingertips

  • May help transform percentage data when data range is between 0-20% or between 80-100%

  • Similar reporting issues (square the transformed mean & calculate CI's)

The ArcSine Transformation

  • p'=arcsin*(SqRt(p))

  • Proportions tend to form a binomial distribution vs a normal distribution

  • This transformation will "centralize" the data - bring values closer to 50%

  • Arcsin is the inverse sine (sin^-1)

  • Radians vs degrees .. Ugh!

  • Check your transformation vs Zar Appendix Table B.24

Only Pre-Midterm Topics Above


Power and Sample Size in anova

  • Power = 1-Beta

  • In JMP

    • Set up regular Power Analysis

    • Choose k sample means option

    • Set alpha for Omnibus F test

    • Enter SD, variability among all groups combined

    • Enter estimated means of each group (represent smallest detectable difference)

    • Leave sample size & power blank to examine power curves

    • Sample size gives Big N. Remember to divide N/k # of groups

    • Reports sample size required to reject the Omnibus



Different Types of ANOVA ModelsFixed-Effects Model (Model I ANOVA)

  • Levels of the factor are specifically chosen by the experimenter; it is these specific groups about which the experimenter is trying to draw conclusions

  • Most common

Random-Effects Model (Model II ANOVA)

  • Levels of the factor are a random sample of all possible levels, a wider universe of groups

  • Instead of being concerned with effects of specific levels, you are trying to generalize effects across a random selection

Mixed-Effects Model (Model III ANOVA)

  • Some factors/treatments are fixed & some are random

  • SS & MS calculated the same; ANOVA table looks similar

  • Differences in the MS term used in F test for some HO's and how secondary analyses are performed



Factorial ANOVA (Zar Chapter 12)MultiSample HO:

  • One-Way ANOVA model

    • 1 Independent Variable

      • Nominal w/ more than two levels

    • 1 Dependent Variable

      • Continuous

Factorial Analysis of Variance

  • Consider the effects of more than 1 independent variable on a dependent variable simultaneously (in the same model)

  • Advantages

    • No need for multiple 1-way ANOVAs;

      • Can test for interaction among factors

  • Two-way ANOVA model

    • 2 Independent Variable

      • BOTH Nominal each w/ two or more levels

    • 1 Dependent Variable

      • Continuous

Two-way ANOVA/Two-factor ANOVA

  • 2 independent variables = 2 treatments = 2 factors = 2 main effects

    • Don't confuse with multiple levels or groups

  • Let's add a variable to the experiment that tested the effect of 5 sugars. Now we want to test the effect of both sugar and pH on pea growth

    • 5x2 factorial design

    • Each level of 1 factor is in combination with each level of the second factor; "crossed"

    • Balanced design (equal replication)

    • 10 combinations, 50 observations

    • Sugar will have an F value, pH will have an F value, Sugar x pH (interaction between 2 variables) will have an F value

    • Certain tests are informative (Tukey is good, provides pairwise comparisons)

      • Examine interaction plots to help us see visually why we have interaction

  • 2x2 Design Rat & Lard Example

    • Effect of lard type on food consumption of rats (N=12; n=6 per main effect)

    • 2 Main Effects (Fixed) each w/ 2 levels:

      • Fat (Fresh, Rancid)

      • Sex (Female, Male)

      • 3 replicates per subgroup

    • Fit Model Platform

    • Dependent Variable into Y box

    • Independent Variables & Interaction into model effects box

    • Analysis of Variance Prob>F = Omnibus F

    • Manuscript statement looks at Effect Test results & post hoc results

    • To find variation of a certain effect, divide sum of squares model/group by sum of squares total

    • Grab LS Means Plots for both Effects and the Interaction

    • If Main Effect is not significant, then post hoc testing is not necessary

  • Publication Statement:

    • Consumption by rats was significantly higher for fresh fat versus rancid fat (Two-way ANOVA, F1,8 = 41.96, P < 0.001), and main effect “Fat” accounted for 79.0% of the total variation in rat consumption (Table/Figure 1). Sex (F1,8= 2.59, P = 0.146) and Fat*Sex (F1,8= 0.63, P = 0.450) were not significant.

    • Effect tests F & P value tells us that we have a variable effect. REPORT EFFECT TESTS F&P VALUES

  • 3x2 Factorial Design Sandwich Data

    • Sandwich: Meatball, BLT, Spicy Italian

    • Season: Spring, Summer

    • Dependent Variable: Sales from ___ Subway Stores

      • Significant variation within Sandwich Types, so run post hoc analysis to see how and why

    • When a main effect is significant, and has more than 2-levels, proceed as you would in a one-way ANOVA

      • Sales of BLT are significantly higher than sales of Meatball & Spicy Italian

      • Sales of sandwiches did not differ between spring and summer

      • There was no interaction of the main effects, indicating that Season affected sales equally across all Sandwiches

Analysis of Significant Interactions

  • The interaction term allows for examination of the joint effect of factors on the dependent variable (advantage of factorial design)

  • If the interaction term is significant, it means the nature of the effect of one factor on the dependent variable is dependent on levels of the other factor

  • In factorial anova, you should first look to see if the interaction term is significant, because if it is, then biological conclusions made about the main effects are unreliable or not applicable in all instances (combinations)

  • Interaction among factors indicates the effect of the two Main Effects are not independent of each other

  • What if you have a significant Interaction Effect?

    • Effect of Sex and Season on hematocrit of the dark-eyed junco:

      • No effect of Sex (p=0.26)

      • Significant effect of Season (p=0.021) (Hematocrit was higher in the Spring)

      • Significant interaction of Sex X Season (p=0.032)

        • The F-test of the interaction is enough statistical information to conclude that Hc of female juncos in spring is higher than Hc of female juncos in summer

        • Note: You could not draw strong conclusions about the Season-effect without

Interpreting Interaction: Factors with >2-levels

  • A professor gives a final exam that's an essay. Students are randomly assigned to either take the exam with laptops or write in blue-books. Additionally, students are put into three categories based on typing ability: None, Moderate, Skilled. The instructor was interested in the effect of Method, Ability, and the interaction of those two on score on the essay. Grades assigned "blindly".

    • Method: Laptop, blue-book

    • Ability: None, Moderate, Skilled

    • Method X Ability

    • Dependent Variable: Essay score

  • Main Effects:

    • Ability - Significant

      • Proceed with post hoc testing (Tukey HSD)

    • Method - Not Significant

    • Interaction - Significant

      • Now, you must analyze the simple main effects. This entails examining the changes in effect of one factor over levels of the other

      • Focus on key comparisons, don't have to run all tests

  • Presenting results

    • For F-ratio of ability, present F 2,12 then p-value

    • For F-ratio of ability*method, present F 2,12 then p-value

    • There was a significant effect of Ability (Test, F, p=0.032), where scores f students with moderate typing ability were higher than scores of students with no typing ability (Tukey HSD). The effect of Method was not significant (F, p=0.901); however, there was a significant AbilityXMethod interaction (F, p=0.0465). Examination of simple main effects was inconclusive, but there was a trend of lower scores in students skilled in typing and using laptops versus skilled students using bluebooks (test).

  • A researcher is interested in studying the effect of group psychotherapy and medication on depression. 30 patients participated in the study

  • The researcher designed this experiment to examine if types of therapy, psychotherapy and medication, interact in their effect on depression

    • 2x3 factorial design

      • 30 total patients

      • 6 subgroups

      • 5 patients per subgroup

    • Psychotherapy: Psychotherapy, No Psychotherapy

    • Medication: Placebo, low Dose, High Dose

    • Dependent Variable: Depression scores

    • Both main effects are Statistically Significant

    • The Interaction Term is significant

      • Run a Tukey HSD after to find why

    • Group psychotherapy influenced subjects in the placebo and low dose treatments, but it had no influence on people given the high dose treatment

Efficient Use of Resources

  • Interested in testing the effects of 2 environmental variables, air temperature and nitrate, on the growth of cotton

  • For each independent variable, you have 3 levels

    • High, Medium, and Low

  • Produces a 3x3 design

  • For power purposes, you want to have 27 cotton plants per level of each main effect

  • Unexplained variance is reduced in the 2-way ANOVA over the 1-way ANOVA

    • Results in higher f-ratios

    • 2-way ANOVA has more Power

  • Incorporating variables into models that explain or account for observed variability in the dependent variable is a good thing, for this is one of our major goals as researchers

  • Two Types of Independent Variables incorporated into factorial Designs:

    • Experimental Variables

      • We are directly interested in the effect of all of those variables, including their interaction, on the dependent variable;

      • These are typically fixed effects

    • Control or Block(ing) Variables

      • Incorporated solely to reduce the amount of experimental error, giving more resolution for exploring effects of Experimental Variables of interest

      • Typically the Block Variable is a random effect

The Randomized Group Design

  • This is they "typical" factorial design we've considered so far

  • 3x2 factorial design with 2 experimental factors

  • Method and ability are fixed effects

    • Method: laptop, bluebook

    • Ability: none, moderate, skilled

    • Method*Ability

    • Dependent Variable: essay score

  • In this design, each cell (6 subgroup combinations) has multiple subjects or replicates (3). Multiple subjects were "randomly" assigned to each cell in the 3x2 design.

  • In this design, you apply a 2-way ANOVA and include the interaction term

The Simple Randomized Block Design

  • 5 m-squared of Bermuda grass

  • Enough space for 3 plots per location

  • Dependent Variable - amount of above ground grass (kg)

    • There is 1 replicate per cell

    • There are 5 replicates per level of Nutrient

    • The factor of interest is Nutrient

    • Can calculate Block Effect since n=3 per block

  • Why is it advantageous to include Location as a Block factor?

  • What is the unit of replication in this design?

The Simple Randomized Block Design (A Simple Mixed Model)

  • "Randomized Complete Blocks", "ANOVA w/o Replication"

  • Interspersion of treatments across "Blocks"

  • This design is used to reduce the amount of experimental error through the inclusion of a block variable that is usually a random effect factor

  • This design can be viewed as somewhat of a hybrid between a 1-way and 2-way ANOVA because you have 2 factors in the model, but you are only interested in the effect of one fixed Independent Variable (at least in the simple randomized block design)

  • The simple randomized block design has 1 fixed effect and 1 random effect

  • The statistical model is a mixed, Model III, 2-way ANOVA without replication

  • The interaction term is not included in the model because there is not enough replication per cell to calculate it

  • In the simple randomized block design, you assume no interaction between Experimental factor and the Block factor

  • In JMP:

    • 3 total columns

      • 1 fixed effect column

      • 1 random effect column

      • 1 dependent variable column

      • 1 dependent and 2 independent

        • Test assumptions of the dependent variable the same way we've been doing it

          • Save residuals and test assumptions

    • Analyze --> Fit Model

    • Independent variable into Y

    • Fixed effect into model effects

    • To change a variable to a random effect click "Attributes" and then click Random Effect

    • If balanced, change "method" to "Traditional"

    • If imbalanced, change "method" to "REML"

    • Make sure "Effect Details" is enabled

      • Enables you to perform Tukey HSD and post hoc tests

Randomized Block - A Biological Example (Zar 12.4)

  • HO: The mean weight of guinea pigs is the same on four specified diets

  • 1 block comprised of 4 cages

  • 1 guinea pig per cage

  • 1 replicate of each diet per block (assigned randomly)

  • Expected gradients within barn:

    • Temperature

    • Light

    • Noise

    • Draft

  • Pigs in Blocks will experience similar conditions

    • Interspersion of Treatments

  • N=20 pigs

  • 5 replicates per diet

  • Columns

    • Block - nominal

    • Diet - nominal

    • Weight gain - Continuous discrete

Repeated Measures Design

  • Each subject receives all levels of Factor A

  • Slide 65 2-Way ANOVA ppt

  • Possibly subject to the "carry-over effect"

    • Subject goes through Treatment 1 and then Treatment 2, but it going through the 1st Treatment may have affected its response

  • Dependencies/correlations across treatments

  • Advantages:

    • Individuals/subjects are acting like "blocks" - homogeneity of potential sources of error

    • Experimental error introduced into the study due to variability between subjects can be accounted for

  • Also called a within-subjects or treatment-by-subject design

  • Subjects are receiving all levels of Factor A

  • Among subject variability can be accounted for and "factored out"

  • RMD design is similar to randomized block design b/c subjects function similarly to "blocks"

  • Intra subject dependencies present both positives & negatives

  • Dependencies are a negative if "carry-over" effects exist across treatments

  • Assess normality using residuals as in one-way ANOVA (will require re-entering data in traditional form)

  • Assess correlation structure (Test of sphericity)

Addressing the Sphericity Assumption

  • JMP provides a test for Sphericity using the Multivariate framework:

    • If the assumption is met, proceed with the unadjusted, univariate F-test

    • If the assumption is not met (Chi-Square <0.05 Sphericity Test)

        1. Apply an adjusted, univariate F-test (Geisser-Greenhouse or Huynh-Feldt

      • 2. Apply a F-test generated MANOVA - "Multivariate F" (Multiple dependent variables)

    • In JMP:

      • Go into Graph Builder and compare individuals (ex. Cholesterol vs. Drug)

      • Add random effect into right-hand overlay (top right box above "Color")

        • Examine correlation structure within Subjects across Groups

        • Note the y-intercept variation (Subject Variation is Important)

Analyzing Repeated Measures Design using Mixed Model

  • Assess normality and variances using residuals in One-Way ANOVA

  • Assess correlations within subjects across groups visually (Graph Builder)

  • Paired t-Test is most appropriate post hoc analysis w/ Bonferroni adjusted alpha level, but can do Tukey HSD

Multiway Factorial Analysis of Variance

  • You can extend the basic ANOVA design to experiments with more than 2 factors

  • In these multiway designs you can examine the effects of numerous factors (3,4,5,etc) simultaneously, with interactions, in one model

  • Factors can be a combination of both fixed and random effects

  • Typically, you never see more than 3 or 4 factors (3-way & 4-way ANOVA)


Tests of Difference vs Tests of Relationships

  • Tests of Difference

    • Is this group different from that group(s)?

      • t-Tests, ANOVA's

      • Independent variable is typically nominal/categorical

  • Tests of Relationships

    • Is variable A related (co-vary) with variable B?

      • Correlation and Regression

      • Independent variable is typically continuous (but doesn't have to be, particularly in correlation, where there isn't a "dependent" or "independent" variable)

  • Correlation

    • To what degree does one variable vary with another? (Does not imply cause & effect)

  • Regression

    • To what degree is variable Y dependent on variable X? (Implies a cause-and-effect relationship exists)

Correlation

  • Research question: Are two (or >2) variables "associated" with other?

  • Important Point: The research question is not one of cause & effect

    • Examples:

      • Do two methods of measuring blood pressure tend to give corresponding results?

        • Blood pressure measurements with 2 methods on the same units

      • How strongly associated are pairs of morphometric characteristics of grizzly bears?

        • Data points are 1 grizzly bear

        • Y and X values are leg length and arm length

      • Is there correspondence between concentrations of cadmium and lead in sediments of streams in a watershed impacted by industrial pollution?

        • Sample unit is sediment core sample

        • From that sample we get a measurement of [Cadmium] and a measurement of [Lead]. These are X and Y values

  • In JMP:

    • Make sure both are normally distributed & relationship is linear

    • Go to Analyze -> Multivariate Methods -> Multivariate -> Pairwise Correlations

      • Put Correlation value & P-value (r=0.952, p<0.001)

Simple Linear Correlation

  • Three questions:

      1. Are two measurement variables related in a linear fashion?

      1. If they are related, what is the direction (+ or -)?

      1. How strong is the relationship? (Differences, parallel to effect size)

  • Smoking Example: cig smoking/day & CHD mortality/10,000 people

  • Null H0: There is no correlation b/w Smoking and CHD. (correlation coefficient = 0)

  • X-axis and Y-axis placement does NOT matter in correlations

  • Do NOT put a line of best fit in scatterplot. If putting a visual aide, add an ellipse

The Correlation Coefficient (r)

  • aka Pearson's r or the Pearson product-moment correlation coefficient

  • r is a measure of association between two variables (X&Y)

  • Two pieces of information obtained from rvalues:

      1. Sign (+ or -) of r indicates whether the association is positive or negative

      1. Size of r (from -1 to 1) indicates the magnitude of the association (further from 0 = stronger)

  • Since r values range from -1 to 1, 0.85 indicates a strong positive relationship exists b/w cigarette consumption and CHD

  • We can make this conclusion regardless of the underlying distribution of the two variables. In this sense, we view the r value as an index (rules of thumb)

The Correlation Coefficient (r): Test of Significance

  • There are no statistical assumptions associated with calculating r, and if an index is what you need to make the inference of interest, then stop here. (However, this is typically not how Biologists use r)

  • Assumptions of Normality

    • Both variables X&Y, were sampled randomly from a population with a normal distribution (bivariate normal distribution) and the relationship between the variables is linear

    • To calculate a p-value for r, X&Y need to be normally distributed and the relationship needs to be linear (DO IN GRAPH BUILDER)

      • If performing a transformation to one variable, apply to both

    • Nonparametric (Spearman's rs & Kendall's tau) has good power so transformation is not necessary

Nonparametric Correlation (Ranks)

  • Spearman's correlation coefficient (rs); ranges from -1 to 1 or Kendall's tau. Analyses based on ranks

  • Apply when bivariate normal assumptions are violated, when data are ordinal, or the relationship is nonlinear

    • In JMP:

      • Analyze -> Multivariate Methods -> Multivariate

      • Then you can find "linear correlations" and "nonparametric correlations"

      • Reid usually goes with Spearman's over Kendall's

        • Reporting is (r(sub-s), p-value)

    • The more elliptical the scatter of points, the more intense the correlation

    • Be careful interpreting significant r's

    • As sample size increases, the critical value decreases

    • Two-tailed usually

Multiple Correlations: The Correlation Matrix

  • What if you have more than 2 measurement variables

    • 8 variables: 28 total comparisons

    • alpha per correlation = ?

  • Holm-Bonferroni Method (Holm 1979) WILL NOT BE TESTED OVER

    • Stop at the first non-significant outcome

    • Order the p-values from smallest to greatest

      • H4 = 0.005

      • H1 = 0.01

      • H3 = 0.03

      • H2 = 0.04

    • Work the Holm-Bonferroni formula for the first rank:

      • HB = Target alpha / (n-rank+1)

      • HB = 0.05 / (4 -1+1) = 0.0125

The Limitations of Correlation

  • Correlation analysis indicates cigarette consumption and CHD are related. It also tells us the relationship is positive and relatively strong (r= 0.85)

    • BUT...

      • You may want to predict incidence of CHD for given levels of cigarette consumption or how much does CHD increase with a unit increase in cigarette consumption. This hypothesis of causality about the relationship requires Regression Analysis

Sparrow Wing Length as a Function of Age

  • Correlation--> X <--> Y (Covariation)

    • Just has 2 variables, technically not independent or dependent variables

  • Regression--> X ---> Y

    • Age (Independent) ---> Wing Length (Dependent)

A Regression Example

  • A snake physiologist wished to investigate the effect of temperature on the heart rate of juniper pythons. She selected nine specimens of approximately the same age, size, and sex and placed each animal at preselected temperature between 2 and 18 C. After the snakes equilibrated to their ambient temperatures, she measured their heart rates. n=9

    • Temp (IV) (Fixed)

    • Heart Rate (DV)

Simple Linear Regression

  • Simple vs. Multiple regression (one predictor variable vs multiple predictors)

  • Linear regression vs non-linear regression

  • Simple Linear Regression:

    • Species a straight line relationship between two variables

      • Predictor variable (X) (usually a Fixed Effect-Model 1)

      • Response variable (Y)

    • Specifies a predictive relationship between X&Y

    • SLR analysis involves producing

      • A regression line or "best fit" line through points on a scatterplot of X&Y

      • A regression equation that relates X&Y

Building the Regression Equation

  • Regression implies a functional (cause-effect) relationship between variables

  • Two components need to be calculated from the data:

    • Slope (b) (the "regression coefficient"

    • y-intercept (a)

    • Y=bX+a

The Regression Analysis

  • The regression analysis calculates values of a and b for a data set so that the resulting equation is the best obtainable for the data

The "Best Fit" Line

  • Not all observed values of y fall on the line

  • The values of y that fall directly on the line are the predicted values "y-hat"

  • The sums of squares resulting from squaring the values of y - y hat is much smaller than the SS of y without consideration of x

Testing the Statistical Significance of the Regression Model

  • Evaluate significance with ANOVA where the F test is testing the overall significance of the model

  • 3 Sums of Squares calculations (& df's) are needed:

    • Total SS (df = N-1)

    • Regression or "Model" SS (df=1)

    • Residual/Error SS (df = N-2)

  • In JMP:

    • Fit "Y by X"

    • Input variables

    • Select "Fit Line" at the red carat to get "Linear Fit" results box

    • Results are in "Analysis of Variance" under Prob > F (significance of Model)

  • The F-test tells us we have a very low probability of committing a Type I error and that python heart rate does vary linearly with temperature

  • How good is the model?

    • Big F tells us that it's good

    • Calculate % of how much variation is within the model (Model/C. Total)

The Coefficient of Determination (r^2)

  • Tells us what % of the variability in the dependent variable (y) is explained by the independent variable (x)

  • r^2 = SSmodel/SStotal

  • 93.9% of the variability observed in example

Using the Model: Predicting y from x

  • y=2.14+1.77x

  • Heart Rate=2.14+1.77(Temp.)

  • Plug in values of x and solve for y

  • Can predict heart rate at temps that were not tested (e.g, 5 and 9ºC)

  • Model can be used by other researchers

Assumptions of Model 1 Simple Linear Regression

  • The IV is usually fixed but can be random

  • The y observations are independent

  • The functional relationship is linear

  • Residuals are normally distributed

  • Variances are equal

    • After you've built the model, Fit a line, and looking at the JMP output..

    • Click on the red triangle beside "Linear Fit" (Use graph in analysis)

      • To check normality - click on "save residuals"

      • To assess variances - click on "plot residuals"

        • If non-normal or variances are violated, transform both variables

Regression or ANOVA?

  • Age as a continuous variable

    • Data from basically 13 levels of the IV (Age)

    • "Replicated" regression is best

    • Relationship between y and all x's within the range of values tested is quantified

Problem Background:

  • A researcher is interested in site-specific differences in body size among populations of rattlesnakes. Why an interest in body size, well, reproductive traits in animals (e.g., number and size of offspring) often vary with body size. Populations may vary in body size due to differences in resource availability, resource quality, size-specific predation, population density, etc. Most importantly, size of rattlesnakes may vary with age.

  • How can the researcher examine for geographic differences in body size between two populations knowing that size will also vary due to differences in age?

    • Y-variable is body size (Continuous discrete)

    • Location (Categorical)

    • X-Variable is Age (Continuous discrete)

    • ANCOVA: Analysis of Covariance, 2 X-variables that are Continuous & Categorical

ANCOVA Requirements

  • 1 Dependent Variable (Continuous)

  • 1 IV (Categorical)

  • 1 Covariate (Continuous)

Covariate

  • Variable that is related to the DV, which you can't manipulate, but you want to account for its relationship with the DV

  • Increased sensitivity of tests of main effects and interactions since usage of a covariate will result in a reduction of error variance

ANCOVA Assumptions

  • Residuals are normally distributed and variances are homogenous

  • Linearity - significant linear relationship between covariate and DV

  • Since covariate is used as a linear predictor of the DV yet it is not a fixed effect, the covariate is assumed to be measured without any error

  • Homogeneity of regressions (i.e. no significant interaction of GroupXCovariate)

ANCOVA In JMP:

  • Use the "Fit Model" Platform

  • Model should contain:

    • IV (Drug)

    • Covariate (X)

    • Interaction Term (Drug*X)

  • Look if Covariate is significant. (Example is significant)

  • Look at Interaction Term (Example is NOT significant, lines are statistically parallel)

  • Look at IV (Example is not significant, no drug effect)

    • Adjusted Means

      • When using ANCOVA, the means for each group get adjusted by the Covariate-Dependent Variable relationship

      • If the Covariate has a significant relationship with the Dependent Variable then comparisons are made on the adjusted means

      • When doing ANCOVA, you should graph/report adjusted means

ANCOVA - A Biological Example

  • Fish inhabiting caves often have small eyes relative to fish living in streams on the surface and is though to be an example of adaptation to life in a cave environment. Banded Sculpin is a common fish found in surface streams of North America, but it can also sometimes be found living in caves. A cave population of Banded Sculpin in Missouri are showing signs of cave adaptation similar to "true" cavefishes. Researchers are interested in whether or not sculpin in surface streams have different eye size relative to sculpin living in caves. Eye size of individual sculpin may also vary with total length of the fish

  • Construct the model

    • DV: Y-variable: (Eye size)

    • IV: (Location)

    • Covariate: X-variable (Total length)

    • Interaction Term: Location*Total length

Advanced Regression Techniques

  • Multiple regression (>1 IV)

  • Analysis of Covariance (ANCOVA)-mixture of regression & ANOVA

Analysis of Frequencies

  • Interest in the frequency that an event occurs...

  • How does an observed outcome compare to an expected outcome or distribution?

    • Is the sex ratio (M:F) in a population of box turtles the expected 1:1 ratio?

    • Do the frequencies of observed phenotypes conform to the expected 3:1 ratio?

    • Do mountain lions eat equal amounts of white-tailed deer and mule deer?

      • Goodness of Fit?

Data Characteristics

  • "Count data" - discrete number of observations

  • 1 variable with categories or "bins"

  • 2 or more categories/bins

  • Multiple independent observations within categories (5 minimum; >10 recommended)

Chi-Square Goodness of Fit Test

  • Quarter tossing

    • Probability of Heads? Tails?

    • Is observed significantly different from expected? Is the disparity due to random chance?

      • OR is the deviation not due to random chance?

  • JMP does test for us, but simple to calculate by hand

  • We can test to see if our observed frequencies "Fit" our expectations

    • This is the chi-squared Goodness-of-Fit test

    • Converts the difference between frequencies we observe and frequencies we expected

  • Nonparametric test (NO ASSUMPTIONS)

  • Data are frequencies (counts)

  • Observations are independent

  • Categories have large enough expected frequencies. When there are 4 or fewer categories, none of the expected frequencies are less than 5%

Conducting Chi-Square Analysis: Goodness of Fit IN JMP

  • 2 variables, so 2 columns

  • Frequency/Count column & IV column

    • "Analyze" -> "Distributions"

      • Input IV into Y column spot

      • Input Frequency/Count data into Frequency spot

    • JMP spits out percentages

    • Click carat next to IV and click "Test Probabilities"

      • Input expected/hypothesized probabilities

        • Should add to 1.0

        • Run the test

          • ONLY report Pearson test results. ChiSquare value, df, & p-value

          • Report as: (Chi-Square test, ChiSquare value, df, p-value)

Chi-Square Test for Association or Independence

  • Are two categorical/nominal variables related/associated

  • Same data type and assumptions as Goodness of Fit Tests

  • Calculations are similar except not comparing to an "expected"

  • Contingency Table

  • Example

    • H0: Age 0 male & female Ohio shrimp captures at McCallie Access, Mississippi River are not associated (do not differ) with month

    • HA: Captures of male and female Age 0 Ohio shrimp are related to month

      • Variable A: June, July, August

      • Variable B: Male Age 0, Female Age 0

        • Fit Y by X

          • Y: Sex

          • X: Month

          • Frequency: Count

            • JMP gives "Contingency Table" and resulting Chi-Square values (look at Pearson results)

              • Can apply Bonferroni adjusted alpha level, but not necessary if risk of family-wise error is low