0.0(0)

Take a practice test

Reliability

an umbrella term under which different types of scores stability are assessed
suggests trustworthiness and stability
can pertain to stability of scores over time (test-retest), stability of item scores across items (internal consistency), or stability of ratings across judges, or raters, of a person, object, event, and so on (interrater reliability)
a quality of test scores that suggests they are sufficiently consistently and free form measurement error to be useful
the evaluation of score reliability involves a 2-step process that consists of (a) determining what possible sources of error may enter into test scores and (b) estimating the magnitude of those errors

Sources of Error in Psychological Testing

Error can enter into the scores of psychological tests due to an enormous number of reasons, many of which are outside the purview of psychometric estimates of reliability.
Generally speaking, however, the error that enter into test scores may be categorized as stemming from one or more of the 3 following sources
- the context in which the testing takes place
- test taker
- test itself

Sources of Measurement with Typical Reliability Coefficients Used to Estimate Them

Interscorer or Interrater Differences

test scored with a degree of subjectivity
the label assigned to the error that may enter into scores whenever the element of subjectivity plays a part in scoring a test
it is assumed that different judges will not always assign the same exact scores or ratings to a given test performance even if:
- the scoring differences specified in the test manual are explicit and detailed
- the scorers are conscientious in applying those directions
it refers to variations in scores that stem from differences in the subjective judgement of the scorers
Scorer Reliability

The Sampling Error

refers to the variability inherent in test scores as a function of the fact that they are obtained at one point in time rather than at another
whereas a certain amount of time sampling error is assumed to enter into all test scores, as a rule, one should expect less of it in the scores of tests that assess relatively stable traits
Test-Retest Reliability

Content Sampling Error

the term used to label the trait-irrelevant variability that can enter into test scores as a result of fortuitous factors related to the content of the specific items included in a test
Alternate-Form Reliability
- To investigate this kind of reliability, 2 or more different forms of the test -- identical in purpose but differing in specific content -- need to be prepared and administered to the same group of subjects. The test taker’s scores on each of the versions are then correlated to obtain alternate-form reliability coefficients
Split-Half Reliability
- Administer a test to a group of individuals and to create 2 scores for each person by splitting the test into halves

Interim Inconsistency

refers to error in scores that results from fluctuations in items across an entire test, as opposed to the content sample error emanating from the particular configuration of items included in the test as a whole
Such inconsistencies can be due to a variety of factors, including content sampling error and content hetergeneity
Content Homogeneity
- results from the inclusion of items or set of items that tap content knowledge or psychological functions that differ from those tapped by other items in the same test
- can be checked using split-half reliability or interim reliability
- the 2 most frequently used formulas used to calculate interim consistency are the Kuder-Richardson Formula 20 (KR-20) and Coefficient Alpha (a) or Cronbach’s alpha

Tests for Reliability

Test-Retest Reliability

An estimate of reliability obtained by correlating pairs of scores from the same people on 2 different administrations of the same test
Appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time, such as a personality trait

Alternate-Forms Reliability

The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms coefficient of reliability, which is often termed the coefficient of equivalence
Alternate-Forms
- simply different versions of a test that have been constructed so as to be parallel

Split-Half Reliability

An estimate of split-half reliability is obtained by correlating 2 pairs of scores obtained from equivalent halves of a single test administered once
It is a useful measure of reliability when it is impractical or undesirable to assess reliability with 2 tests or to administer a test twice (because of factors such as time and expense)
- Step 1: Divide the test into equivalent halves
- Step 2: Calculate a Pearson r between scores on the 2 halves of the test
- Step 3: Adjust the half-test reliability using the Spearman-Brown formula

Inter-Item Consistency

refers to the degree of correlation among all the items on a scale
An index interim consistency, in turn, is useful in assessing the homogeneity of the test.

Interrater Reliability

the degree of agreement or consistency between 2 or more scores (judges or raters with regard to a particular measure)

Heterogeneity

describes the degree to which a test measures different factors. A heterogenous or non homogeneous test is composed of items that measure more than one trair

What to do when reliability of a test is low

increase the number of items
factor and item analysis

Home

Social Studies

Psychology Analytical Psychology

Reliability of Test Scores and Test Items

Reliability

an umbrella term under which different types of scores stability are assessed
suggests trustworthiness and stability
can pertain to stability of scores over time (test-retest), stability of item scores across items (internal consistency), or stability of ratings across judges, or raters, of a person, object, event, and so on (interrater reliability)
a quality of test scores that suggests they are sufficiently consistently and free form measurement error to be useful
the evaluation of score reliability involves a 2-step process that consists of (a) determining what possible sources of error may enter into test scores and (b) estimating the magnitude of those errors

Sources of Error in Psychological Testing

Error can enter into the scores of psychological tests due to an enormous number of reasons, many of which are outside the purview of psychometric estimates of reliability.
Generally speaking, however, the error that enter into test scores may be categorized as stemming from one or more of the 3 following sources
- the context in which the testing takes place
- test taker
- test itself

Sources of Measurement with Typical Reliability Coefficients Used to Estimate Them

Interscorer or Interrater Differences

test scored with a degree of subjectivity
the label assigned to the error that may enter into scores whenever the element of subjectivity plays a part in scoring a test
it is assumed that different judges will not always assign the same exact scores or ratings to a given test performance even if:
- the scoring differences specified in the test manual are explicit and detailed
- the scorers are conscientious in applying those directions
it refers to variations in scores that stem from differences in the subjective judgement of the scorers
Scorer Reliability

The Sampling Error

refers to the variability inherent in test scores as a function of the fact that they are obtained at one point in time rather than at another
whereas a certain amount of time sampling error is assumed to enter into all test scores, as a rule, one should expect less of it in the scores of tests that assess relatively stable traits
Test-Retest Reliability

Content Sampling Error

the term used to label the trait-irrelevant variability that can enter into test scores as a result of fortuitous factors related to the content of the specific items included in a test
Alternate-Form Reliability
- To investigate this kind of reliability, 2 or more different forms of the test -- identical in purpose but differing in specific content -- need to be prepared and administered to the same group of subjects. The test taker’s scores on each of the versions are then correlated to obtain alternate-form reliability coefficients
Split-Half Reliability
- Administer a test to a group of individuals and to create 2 scores for each person by splitting the test into halves

Interim Inconsistency

refers to error in scores that results from fluctuations in items across an entire test, as opposed to the content sample error emanating from the particular configuration of items included in the test as a whole
Such inconsistencies can be due to a variety of factors, including content sampling error and content hetergeneity
Content Homogeneity
- results from the inclusion of items or set of items that tap content knowledge or psychological functions that differ from those tapped by other items in the same test
- can be checked using split-half reliability or interim reliability
- the 2 most frequently used formulas used to calculate interim consistency are the Kuder-Richardson Formula 20 (KR-20) and Coefficient Alpha (a) or Cronbach’s alpha

Tests for Reliability

Test-Retest Reliability

An estimate of reliability obtained by correlating pairs of scores from the same people on 2 different administrations of the same test
Appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time, such as a personality trait

Alternate-Forms Reliability

The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms coefficient of reliability, which is often termed the coefficient of equivalence
Alternate-Forms
- simply different versions of a test that have been constructed so as to be parallel

Split-Half Reliability

An estimate of split-half reliability is obtained by correlating 2 pairs of scores obtained from equivalent halves of a single test administered once
It is a useful measure of reliability when it is impractical or undesirable to assess reliability with 2 tests or to administer a test twice (because of factors such as time and expense)
- Step 1: Divide the test into equivalent halves
- Step 2: Calculate a Pearson r between scores on the 2 halves of the test
- Step 3: Adjust the half-test reliability using the Spearman-Brown formula

Inter-Item Consistency

refers to the degree of correlation among all the items on a scale
An index interim consistency, in turn, is useful in assessing the homogeneity of the test.

Interrater Reliability

the degree of agreement or consistency between 2 or more scores (judges or raters with regard to a particular measure)

Heterogeneity

describes the degree to which a test measures different factors. A heterogenous or non homogeneous test is composed of items that measure more than one trair

What to do when reliability of a test is low

increase the number of items
factor and item analysis