Zusammenfassung der Ressource
X = T + B + W + Er
Anmerkungen:
- Explanation symbols:
X = measured scoreT = 'real' score (imaginary) B = bias W = validity errorEr = error, assumed to be random
- Bias (B): nonrandom source of error, caused by a measurement instrument that measures on a different scale than planned
Anmerkungen:
- Example: mistaking a meter stick for a yardstick. Everything you measure will then be shorter than reality. The absolute error is linear, meaning in this example when you have someone twice as long as the meter stick, you will have twice as much (absolute) biased error. The relative error is constant.
- Can be overcome by: standardization of tests.
- Caused by a rater or judge, or difference in testing situations.
- Validity error (W): error from measuring the wrong construct
Anmerkungen:
- For example: in case X = W, you are measuring without any error, but you are not measuring the construct (e.g. creativity) you want to measure
- Can be caused by
- Response style
Anmerkungen:
- Having the tendency to say 'no', or 'yes', or choosing extreme answers to items.
- Can be overcome by balancing 'yes' and 'no' answers
- Response set
Anmerkungen:
- Answering items in a socially desirable way
- Can be overcome by
- use items that don't arouse defensiveness
- use answer options that don't differ on social desirability
- measure social desirability and then remove the effect
Anmerkungen:
- Does not always work in heightening validity
- Measuring validity
Anmerkungen:
- Three types, some have argued that we should think of validation as a whole
- Criterion related
Anmerkungen:
- Comparing the new measurement instrument with an existing one
1. Predictive validity: comparing a predicting instrument (new instrument) with actual results ('existing instrument')
2. Concurrent validity: taking samples with the existing instrument at the same time as the new measurement instrument
3. Postdictive validity: doing the same as with (2), only then take the sample from the existing instrument first.
Disadvantage criterion validity: the existing measurement might not be perfect (at all)
- Content related
Anmerkungen:
- Inspecting the test's content, judging how the seperate questions are for measuring that you want to measure
Is often too subjective for scientific purposes, tough can be used for achievement tests
A thing to look out for when looking at the content of an instrument is whether the questions are obvious in their content.
- Construct related
Anmerkungen:
- How well does the test reflect the target construct? Kind of summarizes the two beneath.
Difference with criterion-related is that here, we don't assume the criterion's (=existing instrument) validity
- Factor analysis
Anmerkungen:
- How many constructs (=factors) are measured by the test's items?
What is the extent to which each item is related to each
factor'?
If you have an instrument that is intended to measure 1 construct, you expect 1 dominant factor. But then still, you don't know whether this one dominant factor represents the construct you want to measure. From here, you can check the content again (content validity).
- check correlation with similar instruments
Anmerkungen:
- Is already mentioned under criterion related.
Convergent validity: if there is correlation between new and existing instrument
Discrimant validity: if there is no correlation
If an instrument shows discriminant validity, it can be because of method effects. To check this, one should compare the results of an instrument with the results of another instrument that is intended to measure something else, but uses the same method. If those two correlate, then it reflects some method effect
The presence of a method effect can be checked elaborately by a multitrait-multimethod matrix.
- Theoretical-experimental
Anmerkungen:
- Look at how well the instrument conforms to the theory.
Assumptions: theory is correct, and both instruments to be compared are more or less correct.
The instrument that measures stronger agreement with the theory will be concluded to measure best.
- Random error (Er): unpredictical
- Measured by: reliability coefficient, in practice done by a correlation coefficient applied on two tests measuring the same variable
Anmerkungen:
- Reliability coefficient can be zero and higher. In theory it is calculated as d(T+W)/dX, where 'd' stands for the variation.
With a reliability coefficient, true scores can be estimated, assuming that there is no validity error.
- Within-test Consistency
- Odd-even method
- split-half method
- Average correlation of each item with every other item
Anmerkungen:
- Widely used, built-in function in SPSS.
It estimates reliability of one item, not of the whole test.
- Cronbach's coefficient alpha
Anmerkungen:
- Takes into account average reliability of items and the number of items.
'Scales with items in a two-answer or true/false format often use a related reliability coefficient called KR-20' (pg. 83)
- Can be improved by 1. making the test longer, or 2. replacing items with items that agree more with the existing items
- Taking into account variations caused by the occasion
Anmerkungen:
- Called: between-occasion error
- test-retest
Anmerkungen:
- Evidence for stability measured trait as well as the quality of the measure
Disadvantage: participants may remember questions
- Parallel test
Anmerkungen:
- overcomes memory effects that play a role in test-retest
For both the development of the measured trait has to be taken into account
- Interrater reliability
Anmerkungen:
- estimated by 'the correlation of scores from two observers of the same behavioral sample' (p. 84)
- Percent agreement
Anmerkungen:
- Percent of agreements per unit time
Disadvantages:
1. adjencies (does not give credit to near misses)
2. fails to allow for chance agreements
- Kappa coefficient
Anmerkungen:
- measures agreement adjusted for chance
- Low reliability often leads to attenuation
Anmerkungen:
- Attenuation = underestimation of relationship between the studied constructs.
There are ways to estimate the relation by making the reliability perfect through the reliability coefficient (correcting for the attenuation).
- Caused by: guessing, skipping or misreading items, and variations in performance due to the occasion.
Anmerkungen:
- The other two listed in the book: 'variations in grading' and 'variation in difficulty of items' can be seen as bias.
- Reliability can be improved by
- heightening within-test consistency (see above)
- Standardizing data collection process
Anmerkungen:
- For raters: training them can reduce errors.
One could argue that this is more bias than random error.