Construct validity in psychological tests.
Psychological Bulletin (1955)
- PubMed: 13245896
Available from www.ncbi.nlm.nih.gov
or
Abstract
"Construct validation was introduced in order to specify types of research required in developing tests for which the conventional views on validation are inappropriate. Personality tests, and some tests of ability, are interpreted in terms of attributes for which there is no adequate criterion. This paper indicates what sorts of evidence can substantiate such an interpretation, and how such evidence is to be interpreted." 60 references. (PsycINFO Database Record (c) 2010 APA, all rights reserved)
Author-supplied keywords
Available from www.ncbi.nlm.nih.gov
Page 1
Construct validity in psychological tests.
/OL. 52, No. 4 JULY, 1955
Psychological Bulletin
CONSTRUCT VALIDITY IN PSYCHOLOGICAL TESTS
LEE J. CRONBACH
University of Illinois AND
PAUL E. MEEHLi
University of Minnesota
Validation of psychological tests
Las not yet been adequately concep-
tualized, as the APA Committee on
J’sychological Tests learned when it
undertook (1950-54) to specify what
qualities should be investigated be-
fore a test is published. In order to
make coherent recommendations the
(’ommittee found it necessary to dis-
tinguish four types of validity, estab-
lished by different types of research
and requiring different interpreta-
tion. The chief innovation in the
Committee’s report was the term con-
siruct validity? This idea was first
formulated by a subcommittee
(Meehl and R. C. Challman) study-
ing how proposed recommendations
would apply to projective techniques,
and later modified and clarified by
the entire Committee (Bordin, Chall-
man, Conrad, Humphreys, Super,
and the present writers). The state-
ments agreed upon by the Commit-
tee (and by committees of two other
associations) were published in the
Technical Recommendations (59). The
present interpretation of construct
validity is not "official" and deals
1
The second author worked on this prob-
lem in connection with his appointment to the
Minnesota Center for Philosophy of Science.
We are indebted to the other members of the
(Center (Herbert Feigl, Michael Scriven,
Wilfrid Sellars), and to D. L. Thistlethwaite
of the University of Illinois, for their major
contributions to our thinking and their sug-
gestions for improving this paper.
2
Referred to in a preliminary report (58)
as congruent validity.
with some areas where the Committee
would probably not be unanimous.
The present writers are solely respon-
sible for this attempt to explain the
concept and elaborate its implica-
tions.
Identification of construct validity
was not an isolated development.
Writers on validity during the pre-
ceding decade had shown a great deal
of dissatisfaction with conventional
notions of validity, and introduced
new terms and ideas, but the result-
ing aggregation of types of validity
seems only to have stirred the muddy
waters. Portions of the distinctions
we shall discuss are implicit in Jen-
kins’ paper, "Validity for what?"
(33), Gulliksen’s "Intrinsic validity"
(27), Goodenough’s distinction be-
tween tests as "signs" and "samples"
(22), Cronbach’s separation of "logi-
cal" and "empirical" validity (11),
Guilford’s "factorial validity" (25),
and Hosier’s papers on "face valid-
ity" and "validity generalization"
(49, 50). Helen Peak (52) comes
close to an explicit statement of con-
struct validity as we shall present it.
FOUR TYPES OF VALIDATION
The categories into which the Rec-
ommendations divide validity studies
are: predictive validity, concurrent
validity, content validity, and con-
struct validity. The first two of these
may be considered together as cri-
terion-oriented validation procedures.
The pattern of a criterion-oriented
281
Psychological Bulletin
CONSTRUCT VALIDITY IN PSYCHOLOGICAL TESTS
LEE J. CRONBACH
University of Illinois AND
PAUL E. MEEHLi
University of Minnesota
Validation of psychological tests
Las not yet been adequately concep-
tualized, as the APA Committee on
J’sychological Tests learned when it
undertook (1950-54) to specify what
qualities should be investigated be-
fore a test is published. In order to
make coherent recommendations the
(’ommittee found it necessary to dis-
tinguish four types of validity, estab-
lished by different types of research
and requiring different interpreta-
tion. The chief innovation in the
Committee’s report was the term con-
siruct validity? This idea was first
formulated by a subcommittee
(Meehl and R. C. Challman) study-
ing how proposed recommendations
would apply to projective techniques,
and later modified and clarified by
the entire Committee (Bordin, Chall-
man, Conrad, Humphreys, Super,
and the present writers). The state-
ments agreed upon by the Commit-
tee (and by committees of two other
associations) were published in the
Technical Recommendations (59). The
present interpretation of construct
validity is not "official" and deals
1
The second author worked on this prob-
lem in connection with his appointment to the
Minnesota Center for Philosophy of Science.
We are indebted to the other members of the
(Center (Herbert Feigl, Michael Scriven,
Wilfrid Sellars), and to D. L. Thistlethwaite
of the University of Illinois, for their major
contributions to our thinking and their sug-
gestions for improving this paper.
2
Referred to in a preliminary report (58)
as congruent validity.
with some areas where the Committee
would probably not be unanimous.
The present writers are solely respon-
sible for this attempt to explain the
concept and elaborate its implica-
tions.
Identification of construct validity
was not an isolated development.
Writers on validity during the pre-
ceding decade had shown a great deal
of dissatisfaction with conventional
notions of validity, and introduced
new terms and ideas, but the result-
ing aggregation of types of validity
seems only to have stirred the muddy
waters. Portions of the distinctions
we shall discuss are implicit in Jen-
kins’ paper, "Validity for what?"
(33), Gulliksen’s "Intrinsic validity"
(27), Goodenough’s distinction be-
tween tests as "signs" and "samples"
(22), Cronbach’s separation of "logi-
cal" and "empirical" validity (11),
Guilford’s "factorial validity" (25),
and Hosier’s papers on "face valid-
ity" and "validity generalization"
(49, 50). Helen Peak (52) comes
close to an explicit statement of con-
struct validity as we shall present it.
FOUR TYPES OF VALIDATION
The categories into which the Rec-
ommendations divide validity studies
are: predictive validity, concurrent
validity, content validity, and con-
struct validity. The first two of these
may be considered together as cri-
terion-oriented validation procedures.
The pattern of a criterion-oriented
281
Page 2
282 LEE J. CRONBACH AND PAUL E. MEEHL
study is familiar. The investigator is
primarily interested in some criterion
which he wishes to predict. He ad-
ministers the test, obtains an inde-
pendent criterion measure on the
same subjects, and computes a cor-
relation. If the criterion is obtained
some time after the test is given, he is
studying predictive validity. If the
test score and criterion score are de-
termined at essentially the same time,
he is studying concurrent validity.
Concurrent validity is studied when
one test is proposed as a substitute
for another (for example, when a
multiple-choice form of spelling test
is substituted for taking dictation),
or a test is shown to correlate with
some contemporary criterion (e.g.,
psychiatric diagnosis).
Content validity is established by
showing that the test items are a sam-
ple of a universe in which the investi-
gator is interested. Content validity
is ordinarily to be established de-
ductively, by defining a universe of
items and sampling systematically
within this universe to establish the
test.
Construct validation is involved
whenever a test is to be interpreted
as a measure of some attribute or
quality which is not "operationally
denned." The problem faced by the
investigator is, "What constructs
account for variance in test perform-
ance?" Construct validity calls for
no new scientific approach. Much
current research on tests of personal-
ity (9) is construct validation, usu-
ally without the benefit of a clear
formulation of this process.
Construct validity is not to be iden-
tified solely by particular investiga-
tive procedures, but by the orienta-
tion of the investigator. Criterion-
oriented validity, as Bechtoldt em-
phasizes (3, p. 1245), "involves the
acceptance of a set of operations as an
adequate definition of whatever is to
be measured." When an investigator
believes that no criterion available to
him is fully valid, he perforce be-
comes interested in construct validity
because this is the only way to avoid
the "infinite frustration" of relating
every criterion to some more ultimate
standard (21). In content validation,
acceptance of the universe of content
as defining the variable to be meas-
ured is essential. Construct validity
must be investigated whenever no
criterion or universe of content is
accepted as entirely adequate to de-
fine the quality to be measured. De-
termining what psychological con-
structs account for test performance
is desirable for almost any test. Thus,
although the MM PI was originally
established on the basis of empirical
discrimination between patient
groups and so-called normals (con-
current validity), continuing research
has tried to provide a basis for de-
scribing the personality associated
with each score pattern. Such inter-
pretations permit the clinician to pre-
dict performance with respect to cri-
teria which have not yet been em-
ployed in empirical validation studies
(cf. 46, pp. 49-50, 110-111).
We can distinguish among the four types
of validity by noting that each involves a
different emphasis on the criterion. In pre-
dictive or concurrent validity, the criterion
behavior is of concern to the tester, and he
may have no concern whatsoever with the
type of behavior exhibited in the test. (An
employer does not care if a worker can mani-
pulate blocks, but the score on the block
test may predict something he cares about.)
Content validity is studied when the tester
is concerned with the type of behavior in-
volved in the test performance. Indeed, if the
test is a work sample, the behavior repre-
sented in the test may be an end in itself.
Construct validity is ordinarily studied when
the tester has no definite criterion measure
of the quality with which he is concerned, and
must use indirect measures. Here the trait or
quality underlying the test is of central im-
portance, rather than either the test behavior
or the scores on the criteria (59, p. 14).
study is familiar. The investigator is
primarily interested in some criterion
which he wishes to predict. He ad-
ministers the test, obtains an inde-
pendent criterion measure on the
same subjects, and computes a cor-
relation. If the criterion is obtained
some time after the test is given, he is
studying predictive validity. If the
test score and criterion score are de-
termined at essentially the same time,
he is studying concurrent validity.
Concurrent validity is studied when
one test is proposed as a substitute
for another (for example, when a
multiple-choice form of spelling test
is substituted for taking dictation),
or a test is shown to correlate with
some contemporary criterion (e.g.,
psychiatric diagnosis).
Content validity is established by
showing that the test items are a sam-
ple of a universe in which the investi-
gator is interested. Content validity
is ordinarily to be established de-
ductively, by defining a universe of
items and sampling systematically
within this universe to establish the
test.
Construct validation is involved
whenever a test is to be interpreted
as a measure of some attribute or
quality which is not "operationally
denned." The problem faced by the
investigator is, "What constructs
account for variance in test perform-
ance?" Construct validity calls for
no new scientific approach. Much
current research on tests of personal-
ity (9) is construct validation, usu-
ally without the benefit of a clear
formulation of this process.
Construct validity is not to be iden-
tified solely by particular investiga-
tive procedures, but by the orienta-
tion of the investigator. Criterion-
oriented validity, as Bechtoldt em-
phasizes (3, p. 1245), "involves the
acceptance of a set of operations as an
adequate definition of whatever is to
be measured." When an investigator
believes that no criterion available to
him is fully valid, he perforce be-
comes interested in construct validity
because this is the only way to avoid
the "infinite frustration" of relating
every criterion to some more ultimate
standard (21). In content validation,
acceptance of the universe of content
as defining the variable to be meas-
ured is essential. Construct validity
must be investigated whenever no
criterion or universe of content is
accepted as entirely adequate to de-
fine the quality to be measured. De-
termining what psychological con-
structs account for test performance
is desirable for almost any test. Thus,
although the MM PI was originally
established on the basis of empirical
discrimination between patient
groups and so-called normals (con-
current validity), continuing research
has tried to provide a basis for de-
scribing the personality associated
with each score pattern. Such inter-
pretations permit the clinician to pre-
dict performance with respect to cri-
teria which have not yet been em-
ployed in empirical validation studies
(cf. 46, pp. 49-50, 110-111).
We can distinguish among the four types
of validity by noting that each involves a
different emphasis on the criterion. In pre-
dictive or concurrent validity, the criterion
behavior is of concern to the tester, and he
may have no concern whatsoever with the
type of behavior exhibited in the test. (An
employer does not care if a worker can mani-
pulate blocks, but the score on the block
test may predict something he cares about.)
Content validity is studied when the tester
is concerned with the type of behavior in-
volved in the test performance. Indeed, if the
test is a work sample, the behavior repre-
sented in the test may be an end in itself.
Construct validity is ordinarily studied when
the tester has no definite criterion measure
of the quality with which he is concerned, and
must use indirect measures. Here the trait or
quality underlying the test is of central im-
portance, rather than either the test behavior
or the scores on the criteria (59, p. 14).
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
118 Readers on Mendeley
by Discipline
60% Psychology
10% Social Sciences
by Academic Status
55% Ph.D. Student
9% Student (Master)
8% Professor
by Country
58% United States
8% United Kingdom
3% Canada


