ED 110 ne^
TH 004 754
Hecht, Kathryn A,
Overview of Problems Involved in Validating
Professional Licensing and Certification
22p*; Paper presented at the Annual Meeting of the
Na,tional Council on Heasurement in Education
(Chicago, Illinois, April 16-18, 1974)
MF-$0.76 HC-$1.58 PLUS POSTAGE
♦Certification; *Nurses; Performance Tests;
Predictive Validity; Professional Occupations;
Selection; State Licensing Boards; Testing; Testing
Problems ; *Test Validity
A large amount of profess
focused upon the ambiguities and problems
professional licensing and certification
seems to be a simple problem on the surf a
of professionals for competence and the p
policing so that it offers equal fairness
very complex problem involving unresolved
methodological issues particularly with e
are four main areas of concern: (1) criti
growing^number of jobs requiring licensin
practices in hiring and occupations acces
certification through testing. The exampl
ional interest has been
involved in the conduct of
through examinations. What
ce, that being the policing
ractice of conducting this
to all, turns out to be a
conceptual, legal, and
xamination validity. There
cism of testing, (2) the
cf, (3) discriminatory
and (4)' validity of
e used is the field of
************ ♦*****4t***********^**** *************
* Documents acquired by ERIC incihde many informal unpublished *
* materials net available from other>«ources. ERIC makes every effort *
* to obtain the best copy available^ nevertheless, items of marginal *
* reproducibility are often encountered and this affects the quality *
* of the microfiche and hardcopy reproductions ERIC makes available *
* via the ERIC Document Reproduction Service (EDRS) . EDRS is not *
* responsible for the quality of the original document. Reproductions *
* supplied by EDRS are the best that can be made from the original. *
♦lit lit****** ♦Jit ********************************************* ***^
f-i OVERVIEIV OF PROBIBB INVOLVED IN
r-H VALIDATING PROFESSIONAL LICENSING AND CERTIFICATION EXANnNATIONS
Kathryn A. Hecht, Ed.D.
Center for Nortliern Educational
University of Alaska
Fairbanks, Alaska 99701
us OEPAOTmEN? OF HEALTH
EDUCATION * WELFARE
NATIONAL INSTITUTE OP
' PON", Of^ V i E A Oft 0^ N'ONS
I'^'tO DO NO'' Sfr t S'>A»'L V wEP«e
EO. C-'' ON POS T ON 0» POL ry
Presented at the 1974 Aainual Meeting of the
National Council on Measurement in Education
Symposium (Session 7A)
Validation of Professional Licensing and Certification Exa^ninations :
A Methodological Dilemna
How does an evaluator from Alaska come to be addressing you today on
validation of licensing exains? Last year, while working as an independent
consultant, I v.'as asked by the National League for Nursing to do a back-
ground paper on the validation of the RN (Registered Nurse) licensing
examination and related work on performance testing. Naively, I thought it
would be a sijnple task of pulling together what had been done in other pro-
fessions. It turned out to be a irfuch more complex and interesting task than
I had expected, and questions and concerns raised during that study led
directly to our meeting together today. (One o£ (Mr participants, Paul
Jacobs, is now validation study director for the NLN and he will tell you
more about the specifics of that effort.)
Definitions of Licensure and Certification
First, as part of an overview there is a simple matter of defining
licensure and certification, . . . only it is not so sii^ile. There is nq
standard definition nor usage of the terms. For the purposes of our dis-
cussion I think the most useful definitions are those proposed by 0. Jensen
(1972). In an unpiiblished paper, he discusses licensure and certification
as two types of minimum competency testing, in that the purpose of the tests
is to establish an individual's status with respect to an established go/no-go
criteria. Licensing is usually a mandatory program designed to protect the
public from incompetent practitioners, that is, to prevent an individual with
particular deficiencies fran entering practice, Jensen call this "selecting
out". Certification, on the other hand, is ujually a voluntary program with
the emphasis on granting special status to an individual with more than run-
of-the-mill knowledge, ability, and skill. This Jensen calls "selecting- in".
Perhaps the best known example of a "selecting-out" exam v-ould be a
driving license, where the public is protected from those whose driving knowledge
is judged not to be up to standard. Another example is the RN exam, vdiose
espoused purpose is to "measure minimum safety and effectiveness of practice,
for the protection of the public" (N.L.N. 1961). Both licenses represent a
legal right to engage in the appropriate activity.
Examples of "selecting- in" or certification are the "diplomate" program
for medical specialities and the new certification program for automobile
mechanics. They are both exams designed for experienced practitioners which
provide evidence of superior capability in a specialty within the occupation.
Since validation deals with the purpose to which the test is intended,
I believe these interpretations and distinctions to-be important for our dis-
cussion. It should be obvious that the same test could not serve both licensure
and certification purposes as defined here.
Unfortunately, neither this distinction nor any other I can locate fits
current usages of the terms. For example, teacher certification I believe to
be a misnomer. It is a legal requirement to begin teaching, to protect the
public from incompetence and signifies no special standing within the professim.
I am sure you can think of other cases vAich do not fit the given definition.
Why the Concern?
Next, a brief .look at why the growing concern about licensing at this
time? There are four concerns I will outline briefly. (Several of the
participants and discussants are especially well qualified to discuss them
The first is the criticism of testing in general which in the past decade
has become a popular cause making frequent headlines and even best sellers
(Hoffmann, B., 1962)
Second, there has been a proliferation of jobs requiring licensing and a
hodgepodge of local and state legislated bodies -emerging to control the process.
Benjamin Shimberg (one of our discussants) and others (1972) have \%Titten a
report entitled Occupational Licensing and Public Policy , \vhich raises these-
issues- It the only up-to-date and comprehensive document I was able to
locate and it provided an excellent overview in itself of licensing practices
in various occupations and their dubious quality.
Third, the civil rights movement has continued to make inroads against
discrimination, specifically here concerned with discriminatory practices in
hiring and occupations access. Equal Employment Opportunity Ccsnmission
Guidelines, 1970, focuses attention on test validation in anployment situations
and there is reason. te1)elieve from various recent court decisions (such as
Griggs vs. aike Pouter Company, 401, U.S. .424, 1971) that the federal guidelines
could be applied to licensing situations. The guidelines require that evidence
of a testes validity:
. .should consist of empirical data demonstrating that
the test is predictive of or significantly correlated
with important elements of work behavior which comprcnnise
or are relevent to the jobs for which candidates are being
evaluated. Empirical evidence in support of a testes
validity must be based on studies emphasizing generally
accepted procedures, such as tnose aescrioea m btandarcis
for Educational and Psychological Tests and NIanuals , published
by the Merican Psychological i\ssociation. However , evidence
for content or construct validity should be accompanied by
sufficient information from job analysis to danonstrate
the relevance of the content or contruct."
The November 'WA Nfonitor'* clipping I included describes ?ome recent
extensions of the guidelines to local and state governments. (Oar next speaker,
Thomas Goolsby, Jr., will bring us up to date and discuss the legal questions
Fourth, are challenges to access' bein^g made to many professions to
obtain status through alternatives to the traditional curriculum/school based
training routes. This becomes a question of who qualifies to take a licensure
exam? Are. such exams really to protect the public or a limit access by those
who have already made it? If exams are not proven valid in terms of job needs
and as they are in most cases controlled by the professions themselves, then
this is a meaningful issue for those who seek entry through alternative routes.
For example, cases as reported of returning army medics who sought to take the
RN licensure exam were denied on the grounds of not having graduated fran
It can be said that licensing is going through a period of questioning.
For a number of reasons including questions of federal legality, licensing
agencies are apt to soon be challenged to prove their tests are valid predictions
of job perfoimance significantly measuring job-related skills. It seems unlikely
that any less will be acceptable*
Availability of Information
Despite a growing concern for licensure and validation in particular,
there is a surprising lack of information and research on the topic. This is
especially true in attempting to relate licensure to job performance. Tlie
information I was able to locate on licensing and related performance testing
was scanty, often in progress, and done in subject matter areas rather than
considered collectively as a methodological problem* In many cases, material
was not available through generally accessible professional media and in sane
cases, professions considered such information confidential.
(This lack of information encouraged me to include with this papev the
complete bibliography from my WLN study, hoping to save you the considerable
trouble I ucnt through in collecting sources.)
Maslow (1971) , (one of our discussants) \A\o was at the time with the
Civil Service Commission Research Center, advised the Council on Occupation
"I am convinced that we need to sharpen our ability to develop
and demonstrate the rational relationship between the job require-
ments and the measurement system used to certify or qualify people
for an occupation. A number of tecliniques are available to improve
the process of job analysis to get a much more exact fix on the
critical requirements for the work to be done. I would urge, there-
fore, that especially in examinations for occupational knowledge and
proficiency, you insist, at the very least, on a clearcut showing ^
of how one proceeds frcsn the decision as to the skills and abilities
required -for effective performance to the decisions that certain tests
or other measures will insure that the applicant can adequately perform
in that occupation/*
Validation Studies: Tlie Problem
How have licensure validation studies been done? How should they be done?
What do the studies available tell us? (This audience need not be raninded
of the four generally accepted types of validity.)
Validation studies of licensure exams are rare. SeldcM is the test
development process that sophisticated or comprehensive. Many occupational
groups call in teachers of their trade and/or practitioners at some point in
the test development process. At worst, it is a rubber stmp operation. At
best, it can approach a content validation methodology, but the quality of
the process is limited by the adequacy of the universe specifications or how
well the content from which the sample or test is drawn is defined and described.
A second limiting factor has to do with how systematically the comments are
requested, recorded and used. Such exercises are seldom reported except to say
that they exist.
In my opinion, predictive criterion-related validation studies are the
type most closely fitting the expressed purposes of licensure exams, that of
assuring minimal competency on the job for the protection of the public. (The
second "APA Monitor" clipping I have attached speaks to some professional
disagreement on this matter) . Concern is with a criterion not yet obtainable
at the time of testing and one'wishes to predict an individual's outcome prior
to that situation occurring. They are 'selecting out' tests, as licensure
was previously defined. Clearly, this suggests a research problem in itself,
as those who fail are kept from practi(:e and usually are not considered part
of a validity study, as they are not practicing and available for observation
in that job.
However, the major problem in predictive studies is finding appropriate
job-related criteria. As Anastasi (1972) said: \
"Insofar as predictors are evaluated on the basis of their
criterion measures, a validation study can be no better than the
quality of its criterion data. Yet, in real- life situations, good
criterion data are hard to come by." ' i
Shimberg and others (1972) cite a similar, added logistical problem in
regard to validation of licensure tests:
"Individuals are licensed by a board, but once licensed they
work for different employers- -possibly in widely scattered locations.
Any board that seeks to validate its tests by following up on the
performance of each licensee faces a formidable task."
I think it can fairly be said that validation studies of the predictive
type demanding job- related criteria are difficult to develop, time-consuming,
impractical and expensive to perform. Psychometric , methodology offers little
guidance for such validation studies. The area of licensure in particular
lacks the "classic" studies familiar to those schooled in psychological testing.
aice this is comprehended, the fact that such validations are rare, almost
non-existant^ is less surprising but nevertheless disconcerting.
M Exanple of Validity Evidence: RN Licensure
Nursing ^vas selected as the occupational example \;ith which I am most
familiar; and because jon many of the findings in the Occupational Licensing
and Public Policy report referred to above, the exam for RN licensure wmld
rate high in comparison with other licensure exams reported upon. It is
developed according to accepted test procedures, given under carefully controlled
conditions, scored objectively and serves all states. To illustrate by comparison,
some occupations build tests upon available text book questions (barbering)
or make choices from a local file of essay Questions (merchant marines) . Itost
local or state exams have no reciprocity arrangements.
The RN licensure exam has never been directly validated, though rather
typical content cheeks by nurse educators are routinely done. However, two
types of studies are available which used the licensure exam as the criterion
data, those that use the exam scores as a criterion variable in validity studies
of other nursing tests, and studies which attempted to predict directly RN licen-
sure scores. It is easier to use success on the licensure exam than to
determine vviu^t constitutes success on the job or build an instrument to ewer
a multitude of job situations. For this reasons, the NLN uses the licensure
exam to validate the predictive use of their pre-nursing exam. A high degree
of relationship is found between the two. The RN licensure exam also correlates
highly with the NLN achievement tests. However, a number of smaller studies,
less definite but fairly consistant found that through theory grades were good
predictors of licensure scores, clinical course grades were not; and correlations
between theory and clinical course grades were lower than expected.
One can say with some confidence then, that the RN licensure exams are
highly related to academic achievement but are such achievement measures necessarily
related to the minimum competency required for the practice of beginning
O nursing? Obviously there is a necessary cognitive knowledge component to any
job but i.s it sufficient? "Is it possible, for instance," as one researcher
asks (Taylor, and other, 1966), "that students who do better in clinical practice
courses than in more traditional academic classes will be more successful in
actual work situations?" If in this or other fields licensure esxams are more
related to academic success than job performance, such findings will not only
require changes in the licensure exams but more far reaching questioning of
the curriculum and of the underlying occupational structure.
Testing Research and Job Performance
What does testing research suggest concerning the predictive validity
of paper and pencil tests which are known to be highly related to success
in school I school curriculum or academic grades? World War II Naval research
is coimonly credited as the iint at which it became recognized that paper and
pencil te^s, though highly correlated with final course grades, were not
effecient predictors of job performance.
"Although it had been assumed that written tests sufficed
to indicate wiiat a man had learned in a service school, the
evidence showed that performance tests and improved shop grades
were not closely correlated with written test grades. -Daring
tryout in Gunners' Nfates School, performance tests correlated
from .14 to .35 with written tests and only slightly higher
with final grades which were based largely on witten tests."
These same written tests were also found to correlate well with reading tests
(Gulliksen, 1950). Efforts were made following these findings to introduce
' more practical work and performance testing to the training.
This lack of relation between achievement as measured by traditional
paper-and-pencil tests and performance measures, which appears in studies
as diverse as education (Quirk and others, 1972) and engineering (Honphill, 1963)
suggests the great importance of test validation for licensure and certification.
Although much lip service is given to the concept, it is seldom perfoimed in
Q an acceptable manner.
I^ans and Fredericksen (1951) sum up this point frcan a measurement
"From the standpoint of validity one of the most serious
errors committed in the field of human measurement has been that
which assumes the high correlation of knowledge of facts and
principles on the one hand and performance on the other. Nevertheless,
examinations for admission to the oar, for medical practice, for
teaching. . .are predominantly verbal tests of fact and principle
in the respective fields."
If training and knowle<ige variables are not necessarily sufficient to
define job proficiency, where does one look?
Per fom^ance Testing: Examples and Development
one accepts Fitzpatrick's and Nforrison's (1971) definition of perfomance
testing as a test which is relatively realistic > then it is logical to look
here for the answer to our questions of (1) how to validate licensure exams
more effectively and (2) how to revise licensure tests if necessary.
The most interesting and well documented use I found concerning performance
measures in predictive validation research ^vas in the area of employee selection
and promdrion. Besides the monetary incentive for making a correct decision^
an employers* situation has numerous advantages over licensure boards, such
as control over subjects, the limited range of jobs and job descriptive infor-
mation, and the possibility of gradually implementing a testing program, allowing
research time to study predictions without actually implementing them.
Assessment centers are a performance-based type of employment or promotion
screening device. The technique was originally devised to select secret service
agents during World War II and applied in industrial situations by AT5T in
the fifties. The procedures (Byham, 1970):
. .simulates *live' the basic situations with which a manager
would be faced if he were moved up and develops information about
how well he will cope at the higher level before the decision to
promote him is actually made."
The assessors at the centers are trained observers, the exercises arc standardized,
and the condition? are constant and relatively realistic. This allows more
valid comparative judgments to be made than in the 'real world*.
Two kinds of validity 3tudies have been done. In an experimental setting,
reports of the assessment are not released to managanent; thus no decisions
are made on the basis of the assessment. The predictions are then compared
with actual performance by some rating and/or obsenration technique, and other
indicators of job success. If reports are released, which is more common but
less conducive to sound validation, studies are then based on comparing those
promoted before assesanent center results were available to those prcMOted with
this information, or by simply canparing progress of candidates promoted using
assesanent center reports and subsequent performance. According to Byham,
all validation methods have tentatively pointed to the same conclusion:
"The assessment center technique has sLown itself a better
indicator of future success than any other tool management lias
For a more descriptive example of how one such center works and the validation
process was given by Bray and Campbell of AT^T (1968). Though the assessment
center concept could be used as a validation tool, as an ongoing technique for
licensure examinations it is obviously unrealistic.
' To illustrate a more practical \approach to introducing reality into the
testing situation the medical profession has developed two types of programmed
testing of clinical competence to simulate performance on objectively scored ^
paper and pencil tests. The National Board of Medical Examiners first introduced
the conceit (Hubbard, 1964) and now uses programmed testing for the medical
licensing exam Part III on clinical competence, which previously was a practical
bedside type of oral exam. This is a linear model, while certification speciality
exams use a branching model (McGuire and Babbott, 1967). In both, the examinee
is confronted by a realistic clinical situation and precedes through a series of
decision choices, each step accompanied by an increment of infomation upon
vMch the next depends, similar to programmed teaching. In the branching model
the difference is that decision cLjices change, based upon previous choices
allowing more than one route to a solution.
Neither variation has been validated in relation to predicting on the
job performance but some work is in progress. The Part III or clinical competence
exam is said to derive its validity from, among t^.^igs, measuring something
different from Parts I and II, related to medical school^ course work. Cronbach
(1970) having reviewed the validity evidence on Part III notes: ^^Follow up
studies are needed to be sure that the test measures a skill of medical practice
and not just ingenuity in test taking.'*
(Other examples of performance tests can be found in The Handbook of
Performarlce Testing by Boyd and Shiju^crg (1971) , although most are of a mechanical
Similarly to the problems confronted by those attanpting predictive
validation of licensure tests, performance test developnent logically begins
with a study of specific skills and abilities involved in the activities the
test is designed to measure or predict. The next step is the choice of represent-
ative tasks v;hich strongly influences the validity of the performance test(s).
Other difficulties with performance testing come frcrni a lack of applied
methodology in that performance tests are by nature criterion-referenced,
and procedures for estimating reliability and validity are meager.*
Nfast literature on performance tests discuss them as a new fom of assessment
* Licensure and certification exams have been discussed as types o'f minimal
competency exams and like performance measures would normally be considered
criterion referenced. The examinee is theoretically tested in terms of an
absolute criteria,! and canparisons among test takers is not a licensing
purpose. However, most licensing tests are developed on norm- referenced
models and the purposes. (I hope Robert Frary will bring this point into
Q his discussion of methodology). ^ .
used to increase the realism of the test. The primary interest in performance
tests expressed here is less commonly discussed, that o£ providing the criteria
^ i -
for predictive validation. The only description of such a research use I
1— ' ' was a theorectical discussion on ''Providing a Criterion Measure'' by
It>an and Frederiksen, 22 years ago (1951) :
"IVhen the behavior involved in a situation i/^ broad enough
and representative enough of the situation as a whole, the per-
formance is itself the criterion behavior for that situation.
Consequently, perfonnance test data, particularly when they refer
to work samples, provide a more satisfactory measure of criterion
behavior than is usually available. Because perfomance tests
serve as a measure of the criterion, they may be of use in
several important ways.
' Performance test data may provide, first of all, a criterion
for research. Infonnation yielded by perfonnance tests makes
possible the validation of other measures which, although of a
more indirect nature, may be more economical in administration.
In many situations it is difficult and expensive to administer
performance tests to large numbers of examinees. Such situ-
uations demand the construction of psychometric instruments
that ivlll yield measurements related to criterion and will also
be practicable. In the construction of aptitude tests for various
skills and operations, perfonnance tests may provide the criterion
against which the available second order test can be judged."
As methodologists, the validation of licensing and certification exams
presents real and immediate challenges. Here are practical problems, based
on real and current concerns. If each occupation continues struggling on its
own, without serious attempts from a ^oup as we have here today, to provide
an integrated conceptual and methodological framework, solutions will ronain
a long ^vay off.
"APA Monitor" November, 1973
With the establishment of the i£(|ual Employment
Opportunity Commission (EEOC), and lata the
OflBce of Federal Contract Compliance (OFCC),
there were powerful forces examining some f the
discriminatory employment practices in both the
public apd private sectors. Aggrieved groups be-
gan to marshall the law in order to overcome the
past efiFects of employment practices. Tests ar.a
test usage became the key issues in the develop-
ment of these cases. Suddenly, terms which had
been sacrosanct and strictly within the domain of
ps>'chology were being defined by opinions ol
judges in court cases. At the beginning of a case, a
judge might have believed that the validity of a
test depended on the presence of a stamp, hut ii,
the end, by tlic opinion he handed down, he was
defining construct, content, or criterion-related
validity based on the construction of the test.
Governmental guidelines were drawn no by
both. the OFCC and the EEOC to apply to all
instances of test usage in employment. Although
these Guidelines cited the APA Standards, they
clearly stood on their own merits. In 1972, the
Civil Rights .\ct was amended to give regulatory
powers to the EEOC This aniendmest also estab-
hshed the Equal Employment Opportunity Coor-
dinating Cocncil (EEOCC ) which is empowered
with *'the responsibility for developing anl im-
plementing agreements policies and practices de-
signed to maximize effort, promote efficiency, and
eliminate conflict, competition, duplication and
inconsistency among the operations, functions and
jurisdictions of the various departments, agencies
and branches of the Federal government respon-
sible for the implementation and enforcement of
equal employment opportunity legislation, orders,
and policies/* Development of the Uniform Guide-
lines on Employee Selection Procedures" is sthc
first significant cooperative efltort of the Council
composed of representatives from member agen-
cies—Civil Rights Commission, Civil Service Com- ,
mission. Department of Justice, EEOC and the
Department of Labor. The 1972 amendment to
the Civil Rights Act in addition to creating the
Council, puts looal and state governments under
the jurisdiction of the EEOC and spells out non-
disc rim imnat ion requirements for the federal gov-
ernment.. This means that as numy as an ad'h'-
tional 18 million employees are brought under
protection of this Act and will be affected by im-
plementation of these Guidelines. The Guidelines
will also have a profound effect on test develop-
ment and usage.
It is refreshing that the government is being
pro-active in seeking the counsel of the j^sycholog-
ical profession prior to the adoption of these
Guidelines as part of official policy., I urge mem-
bers of APA to review this important document
and to make their views publicly known. You can
receive a copy of the Uniform Guidelines on Em-
ployee Selection Procedures by writing directly
to the Office of Scientific Affairs, APA, 1200 17th
Street, N.W., Washington, D.C. 20036. You should
then make your comments directly to one. of the
member agencies of the Equal Employment Op-
The Council will hold an open meeting for
psychologists to discuss the proposed Uniform
Guidelines in Employee Selection Procedures in
Washington, D.C. at the U.S. Civil Service Com-
I mission. Room 1304, 1900 E St. N.W. beginning at
9 AM on November 15, 1973. Anyone interested
in making a public statement on the guidelines
should contact Mr. David Rose, Employment
Section, Civil Rights Divisiim, U.S. Department of
justice, 550 11th St. N W., Room 1138. Wash-
ington, D.C, (202) 739.aS31. Leona Tyler
"APA ^tonito^" ?, 1973
Civil Service, EEOC
spar over test validity
Two key federal agencies ex-
pressed diametrically opposing
views on the validation of em-
ployment tests during an APA-
sponsored open hearing on the
revised Standards for Develop*
ment and Use of Educational
and Psychological Tests held in
Washington last March.
While praising the Standards
in general. Dr. John S. How-
land, director of tne U.S. Civil
Service Commission's Personnel
Research and Development
Center* took exception with
what he interpreted a.^ an im-
plied endorsement of criterion-
related validity as "the preferred
"Very real constrictions on the
_Tactical usefulness of classic
criterion-related validity make it
less and less attractive in em-
ployment situations" said How-
land. . . Construct validity
deserves equal time and atten-
tion and in the long run may be
the most useful stra^?gy for de-
veloping generality for tests."
^Criterion-related validity stud-
ies are usually not appropri-
ate in our employment situations
were persons are hired from the
top of a list dowr\ in descending
score order. We ;nust rely on a
system in whic!i relevant job
loiowledfi^es, skills, and abilities,
identified through careful job
analysis, are used to identify
content and construct domains
appropriate for assessing job ap-
Dr.. William H Enneis. chief
of Research Studies for the
Equal Employment Opportunit3\
Commission said a *':najor con- ,
cem*^ of his agency "has been/
the shift away from clearly ap-'
propriate criterion-related valid^
ity investigations and the conse-
quent attempt to justify employ-
ee selection procedures under
concepts that have no definite
psychological meaning or empir-
Enneis called the Standards
treatment of criterion-related
validity "excellent," and sug-
gested that if test developers
and users heeded it they would
"in most situations, sati.sfy the
requirements of the EEOC
"Construct validity," he
added, "from an employment
viewpoint, is .extremely diflBcult
to accept because claims have
been made for it without one
shred of evidence that the con-
structs purportedly measured by
the test are actually the same as
those allegedly required on the
Issues of validity will again be
high on the agenda when the
Joint Committee charged with
revising the Standards meets
in mid-May to consider the sug-
gestions made by Rowland, En-
neis and other concerned parties
who appeared at the Washing-
ton hearings and similar meet-
ings In San Francisco and New
The final document is due to
be published this summer.,
At the February New Orleans
meeting sponsored by the Amer-
ican Educational Research Asso-
ciation and the National Coun-
cil on Measurement in Educa-
tion—the two organizations col-
laborating with APA in the revi-
sion effort— some educators rec-
/ommended deletion of Section L
jof the Standards which deals
f with program evaluation.
^ They argued that program
evaluation involves more than
just . testing and could not be
treated properly within the lim-
ited time available before publi-
cation of the Standards. Others
advocated a separate document
on program evaluation.
VALIDATION OF HE STATE BOARD TESl POOL
EXAMINATION FOR RN LICENSURE:
Working Paper prepared for the
National League for Nursing
in conjunction mth the
SBTPE Research Steering Coiunittee
for the RN Examination
By Kathryn A. Hecht, Ed.
National League for Nursing
10 Columbus Circle
NcvV York, New York
Those entries so marked are cited in this paper.
Anastasi, A. 'Technical Critique/* In An Investl ctation of Sour ces of Bias in the Prediction
of Job Performan ce: A Six Year Stud y > Proceedings of Invitational Conference.
Princeton: Educational Testing Seivice, 1972.
Andrews, E. "Certification." In Houston^ W, R. and Howsan (Eds.), Compotenigy-Based
Teacher Education. Chicago: Science Research Associates, 1972.
Bailey, J. T., McDonald, F. J., and Klaus, K. E. An Experiment in Nursing Cuniculums
at a University. San Francisco: San Francisco Medicol Center, University of Cali-
Baldv/in, J. P., Mowbray, J. K., and Raymond, T. G, "Factors Influencing Performance on
State Board Test Pool Examinations." Nursing Research, March-April, 1968, Vol. 17,
pp. 170-2 •
Bircher, A. U. "Nursing Liceijisure and Nursing Turf," Supervisor Nurse , May, 1972, Vol.
3:5, pp. 60-71.
Boyd, J. L., Jr., and Shimberg, B. Handbook on Performance Testing. Princeton: Educa-
tional Testing Service, 1971 .
Brandt, E. M., Hastle, B., and Schumann, D. "Predicting Success on State Beard Examina-
tions." Nursing Research, Winter 1966, Vol. 15:1, pp. 62-8.
Brandt, J. "Licensure, Pros, Cons and Proposals." Chart, May 1972, pp. 74-7.
Bray, D. V/., and Campbell, R. J. "Selection of Salesrren by Means of an Assessment Cen-
ter." Journal of Applied Psychology, 1968, Vol. 52:1, pp. 36-41.
Byham, W. "Assessment Centers for Spotting Future Managers." Harvard Cusine^s Review,
July-August 1970, Vol. 4S:4, pp. 150-67.
Cronbach, L. J. Essentials of Psychological Testing. (Third Edition) New York: Harper and
Cronbach, L. J. "Test Validation." In Thorndike, R. L. (Ed.), Educational Measurement.
Washington, D.C.: American Council on Education, 1971.
Educational Testing Service. Bulletin of Information; General Automobile Mechanic. Prince
Educational Testing Service. An Investigation of Sources of Bias in the Prediction of Job Per*"
formance: A Six Year Study. Proceedings of Invitational Conference: Educational
Testing Service, 1972.
Fine, J. L., Malfette, J.I., and Shoben, E.J. The Development of Criterion for Driver
Behavior. New York: Columbia University Press^ 1965.
* FUzpatrick, R., and Mo:risoi\ E.J. "Performance and Product Evaluation." In Thorndike,
R. L. (Ed.), Educational Measurement (2nd edition). Washington, D.C.: American
Council on Education, 1971.
Ft/ars, G., and Gosnell, D. Nursing Evaluation; The Problem ani the Process; The Critical
Incident Technique . New York; Macmitlan Co., 1964.
. Flanagan, J. C. "The Critical Incident Technique." Psychological Bulletin, 1954, Vol. 51,
Gorham, W. A. "Methods for Measuring Staff Nursing Performance." Nursing Research,
Winter 1963, Vol. 12:1, pp. 4-11.
Gorham, V>^. A. "Staff Nursing Behaviors Contributing to Patient Care and Improvement."
Nursing Research, Spring 1962, Vol. 2;2, pp. 68-77.
* Gulliksen, H. "Intrinsic Vali'dity." American Psychologist, 1950, Vol. 5, pp. 511-17.
* Hemphill, J. K. A Summary of the Engineering Study. Princeton: Educational Testing
Highland, R, W. A Guide for Use in Performance Vsting in Air Force Techmcal Schools,
Colorado: Lowry Air Force Base, 1955. (see Cronbach, 1971)
* Hoffmann, B. The Tyranny of Testing, New York: Crowell-Colfier, 1962.
Hubbard, J. P. Measuring Medical Education. Philadelphia: Lea and Febiger, 1971*
* Hubbard, J. P. "Programmed Testing In the Examinations of the National Board of Medical
Examiners." In 196 3 Invitational Conference on Testing Problems, Princeton: Educa-
tional Testing Service, 1964 •
* Jensen, O. "Some Thoughts on Minimum Competency Interpretations of Performance on
(M-C Item) Achievement Tests." (mimeo). Educational Testing Service, March 1972.
Journal of the American Medical Association (JAMA), "Medical Licensure Stotistics— 1970."
JAMA, June 14, 1971, Vol. 216, p. 11.
Klaus, D. J., Gosnell, D. E., Chowla, M. C,, and Reilly, P, C. Controlling Experience
to Improve Nursing Proficiency: Vol. Ill, Determining Proficient Performance.
Pittsburgh: American Institutes for Research, 1968.
Klaus, D. J., Gosnell, D. E., Jacobs, A. M., Reilly, P. C, and Taylor, J. A. Control-
ling Experience to Improve Nursing Proficiency; Vol. I. Background and Study Plan.
Pittsburgh: American Institutes for Research, 1966.
Ledbetter, P.J. "An i^nalysis of the Performance of Graduates of a Selected Baccalaureate
Program in Nursing with Regard to Selected Standardized Examinations." Dtssertotion
Abstracts. University Microfilms: Ann Arbor. Vol . 29, p^ 3331 -A, ' 1969. (see
Lewy, A., and McGuire, C. "A Study of Alternative Approaches in Estimating the Relia-
btlity of Unconventional Tests." (presented at A.E.R.A., 1966) Centfer for the Study
of Medical Education. Chicago: University of Illinois College of Medicine, 1966.
— — — ^
Litwack, L., Sakata, R., and V/ykle, M. Counseling Evaluation and Student Development.
Philadelphia: W. B. Saunders Co., 1972.
Magnusson, D. Test Theory. Reading, Masiachusetts: Addison-Wesley Publishing Co.,
*Maslow, A. P. "Licensing Tests -Occupational Bridge or Barrier?" Proceedings Third Annual
Conference ♦ Council on Occupation Licensing, July 1971. (sec Shirtiberg, 1972)
*McDonald, F. J. "Evaluation of Teaching Behavior." In Houston, W. R., and Hov»/san
(Eds.), Competency-Based Teacher Education, Chicogo: Science Research Assoc.,
McGuire, C. H. "An Evaluation Model for Professional Education—Medical Education."
1967 Invitational Conference on Testing Problems. Princeton: Educational Testing
McGuire, C.H. Clinical Simulations . Appleton Century Crofts, 1972.
*McGuire, C. H., and Babbott, D. "Simulation Techniques in the Measurement of Problem-
Solving Skills." Journal of Educational Measurement, Spring 1967, Vol. 4:1.
Mueller, E. J., and Lyman, H. B. "The Prediction of Scores on the State Board Test Pool
Examination." Nursing Research . May-June 1969, Vol. 18:3, pp. 263-7.
National League for Nursing, "Analysis of State Board Test Scores 1957." Nursing Outlook.
1959, Vol. 7:5.
National League for Nursing. (1963) In Katzell, M. "Comments on Moll's Report ." NLN
National Lecgue for Nursing. "Safety and Effectiveness of Practice."- Nursing Outlook
("Let's Examine" series), October 1962, Vol. 10.
National League for Nursing. "Standardization, Scoring and Use of Scores on the SBTPE."
SB162-28, pp. 7-11.
♦National League for Nursing. "V.here Those Tests Come From and What They're For." Nurs-
ing Outlpok ("Let's Examine" series), April 1961, Vol. 9.
Nealy, S. M., and Owen, T. W. "A Multitrait-Multimethod Analysis of Predictors and
Criteria of Nursing Performance." Organizational Behavior ond Humon Performance,
1970, Vol. 5, pp. 348-65.
Polef, S., and Stewart, CP. "Development of a Rating Scale." Studies in Personnel
Psychology, October 1971, Vol. 3:2, pp. 7-20.
Pophom, J. W. "Performance Tests of Teaching Proficiency." American Educational Research
Journal, January 1971, Vol. 8:1, pp. 105-15.
Popham, W. J., and Husek, T. R. "Implications ^f Criterion-Referenced Meosurement . "
Journal of Education Measurement, Spring 196V, Vol, 6:1, pp. 1-9.
Quirk, T.J. "Performance Tests for the Beginning Teachers: Why All the Fuss?" (Invited
address, N.J. Performance Evaluation Project 1971), Princeton: Educational Testing
* Quirk, T, J., Witten, B, J., and Weinberg, S. F. A Criticol Review of Reseorch Related
to the Natio nal Teacher Examinations , Princeton: Educational Testing Service, 1972.
Rosen, A., and Abraham, G. E. "Evaluation of a Procedure for Assessing the Performance of
Staff Nurses." Nursing Research, February 1963, Vol. 14, pp. 78-82.
* Ryans, D. G., end Frederiksen, N. "Performance Tests of Educational Achievement." In
Linquist, E. F. (Ed.), Educationol Measurement, Washington, D.C.; American Coun-
cil of Education, 1951 .
Schmidt, S, "Special Background Presentation." A presentation at the April 1972 meeting of
the SBTPE Reseorch Steering Committee for the RN Examination. (NLN) July, 1972-
Schorr, T. M. "A Critical Balance." American Journal of Nursing, November 1971, Vol.
71:11, p. 2127.
Shields, M. R. "A Project for Curriculum Improvement." Nursing Research, October 1952,
Vol. 1:2, pp. 4-13.
* Shimberg, B,, Esser, B. P., and Kruger, D. H. Occupational Licensing and Public Policy,
Princeton: Educational Testing Service, October 1972.
Slater, D. "The Slater Nursing Competencies Rating Scale." Detroit: College of Nursing;
Wayne State University, 1967*
Smith, P* C, and Kendall, L. M. "Retronslotion of Expectations: An Approach to the Con-
struction of Unambiguous Anchors for Rating Scales.** Journal of Applied Psychology,
1963, Vol. 47:2, pp*. 149-55.
* Stuit, D. B. (Ed.) Personnel Research and Test Development in the U.S. Bureau of Naval
Personnel. Project N-106 and the College Entrance Examination Board. Princeton:
Princeton University Press, 1947. (see Boyd, 1972)
* Taylor, C. W., Nahm, H., Loy, L., Harms, M., Berthold, J., and Wolfer, J. Selection
and Recruitm ent of Nurses and Nursing Students; A Review of Research Studies and
Practices. Salt Lake City; University of Utah Press (rev.), 1966.
Tayior, C. W., Nahm, H., Quinn, M., Harms, M., Mulaik, J., Mulaik, S. Report of
Measurement and Prediction of Nursing Performance. Part I. Factor Analysis of
Nursing Students' Application Dota^ Entrance Test Scores, Achievement Test Scores
ond Grades in Nursing School. Salt ixike City; University of Utah, 1965.
Tiffin, J., and McCormick, E.J. Industrial Psychology (Fifth Edition). Englewood Cliffs:
* U .S . Government . Guidefines fkg Ef^ployee Selection Procedures Title 29 Labor, Chapter
XIV, Equal Employment Opportunity Commission Federal Register. August 1970, Vol.
Wandelt, M. A., and Ager, J. Quality Patient Core Scolo. Detroit: College of Nursing,
Wayne State University, 1970.
Wrigley, M. H. "A Study of the Relationship Between the Scores of Practical Nurses on the
Licensing Examination and Ratings on Their Performance on the Job." Dissertation
Abstracts. University Microfilms: Ann Arbor. Vol. 30, p. 1763-A, 1969.