Skip to main content

Full text of "ERIC ED110481: Overview of Problems Involved in Validating Professional Licensing and Certification Examinations."

See other formats


ED 110 ne^ 

TH 004 754 




Hecht, Kathryn A, 

Overview of Problems Involved in Validating 
Professional Licensing and Certification 
[Apr 74] 

22p*; Paper presented at the Annual Meeting of the 
Na,tional Council on Heasurement in Education 
(Chicago, Illinois, April 16-18, 1974) 

MF-$0.76 HC-$1.58 PLUS POSTAGE 
♦Certification; *Nurses; Performance Tests; 
Predictive Validity; Professional Occupations; 
Selection; State Licensing Boards; Testing; Testing 
Problems ; *Test Validity 


A large amount of profess 
focused upon the ambiguities and problems 
professional licensing and certification 
seems to be a simple problem on the surf a 
of professionals for competence and the p 
policing so that it offers equal fairness 
very complex problem involving unresolved 
methodological issues particularly with e 
are four main areas of concern: (1) criti 
growing^number of jobs requiring licensin 
practices in hiring and occupations acces 
certification through testing. The exampl 
nursing. (Author/DEP) 

ional interest has been 

involved in the conduct of 
through examinations. What 

ce, that being the policing 
ractice of conducting this 

to all, turns out to be a 
conceptual, legal, and 
xamination validity. There 
cism of testing, (2) the 

cf, (3) discriminatory 
and (4)' validity of 

e used is the field of 

************ ♦*****4t***********^**** ************* 

* Documents acquired by ERIC incihde many informal unpublished * 

* materials net available from other>«ources. ERIC makes every effort * 

* to obtain the best copy available^ nevertheless, items of marginal * 

* reproducibility are often encountered and this affects the quality * 

* of the microfiche and hardcopy reproductions ERIC makes available * 

* via the ERIC Document Reproduction Service (EDRS) . EDRS is not * 

* responsible for the quality of the original document. Reproductions * 

* supplied by EDRS are the best that can be made from the original. * 
♦lit lit****** ♦Jit ********************************************* ***^ 



\ . 







Kathryn A. Hecht, Ed.D. 

Center for Nortliern Educational 

University of Alaska 
Fairbanks, Alaska 99701 


' PON", Of^ V i E A Oft 0^ N'ONS 
I'^'tO DO NO'' Sfr t S'>A»'L V wEP«e 

EO. C-'' ON POS T ON 0» POL ry 

Presented at the 1974 Aainual Meeting of the 
National Council on Measurement in Education 
Symposium (Session 7A) 
Validation of Professional Licensing and Certification Exa^ninations : 

A Methodological Dilemna 
Chicago, 1974 




How does an evaluator from Alaska come to be addressing you today on 
validation of licensing exains? Last year, while working as an independent 
consultant, I v.'as asked by the National League for Nursing to do a back- 
ground paper on the validation of the RN (Registered Nurse) licensing 
examination and related work on performance testing. Naively, I thought it 
would be a sijnple task of pulling together what had been done in other pro- 
fessions. It turned out to be a irfuch more complex and interesting task than 
I had expected, and questions and concerns raised during that study led 
directly to our meeting together today. (One o£ (Mr participants, Paul 
Jacobs, is now validation study director for the NLN and he will tell you 
more about the specifics of that effort.) 

Definitions of Licensure and Certification 

First, as part of an overview there is a simple matter of defining 
licensure and certification, . . . only it is not so sii^ile. There is nq 
standard definition nor usage of the terms. For the purposes of our dis- 
cussion I think the most useful definitions are those proposed by 0. Jensen 
(1972). In an unpiiblished paper, he discusses licensure and certification 
as two types of minimum competency testing, in that the purpose of the tests 
is to establish an individual's status with respect to an established go/no-go 
criteria. Licensing is usually a mandatory program designed to protect the 
public from incompetent practitioners, that is, to prevent an individual with 
particular deficiencies fran entering practice, Jensen call this "selecting 
out". Certification, on the other hand, is ujually a voluntary program with 
the emphasis on granting special status to an individual with more than run- 
of-the-mill knowledge, ability, and skill. This Jensen calls "selecting- in". 


Perhaps the best known example of a "selecting-out" exam v-ould be a 
driving license, where the public is protected from those whose driving knowledge 
is judged not to be up to standard. Another example is the RN exam, vdiose 
espoused purpose is to "measure minimum safety and effectiveness of practice, 
for the protection of the public" (N.L.N. 1961). Both licenses represent a 
legal right to engage in the appropriate activity. 

Examples of "selecting- in" or certification are the "diplomate" program 
for medical specialities and the new certification program for automobile 
mechanics. They are both exams designed for experienced practitioners which 
provide evidence of superior capability in a specialty within the occupation. 

Since validation deals with the purpose to which the test is intended, 
I believe these interpretations and distinctions to-be important for our dis- 
cussion. It should be obvious that the same test could not serve both licensure 
and certification purposes as defined here. 

Unfortunately, neither this distinction nor any other I can locate fits 
current usages of the terms. For example, teacher certification I believe to 
be a misnomer. It is a legal requirement to begin teaching, to protect the 
public from incompetence and signifies no special standing within the professim. 
I am sure you can think of other cases vAich do not fit the given definition. 

Why the Concern? 

Next, a brief .look at why the growing concern about licensing at this 
time? There are four concerns I will outline briefly. (Several of the 
participants and discussants are especially well qualified to discuss them 

The first is the criticism of testing in general which in the past decade 
has become a popular cause making frequent headlines and even best sellers 
(Hoffmann, B., 1962) 

Second, there has been a proliferation of jobs requiring licensing and a 
hodgepodge of local and state legislated bodies -emerging to control the process. 
Benjamin Shimberg (one of our discussants) and others (1972) have \%Titten a 
report entitled Occupational Licensing and Public Policy , \vhich raises these- 
issues- It the only up-to-date and comprehensive document I was able to 
locate and it provided an excellent overview in itself of licensing practices 
in various occupations and their dubious quality. 

Third, the civil rights movement has continued to make inroads against 

discrimination, specifically here concerned with discriminatory practices in 

hiring and occupations access. Equal Employment Opportunity Ccsnmission 

Guidelines, 1970, focuses attention on test validation in anployment situations 

and there is reason. te1)elieve from various recent court decisions (such as 

Griggs vs. aike Pouter Company, 401, U.S. .424, 1971) that the federal guidelines 

could be applied to licensing situations. The guidelines require that evidence 

of a testes validity: 

. .should consist of empirical data demonstrating that 
the test is predictive of or significantly correlated 
with important elements of work behavior which comprcnnise 
or are relevent to the jobs for which candidates are being 
evaluated. Empirical evidence in support of a testes 
validity must be based on studies emphasizing generally 
accepted procedures, such as tnose aescrioea m btandarcis 
for Educational and Psychological Tests and NIanuals , published 
by the Merican Psychological i\ssociation. However , evidence 
for content or construct validity should be accompanied by 
sufficient information from job analysis to danonstrate 
the relevance of the content or contruct." 

The November 'WA Nfonitor'* clipping I included describes ?ome recent 
extensions of the guidelines to local and state governments. (Oar next speaker, 
Thomas Goolsby, Jr., will bring us up to date and discuss the legal questions 
further) . 



Fourth, are challenges to access' bein^g made to many professions to 
obtain status through alternatives to the traditional curriculum/school based 
training routes. This becomes a question of who qualifies to take a licensure 
exam? Are. such exams really to protect the public or a limit access by those 
who have already made it? If exams are not proven valid in terms of job needs 
and as they are in most cases controlled by the professions themselves, then 
this is a meaningful issue for those who seek entry through alternative routes. 
For example, cases as reported of returning army medics who sought to take the 
RN licensure exam were denied on the grounds of not having graduated fran 
nursing school. 

It can be said that licensing is going through a period of questioning. 
For a number of reasons including questions of federal legality, licensing 
agencies are apt to soon be challenged to prove their tests are valid predictions 
of job perfoimance significantly measuring job-related skills. It seems unlikely 
that any less will be acceptable* 

Availability of Information 

Despite a growing concern for licensure and validation in particular, 
there is a surprising lack of information and research on the topic. This is 
especially true in attempting to relate licensure to job performance. Tlie 
information I was able to locate on licensing and related performance testing 
was scanty, often in progress, and done in subject matter areas rather than 
considered collectively as a methodological problem* In many cases, material 
was not available through generally accessible professional media and in sane 
cases, professions considered such information confidential. 

(This lack of information encouraged me to include with this papev the 
complete bibliography from my WLN study, hoping to save you the considerable 
trouble I ucnt through in collecting sources.) 


Maslow (1971) , (one of our discussants) \A\o was at the time with the 
Civil Service Commission Research Center, advised the Council on Occupation 

"I am convinced that we need to sharpen our ability to develop 
and demonstrate the rational relationship between the job require- 
ments and the measurement system used to certify or qualify people 
for an occupation. A number of tecliniques are available to improve 
the process of job analysis to get a much more exact fix on the 
critical requirements for the work to be done. I would urge, there- 
fore, that especially in examinations for occupational knowledge and 
proficiency, you insist, at the very least, on a clearcut showing ^ 
of how one proceeds frcsn the decision as to the skills and abilities 
required -for effective performance to the decisions that certain tests 
or other measures will insure that the applicant can adequately perform 
in that occupation/* 

Validation Studies: Tlie Problem 

How have licensure validation studies been done? How should they be done? 
What do the studies available tell us? (This audience need not be raninded 
of the four generally accepted types of validity.) 

Validation studies of licensure exams are rare. SeldcM is the test 
development process that sophisticated or comprehensive. Many occupational 
groups call in teachers of their trade and/or practitioners at some point in 
the test development process. At worst, it is a rubber stmp operation. At 
best, it can approach a content validation methodology, but the quality of 
the process is limited by the adequacy of the universe specifications or how 
well the content from which the sample or test is drawn is defined and described. 
A second limiting factor has to do with how systematically the comments are 
requested, recorded and used. Such exercises are seldom reported except to say 
that they exist. 



In my opinion, predictive criterion-related validation studies are the 
type most closely fitting the expressed purposes of licensure exams, that of 
assuring minimal competency on the job for the protection of the public. (The 
second "APA Monitor" clipping I have attached speaks to some professional 
disagreement on this matter) . Concern is with a criterion not yet obtainable 
at the time of testing and one'wishes to predict an individual's outcome prior 
to that situation occurring. They are 'selecting out' tests, as licensure 
was previously defined. Clearly, this suggests a research problem in itself, 
as those who fail are kept from practi(:e and usually are not considered part 
of a validity study, as they are not practicing and available for observation 
in that job. 

However, the major problem in predictive studies is finding appropriate 
job-related criteria. As Anastasi (1972) said: \ 

"Insofar as predictors are evaluated on the basis of their 
criterion measures, a validation study can be no better than the 
quality of its criterion data. Yet, in real- life situations, good 
criterion data are hard to come by." ' i 

Shimberg and others (1972) cite a similar, added logistical problem in 
regard to validation of licensure tests: 

"Individuals are licensed by a board, but once licensed they 
work for different employers- -possibly in widely scattered locations. 
Any board that seeks to validate its tests by following up on the 
performance of each licensee faces a formidable task." 

I think it can fairly be said that validation studies of the predictive 

type demanding job- related criteria are difficult to develop, time-consuming, 

impractical and expensive to perform. Psychometric , methodology offers little 

guidance for such validation studies. The area of licensure in particular 

lacks the "classic" studies familiar to those schooled in psychological testing. 

aice this is comprehended, the fact that such validations are rare, almost 

non-existant^ is less surprising but nevertheless disconcerting. 




M Exanple of Validity Evidence: RN Licensure 

Nursing ^vas selected as the occupational example \;ith which I am most 
familiar; and because jon many of the findings in the Occupational Licensing 
and Public Policy report referred to above, the exam for RN licensure wmld 
rate high in comparison with other licensure exams reported upon. It is 
developed according to accepted test procedures, given under carefully controlled 
conditions, scored objectively and serves all states. To illustrate by comparison, 
some occupations build tests upon available text book questions (barbering) 
or make choices from a local file of essay Questions (merchant marines) . Itost 
local or state exams have no reciprocity arrangements. 

The RN licensure exam has never been directly validated, though rather 
typical content cheeks by nurse educators are routinely done. However, two 
types of studies are available which used the licensure exam as the criterion 
data, those that use the exam scores as a criterion variable in validity studies 
of other nursing tests, and studies which attempted to predict directly RN licen- 
sure scores. It is easier to use success on the licensure exam than to 
determine vviu^t constitutes success on the job or build an instrument to ewer 
a multitude of job situations. For this reasons, the NLN uses the licensure 
exam to validate the predictive use of their pre-nursing exam. A high degree 
of relationship is found between the two. The RN licensure exam also correlates 
highly with the NLN achievement tests. However, a number of smaller studies, 
less definite but fairly consistant found that through theory grades were good 
predictors of licensure scores, clinical course grades were not; and correlations 
between theory and clinical course grades were lower than expected. 

One can say with some confidence then, that the RN licensure exams are 
highly related to academic achievement but are such achievement measures necessarily 
related to the minimum competency required for the practice of beginning 
O nursing? Obviously there is a necessary cognitive knowledge component to any 



job but i.s it sufficient? "Is it possible, for instance," as one researcher 
asks (Taylor, and other, 1966), "that students who do better in clinical practice 

courses than in more traditional academic classes will be more successful in 


actual work situations?" If in this or other fields licensure esxams are more 
related to academic success than job performance, such findings will not only 
require changes in the licensure exams but more far reaching questioning of 
the curriculum and of the underlying occupational structure. 

Testing Research and Job Performance 

What does testing research suggest concerning the predictive validity 

of paper and pencil tests which are known to be highly related to success 

in school I school curriculum or academic grades? World War II Naval research 

is coimonly credited as the iint at which it became recognized that paper and 

pencil te^s, though highly correlated with final course grades, were not 

effecient predictors of job performance. 

"Although it had been assumed that written tests sufficed 
to indicate wiiat a man had learned in a service school, the 
evidence showed that performance tests and improved shop grades 
were not closely correlated with written test grades. -Daring 
tryout in Gunners' Nfates School, performance tests correlated 
from .14 to .35 with written tests and only slightly higher 
with final grades which were based largely on witten tests." 
(Stuit, 1947) 

These same written tests were also found to correlate well with reading tests 
(Gulliksen, 1950). Efforts were made following these findings to introduce 
' more practical work and performance testing to the training. 

This lack of relation between achievement as measured by traditional 
paper-and-pencil tests and performance measures, which appears in studies 
as diverse as education (Quirk and others, 1972) and engineering (Honphill, 1963) 
suggests the great importance of test validation for licensure and certification. 
Although much lip service is given to the concept, it is seldom perfoimed in 
Q an acceptable manner. 

ERIC 10 



I^ans and Fredericksen (1951) sum up this point frcan a measurement 

"From the standpoint of validity one of the most serious 
errors committed in the field of human measurement has been that 
which assumes the high correlation of knowledge of facts and 
principles on the one hand and performance on the other. Nevertheless, 
examinations for admission to the oar, for medical practice, for 
teaching. . .are predominantly verbal tests of fact and principle 
in the respective fields." 

If training and knowle<ige variables are not necessarily sufficient to 
define job proficiency, where does one look? 

Per fom^ance Testing: Examples and Development 

one accepts Fitzpatrick's and Nforrison's (1971) definition of perfomance 
testing as a test which is relatively realistic > then it is logical to look 
here for the answer to our questions of (1) how to validate licensure exams 
more effectively and (2) how to revise licensure tests if necessary. 

The most interesting and well documented use I found concerning performance 
measures in predictive validation research ^vas in the area of employee selection 
and promdrion. Besides the monetary incentive for making a correct decision^ 
an employers* situation has numerous advantages over licensure boards, such 
as control over subjects, the limited range of jobs and job descriptive infor- 
mation, and the possibility of gradually implementing a testing program, allowing 
research time to study predictions without actually implementing them. 

Assessment centers are a performance-based type of employment or promotion 

screening device. The technique was originally devised to select secret service 

agents during World War II and applied in industrial situations by AT5T in 

the fifties. The procedures (Byham, 1970): 

. .simulates *live' the basic situations with which a manager 
would be faced if he were moved up and develops information about 
how well he will cope at the higher level before the decision to 
promote him is actually made." 

ERJC 11 



The assessors at the centers are trained observers, the exercises arc standardized, 

and the condition? are constant and relatively realistic. This allows more 

valid comparative judgments to be made than in the 'real world*. 

Two kinds of validity 3tudies have been done. In an experimental setting, 

reports of the assessment are not released to managanent; thus no decisions 

are made on the basis of the assessment. The predictions are then compared 

with actual performance by some rating and/or obsenration technique, and other 

indicators of job success. If reports are released, which is more common but 

less conducive to sound validation, studies are then based on comparing those 

promoted before assesanent center results were available to those prcMOted with 

this information, or by simply canparing progress of candidates promoted using 

assesanent center reports and subsequent performance. According to Byham, 

all validation methods have tentatively pointed to the same conclusion: 

"The assessment center technique has sLown itself a better 
indicator of future success than any other tool management lias 
yet devised.'' 

For a more descriptive example of how one such center works and the validation 
process was given by Bray and Campbell of AT^T (1968). Though the assessment 
center concept could be used as a validation tool, as an ongoing technique for 
licensure examinations it is obviously unrealistic. 

' To illustrate a more practical \approach to introducing reality into the 
testing situation the medical profession has developed two types of programmed 
testing of clinical competence to simulate performance on objectively scored ^ 
paper and pencil tests. The National Board of Medical Examiners first introduced 
the conceit (Hubbard, 1964) and now uses programmed testing for the medical 
licensing exam Part III on clinical competence, which previously was a practical 
bedside type of oral exam. This is a linear model, while certification speciality 
exams use a branching model (McGuire and Babbott, 1967). In both, the examinee 
is confronted by a realistic clinical situation and precedes through a series of 


decision choices, each step accompanied by an increment of infomation upon 
vMch the next depends, similar to programmed teaching. In the branching model 
the difference is that decision cLjices change, based upon previous choices 
allowing more than one route to a solution. 

Neither variation has been validated in relation to predicting on the 
job performance but some work is in progress. The Part III or clinical competence 
exam is said to derive its validity from, among t^.^igs, measuring something 

different from Parts I and II, related to medical school^ course work. Cronbach 
(1970) having reviewed the validity evidence on Part III notes: ^^Follow up 
studies are needed to be sure that the test measures a skill of medical practice 
and not just ingenuity in test taking.'* 

(Other examples of performance tests can be found in The Handbook of 
Performarlce Testing by Boyd and Shiju^crg (1971) , although most are of a mechanical 
technical, variety). 

Similarly to the problems confronted by those attanpting predictive 
validation of licensure tests, performance test developnent logically begins 
with a study of specific skills and abilities involved in the activities the 
test is designed to measure or predict. The next step is the choice of represent- 
ative tasks v;hich strongly influences the validity of the performance test(s). 
Other difficulties with performance testing come frcrni a lack of applied 
methodology in that performance tests are by nature criterion-referenced, 
and procedures for estimating reliability and validity are meager.* 

Nfast literature on performance tests discuss them as a new fom of assessment 

* Licensure and certification exams have been discussed as types o'f minimal 
competency exams and like performance measures would normally be considered 
criterion referenced. The examinee is theoretically tested in terms of an 
absolute criteria,! and canparisons among test takers is not a licensing 
purpose. However, most licensing tests are developed on norm- referenced 
models and the purposes. (I hope Robert Frary will bring this point into 
Q his discussion of methodology). ^ . 

ERIC 13 


used to increase the realism of the test. The primary interest in performance 

tests expressed here is less commonly discussed, that o£ providing the criteria 

^ i - 

for predictive validation. The only description of such a research use I 

1— ' ' was a theorectical discussion on ''Providing a Criterion Measure'' by 

It>an and Frederiksen, 22 years ago (1951) : 

"IVhen the behavior involved in a situation i/^ broad enough 
and representative enough of the situation as a whole, the per- 
formance is itself the criterion behavior for that situation. 
Consequently, perfonnance test data, particularly when they refer 
to work samples, provide a more satisfactory measure of criterion 
behavior than is usually available. Because perfomance tests 
serve as a measure of the criterion, they may be of use in 
several important ways. 

' Performance test data may provide, first of all, a criterion 
for research. Infonnation yielded by perfonnance tests makes 
possible the validation of other measures which, although of a 
more indirect nature, may be more economical in administration. 
In many situations it is difficult and expensive to administer 
performance tests to large numbers of examinees. Such situ- 
uations demand the construction of psychometric instruments 
that ivlll yield measurements related to criterion and will also 
be practicable. In the construction of aptitude tests for various 
skills and operations, perfonnance tests may provide the criterion 
against which the available second order test can be judged." 

As methodologists, the validation of licensing and certification exams 
presents real and immediate challenges. Here are practical problems, based 
on real and current concerns. If each occupation continues struggling on its 
own, without serious attempts from a ^oup as we have here today, to provide 
an integrated conceptual and methodological framework, solutions will ronain 
a long ^vay off. 


"APA Monitor" November, 1973 

Testing and 
Equal Employment 

With the establishment of the i£(|ual Employment 
Opportunity Commission (EEOC), and lata the 
OflBce of Federal Contract Compliance (OFCC), 
there were powerful forces examining some f the 
discriminatory employment practices in both the 
public apd private sectors. Aggrieved groups be- 
gan to marshall the law in order to overcome the 
past efiFects of employment practices. Tests ar.a 
test usage became the key issues in the develop- 
ment of these cases. Suddenly, terms which had 
been sacrosanct and strictly within the domain of 
ps>'chology were being defined by opinions ol 
judges in court cases. At the beginning of a case, a 
judge might have believed that the validity of a 
test depended on the presence of a stamp, hut ii, 
the end, by tlic opinion he handed down, he was 
defining construct, content, or criterion-related 
validity based on the construction of the test. 

Governmental guidelines were drawn no by 
both. the OFCC and the EEOC to apply to all 
instances of test usage in employment. Although 
these Guidelines cited the APA Standards, they 
clearly stood on their own merits. In 1972, the 
Civil Rights .\ct was amended to give regulatory 

powers to the EEOC This aniendmest also estab- 
hshed the Equal Employment Opportunity Coor- 
dinating Cocncil (EEOCC ) which is empowered 
with *'the responsibility for developing anl im- 
plementing agreements policies and practices de- 
signed to maximize effort, promote efficiency, and 
eliminate conflict, competition, duplication and 
inconsistency among the operations, functions and 
jurisdictions of the various departments, agencies 
and branches of the Federal government respon- 
sible for the implementation and enforcement of 
equal employment opportunity legislation, orders, 
and policies/* Development of the Uniform Guide- 
lines on Employee Selection Procedures" is sthc 
first significant cooperative efltort of the Council 
composed of representatives from member agen- 
cies—Civil Rights Commission, Civil Service Com- , 
mission. Department of Justice, EEOC and the 
Department of Labor. The 1972 amendment to 
the Civil Rights Act in addition to creating the 
Council, puts looal and state governments under 
the jurisdiction of the EEOC and spells out non- 
disc rim imnat ion requirements for the federal gov- 
ernment.. This means that as numy as an ad'h'- 
tional 18 million employees are brought under 
protection of this Act and will be affected by im- 
plementation of these Guidelines. The Guidelines 
will also have a profound effect on test develop- 
ment and usage. 

It is refreshing that the government is being 
pro-active in seeking the counsel of the j^sycholog- 
ical profession prior to the adoption of these 
Guidelines as part of official policy., I urge mem- 
bers of APA to review this important document 
and to make their views publicly known. You can 
receive a copy of the Uniform Guidelines on Em- 
ployee Selection Procedures by writing directly 
to the Office of Scientific Affairs, APA, 1200 17th 
Street, N.W., Washington, D.C. 20036. You should 
then make your comments directly to one. of the 
member agencies of the Equal Employment Op- 
portunity Council, 

The Council will hold an open meeting for 
psychologists to discuss the proposed Uniform 
Guidelines in Employee Selection Procedures in 
Washington, D.C. at the U.S. Civil Service Com- 
I mission. Room 1304, 1900 E St. N.W. beginning at 
9 AM on November 15, 1973. Anyone interested 
in making a public statement on the guidelines 
should contact Mr. David Rose, Employment 
Section, Civil Rights Divisiim, U.S. Department of 
justice, 550 11th St. N W., Room 1138. Wash- 
ington, D.C, (202) 739.aS31. Leona Tyler 

APA President 



"APA ^tonito^" ?, 1973 

Civil Service, EEOC 
spar over test validity 

Two key federal agencies ex- 
pressed diametrically opposing 
views on the validation of em- 
ployment tests during an APA- 
sponsored open hearing on the 
revised Standards for Develop* 
ment and Use of Educational 
and Psychological Tests held in 
Washington last March. 

While praising the Standards 
in general. Dr. John S. How- 
land, director of tne U.S. Civil 
Service Commission's Personnel 
Research and Development 
Center* took exception with 
what he interpreted a.^ an im- 
plied endorsement of criterion- 
related validity as "the preferred 

"Very real constrictions on the 
_Tactical usefulness of classic 
criterion-related validity make it 
less and less attractive in em- 
ployment situations" said How- 
land. . . Construct validity 
deserves equal time and atten- 
tion and in the long run may be 
the most useful stra^?gy for de- 
veloping generality for tests." 

^Criterion-related validity stud- 
ies are usually not appropri- 
ate in our employment situations 
were persons are hired from the 
top of a list dowr\ in descending 
score order. We ;nust rely on a 
system in whic!i relevant job 
loiowledfi^es, skills, and abilities, 
identified through careful job 
analysis, are used to identify 
content and construct domains 
appropriate for assessing job ap- 

Dr.. William H Enneis. chief 
of Research Studies for the 
Equal Employment Opportunit3\ 
Commission said a *':najor con- , 
cem*^ of his agency "has been/ 
the shift away from clearly ap-' 
propriate criterion-related valid^ 
ity investigations and the conse- 
quent attempt to justify employ- 
ee selection procedures under 
concepts that have no definite 

psychological meaning or empir- 
ical legitimacy." 

Enneis called the Standards 
treatment of criterion-related 
validity "excellent," and sug- 
gested that if test developers 
and users heeded it they would 
"in most situations, sati.sfy the 
requirements of the EEOC 

"Construct validity," he 
added, "from an employment 
viewpoint, is .extremely diflBcult 
to accept because claims have 
been made for it without one 
shred of evidence that the con- 
structs purportedly measured by 
the test are actually the same as 
those allegedly required on the 

Issues of validity will again be 
high on the agenda when the 
Joint Committee charged with 
revising the Standards meets 
in mid-May to consider the sug- 
gestions made by Rowland, En- 
neis and other concerned parties 
who appeared at the Washing- 
ton hearings and similar meet- 
ings In San Francisco and New 

The final document is due to 
be published this summer., 

At the February New Orleans 
meeting sponsored by the Amer- 
ican Educational Research Asso- 
ciation and the National Coun- 
cil on Measurement in Educa- 
tion—the two organizations col- 
laborating with APA in the revi- 
sion effort— some educators rec- 
/ommended deletion of Section L 
jof the Standards which deals 
f with program evaluation. 
^ They argued that program 
evaluation involves more than 
just . testing and could not be 
treated properly within the lim- 
ited time available before publi- 
cation of the Standards. Others 
advocated a separate document 
on program evaluation. 



Working Paper prepared for the 
National League for Nursing 
in conjunction mth the 
SBTPE Research Steering Coiunittee 
for the RN Examination 

By Kathryn A. Hecht, Ed. 
February, 1975 

National League for Nursing 
10 Columbus Circle 
NcvV York, New York 

Those entries so marked are cited in this paper. 



Anastasi, A. 'Technical Critique/* In An Investl ctation of Sour ces of Bias in the Prediction 
of Job Performan ce: A Six Year Stud y > Proceedings of Invitational Conference. 
Princeton: Educational Testing Seivice, 1972. 

Andrews, E. "Certification." In Houston^ W, R. and Howsan (Eds.), Compotenigy-Based 
Teacher Education. Chicago: Science Research Associates, 1972. 

Bailey, J. T., McDonald, F. J., and Klaus, K. E. An Experiment in Nursing Cuniculums 
at a University. San Francisco: San Francisco Medicol Center, University of Cali- 
fornia, 1971. 

Baldv/in, J. P., Mowbray, J. K., and Raymond, T. G, "Factors Influencing Performance on 
State Board Test Pool Examinations." Nursing Research, March-April, 1968, Vol. 17, 

pp. 170-2 • 

Bircher, A. U. "Nursing Liceijisure and Nursing Turf," Supervisor Nurse , May, 1972, Vol. 
3:5, pp. 60-71. 

Boyd, J. L., Jr., and Shimberg, B. Handbook on Performance Testing. Princeton: Educa- 
tional Testing Service, 1971 . 

Brandt, E. M., Hastle, B., and Schumann, D. "Predicting Success on State Beard Examina- 
tions." Nursing Research, Winter 1966, Vol. 15:1, pp. 62-8. 

Brandt, J. "Licensure, Pros, Cons and Proposals." Chart, May 1972, pp. 74-7. 

Bray, D. V/., and Campbell, R. J. "Selection of Salesrren by Means of an Assessment Cen- 
ter." Journal of Applied Psychology, 1968, Vol. 52:1, pp. 36-41. 

Byham, W. "Assessment Centers for Spotting Future Managers." Harvard Cusine^s Review, 
July-August 1970, Vol. 4S:4, pp. 150-67. 

Cronbach, L. J. Essentials of Psychological Testing. (Third Edition) New York: Harper and 
Row, 1970. 

Cronbach, L. J. "Test Validation." In Thorndike, R. L. (Ed.), Educational Measurement. 
Washington, D.C.: American Council on Education, 1971. 

Educational Testing Service. Bulletin of Information; General Automobile Mechanic. Prince 
ton: 972. 

Educational Testing Service. An Investigation of Sources of Bias in the Prediction of Job Per*" 
formance: A Six Year Study. Proceedings of Invitational Conference: Educational 
Testing Service, 1972. 

Fine, J. L., Malfette, J.I., and Shoben, E.J. The Development of Criterion for Driver 
Behavior. New York: Columbia University Press^ 1965. 


* FUzpatrick, R., and Mo:risoi\ E.J. "Performance and Product Evaluation." In Thorndike, 

R. L. (Ed.), Educational Measurement (2nd edition). Washington, D.C.: American 
Council on Education, 1971. 

Ft/ars, G., and Gosnell, D. Nursing Evaluation; The Problem ani the Process; The Critical 
Incident Technique . New York; Macmitlan Co., 1964. 

. Flanagan, J. C. "The Critical Incident Technique." Psychological Bulletin, 1954, Vol. 51, 
pp. 327-58. 

Gorham, W. A. "Methods for Measuring Staff Nursing Performance." Nursing Research, 
Winter 1963, Vol. 12:1, pp. 4-11. 

Gorham, V>^. A. "Staff Nursing Behaviors Contributing to Patient Care and Improvement." 
Nursing Research, Spring 1962, Vol. 2;2, pp. 68-77. 

* Gulliksen, H. "Intrinsic Vali'dity." American Psychologist, 1950, Vol. 5, pp. 511-17. 

* Hemphill, J. K. A Summary of the Engineering Study. Princeton: Educational Testing 

Service, 1963. 

Highland, R, W. A Guide for Use in Performance Vsting in Air Force Techmcal Schools, 
Colorado: Lowry Air Force Base, 1955. (see Cronbach, 1971) 


* Hoffmann, B. The Tyranny of Testing, New York: Crowell-Colfier, 1962. 
Hubbard, J. P. Measuring Medical Education. Philadelphia: Lea and Febiger, 1971* 

* Hubbard, J. P. "Programmed Testing In the Examinations of the National Board of Medical 

Examiners." In 196 3 Invitational Conference on Testing Problems, Princeton: Educa- 
tional Testing Service, 1964 • 

* Jensen, O. "Some Thoughts on Minimum Competency Interpretations of Performance on 

(M-C Item) Achievement Tests." (mimeo). Educational Testing Service, March 1972. 

Journal of the American Medical Association (JAMA), "Medical Licensure Stotistics— 1970." 
JAMA, June 14, 1971, Vol. 216, p. 11. 

Klaus, D. J., Gosnell, D. E., Chowla, M. C,, and Reilly, P, C. Controlling Experience 
to Improve Nursing Proficiency: Vol. Ill, Determining Proficient Performance. 
Pittsburgh: American Institutes for Research, 1968. 

Klaus, D. J., Gosnell, D. E., Jacobs, A. M., Reilly, P. C, and Taylor, J. A. Control- 
ling Experience to Improve Nursing Proficiency; Vol. I. Background and Study Plan. 
Pittsburgh: American Institutes for Research, 1966. 

Ledbetter, P.J. "An i^nalysis of the Performance of Graduates of a Selected Baccalaureate 
Program in Nursing with Regard to Selected Standardized Examinations." Dtssertotion 
Abstracts. University Microfilms: Ann Arbor. Vol . 29, p^ 3331 -A, ' 1969. (see 
Litwack, 1972) 

Lewy, A., and McGuire, C. "A Study of Alternative Approaches in Estimating the Relia- 

btlity of Unconventional Tests." (presented at A.E.R.A., 1966) Centfer for the Study 

of Medical Education. Chicago: University of Illinois College of Medicine, 1966. 
— — — ^ 

Litwack, L., Sakata, R., and V/ykle, M. Counseling Evaluation and Student Development. 

Philadelphia: W. B. Saunders Co., 1972. 




Magnusson, D. Test Theory. Reading, Masiachusetts: Addison-Wesley Publishing Co., 

*Maslow, A. P. "Licensing Tests -Occupational Bridge or Barrier?" Proceedings Third Annual 
Conference ♦ Council on Occupation Licensing, July 1971. (sec Shirtiberg, 1972) 

*McDonald, F. J. "Evaluation of Teaching Behavior." In Houston, W. R., and Hov»/san 

(Eds.), Competency-Based Teacher Education, Chicogo: Science Research Assoc., 
Inc., 1972. 

McGuire, C. H. "An Evaluation Model for Professional Education—Medical Education." 

1967 Invitational Conference on Testing Problems. Princeton: Educational Testing 
Service, 1968. 

McGuire, C.H. Clinical Simulations . Appleton Century Crofts, 1972. 

*McGuire, C. H., and Babbott, D. "Simulation Techniques in the Measurement of Problem- 
Solving Skills." Journal of Educational Measurement, Spring 1967, Vol. 4:1. 

Mueller, E. J., and Lyman, H. B. "The Prediction of Scores on the State Board Test Pool 
Examination." Nursing Research . May-June 1969, Vol. 18:3, pp. 263-7. 

National League for Nursing, "Analysis of State Board Test Scores 1957." Nursing Outlook. 
1959, Vol. 7:5. 

National League for Nursing. (1963) In Katzell, M. "Comments on Moll's Report ." NLN 

National Lecgue for Nursing. "Safety and Effectiveness of Practice."- Nursing Outlook 
("Let's Examine" series), October 1962, Vol. 10. 

National League for Nursing. "Standardization, Scoring and Use of Scores on the SBTPE." 
SB162-28, pp. 7-11. 

♦National League for Nursing. " Those Tests Come From and What They're For." Nurs- 
ing Outlpok ("Let's Examine" series), April 1961, Vol. 9. 

Nealy, S. M., and Owen, T. W. "A Multitrait-Multimethod Analysis of Predictors and 

Criteria of Nursing Performance." Organizational Behavior ond Humon Performance, 
1970, Vol. 5, pp. 348-65. 

Polef, S., and Stewart, CP. "Development of a Rating Scale." Studies in Personnel 
Psychology, October 1971, Vol. 3:2, pp. 7-20. 

Pophom, J. W. "Performance Tests of Teaching Proficiency." American Educational Research 
Journal, January 1971, Vol. 8:1, pp. 105-15. 

Popham, W. J., and Husek, T. R. "Implications ^f Criterion-Referenced Meosurement . " 
Journal of Education Measurement, Spring 196V, Vol, 6:1, pp. 1-9. 



Quirk, T.J. "Performance Tests for the Beginning Teachers: Why All the Fuss?" (Invited 

address, N.J. Performance Evaluation Project 1971), Princeton: Educational Testing 
Service . 

* Quirk, T, J., Witten, B, J., and Weinberg, S. F. A Criticol Review of Reseorch Related 

to the Natio nal Teacher Examinations , Princeton: Educational Testing Service, 1972. 

Rosen, A., and Abraham, G. E. "Evaluation of a Procedure for Assessing the Performance of 
Staff Nurses." Nursing Research, February 1963, Vol. 14, pp. 78-82. 

* Ryans, D. G., end Frederiksen, N. "Performance Tests of Educational Achievement." In 

Linquist, E. F. (Ed.), Educationol Measurement, Washington, D.C.; American Coun- 
cil of Education, 1951 . 

Schmidt, S, "Special Background Presentation." A presentation at the April 1972 meeting of 
the SBTPE Reseorch Steering Committee for the RN Examination. (NLN) July, 1972- 

Schorr, T. M. "A Critical Balance." American Journal of Nursing, November 1971, Vol. 
71:11, p. 2127. 

Shields, M. R. "A Project for Curriculum Improvement." Nursing Research, October 1952, 
Vol. 1:2, pp. 4-13. 

* Shimberg, B,, Esser, B. P., and Kruger, D. H. Occupational Licensing and Public Policy, 

Princeton: Educational Testing Service, October 1972. 

Slater, D. "The Slater Nursing Competencies Rating Scale." Detroit: College of Nursing; 
Wayne State University, 1967* 

Smith, P* C, and Kendall, L. M. "Retronslotion of Expectations: An Approach to the Con- 
struction of Unambiguous Anchors for Rating Scales.** Journal of Applied Psychology, 
1963, Vol. 47:2, pp*. 149-55. 

* Stuit, D. B. (Ed.) Personnel Research and Test Development in the U.S. Bureau of Naval 

Personnel. Project N-106 and the College Entrance Examination Board. Princeton: 
Princeton University Press, 1947. (see Boyd, 1972) 

* Taylor, C. W., Nahm, H., Loy, L., Harms, M., Berthold, J., and Wolfer, J. Selection 

and Recruitm ent of Nurses and Nursing Students; A Review of Research Studies and 
Practices. Salt Lake City; University of Utah Press (rev.), 1966. 

Tayior, C. W., Nahm, H., Quinn, M., Harms, M., Mulaik, J., Mulaik, S. Report of 
Measurement and Prediction of Nursing Performance. Part I. Factor Analysis of 
Nursing Students' Application Dota^ Entrance Test Scores, Achievement Test Scores 
ond Grades in Nursing School. Salt ixike City; University of Utah, 1965. 

Tiffin, J., and McCormick, E.J. Industrial Psychology (Fifth Edition). Englewood Cliffs: 
Prentice-Hall, 1965. 



* U .S . Government . Guidefines fkg Ef^ployee Selection Procedures Title 29 Labor, Chapter 

XIV, Equal Employment Opportunity Commission Federal Register. August 1970, Vol. 
149:35. — 

Wandelt, M. A., and Ager, J. Quality Patient Core Scolo. Detroit: College of Nursing, 
Wayne State University, 1970. 

Wrigley, M. H. "A Study of the Relationship Between the Scores of Practical Nurses on the 
Licensing Examination and Ratings on Their Performance on the Job." Dissertation 
Abstracts. University Microfilms: Ann Arbor. Vol. 30, p. 1763-A, 1969. 

ERIC ''^