Failure to consider errors of measurement when using partial correlation or analysis of covariance techniques can result in erroneous conclusions. Certain aspects of this problem are discussed and particular attention is given to issues raised in a recent article by Brewar, Campbell, and Crano. (Author)

A project was designed to develop and test a library of cassette audiotapes for improving the technical skills of educational researchers. Fourteen outstanding researchers from diverse fields were identified, and a short instructional tape was prepared by each. Subjects of the tapes included instructional objectives for intellectual skills, sources of bias in surveys, implications for the next 20 years of change, some precepts for conducting educational research, statistical interactions,...

The purpose of this study was to empirically determine the effects of quantified violations of the underlying assumptions of parametric statistical tests commonly used in educational research, namely the correlation coefficient (r) and the t test. The effects of heterogeneity of variance, nonnormality, and nonlinear transformations of scales were studied separetely and in all combinations. Monte Carlo procedures were followed to generate random digits which had the following shapes: normal,...

The statistician has n independent estimates of a parameter he knows is positive, but, as is the case in components-of-variance problems, some of the estimates may be negative. If the n estimates are to be combined into a single number, we compare the obvious rule, that of averaging the n values and taking the positive part of the result, with that of averaging the positive parts. Although the estimator generated by the second rule is not consistent, it is shown by numerical calculation that...

In some correlational studies it is not reasonable to assume that bivariate observations are uncorrelated. An example would be a configural analysis in which two individuals are correlated across several variables (e.g., Q-technique). The present study was a Monte Carlo investigation of the robustness of techniques used in judging the magnitude of a sample correlation coefficient when observations are correlated. Empirical distributions of r, t, and Fisher's z were generated. Patterns of...

When random assignment has been accomplished and an analysis of covariance (ANCOVA) is being used to correct for initial differences among treatment groups, use of unreliable covariables not only decreases the power of ANCOVA, but also causes ANCOVA to test biased treatment effects. Several correction procedures have been suggested for the single fallible covariable design. The purpose of this paper is to extend the earlier work by describing two alternative correction procedures for the...

An influential statistics test recommends a Levene text for homogeneity of variance. A recent note suggests that Levene's test is upwardly biased for small samples. Another report shows inflated Alpha estimates and low power. Neither study utilized more than two sample sizes. This Monte Carlo study involved sampling from a normal population for all combinations of two to seven variances and equal sample sizes from three to twelve. Alpha was upwardly biased for smaller sample sizes; Alpha was...

The Anchor Test Study provides a method for translating a pupil's score on any one of eight widely used standardized reading tests for Grades 4, 5, and 6 to a corresponding score on any of the other seven tests, as well as furnishing new nationally representative norms for each of the eight tests. In addition, the study presents new estimates of alternate form reliability for each test, provides estimates of the intercorrelations among the tests, and explores empirically some methodological...

The mathematical derivation of the statistics used for inference in some linear models assumes that the values of the independent variables are measured without error. This assumption is often disregarded when these models are utilized in research. This study is an investigation of the consequences of the violation of this assumption for one family of linear models on the magnitude of these statistics and the frequency of Type I error in these models. The results of this study indicate that...

The relationship of sample size to number of variables in the use of factor analysis has been treated by many investigators. In attempting to explore what the minimum sample size should be, none of these investigators pointed out the constraints imposed on the dimensionality of the variables by using a sample size smaller than the number of variables. A review of studies in this area is made as well as suggestions for resolution of the problem. (Author)

A computer-simulated study was made of the sampling distribution of omega squared, a measure of strength of relationship in multivariate analysis of variance which had earlier been proposed by the author. It was found that this measure was highly positively biased when the number of variables is large and the sample size is small. A correction formula for reducing the bias was developed by the method of least squares and was found to yield nearly unbiased corrected values. A simpler,...

The investigation focused on the effects of using grouped data to estimate the relations that exist in data on individuals. Different research contexts were identified in which researchers group observations though interested in relations among measurements on individuals. The consequences of estimating regression coefficients from grouped data were examined from a "structural equations" perspective. A simple linear regression model was hypothesized and then modified by the...

When two groups, initially dissimilar, undergo different treatments, can subsequent differences be partitioned in such a way that the difference between the two treatments is unbiased? This is the central problem of this paper, and it is confronted by the examination of two levels of information using a Follow Through Evaluation. The first information level contains, in addition to outcome variables (achievement tests), information on child characteristics and family background. The second...

Intended as a reference for the convenience of students in sampling, this monograph attempts to express relevant, introductory mathematics and probability in the context of sample surveys. Although some proofs are presented, the emphasis is more on exposition of mathematical language and concepts than on the mathematics per se and rigorous proofs. Many problems are given as exercises so a student may test his interpretation or understanding of the concepts. Most of the mathematics is...

This paper was intended to promote a deeper understanding of a statistical method called balancing developed by National Assessment of Educational Progress. Problems in estimating main effects when populations are disproportionate, balancing solutions to these problems, methods equivalent to balancing, interpretation of balanced results, and some applications are considered and accompanied with examples. It is concluded that properly balanced results or the adjusted marginal means should be...

The effects of the violation of the assumption of normality coupled with the condition of multicollinearity upon the outcome of testing the hypothesis Beta equals zero in the two-predictor regression equation is investigated. A monte carlo approach was utilized in which three differenct distributions were sampled for two sample sizes over thirty-four population correlation matrices. The preliminary results indicate that the violation of the assumption of normality has significant effect upon...

The known interval scale, referred to as the 7.8 scale, has been criticized as an invalid measuring instrument in the form of an attitude scale. It is the purpose of this paper to demonstrate that this scale can produce spuriously inflated correlation coefficients, high reliability, and false significance on statistical tests. The case will be made along two general lines. First, the effects of the scale on reliability, validity, and significance testing will be presented and second the...

Investigated empirically through post mortem item-examinee sampling were the relative merits of two alternative procedures for allocating items to subtests in multiple matrix sampling and the feasibility of using the jackknife in approximating standard errors of estimate. The results indicate clearly that a partially balanced incomplete block design is preferable to random sampling in allocating items to subtests. The jackknife was found to better approximate standard errors of estimate in the...

An attempt was made to determine differences in reading achievement gains and student attitudes towards school between groups of third grade children enrolled in either modified open or traditional classrooms in the same school. The Metropolitan Achievement Test in Reading was used for pre- and posttest comparisons of achievement, and a questionnaire on student attitudes was administered at the end of the school year. Radical differences in the kinds of children assigned to either modified open...

A procedure for predicting categorical outcomes using categorical predictor variables was described by Moonan. This paper describes a related technique which uses prior probabilities, updated by joint likelihoods, as classification criteria. The procedure differs from Moonan's in that the outcome having the greatest posterior probability is selected as the prediction regardless of misclassification cost. It also differs in method of screening and weighting the predictor variables, and treats...

A method of interpolation has been derived that should be superior to linear interpolation in computing the percentile ranks of test scores for unimodal score distributions. The superiority of the logistic interpolation over the linear interpolation is most noticeable for distributions consisting of only a small number of score intervals (say fewer than 10), particularly distributions that are relatively unskewed. Logistic interpolation thus should be useful in practical situations in which...

For one important set of data, namely, the data on unemployment and employment collected by the Census Bureau in its monthly Current Population Survey (CPS), some information on nonsampling error is available that can be used to evaluate the regularly reported labor force figures. The paper is concerned with the nonsampling error that relates specifically to measurement, or response bias (the bias resulting from the interview and enumeration process itself). It examines and tabulates the extent...

Suggesting that James Coleman's paper on massive school desegragation reveals methodological flaws of such magnitude that they raise serious questions as to the validity of the conclusions, this paper addresses a full sequence of perceived methodological errors found in the Coleman document, but does not dismiss the conclusions based upon initial errors, no matter how cogent they be. This approach makes the implicit assumption that, at each step, all previous operations upon the data are both...

This report presents the findings of a research study, conducted by the College of the Mainland (COM) as a subcontractor for Project FOLLOW-UP, designed to test the accuracy of random sampling and to measure non-response bias in mail surveys. In 1975, a computer-generated random sample of 500 students was drawn from a population of 1,256 students who had attended COM in the spring of 1972. A 48% response to a follow-up survey of the sample was achieved. A random subsample of 70 non-respondents...

The unbiased estimate of a "treatment effect" reached by analysis of covariance in a nonrandomized experiment would often require that a different covariate be used in each treatment. A sufficient but unlikely condition for an unbiased estimate is that the covariate for each treatment is (1) the complete covariate that predicts the outcome as fully as possible from initial characteristics of the case, or (2) the complete discriminant that fully represents differences between group...

The development of an index reflecting the probability that the observed correspondence between multiple choice test responses of two examinees was due to chance in the absence of copying was previously reported. The present paper reports the implementation of a statistic requiring less restrictive underlying assumptions but more computation time and a related Bayesian procedure designed to adjust the standard error estimates to counteract the effect of the presence of a substantial proportion...

The National Food and Agriculture Council of the Philippines regularly requires rapid feedback data for analysis, which will assist in monitoring programs to improve and increase the production of selected crops by small scale farmers. Since many other development programs in various subject matter areas also require similar statistical appraisals, this handbook was developed to present and explain the underlying principles and processes of scientific surveying. This includes the fundamentals...

A general interest in attrition, or loss of units from a study, stems in part from the observation that the infrequency of attention to attrition exacerbates problems of data interpretation. As a substudy of the national evaluation of Project Follow Through, the potential biasing effects of attrition of subjects from the sites were investigated. Policy attrition, the administrative dropping of a unit; program attrition, loss of subjects due to mobility, dislike of the treatment, etc.; and...

This study bears on Arthur R. Jensen's latest statement on the heritability of intelligence. Allowing for gene-environment correlation, Jensen (1975) reports that under a wide range of assumptions, the twin data show that one-half to three-fourths of IQ variance is accounted for by genetic factors. This conclusion falls when an arbitrary specification is relaxed. The present study presents Jensen's model, along with a modification. (Author/AM)

Extending the classical twin method, Arthur R. Jensen claims to find unique estimates of the variance and covariance of the genetic and environmental components of IQ. The analyses presented in this paper show that a wide variety of parameter estimates are compatible with his model and data. But the extent of indeterminancy is deemed to be even wider. Furthermore, his model is shown to be not immutable. In particular, Jensen's assumption that one twin's environment is as highly correlated with...

The purpose of this study was to investigate the comparibility of counselor effectiveness ratings made by four different groups. This study examined the relationships among student counselors' self-ratings, peer ratings, supervisor ratings, and client ratings on the Counselor Effectiveness Scale, Form 2, a semantic differential rating scale appropriate for immediate use with raters of varying sophistication. No significant relationships were found between pairs of rating groups on the total...

Using examples from evaluations of the Emergency School Aid Act (ESAA) Basic Grants Program, the ESAA Pilot Program, and the sustaining effects of compensatory education programs, school and student attrition are discussed. As in the example cases, appreciable attrition can be expected in most longitudinal studies. The possible effects of this attrition on descriptive analyses, analyses of student gains for each school year, and analyses of differential achievement gains for different treatment...

Federal assistance for special educational programs makes necessary the regular study of evaluations of thousands of innovations in compensatory education, bilingual education, and reading programs. The results are reported to the President and to Congress. However, investigating organizations find only a few programs with adequate evidence and thousands with faulty evaluation designs. Some of the most common faults are discussed, with examples. There are other factors which lower hopes. If...

Aggregation, or grouping, is a statistical procedure through which all members of a study within a specified range of scores (usually observed scores) are assigned a common or "group" score (for example, the group mean). The various social science methodology literatures agree on the costs of grouping: not only does one always lose information in grouping, in a wide variety of situations grouping introduces systematic error (bias). For most educational research applications the...

Grouping is a statistical procedure through which members of the same group are considered as a single unit of observation. There are methodological and inferential problems associated with various grouping procedures in various settings. This extensive paper focuses on making inferences about individuals when the analysis uses data that is grouped over individuals (for example, school means). The paper identifies five research contexts in which grouping is used, reviews the literature on...

Grouping is a statistical procedure through which members of the same group are considered as a single unit of observation. There are various ways to assign group membership and various ways to assign values of variables to groups. There are methodological problems associated with grouping in general and with particular methods of grouping. This paper argues that a wide variety of complex analytical problems concerning inferences from grouped observations can be understood from the use of a few...

The application of a multivariate analytic technique for the analysis of data from longitudinal designs with multiple dependent variables is presented. The technigue is the multivariate generalization of univariate repeated measures ANOVA. An application of the technique to data collected using materials from the Asian Studies Curriculum Project is included. The example analysis indicated the technique is viable and should be a useful tool for the methodologist/evaluator. (Author)

Fairness or unfairness may be an attribute of a test per se, or of its use, or of its statistical treatment. An hypothetical situation designed to be intrinsically fair and unbiased is used to show that analysis of covariance as a statistical method may introduce bias to the treatment of test scores. In contrast, equipercentile equating methods are shown, in this situation, to result in a fair and unbiased treatment of test scores. A graphic figure illustrates the comparison of the two...

This paper presents a discussion of issues raised in the evaluation of Project Follow Through reported by Abt Associates. The paper suggests that many of the problems inherent in the design of both the program and the evaluation stem from the underlying assumption that one educational model could be found which would best alleviate the educational problems of the poor. The paper suggests that even when the original evaluation design was modified, substantial problems remained. The major issues...

This paper examines the question of the hereditary nature of intelligence and the validity of some of the statistical procedures which have been used in measuring the degree of hereditability. The author feels that proof of the question lacks sufficient scientific rigor for the support of any conclusion, particulary for a question of such political and emotional importance. (CTM)

The Survey of Doctoral Scientists and Engineers (SDSE) itself was the first of a planned series of biennial surveys of manpower in the physical, life and social sciences, mathematics, and engineering, prepared for the National Science Foundation by the Commission on Human Resources of the National Research Council. This evaluation report attempted to examine the SDSE for evidence of nonresponse bias and to identify strengths and weaknesses for the improvement of future manpower studies....

Synthetic estimation is a statistical technique that estimates small-area statistics by combining national estimates of the relevant characteristics with estimates of other known characteristics of the small geographic area. The advantages of the synthetic estimation approach to local estimation are its intuitive appeal, its simplicity, and its low cost. A major disadvantage is its possibile lack of sensitivity to certain local characteristics. Another method used for the same purpose is the...

As in any very large survey, the sample that provided data for the National Longitudinal Study of the High School Class of 1972 (NLS) differed from the random sample of the study design because of nonresponse from 226 of the 1,200 schools in the primary sample, and a small amount of other missing data. Part A of this report, briefly describes the design of the sample, including provisions for selecting alternate schools for those that declined to participate, the detailed stratification plan,...

A simulation study was designed to assess the severity of regression effects when a set of selection scores is also used as pretest scores as this pertains to RMC Model A of the Elementary and Secondary Education Act Title I evaluation and reporting system. Data sets were created with various characteristics (varying data reliability and extremeness of subgroups) that are relevant to the regression phenomenon. These data sets were analyzed to obtain indices of the amount of regression which...

Structural equation models incorporating unmeasured variables make possible the rigorous testing of theories previously difficult to test adequately because of fallible measures of the theoretic variables. This paper first discusses a simple causal model; incorporating a single unmeasured variable for the purpose of exposition. A substantive example follows, incorporating several unmeasured variables for which multiple indicators were available. Joreskog's LISREL model for the analysis of...

Problems and procedures in assessing and obtaining fit of data to the Rasch model are treated and assumptions embodied in the Rasch model are made explicit. It is concluded that statistical tests are needed which are sensitive to deviations so that more than one item parameter would be needed for each item, and more than one person parameter would be needed for each person. Statistical goodness-of-fit tests--based on the conditional maximum likelihood estimates of the item parameters--which can...

Normal curve equivalent achievement gains estimates were compared with RMC Title I evaluation Models A1 and B1. The comparison focused upon the amount of bias introduced by Model A1 when its underlying assumptions were violated. The model assumes, first, that the local school population is accurately represented by the national norm group; and secondly, that the percentile standing of the treatment group on the pretest remained unchanged on the posttest in the abscence of treatment effect. Data...

The document, part of a series of chapters described in SO 011 759, considers the problem of censoring in the analysis of event-histories (data on dated events, including dates of change from one qualitative state to another). Censoring refers to the lack of information on events that occur before or after the period for which data are available. Unless censorship is dealt with, researchers are likely to make erroneous inferences about the change process. The report considers several approaches...

Four types of control groups are commonly used in cognitive manipulation studies: (1) no-treatment; (2) practice with own methods; (3) practice and training with competing treatments; and (4) practice and training with irrelevant treatments. There are problems associated with the use of each group as a baseline for identifying the "true" treatment effect. A general, two-step control procedure was advocated in which the researcher first identifies the most appropriate control group for...

The accuracy with which regression models estimate treatment effects is dependent upon a number of conditions. The stability of the regression line (a function of sample size and correlation between pretest and posttest) is said to be the most important of these conditions. The utility of regression models is proportional to the size of the correlation between pretest and posttest. As the size of the correlation increases, the predicted posttest scores of the treatment group decreases. This...

