Instrumental variable estimators hold the promise of enabling researchers to estimate the effects of educational treatments that are not (or cannot be) randomly assigned but that may be affected by randomly assigned interventions. Examples of the use of instrumental variables in such cases are increasingly common in educational and social science research. The most commonly used instrumental variables estimator is two-stage least squares (2SLS). Many of the properties of the 2SLS estimator are...
Topics: ERIC Archive, Social Science Research, Least Squares Statistics, Computation, Correlation,...
A procedure for predicting categorical outcomes using categorical predictor variables was described by Moonan. This paper describes a related technique which uses prior probabilities, updated by joint likelihoods, as classification criteria. The procedure differs from Moonan's in that the outcome having the greatest posterior probability is selected as the prediction regardless of misclassification cost. It also differs in method of screening and weighting the predictor variables, and treats...
Topics: ERIC Archive, Bayesian Statistics, Behavioral Science Research, Classification, Higher Education,...
This article presents the author's address at the 2007 "Journal of Applied Quantitative Methods" ("JAQM") prize awarding festivity. The festivity was included in the opening of the 4th International Conference on Applied Statistics, November 22, 2008, Bucharest, Romania. In the address, the author reflects on three theses that question the gnoseological and operational efficiency of quantitative methods in the social domain. The first refers to symbolical analysis, where the...
Topics: ERIC Archive, Research Methodology, Epistemology, Statistical Analysis, Statistical Studies,...
Selection bias is problematic when evaluating the effects of postsecondary interventions on college students, and can lead to biased estimates of program effects. While instrumental variables can be used to account for endogeneity due to self-selection, current practice requires that all five assumptions of instrumental variables be met in order to credibly estimate the causal effect of a program. Using the Pike et al. (2011) study of selection bias and learning communities as an example, the...
Topics: ERIC Archive, Statistical Bias, College Students, Educational Research, Statistical Analysis,...
Beginning with the planning stages of the National Assessment of Educational Progress (NAEP), careful attention has been given to the design of efficient probability sampling methods for the selection of class-age respondents and the assignment of test packages. With these methods, it is possible for NAEP researchers to make relatively precise statements about population characteristics on the basis of fairly small samples. The purpose of this monograph is to describe what is meant by...
Topics: ERIC Archive, Educational Assessment, Elementary Secondary Education, Error of Measurement,...
The results of six major projects are discussed including a comprehensive mathematical and statistical analysis of the problems caused by errors of measurement in linear models for assessing change. In a general matrix representation of the problem, several new analytic results are proved concerning the parameters which affect bias in observed-score regression statistics. The bias in ordinary least squares estimators is expressed as a function of covariances among true scores, among the...
Topics: ERIC Archive, Algorithms, Analysis of Covariance, Change, Error of Measurement, Mathematical...
The National Household Education Survey (NHES) was conducted for the first time in 1991 as a way to collect data on the early childhood education experiences of young children and participation in adult education. Because the NHES methodology is relatively new, field tests were necessary. A large field test of approximately 15,000 households was conducted during the fall of 1989 to examine several methodological issues. This report examines a technique that was used to increase the coverage of...
Topics: ERIC Archive, Adolescents, Adult Education, Age Groups, Data Collection, Dropout Research,...
Quality assurance and evaluation research, like other fields of social research and its application, are confronted with a series of problems. In the present paper, I want first to give a list of such problems, although necessarily incomplete. It is then claimed that while there is no "perfect" solution to these problems, critical multiplism may be a set of approaches which might attenuate the problems or at least make them more visible so that one can deal with them. Critical...
Topics: ERIC Archive, Evaluation Research, Research Problems, Guidelines, Evaluation Methods, Theories,...
Evaluators of education interventions are increasingly designing studies to detect impacts much smaller than the 0.20 standard deviations that Cohen (1988) characterized as "small." While the need to detect smaller impacts is based on compelling arguments that such impacts are substantively meaningful, the drive to detect smaller impacts may create a new challenge for researchers: the need to guard against smaller biases. The purpose of this paper is twofold. First, we examine the...
Topics: ERIC Archive, Intervention, Educational Research, Research Problems, Statistical Bias, Statistical...
Previous studies have indicated that the reliability of test scores composed of testlets is overestimated by conventional item-based reliability estimation methods (S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer, 1995; H. Wainer and D. Thissen, 1996; G. Lee and D. Frisbie). In light of these studies, it seems reasonable to ask whether the item-based estimation methods for the conditional standard errors of measurement (SEM) would provide underestimates for tests composed of testlets. The...
Topics: ERIC Archive, Definitions, Error of Measurement, Estimation (Mathematics), Reliability, Statistical...
This study addresses the sample error and linking bias that occur with small and unrepresentative samples in a non-equivalent groups anchor test (NEAT) design. We propose a linking method called the "synthetic function," which is a weighted average of the identity function (the trivial equating function for forms that are known to be completely parallel) and a traditional equating function (in this case, the chained linear equating function) used in the normal case in which forms are...
Topics: ERIC Archive, Equated Scores, Sample Size, Test Items, Statistical Bias, Comparative Analysis, Test...
In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter estimates is ignored. Instead, estimated item parameters can be regarded as covariates measured with error. Along the line of this...
Topics: ERIC Archive, Item Response Theory, Ability, Error of Measurement, Maximum Likelihood Statistics,...
Extending the classical twin method, Arthur R. Jensen claims to find unique estimates of the variance and covariance of the genetic and environmental components of IQ. The analyses presented in this paper show that a wide variety of parameter estimates are compatible with his model and data. But the extent of indeterminancy is deemed to be even wider. Furthermore, his model is shown to be not immutable. In particular, Jensen's assumption that one twin's environment is as highly correlated with...
Topics: ERIC Archive, Environmental Influences, Environmental Standards, Genetics, Heredity, Intelligence...
Recent investigations of the extent to which students with disabilities are allowed to participate in major national data collections used in measurement-driven education reform suggest that 40 to 50 percent of students with disabilities are typically excluded from major assessments, although they are included to a greater degree in assessments that do not require completion of cognitive tests. The problem is one of accurate statistical reporting and modeling educational processes and...
Topics: ERIC Archive, Attrition (Research Studies), Disabilities, Educational Assessment, Educational...
This paper evaluates the logic underlying various criticisms of statistical significance testing and makes specific recommendations for scientific and editorial practice that might better increase the knowledge base. Reliance on the traditional hypothesis testing model has led to a major bias against nonsignificant results and to misinterpretation of significant results. A finding of statistical significance does not mean that the null hypothesis is false, since there are many factors affecting...
Topics: ERIC Archive, Analysis of Variance, Data Interpretation, Editors, Effect Size, Error of...
A study was conducted to investigate whether augmenting the calibration of items using computerized adaptive test (CAT) data matrices produced estimates that were unbiased and improved the stability of existing item parameter estimates. Item parameter estimates from four pools of items constructed for operational use were used in the study to arrive at a final number of 1,392 unique items. Fifty sets of true parameter estimates were generated from the base item prior information, and each true...
Topics: ERIC Archive, Adaptive Testing, Bayesian Statistics, Computer Assisted Testing, Estimation...
The National Household Education Survey (NHES) is a telephone survey of the noninstitutionalized civilian population of the United States that collects data on educational issues that are best explored through contact with households rather than with institutions. The NHES has been conducted in 1991, 1993, 1995, and 1996. In the 1996 NHES (NHES:96), the topical components were parent/family involvement in education and civic involvement. The 1996 expanded screener feature included a set of...
Topics: ERIC Archive, Adults, Blacks, Citizen Participation, Cost Effectiveness, Estimation (Mathematics),...
Of particular import to this study, is collider bias originating from stratification on retreatment variables forming an embedded M or bowtie structural design. That is, rather than assume an M structural design which suggests that "X" is a collider but not a confounder, the authors adopt what they consider to be a more reasonable position and that is "X" is both a collider and confounder. Accordingly, in this study they examined the extent to which confounder induced bias...
Topics: ERIC Archive, Statistical Bias, Statistical Analysis, Psychometrics, Elementary School Teachers,...
A computer-simulated study was made of the sampling distribution of omega squared, a measure of strength of relationship in multivariate analysis of variance which had earlier been proposed by the author. It was found that this measure was highly positively biased when the number of variables is large and the sample size is small. A correction formula for reducing the bias was developed by the method of least squares and was found to yield nearly unbiased corrected values. A simpler,...
Topics: ERIC Archive, Analysis of Variance, Computer Programs, Matrices, Multivariate Analysis, Sampling,...
This report presents the findings of a research study, conducted by the College of the Mainland (COM) as a subcontractor for Project FOLLOW-UP, designed to test the accuracy of random sampling and to measure non-response bias in mail surveys. In 1975, a computer-generated random sample of 500 students was drawn from a population of 1,256 students who had attended COM in the spring of 1972. A 48% response to a follow-up survey of the sample was achieved. A random subsample of 70 non-respondents...
Topics: ERIC Archive, Community Colleges, Followup Studies, Institutional Research, Questionnaires,...
In the social sciences, evaluating the effectiveness of a program or intervention often leads researchers to draw causal inferences from observational research designs. Bias in estimated causal effects becomes an obvious problem in such settings. This paper presents the Heckman Model as an approach sometimes applied to observational data for the purpose of estimating an unbiased causal effect. The paper shows how the Heckman model can be viewed as an extension of the linear regression model,...
Topics: ERIC Archive, Causal Models, College Entrance Examinations, Program Effectiveness, Regression...
The effects of variations in degree of range restriction and different subgroup sample sizes on the validity of several item bias detection procedures based on Item Response Theory (IRT) were investigated in a simulation study. The degree of range restriction for each of two subpopulations was varied by cutting the specified subpopulation ability distribution at different locations and retaining the upper portion of the distribution. It was found that range restriction did have an effect on the...
Topics: ERIC Archive, Computer Simulation, Item Analysis, Latent Trait Theory, Mathematical Models, Sample...
The effectiveness of Stout's procedure for assessing latent trait unidimensionality was studied. Strong empirical evidence of the utility of the statistical test in a variety of settings is provided. The procedure was modified to correct for increased bias, and a new algorithm to determine the size of assessment sub-tests was used. The following two issues were addressed via a Monte Carlo simulation: (1) the ability to approximate the nominal level of significance via the observed level of...
Topics: ERIC Archive, Algorithms, Latent Trait Theory, Monte Carlo Methods, Standardized Tests, Statistical...
Advantages and disadvantages of standard Rasch analysis computer programs are discussed. The unconditional maximum likelihood algorithm allows all observations to participate equally in determining the measures and calibrations to be obtained quickly from a data set. On the advantage side, standard Rasch programs can be used immediately, are debugged and accurate, and can report statistics that are difficult to calculate. On the disadvantage side, the user must have the correct hardware,...
Topics: ERIC Archive, Algorithms, Computer Assisted Testing, Computer Graphics, Computer Simulation, Error...
This study uses simulation examples representing three types of treatment assignment mechanisms in data generation (the random intercept and slopes setting, the random intercept setting, and a third setting with a cluster-level treatment and an individual-level outcome) in order to determine optimal procedures for reducing bias and improving precision in each of these three settings. Evaluation criteria include bias, variance, MSE, confidence interval coverage rate, and remaining sample size....
Topics: ERIC Archive, Probability, Statistical Analysis, Statistical Bias, Data Analysis, Yu, Bing, Hong,...
The Education Trust research report "Stuck Schools" suggests a framework for identifying chronically low-performing schools in need of turnaround. The study uses Maryland and Indiana to show that some low-performing schools make progress while others remain stagnant. The report has four serious problems of reliability and validity, however. First, the norm-referenced methodology guarantees "failed" schools independent of any true performance or improvement level by the...
Topics: ERIC Archive, Elementary Secondary Education, Educational Improvement, Identification, Data...
When most people think of the perks of teaching, an image that comes to mind is a shiny apple presented by a gap-toothed pupil. A recent paper by Jason Richwine of the Heritage Foundation and Andrew Biggs of the American Enterprise Institute claims that public school teachers enjoy lavish benefits that are more valuable than their base pay and twice as generous as those of private-sector workers (Richwine and Biggs 2011). According to Richwine and Biggs, this makes teachers' total compensation...
Topics: ERIC Archive, Educational Attainment, Public School Teachers, Salary Wage Differentials, Teacher...
In the Spring 2003 issue of "Harvard Educational Review," Roy Freedle stated that the SAT® is both culturally and statistically biased, and he proposed a solution to ameliorate this bias. His claims, which garnered national attention, were based on serious errors in his analysis. We begin our analyses by assessing the psychometric properties of Freedle's recommended hard half-test that he thinks should form the basis for the supplemental SAT score he proposes to report. Next we...
Topics: ERIC Archive, Test Bias, Statistical Bias, Psychometrics, College Entrance Examinations, Scoring,...
The National Household Education Survey (NHES) is a data collection system of the National Center for Education Statistics. The NHES is a telephone survey of the noninstitutionalized civilian population using households selected using random digit dialing methods. Approximately 60,000 households are screened for each administration, and people who meet predetermined criteria are sampled for more detailed or extended interviews. This report is a continuation of research on issues related to...
Topics: ERIC Archive, Data Collection, National Surveys, Probability, Research Methodology, Responses,...
Since 1973, the National Assessment of Educational Progress (NAEP) has gathered information about levels of student proficiency in mathematics. These assessments are reported by NAEP periodically and present information on the strengths and weaknesses in students' mathematical understanding and their ability to apply that understanding in problem solving situations. This document presents the mathematics framework for the 1996 and 2000 NAEP assessments. The suggested revisions in the framework...
Topics: ERIC Archive, Educational Assessment, Elementary Secondary Education, Evaluation, Mathematics...
John J. Cannell's late 1980s "Lake Wobegon" reports suggested widespread deliberate educator manipulation of norm-referenced standardized test (NRT) administrations and results, resulting in artificial test score gains. The Cannell studies have been referenced in education research since, but as evidence that high stakes (and not cheating or lax security) cause test score inflation. This article examines that research and Cannell's data for evidence that high stakes cause test score...
Topics: ERIC Archive, Testing Programs, Achievement Gains, Standardized Tests, Norm Referenced Tests, High...
Large-scale educational surveys are low-stakes assessments of educational outcomes conducted using nationally representative samples. In these surveys, students do not receive individual scores, and the outcome of the assessment is inconsequential for respondents. The low-stakes nature of these surveys, as well as variations in average performance across countries and other factors such as different testing traditions, are contributing factors to the amount of omitted responses in these...
Topics: ERIC Archive, Item Response Theory, Educational Assessment, Data Analysis, Case Studies,...
Suggesting that James Coleman's paper on massive school desegragation reveals methodological flaws of such magnitude that they raise serious questions as to the validity of the conclusions, this paper addresses a full sequence of perceived methodological errors found in the Coleman document, but does not dismiss the conclusions based upon initial errors, no matter how cogent they be. This approach makes the implicit assumption that, at each step, all previous operations upon the data are both...
Topics: ERIC Archive, Desegregation Effects, Desegregation Methods, Educational Policy, Educational...
Heckman's correction for regression in selected samples for predictive validity studies was applied to a large data file on 7,984 law school applicants. Data included ethnic group, sex, socioeconomic status, undergraduate degree, school, scores on the Law School Admission Test (LSAT), writing ability, undergraduate grade point average, and age. The final selection criteria were not known. Data on the 1,845 applicants who were accepted included year of entrance, sex, date of birth, undergraduate...
Topics: ERIC Archive, Admission Criteria, College Entrance Examinations, Error of Measurement, Grade...
The agreement between item response theory-based and Mantel Haenszel (MH) methods in identifying biased items on tests was studied. Data came from item responses of four spaced samples of 1,000 examinees each--two samples of 1,000 Anglo-American and two samples of 1,000 Native American students taking the New Mexico High School Proficiency Examination in 1982. In addition, a matched group analysis was conducted using a third sample of 650 Native Americans and 650 Anglo Americans. The item...
Topics: ERIC Archive, Comparative Analysis, High School Students, High Schools, Item Analysis, Latent Trait...
Analysis of covariance (ANCOVA) has been recommended as one vehicle with which to evaluate special education and other intervention impacts (M. J. Taylor and M. S. Innocenti, 1993). Common misinterpretations of this methodology for these purposes are explained. These misapplications of ANCOVA include: (1) ignoring the assumption of homogeneity of regression; (2) using ANCOVA even given a lack of random assignment of subjects to groups; and (3) lack of attention to reliability of covariate...
Topics: ERIC Archive, Analysis of Covariance, Compensatory Education, Evaluation Methods, Evaluation...
This review examines the recently released Thomas P. Fordham Institute report, "Education Olympics: The Games in Review." Published just after the completion of the 2008 Beijing Summer Olympics, Education Olympics strategically parallels the international competition by awarding gold, silver and bronze medals to top performing countries based on indicators including scores from international assessments in reading, mathematics, and science. The report contrasts American students'...
Topics: ERIC Archive, Evidence, Academic Achievement, Achievement Rating, Comparative Analysis, Comparative...
To assess the relative effectiveness of public and private school environments, researchers must distinguish between the effects of the schools' programs and the students' innate abilities. Student background variables do not appear to account for all the important differences among students attending public and private schools. This document proposes and tests a model of school selection that relates school choice to a family's assessment of public and private school quality. The assumption is...
Topics: ERIC Archive, Educational Quality, Models, Private Schools, Public Schools, Research Methodology,...
The National Household Education Survey (NHES) was conducted for the first time in 1991 as a way to collect data on the early childhood education experiences of young children and participation in adult education. Because the NHES methodology is relatively new, field tests were necessary. A large field test of approximately 15,000 households was conducted during the fall of 1989 to examine several methodological issues. This report focuses on measurement errors arising from the use of proxy...
Topics: ERIC Archive, Adolescents, Adult Education, Data Collection, Dropout Research, Dropouts,...
A project was designed to develop and test a library of cassette audiotapes for improving the technical skills of educational researchers. Fourteen outstanding researchers from diverse fields were identified, and a short instructional tape was prepared by each. Subjects of the tapes included instructional objectives for intellectual skills, sources of bias in surveys, implications for the next 20 years of change, some precepts for conducting educational research, statistical interactions,...
Topics: ERIC Archive, Audiotape Recordings, Behavioral Objectives, Educational Media, Educational Research,...
Many arguments have been made against allowing examinees to review and change their answers after completing a computer adaptive test (CAT). These arguments include: (1) increased bias; (2) decreased precision; and (3) susceptibility of test-taking strategies. Results of simulations suggest that the strength of these arguments is reduced or eliminated by using specific information item selection (SIIS), under which items are selected to meet information targets, instead of the more common...
Topics: ERIC Archive, Adaptive Testing, Algorithms, Computer Assisted Testing, Review (Reexamination),...
A report from the School Choice Demonstration Project examines issues concerning the funding formula used for the Milwaukee Parental Choice Program (MPCP). It finds that the program generates a net saving to taxpayers in Wisconsin but imposes a significant fiscal burden on taxpayers in Milwaukee. However, these findings depend significantly on how many students would have attended public school if the voucher option were not available, as well as on the actual resource requirements for those...
Topics: ERIC Archive, Funding Formulas, School Choice, Demonstration Programs, Program Effectiveness,...
A single group (SG) equating design with nearly equivalent test forms (SiGNET) design was developed by Grant (2006) to equate small volume tests. The basis of this design is that examinees take two largely overlapping test forms within a single administration. The scored items for the operational form are divided into mini-tests called testlets. An additional testlet is created but not scored for the first form. If the scored testlets are Testlets 1-6 and the unscored testlet is Testlet 7, then...
Topics: ERIC Archive, Data Collection, Equated Scores, Item Sampling, Sample Size, Test Format, Test...
This simulation study focused on the power of detecting group differences in linear growth trajectory parameters within the framework of structural equation modeling (SEM) and compared this approach with the more traditional repeated measures analysis of variance (ANOVA) approach. Three broad conditions of group differences in linear growth trajectory were considered. SEM latent growth modeling consistently showed higher statistical power for detecting group differences in the linear growth...
Topics: ERIC Archive, Analysis of Variance, Groups, Power (Statistics), Sample Size, Simulation,...
Federal assistance for special educational programs makes necessary the regular study of evaluations of thousands of innovations in compensatory education, bilingual education, and reading programs. The results are reported to the President and to Congress. However, investigating organizations find only a few programs with adequate evidence and thousands with faulty evaluation designs. Some of the most common faults are discussed, with examples. There are other factors which lower hopes. If...
Topics: ERIC Archive, Achievement Gains, Analysis of Covariance, Compensatory Education, Criterion...
The effects of monetary gratuities on response rates to mail surveys have been considered in a number of studies. This meta-analysis examined: (1) the nature of the population surveyed; (2) the effects of gratuities in relation to the number of follow-ups; (3) whether the gratuity was equally effective across different populations; (4) whether the gratuity was promised or enclosed; and (5) the year of publication of the study. The bulk of the studies was done in the context of market research....
Topics: ERIC Archive, Comparative Analysis, Evaluation Methods, Mail Surveys, Meta Analysis,...
In this paper, the detection of response patterns aberrant from the Rasch model is considered. For this purpose, a new person fit index, recently developed by I. W. Molenaar (1987) and an iterative estimation procedure are used in a simulation study of Rasch model data mixed with aberrant data. Three kinds of aberrant response behavior are considered: (1) guessing to complete the test; (2) guessing in accordance with the three-parameter logistic model; and (3) responding with different...
Topics: ERIC Archive, Computer Assisted Testing, Computer Simulation, Difficulty Level, Estimation...
The purpose of this study is through Monte Carlo simulation to compare several propensity score methods in approximating factorial experimental design and identify best approaches in reducing bias and mean square error of parameter estimates of the main and interaction effects of two factors. Previous studies focused more on unbiased estimates of the effects of one factor, or the effects of one factor by the subgroups of another factor. The approaches for the unbiased estimates of the main and...
Topics: ERIC Archive, Research Design, Probability, Monte Carlo Methods, Simulation, Scores, Computation,...
Among the most popular techniques used to estimate item response theory (IRT) parameters are those used in the LOGIST and BILOG computer programs. Because of its accuracy with smaller sample sizes or differing test lengths, BILOG has become the standard to which new estimation programs are compared. However, BILOG is still complex and labor-intensive, and the sample sizes required are still rather large. For this reason, J. Ramsay developed the program TESTGRAF (1989), which uses nonparametric...
Topics: ERIC Archive, Comparative Analysis, Effect Size, Estimation (Mathematics), Item Response Theory,...
The present paper reviews the techniques commonly used to correct an observed correlation coefficient for the simultaneous influence of attenuation and range restriction effects. It is noted that the procedure which is currently in use may be somewhat biased because it treats range restriction and attenuation as independent restrictive influences. Subsequently, an equation was derived which circumvents this difficulty and provides a more general solution to the problem of estimating the true...
Topics: ERIC Archive, Correlation, Measurement Techniques, Psychometrics, Research Problems, Statistical...