A simulation study is presented to evaluate and compare three methods to estimate the variance of the estimates of the parameters d and "C" of the signal detection theory (SDT). Several methods have been proposed to calculate the variance of their estimators, "d'" and "c." Those methods have been mostly assessed by comparing the empirical means and variances in simulation studies with the calculations done with the parametric values of the probabilities of giving a...
Topics: ERIC Archive, Evaluation Methods, Theories, Simulation, Statistical Analysis, Stimuli, Maximum...
Multilevel models (MLMs) have proven themselves to be very useful in social science research, as data from a variety of sources is sampled such that individuals at level-1 are nested within clusters such as schools, hospitals, counseling centers, and business entities at level-2. MLMs using restricted maximum likelihood estimation (REML) provide researchers with accurate estimates of parameters and standard errors at all levels of the data when the assumption of normality is met, and outliers...
Topics: ERIC Archive, Hierarchical Linear Modeling, Comparative Analysis, Computation, Robustness...
This article presents a method for addressing the self-selection bias of students who participate in learning communities (LCs). More specifically, this research utilizes equivalent comparison groups based on selected incoming characteristics of students, known as bootstraps, to account for self-selection bias. To address the differences in academic preparedness in the fall 2012 cohort, three stratified random samples of students were drawn from the non-LC population to match the LC cohort in...
Topics: ERIC Archive, College Freshmen, First Year Seminars, Student Participation, Communities of...
The purpose of this study was to empirically evaluate the impact of loglinear presmoothing accuracy on equating bias and variability across chained and post-stratification equating methods, kernel and percentile-rank continuization methods, and sample sizes. The results of evaluating presmoothing on equating accuracy generally agreed with those of previous presmoothing studies, suggesting that less parameterized presmoothing models are more biased and less variable than highly parameterized...
Topics: ERIC Archive, Equated Scores, Statistical Analysis, Accuracy, Sample Size, Statistical Bias, Error...
In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter estimates is ignored. Instead, estimated item parameters can be regarded as covariates measured with error. Along the line of this...
Topics: ERIC Archive, Item Response Theory, Ability, Error of Measurement, Maximum Likelihood Statistics,...
This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different NEAT equating methods. A score model was then developed based on the most important features of the reviewed score models and used to study reliability...
Topics: ERIC Archive, Reliability, Equated Scores, Test Items, Statistical Analysis, True Scores,...
In this paper, a data perturbation method for minimizing the possibility of disclosure of participants' identities on a survey is described in the context of the National Assessment of Educational Progress (NAEP). The method distinguishes itself from most approaches because of the presence of cognitive tasks. Hence, a data edit should have minimal impact on both relations among demographic variables and relations between demographic and proficiency variables. Furthermore, since only a few...
Topics: ERIC Archive, Student Surveys, Risk, National Competency Tests, Data Analysis, Data Collection,...
A multitude of methods has been proposed to estimate the sampling variance of ratio estimates in complex samples (Wolter, 1985). Hansen and Tepping (1985) studied some of those variance estimators and found that a high coefficient of variation (CV) of the denominator of a ratio estimate is indicative of a biased estimate of the standard error of a ratio estimate. Using the same populations, Kovar (1985) and Kovar, Rao, and Wu (1988) repeated the research and showed that the relation between a...
Topics: ERIC Archive, Statistical Analysis, Computation, Sampling, Statistical Bias, National Competency...
This study addresses the sample error and linking bias that occur with small and unrepresentative samples in a non-equivalent groups anchor test (NEAT) design. We propose a linking method called the "synthetic function," which is a weighted average of the identity function (the trivial equating function for forms that are known to be completely parallel) and a traditional equating function (in this case, the chained linear equating function) used in the normal case in which forms are...
Topics: ERIC Archive, Equated Scores, Sample Size, Test Items, Statistical Bias, Comparative Analysis, Test...
Lord's bias function and the weighted likelihood estimation method are effective in reducing the bias of the maximum likelihood estimate of an examinee's ability under the assumption that the true item parameters are known. This paper presents simulation studies to determine the effectiveness of these two methods in reducing the bias when the item parameters are unknown. The simulation results show that Lord's bias function and the weighted likelihood estimation method might not be as effective...
Topics: ERIC Archive, Statistical Bias, Maximum Likelihood Statistics, Computation, Ability, Test Items,...
The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response functions (ERFs) and the corrected weighted-likelihood estimator (CWLE), respectively, to take into account the uncertainty regarding item parameters...
Topics: ERIC Archive, Item Response Theory, Comparative Analysis, Computation, Ability, Test Items, Test...
A single group (SG) equating design with nearly equivalent test forms (SiGNET) design was developed by Grant (2006) to equate small volume tests. The basis of this design is that examinees take two largely overlapping test forms within a single administration. The scored items for the operational form are divided into mini-tests called testlets. An additional testlet is created but not scored for the first form. If the scored testlets are Testlets 1-6 and the unscored testlet is Testlet 7, then...
Topics: ERIC Archive, Data Collection, Equated Scores, Item Sampling, Sample Size, Test Format, Test...
In the Spring 2003 issue of "Harvard Educational Review," Roy Freedle stated that the SAT® is both culturally and statistically biased, and he proposed a solution to ameliorate this bias. His claims, which garnered national attention, were based on serious errors in his analysis. We begin our analyses by assessing the psychometric properties of Freedle's recommended hard half-test that he thinks should form the basis for the supplemental SAT score he proposes to report. Next we...
Topics: ERIC Archive, Test Bias, Statistical Bias, Psychometrics, College Entrance Examinations, Scoring,...
Criteria for prediction of multinomial responses are examined in terms of estimation bias. Logarithmic penalty and least squares are quite similar in behavior but quite different from maximum probability. The differences ultimately reflect deficiencies in the behavior of the criterion of maximum probability.
Topics: ERIC Archive, Probability, Prediction, Classification, Computation, Statistical Bias, Least Squares...
The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths (low, middle, and high). When creating 2-stage MST panels (i.e., forms), we manipulated 2 assembly conditions in each module, such as difficulty level...
Topics: ERIC Archive, Item Response Theory, Computation, Statistical Bias, Error of Measurement, Difficulty...
Since there is no standard national Pre and Post Test for Principles of Finance, akin to the one for Economics, by authors created one by selecting questions from previously administered examinations. The Cronbach's Alpha of 0.851, exceeding the minimum of 0.70 for reliable pen and paper test, indicates that our Test can detect differences in learning outcomes. Improvements between Pre and Post Test scores, statistically significant at the 1% level, in the entire sample and within different...
Topics: ERIC Archive, Finance Occupations, Business Administration Education, Educational Principles,...
The surgical theatre educational environment measures STEEM, OREEM and mini-STEEM for students (student-STEEM) comprise an up to now disregarded systematic overestimation (OE) due to inaccurate percentage calculation. The aim of the present study was to investigate the magnitude of and suggest a correction for this systematic bias. After an initial theoretical exploration of the problem, published scores were retrieved from the literature and corrected using statistical theorems....
Topics: ERIC Archive, Educational Environment, Scores, Grade Prediction, Academic Standards, Scoring,...
This paper examines sources of potential bias in systematic reviews and meta-analyses which can distort their findings, leading to problems with interpretation and application by practitioners and policymakers. It follows from an article that was published in the "Canadian Journal of Communication" in 1990, "Integrating Research into Instructional Practice: The Use and Abuse of Meta-analysis," which introduced meta-analysis as a means for estimating population parameters and...
Topics: ERIC Archive, Meta Analysis, Statistical Bias, Data Interpretation, Accuracy, Research Problems,...
The initial state parameters s[subscript 0] and w[subscript 0] are intricate issues of the averaging cognitive models in Information Integration Theory. Usually they are defined as a measure of prior information (Anderson, 1981; 1982) but there are no general rules to deal with them. In fact, there is no agreement as to their treatment except in specific situations such as linear models where they can be merged with the arbitrary zero inter-response scale C[subscript 0]. We present some...
Topics: ERIC Archive, Models, Psychometrics, Experimental Psychology, Measurement Techniques, Statistical...
A comparison between six rater agreement measures obtained using three different approaches was achieved by means of a simulation study. Rater coefficients suggested by Bennet's [sigma] (1954), Scott's [pi] (1955), Cohen's [kappa] (1960) and Gwet's [gamma] (2008) were selected to represent the classical, descriptive approach, [alpha] agreement parameter from Aickin (1990) to represent loglinear and mixture model approaches and [Delta] measure from Martin and Femia (2004) to represent...
Topics: ERIC Archive, Interrater Reliability, Measurement, Comparative Analysis, Statistical Analysis,...
Regression, weighting and related approaches to estimating a population mean from a sample with nonrandom missing data often rely on the assumption that conditional on covariates, observed samples can be treated as random. Standard methods using this assumption generally will fail to yield consistent estimators when covariates are measured with error. We review approaches to consistent estimation of a population mean of an incompletely observed variable using error-prone covariates, noting...
Topics: ERIC Archive, Simulation, Computation, Statistical Analysis, Statistical Bias, Regression...
Evaluators of education interventions are increasingly designing studies to detect impacts much smaller than the 0.20 standard deviations that Cohen (1988) characterized as "small." While the need to detect smaller impacts is based on compelling arguments that such impacts are substantively meaningful, the drive to detect smaller impacts may create a new challenge for researchers: the need to guard against smaller biases. The purpose of this paper is twofold. First, we examine the...
Topics: ERIC Archive, Intervention, Educational Research, Research Problems, Statistical Bias, Statistical...
This report presents findings on crime and violence in U.S. public schools, using data from the 2015-16 School Survey on Crime and Safety (SSOCS:2016). First administered in school year 1999-2000 and repeated in school years 2003-04, 2005-06, 2007-08, 2009-10, and 2015-16, SSOCS provides information on school crime-related topics from the perspective of schools. Developed and managed by the National Center for Education Statistics (NCES) within the Institute of Education Sciences of the U.S....
Topics: ERIC Archive, Crime, Violence, Discipline, School Safety, Public Schools, School Surveys, National...
Poor quality early childhood education and care (ECEC) can be detrimental to the development of children as it could lead to poor social, emotional, educational, health, economic, and behavioural outcomes. The lack of consensus as to the strength of the relationship between teacher qualification and the quality of the early childhood learning environment has made it difficult for policymakers and educational practitioners alike to settle on strategies that would enhance the learning outcomes...
Topics: ERIC Archive, Early Childhood Education, Teacher Qualifications, Educational Environment,...
The research reported here uses a pre/post-test model and stimulated recall interviews to assess teachers' statistical reasoning about comparing distributions, when enrolled in a graduate-level statistics education course. We discuss key aspects of the course design aimed at improving teachers' learning and teaching of statistics, and the resulting different ways of reasoning about comparing distributions that teachers exhibited before and after the course.
Topics: ERIC Archive, Faculty Development, Thinking Skills, Graduate Students, Statistics, Instructional...
Observational studies are common in educational research, where subjects self-select or are otherwise non-randomly assigned to different interventions (e.g., educational programs, grade retention, special education). Unbiased estimation of a causal effect with observational data depends crucially on the assumption of ignorability, which specifies that potential outcomes under different treatment conditions are independent of treatment assignment, given the observed covariates. The primary goals...
Topics: ERIC Archive, Computation, Influences, Observation, Data, Selection, Simulation, Methods,...
This report is the methodology report for the National Longitudinal Study of the High School Class of 1972 follow-up in 1986. The fifth follow-up survey of the National Longitudinal Study of the High School Class of 1972 (NLS-72) took place during spring and summer of 1986. A mail questionnaire was sent to a subsample of 14,489 members of the original sample of 22,652. A total of 12,841 persons returned the questionnaire, for a response rate of 89 percent. By the time of the survey, the sample...
Topics: ERIC Archive, Longitudinal Studies, High Schools, National Surveys, Annual Reports, Questionnaires,...
We explore the use of instrumental variables (IV) analysis with a multi-site randomized trial to estimate the effect of a mediating variable on an outcome in cases where it can be assumed that the observed mediator is the only mechanism linking treatment assignment to outcomes, as assumption known in the instrumental variables literature as the exclusion restriction. We use a random-coefficient IV model that allows both the impact of program assignment on the mediator (compliance with...
Topics: ERIC Archive, Statistical Bias, Statistical Analysis, Least Squares Statistics, Sampling,...
The increasing availability of data from multi-site randomized trials provides a potential opportunity to use instrumental variables methods to study the effects of multiple hypothesized mediators of the effect of a treatment. We derive nine assumptions needed to identify the effects of multiple mediators when using site-by-treatment interactions to generate multiple instruments. Three of these assumptions are unique to the multiple-site, multiple-mediator case: 1) the assumption that the...
Topics: ERIC Archive, Causal Models, Measures (Individuals), Research Design, Context Effect, Compliance...
In a provocative and influential paper, Jesse Rothstein (2010) finds that standard value-added models (VAMs) suggest implausible future teacher effects on past student achievement, a finding that obviously cannot be viewed as causal. This is the basis of a falsification test (the Rothstein falsification test) that appears to indicate bias in VAM estimates of current teacher contributions to student learning. Rothstein's finding is significant because there is considerable interest in using VAM...
Topics: ERIC Archive, Value Added Models, Academic Achievement, Teacher Effectiveness, Correlation,...
Although sampling has been mentioned as part of the chance and data component of the mathematics curriculum since about 1990, little research attention has been aimed specifically at school students' understanding of this descriptive area. This study considers the initial understanding of bias in sampling by 639 students in grades 3, 5, 7, and 9. Three hundred and forty-one of these students then undertook a series of lessons on chance and data with an emphasis on chance, data handling,...
Topics: ERIC Archive, Schemata (Cognition), Sampling, Mathematics Instruction, Mathematics Curriculum,...
In recent years there has been a renewed interest in understanding the levels and trends in high school graduation in the U.S. A big and influential literature has argued that the "true" high school graduation rate remains at an unsatisfactory level, and that the graduation rates for minorities (Blacks and Hispanics) are alarmingly low. In this paper we take a closer look at the different measures of high school graduation which have recently been proposed and which yield such low...
Topics: ERIC Archive, High Schools, Graduation Rate, Graduation, Educational Trends, Measures...
This article presents the author's address at the 2007 "Journal of Applied Quantitative Methods" ("JAQM") prize awarding festivity. The festivity was included in the opening of the 4th International Conference on Applied Statistics, November 22, 2008, Bucharest, Romania. In the address, the author reflects on three theses that question the gnoseological and operational efficiency of quantitative methods in the social domain. The first refers to symbolical analysis, where the...
Topics: ERIC Archive, Research Methodology, Epistemology, Statistical Analysis, Statistical Studies,...
Compared to its predecessor, "Answers in the Tool Box," the preponderance of the "Toolbox Revisited" story has been on the postsecondary side of the matriculation line. Implicitly, it calls on colleges, universities, and community colleges to be a great deal more interventionary in the precollegiate world, to be more self-reflective about the paths they offer from high school through their own territories. It also calls on them both to fortify their institutional research...
Topics: ERIC Archive, High Schools, Institutional Research, Transitional Programs, Developmental Studies...
The South Australian Certificate of Education (SACE), introduced in 1992-93, is a credential and formal qualification within the Australian Qualifications Framework (AQF). SACE was recently subjected to a review that led to a series of significant recommendations. These recommendations came out of a process that began with the Review Panel scrutinizing existing SACE structures for continuing validity and effectiveness. This paper critically examines claims made by the Review Panel of a...
Topics: ERIC Archive, Qualitative Research, Research Reports, Planning Commissions, Educational...
The South Australian Certificate of Education (SACE) is a credential and formal qualification within the Australian Qualifications Framework. A recent review of the SACE outlined a number of recommendations for significant changes to this certificate. These recommendations were the result of a process that began with the review panel "scrutinizing carefully [existing SACE structures for] continuing validity and effectiveness". This paper critiques the "careful examination"...
Topics: ERIC Archive, Statistical Data, Evaluation Methods, Statistical Bias, Educational Change,...
I take issue with several points in the Howleys' reanalysis (Vol. 12 No. 52 of this journal) of "High School Size: Which Works Best and for Whom?" (Lee & Smith, 1997). That the original sample of NELS schools might have underrepresented small rural public schools would not bias results, as they claim. Their assertion that our conclusions about an ideal high-school size privileged excellence over equity ignores the fact that our multilevel analyses explored the two outcomes...
Topics: ERIC Archive, Achievement Gains, Academic Achievement, School Size, High Schools, Statistical Bias,...
Most of the recent literature on the achievement effects of school size has examined school and district performance. These studies have demonstrated substantial benefits of smaller school and district size in impoverished settings. To date, however, no work has adequately examined the relationship of size and socioeconomic status (SES) with students as the unit of analysis. One study, however, came close (Lee & Smith, 1997), but failed to adjust its analyses or conclusions to the...
Topics: ERIC Archive, Socioeconomic Status, Academic Achievement, School Size, Rural Areas, Data...
Quality assurance and evaluation research, like other fields of social research and its application, are confronted with a series of problems. In the present paper, I want first to give a list of such problems, although necessarily incomplete. It is then claimed that while there is no "perfect" solution to these problems, critical multiplism may be a set of approaches which might attenuate the problems or at least make them more visible so that one can deal with them. Critical...
Topics: ERIC Archive, Evaluation Research, Research Problems, Guidelines, Evaluation Methods, Theories,...
In the first part of the study, nine estimators of the first-order autoregressive parameter are reviewed and a new estimator is proposed. The relationships and discrepancies between the estimators are discussed in order to achieve a clear differentiation. In the second part of the study, the precision in the estimation of autocorrelation is studied. The performance of the ten lag-one autocorrelation estimators is compared in terms of Mean Square Error (combining bias and variance) using data...
Topics: ERIC Archive, Computation, Hypothesis Testing, Correlation, Monte Carlo Methods, Sampling,...
A key issue in quasi-experimental studies and also with many evaluations which required a treatment effects (i.e. a control or experimental group) design is selection bias (Shadish el at 2002). Selection bias refers to the selection of individuals, groups or data for analysis such that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed (Shadish el 2002). There are many ways in which selection bias...
Topics: ERIC Archive, Quasiexperimental Design, Probability, Scores, Least Squares Statistics, Regression...
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has traditionally been the most frequently used method for modeling selection in PSA. There are, however, circumstances under which logistic regression may not...
Topics: ERIC Archive, Probability, Scores, Statistical Analysis, Statistical Bias, Quasiexperimental...
A central issue in nonexperimental studies is identifying comparable individuals to remove selection bias. One common way to address this selection bias is through propensity score (PS) matching. PS methods use a model of the treatment assignment to reduce the dimensionality of the covariate space and identify comparable individuals. parallel to the PS, recent literature has developed the prognosis score (PG) to construct models of the potential outcomes (Hansen, 2008). Whereas PSs summarize...
Topics: ERIC Archive, Probability, Scores, Statistical Bias, Prediction, Monte Carlo Methods, Kelcey, Ben
There has been an active debate in the literature over the validity of value-added models. In this study, the author tests the central assumption of value-added models that school assignment is random relative to expected test scores conditional on prior test scores, demographic variables, and other controls. He uses a Chicago charter school's lottery to identify school effects, and then compares this "experimental" estimate to that of a school value-added model, which is estimated...
Topics: ERIC Archive, Value Added Models, Charter Schools, School Effectiveness, Statistical Bias,...
The main goal of this study was to illustrate and provide some direction for dealing with the complexities of propensity score matching within different multilevel contexts. Special attention is given to how procedures typically applied in a non-hierarchical setting may be modified to properly reduce the expected bias in the estimated treatment effect of a high school-level intervention on college-level outcomes. In particular, students self-selected into a high school level intervention and...
Topics: ERIC Archive, Probability, Scores, Statistical Bias, High School Students, College Students,...
The goal of this paper is to provide guidance for applied education researchers in using multi-level data to study the effects of interventions implemented at the school level. Two primary approaches are currently employed in observational studies of the effect of school-level interventions. One approach employs intact school matching: matching schools that are implementing the treatment to schools not implementing the treatment that are similar in observable characteristics. An alternative...
Topics: ERIC Archive, Matched Groups, Intervention, Randomized Controlled Trials, Elementary Schools,...
Meeting the What Works Clearinghouse (WWC) attrition standard (or one of the attrition standards based on the WWC standard) is now an important consideration for researchers conducting studies that could potentially be reviewed by the WWC (or other evidence reviews). Understanding the basis of this standard is valuable for anyone seeking to meet existing standards and for anyone interested in adopting this approach to developing a standard (that is, combining a theoretical model with empirical...
Topics: ERIC Archive, Attrition (Research Studies), Student Attrition, Randomized Controlled Trials,...
A central goal of the education literature is to demonstrate that specific educational interventions--instructional interventions at the student or classroom level, structural interventions at the school level, or funding interventions at the school district level, for example--have a "treatment effect" on student achievement. This paper has three objectives. First, Theobald and Richardson explain both how Single World Intervention Templates (SWITs) unify two existing approaches to...
Topics: ERIC Archive, Intervention, Educational Research, Pretests Posttests, Outcome Measures,...
When randomized control trials (RCT) are not feasible, researchers seek other methods to make causal inference, e.g., propensity score methods. One of the underlined assumptions for the propensity score methods to obtain unbiased treatment effect estimates is the ignorability assumption, that is, conditional on the propensity score, treatment assignment is independent of the outcome. The purpose of this study is to use within-study comparisons to assess how well propensity score methods can...
Topics: ERIC Archive, Educational Research, Benchmarking, Statistical Analysis, Computation, Comparative...
Randomized controlled trials (RCTs) and regression discontinuity (RD) studies both provide estimates of causal effects. A major difference between the two is that RD only estimates local average treatment effects (LATE) near the cutoff point of the forcing variable. This has been cited as a drawback to RD designs (Cook & Wong, 2008). Comparisons of RCT estimates of average treatment effect (ATE) and RD estimates of LATE are rare because few studies have both randomized assignment and a...
Topics: ERIC Archive, Randomized Controlled Trials, Regression (Statistics), Research Problems, Comparative...