Grouping is a statistical procedure through which members of the same group are considered as a single unit of observation. There are methodological and inferential problems associated with various grouping procedures in various settings. This extensive paper focuses on making inferences about individuals when the analysis uses data that is grouped over individuals (for example, school means). The paper identifies five research contexts in which grouping is used, reviews the literature on...
Topics: ERIC Archive, Correlation, Research Methodology, Sampling, Statistical Bias
Grouping is a statistical procedure through which members of the same group are considered as a single unit of observation. There are various ways to assign group membership and various ways to assign values of variables to groups. There are methodological problems associated with grouping in general and with particular methods of grouping. This paper argues that a wide variety of complex analytical problems concerning inferences from grouped observations can be understood from the use of a few...
Topics: ERIC Archive, Multiple Regression Analysis, Research Methodology, Sampling, Statistical Bias
A Monte Carlo investigation of six robust correlation estimators was conducted for data from distributions with longer than Gaussian tails: a bisquare coefficient, the Tukey correlation, the standardized sums and differences, a biweight standardized sums and differences, the transformed Spearman's rho and a bivariate trimmed Pearson. Evaluation of the estimators was based on bias and variability as measured by mean square error, and efficiency relative to the Pearson correlation coefficient....
Topics: ERIC Archive, Comparative Analysis, Correlation, Mathematical Formulas, Sample Size, Statistical...
The investigation focused on the effects of using grouped data to estimate the relations that exist in data on individuals. Different research contexts were identified in which researchers group observations though interested in relations among measurements on individuals. The consequences of estimating regression coefficients from grouped data were examined from a "structural equations" perspective. A simple linear regression model was hypothesized and then modified by the...
Topics: ERIC Archive, Classification, College Freshmen, Data Analysis, Induction, Prediction, Statistical...
A central issue in nonexperimental studies is identifying comparable individuals to remove selection bias. One common way to address this selection bias is through propensity score (PS) matching. PS methods use a model of the treatment assignment to reduce the dimensionality of the covariate space and identify comparable individuals. parallel to the PS, recent literature has developed the prognosis score (PG) to construct models of the potential outcomes (Hansen, 2008). Whereas PSs summarize...
Topics: ERIC Archive, Probability, Scores, Statistical Bias, Prediction, Monte Carlo Methods, Kelcey, Ben
Using an empirical investigation of alternate item nonresponse adjustment procedures in a National Longitudinal Study (NLS) of missing and faulty data, it is indicated that in some cases imputation can reduce the accuracy of survey estimates. A National Sample of the high school class of 1972 is designed to provide statistics on students moving into early adulthood. The bias resulting from nonresponse and response errors is evaluated using hot deck and weighting class adjustment techniques to...
Topics: ERIC Archive, Graduate Surveys, Item Analysis, Longitudinal Studies, Statistical Analysis,...
The primary objective of this paper is to encourage survey researchers not to become overly reliant on the literature for generic solutions to non-response bias problems. In addition, the paper recounts an example of how a non-traditional approach was used to maximize the usefulness of data collected under unusual constraints and with an a priori expectation of a high rate of non-response. The author was charged with testing the ability of the National Science Foundation to conduct a biennial...
Topics: ERIC Archive, Engineers, Immigrants, National Surveys, Research Problems, Scientists, Statistical...
Aggregation, or grouping, is a statistical procedure through which all members of a study within a specified range of scores (usually observed scores) are assigned a common or "group" score (for example, the group mean). The various social science methodology literatures agree on the costs of grouping: not only does one always lose information in grouping, in a wide variety of situations grouping introduces systematic error (bias). For most educational research applications the...
Topics: ERIC Archive, Error Patterns, Multiple Regression Analysis, Research Methodology, Statistical Bias,...
This study uses simulation examples representing three types of treatment assignment mechanisms in data generation (the random intercept and slopes setting, the random intercept setting, and a third setting with a cluster-level treatment and an individual-level outcome) in order to determine optimal procedures for reducing bias and improving precision in each of these three settings. Evaluation criteria include bias, variance, MSE, confidence interval coverage rate, and remaining sample size....
Topics: ERIC Archive, Probability, Statistical Analysis, Statistical Bias, Data Analysis, Yu, Bing, Hong,...
Canonical correlation analysis is illustrated and three common fallacious interpretation practices are described. Simply, canonical correlation is an example of the bivariate case. Like all parametric methods, it involves the creation of synthetic scores for each person. It presumes at least two predictor variables and at least two criterion variables. Weights, usually labelled standardized function coefficients, are applied to each individual's data to yield the synthetic variables which are...
Topics: ERIC Archive, Correlation, Multivariate Analysis, Research Problems, Statistical Bias, Statistical...
Fleiss' popular multirater kappa is known to be influenced by prevalence and bias, which can lead to the paradox of high agreement but low kappa. It also assumes that raters are restricted in how they can distribute cases across categories, which is not a typical feature of many agreement studies. In this article, a free-marginal, multirater alternative to Fleiss' multirater kappa is introduced. Free-marginal Multirater Kappa (multirater K[free]), like its birater free-marginal counterparts...
Topics: ERIC Archive, Multivariate Analysis, Statistical Distributions, Statistical Bias, Interrater...
The feature that makes item response theory (IRT) models the models of choice for many psychometric data analysts is parameter invariance, the equality of item and examinee parameters from different populations. Using the well-known fact that item and examinee parameters are identical only up to a set of linear transformations specific to the functional form of a given IRT model, violations of these transformations for unidimensional IRT models are algebraically investigated and coefficients...
Topics: ERIC Archive, Item Response Theory, Mathematical Models, Statistical Bias, Rupp, Andre A., Zumbo,...
In a provocative and influential paper, Jesse Rothstein (2010) finds that standard value-added models (VAMs) suggest implausible future teacher effects on past student achievement, a finding that obviously cannot be viewed as causal. This is the basis of a falsification test (the Rothstein falsification test) that appears to indicate bias in VAM estimates of current teacher contributions to student learning. More precisely, the falsification test is designed to identify whether or not students...
Topics: ERIC Archive, Teacher Effectiveness, Academic Achievement, Models, Statistical Bias, Computation,...
Monte Carlo methods were used to investigate the effects of removing extreme data points identified by five indices of influence. Multivariate normal data were simulated and observations were removed from samples if they exceeded the criteria suggested in the literature for each influence statistic. Factors included in the design of the Monte Carlo study were the number of regressor variables, population multiple correlation, degree of multicolinearity, and sample size. Conditions were...
Topics: ERIC Archive, Monte Carlo Methods, Multivariate Analysis, Simulation, Statistical Bias, Kromrey,...
This study used real data to construct testing conditions for comparing results of chained linear, Tucker, and Levine-observed score equatings. The comparisons were made under conditions where the new- and old-form samples were similar in ability and when they differed in ability. The length of the anchor test was also varied to enable examination of its effect on the three different equating methods. Two tests were used in the study, and the three equating methods were compared to a criterion...
Topics: ERIC Archive, Equated Scores, Comparative Analysis, Statistical Analysis, Statistical Bias, Error...
The question of how high a response rate is needed in order for telephone surveys to obtain data that accurately represent the entire sample, was investigated via reevaluating results of three previously published studies and reporting on three 1989 studies for the first time. The three previous studies indicated that, if the sample characteristics had been estimated on the basis of respondents rather than the entire sample, the conclusions would have deviated from the true sample by 4.8%,...
Topics: ERIC Archive, Interviews, Meta Analysis, Response Rates (Questionnaires), Sampling, Statistical...
Previous studies have indicated that the reliability of test scores composed of testlets is overestimated by conventional item-based reliability estimation methods (S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer, 1995; H. Wainer and D. Thissen, 1996; G. Lee and D. Frisbie). In light of these studies, it seems reasonable to ask whether the item-based estimation methods for the conditional standard errors of measurement (SEM) would provide underestimates for tests composed of testlets. The...
Topics: ERIC Archive, Definitions, Error of Measurement, Estimation (Mathematics), Reliability, Statistical...
A method of interpolation has been derived that should be superior to linear interpolation in computing the percentile ranks of test scores for unimodal score distributions. The superiority of the logistic interpolation over the linear interpolation is most noticeable for distributions consisting of only a small number of score intervals (say fewer than 10), particularly distributions that are relatively unskewed. Logistic interpolation thus should be useful in practical situations in which...
Topics: ERIC Archive, Comparative Analysis, Intervals, Mathematical Models, Percentage, Scores, Scoring...
This paper is primarily concerned with determining the statistical bias in the maximum likelihood estimate of the examinee ability parameter in item response theory, and of certain functions of such parameters. Given known item parameters, unbiased estimators are derived for (1) an examinee's ability parameter and proportion-correct true score; (2) the variance of ability parameters across examinees in the group tested, and the variance of proportion-correct true scores; and (3) the...
Topics: ERIC Archive, Estimation (Mathematics), Latent Trait Theory, Mathematical Formulas, Maximum...
Selectivity bias arises in program evaluation when the treatment or control status of the subjects is related to unmeasured characteristics that themselves are related to the program outcome under study. This situation has the potential to lead to an incorrect estimation of the treatment effect when assignment to treatment and control groups is not random. This paper adopts techniques that were recently developed in the econometric analysis of labor and markets and applies these techniques in...
Topics: ERIC Archive, Control Groups, Evaluation Methods, Experimental Groups, Models, Program Evaluation,...
Investigated empirically through post mortem item-examinee sampling were the relative merits of two alternative procedures for allocating items to subtests in multiple matrix sampling and the feasibility of using the jackknife in approximating standard errors of estimate. The results indicate clearly that a partially balanced incomplete block design is preferable to random sampling in allocating items to subtests. The jackknife was found to better approximate standard errors of estimate in the...
Topics: ERIC Archive, Error of Measurement, Item Sampling, Matrices, Sampling, Statistical Analysis,...
The primary purpose of this study was to investigate the appropriateness and implication of incorporating a testlet definition into the estimation of the conditional standard error of measurement (SEM) for tests composed of testlets. The five conditional SEM estimation methods used in this study were classified into two categories: item-based and testlet-based methods. When individual items are used as the fundamental measurement unit, the assumptions required by measurement modeling for tests...
Topics: ERIC Archive, Definitions, Error of Measurement, Estimation (Mathematics), Reliability, Statistical...
The implications of data from a review of ten years of the American Educational Research Journal (AERJ) indicating that random sampling is rare and that there is increased use of quasi-experimental designs lacking in random assignment are considered. It is suggested that tests of significance could be abandoned or at least placed in a subsidiary role as a test of one rival internal validity hypothesis (i.e., that chance caused the results) rather than being the center of data analysis. The use...
Topics: ERIC Archive, Data Analysis, Educational Research, Graduate Study, Higher Education, Sampling,...
The relationship of sample size to number of variables in the use of factor analysis has been treated by many investigators. In attempting to explore what the minimum sample size should be, none of these investigators pointed out the constraints imposed on the dimensionality of the variables by using a sample size smaller than the number of variables. A review of studies in this area is made as well as suggestions for resolution of the problem. (Author)
Topics: ERIC Archive, Correlation, Factor Analysis, Factor Structure, Matrices, Sample Size, Sampling,...
Item response theory (IRT) has been adapted as the theoretical foundation of computerized adaptive testing (CAT) for several decades. In applying IRT to CAT, there are certain considerations that are essential, and yet tend to be neglected. These essential issues are addressed in this paper, and then several ways of eliminating noise and bias in estimating the individual parameter, theta, of person "a" are proposed and discussed, so that accuracy and efficiency in ability estimation...
Topics: ERIC Archive, Ability, Adaptive Testing, Estimation (Mathematics), Item Response Theory,...
The known interval scale, referred to as the 7.8 scale, has been criticized as an invalid measuring instrument in the form of an attitude scale. It is the purpose of this paper to demonstrate that this scale can produce spuriously inflated correlation coefficients, high reliability, and false significance on statistical tests. The case will be made along two general lines. First, the effects of the scale on reliability, validity, and significance testing will be presented and second the...
Topics: ERIC Archive, Attitude Measures, Predictive Validity, Statistical Bias, Statistical Significance,...
The present paper reviews the techniques commonly used to correct an observed correlation coefficient for the simultaneous influence of attenuation and range restriction effects. It is noted that the procedure which is currently in use may be somewhat biased because it treats range restriction and attenuation as independent restrictive influences. Subsequently, an equation was derived which circumvents this difficulty and provides a more general solution to the problem of estimating the true...
Topics: ERIC Archive, Correlation, Measurement Techniques, Psychometrics, Research Problems, Statistical...
Model-based methods for estimating the population mean in stratified clustered sampling are described. The importance of adjusting the weights is assessed by an approach considering the sampling variation of the adjusted weights and its (variance) components. The resulting estimators are more efficient than the jackknife estimators for a variety of datasets obtained from the 1990 Mathematics Trial State Assessment of the National Assessment of Educational Progress (NAEP). The methods can be...
Topics: ERIC Archive, Data Analysis, Estimation (Mathematics), Models, National Surveys, Regression...
Problems associated with low response rates to surveys are considered, drawing from the literature on the methodology of survey research. A series of analyses are presented which were designed to examine the efficacy of Astin and Molm's procedure to adjust for nonresponse biases. Data were obtained form the Cooperative Institutional Research Program survey of 1987 incoming students and the 1991 followup survey. Data were analyzed for 209,627 students at 390 institutions. After separating the...
Topics: ERIC Archive, Followup Studies, Higher Education, Institutional Research, Longitudinal Studies,...
Eta-Squared (ES) is often used as a measure of strength of association of an effect, a measure often associated with effect size. It is also considered the proportion of total variance accounted for by an independent variable. It is simple to compute and interpret. However, it has one critical weakness cited by several authors (C. Huberty, 1994; P. Snyder and S. Lawson, 1993; and T. Snijders, 1996), and that is a sampling bias that leads to an inflated judgment of true effect. The purpose of...
Topics: ERIC Archive, Effect Size, Monte Carlo Methods, Sampling, Statistical Bias, Barnette, J. Jackson,...
Criteria for prediction of multinomial responses are examined in terms of estimation bias. Logarithmic penalty and least squares are quite similar in behavior but quite different from maximum probability. The differences ultimately reflect deficiencies in the behavior of the criterion of maximum probability.
Topics: ERIC Archive, Probability, Prediction, Classification, Computation, Statistical Bias, Least Squares...
An investigation of the effects of randomly missing data in two-predictor regression analyses is described. The differences in the effectiveness of five common treatments of missing data on estimates of R-squared values and each of the two standardized regression weights is also investigated. Bootstrap sample sizes of 50, 100, and 200 were drawn from three sets of actual field data. Randomly missing data were created within each sample, and the parameter estimates were compared with those...
Topics: ERIC Archive, Comparative Analysis, Computer Simulation, Estimation (Mathematics), Mathematical...
The effectiveness of Stout's procedure for assessing latent trait unidimensionality was studied. Strong empirical evidence of the utility of the statistical test in a variety of settings is provided. The procedure was modified to correct for increased bias, and a new algorithm to determine the size of assessment sub-tests was used. The following two issues were addressed via a Monte Carlo simulation: (1) the ability to approximate the nominal level of significance via the observed level of...
Topics: ERIC Archive, Algorithms, Latent Trait Theory, Monte Carlo Methods, Standardized Tests, Statistical...
The application of a multivariate analytic technique for the analysis of data from longitudinal designs with multiple dependent variables is presented. The technigue is the multivariate generalization of univariate repeated measures ANOVA. An application of the technique to data collected using materials from the Asian Studies Curriculum Project is included. The example analysis indicated the technique is viable and should be a useful tool for the methodologist/evaluator. (Author)
Topics: ERIC Archive, Analysis of Variance, Hypothesis Testing, Longitudinal Studies, Multivariate...
In this study, the effect of appended pretesting was evaluated with regard to item statistics and examinee scores for groups of items that were pretested as part of a large-scale operational testing program. In appended pretesting, items are administered in a separately timed section at the end of an operational test battery. Two evaluations were conducted: one using a pretest unit consisting of a reading passage and the other using a pretest unit consisting of mathematics items, most of which...
Topics: ERIC Archive, Context Effect, Pretesting, Simulation, Statistical Bias, Test Construction, Test...
An overview of the state of the art in psychological research is presented, with an emphasis on the attention given to effect sizes. The acceptance of small effect sizes for biomedical research is contrasted with the rejection of similar effect sizes for psychological research. The Binomial Effect Size Display is used to depict the practical magnitude of an effect size regardless of whether the dependent variable is dichotomous or continuous. Other topics discussed include: (1) the meaning of...
Topics: ERIC Archive, Effect Size, Mathematical Models, Meta Analysis, Psychological Studies, Research...
The study evaluated the effectiveness of log-linear presmoothing (Holland & Thayer, 1987) on the accuracy of small sample chained equipercentile equatings under two conditions (i.e., using small samples that differed randomly in ability from the target population "versus" using small samples that were distinctly different from the target population). Results showed that equating with small samples (e.g., N less than 50) using either raw or smoothed score distributions can result...
Topics: ERIC Archive, Equated Scores, Data Analysis, Accuracy, Sample Size, Tests, Statistical Bias, Error...
Drawing on a previous study exploring expertise development in management (J. Arts, W. Gijselaers, and H. Boshuizen, 2000), this study investigated how dimensions of mood may affect recall in studies on expertise development and whether sample composition (distribution of males and females) may account for possible differences in the recall function. A data sample taken from the previous study's data set contained 18 novices, 14 students at the end of the first year of the management program,...
Topics: ERIC Archive, Administration, Case Studies, Moods, Recall (Psychology), Sex Differences,...
This exploratory study extends the work done by B. Plake and others (2000) and R. Guille and others (2001) by investigating whether a negligible occasion facet would still be found when ratings for licensure and certification examinations were completed in isolation. A set of items was sent to a standard-setting committee to be reviewed at home, completely independently of all other members of the committee. Seven to nine raters reviewed each item. The examination was a medical certification...
Topics: ERIC Archive, Certification, Interaction, Judges, Licensing Examinations (Professions), Medical...
Qualitative research evokes rather stereotyped responses from the mainstream of social science. The following 10 standardized responses to the stimulus "qualitative research interview" (QRI) are discussed: (1) it is not scientific, only common sense; (2) it is not objective, but subjective; (3) it is not trustworthy, but biased; (4) it is not reliable, but rests on leading questions; (5) it is not intersubjective, as different interpreters find different meanings; (6) it is not...
Topics: ERIC Archive, Foreign Countries, Interviews, Qualitative Research, Reliability, Research...
The development of an index reflecting the probability that the observed correspondence between multiple choice test responses of two examinees was due to chance in the absence of copying was previously reported. The present paper reports the implementation of a statistic requiring less restrictive underlying assumptions but more computation time and a related Bayesian procedure designed to adjust the standard error estimates to counteract the effect of the presence of a substantial proportion...
Topics: ERIC Archive, Bayesian Statistics, Cheating, Data Processing, Multiple Choice Tests, Probability,...
Since 1973, the National Assessment of Educational Progress (NAEP) has gathered information about levels of student proficiency in mathematics. These assessments are reported by NAEP periodically and present information on the strengths and weaknesses in students' mathematical understanding and their ability to apply that understanding in problem solving situations. This document presents the mathematics framework for the 1996 and 2000 NAEP assessments. The suggested revisions in the framework...
Topics: ERIC Archive, Educational Assessment, Elementary Secondary Education, Evaluation, Mathematics...
A computer-simulated study was made of the sampling distribution of omega squared, a measure of strength of relationship in multivariate analysis of variance which had earlier been proposed by the author. It was found that this measure was highly positively biased when the number of variables is large and the sample size is small. A correction formula for reducing the bias was developed by the method of least squares and was found to yield nearly unbiased corrected values. A simpler,...
Topics: ERIC Archive, Analysis of Variance, Computer Programs, Matrices, Multivariate Analysis, Sampling,...
This simulation study focused on the power of detecting group differences in linear growth trajectory parameters within the framework of structural equation modeling (SEM) and compared this approach with the more traditional repeated measures analysis of variance (ANOVA) approach. Three broad conditions of group differences in linear growth trajectory were considered. SEM latent growth modeling consistently showed higher statistical power for detecting group differences in the linear growth...
Topics: ERIC Archive, Analysis of Variance, Groups, Power (Statistics), Sample Size, Simulation,...
Canonical correlation analysis is a powerful statistical method subsuming other parametric significance tests as special cases, and which can often best honor the complex reality to which most researchers wish to generalize. However, it has been suggested that the canonical correlation coefficient is positively biased. A Monte Carlo study involving 1,000 random samples from each of 64 different population matrices was conducted to investigate bias in both canonical correlation and redundancy...
Topics: ERIC Archive, Computer Simulation, Correlation, Error of Measurement, Monte Carlo Methods,...
The effects of monetary gratuities on response rates to mail surveys have been considered in a number of studies. This meta-analysis examined: (1) the nature of the population surveyed; (2) the effects of gratuities in relation to the number of follow-ups; (3) whether the gratuity was equally effective across different populations; (4) whether the gratuity was promised or enclosed; and (5) the year of publication of the study. The bulk of the studies was done in the context of market research....
Topics: ERIC Archive, Comparative Analysis, Evaluation Methods, Mail Surveys, Meta Analysis,...
The statistician has n independent estimates of a parameter he knows is positive, but, as is the case in components-of-variance problems, some of the estimates may be negative. If the n estimates are to be combined into a single number, we compare the obvious rule, that of averaging the n values and taking the positive part of the result, with that of averaging the positive parts. Although the estimator generated by the second rule is not consistent, it is shown by numerical calculation that...
Topics: ERIC Archive, Computation, Mathematical Applications, Mathematical Models, Measurement Techniques,...
Observational studies are common in educational research, where subjects self-select or are otherwise non-randomly assigned to different interventions (e.g., educational programs, grade retention, special education). Unbiased estimation of a causal effect with observational data depends crucially on the assumption of ignorability, which specifies that potential outcomes under different treatment conditions are independent of treatment assignment, given the observed covariates. The primary goals...
Topics: ERIC Archive, Computation, Influences, Observation, Data, Selection, Simulation, Methods,...
Information about the world and how it works is often hard to locate and difficult to understand. The objectives and activities in this teaching guide were developed to complement the "World Military and Social Expenditures" (WMSE) report in the study of global issues in secondary school classrooms. The report contains well-documented and up-to-date statistics presented in concise narrative, charts, graphs, and maps. The WMSE report encourages the reader to make conscious and direct...
Topics: ERIC Archive, Cost Effectiveness, Data Analysis, Input Output Analysis, Journalism, National...
Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A method based on variance component models is proposed as an alternative to the jackknife procedure used for calculation of the standard errors for the...
Topics: ERIC Archive, Error of Measurement, Estimation (Mathematics), Prediction, Research Design,...