The uses and misuses of assessments in education policy : studies at the student, teacher, and system levels

Placeholder Show Content


This dissertation examines uses and misuses of assessments to inform policies at the student, teacher, and system levels. The three levels are considered in three separate studies, each described below. Much research shows that assessment policies can lead to unintended and, in some cases, perverse outcomes for educational stakeholders. This dissertation adds to that literature. Beyond finding technical measurement problems with how policies use test scores, this dissertation also concerns itself with the usefulness of the inferences that are drawn from educational assessments. Study 1. Are Early Warning Systems an Improvement on Teacher Judgment? The first study examines whether assessments and other measures of college readiness—including surveys of so-called "noncognitive" skills—add anything to teacher intuition about students. Each year, more districts implement early warning systems (EWS). These EWS predict negative student outcomes such as dropping out before they occur. Predictions are then used to match off-track students to appropriate supports and interventions. Research suggests these systems are useful in ensuring educators respond to student needs early by generating conversation around specific students at risk of dropping out. However, no research considers what new information teachers gain from having a specific prediction for a student. This study compares teacher and EWS predictions of whether students will complete high school and enroll in college to see if EWS improve on teacher judgment. Further, it assesses whether accuracy in teacher judgment stems from additional information not in models—especially related to noncognitive skills—and biases like self-fulfilling prophecies. Generally, EWS can provide benefits both as organizational tools and by increasing the precision with which students are identified for supports and interventions. Study 2: Is Teacher Value Added Still Valid if an Equal-Interval Scale Assumption is Violated? The next study considers a divisive topic in education policy: the use of value added to measure teacher effectiveness and make personnel decisions. Specifically, the study examines whether assumptions about the units of the test scale being used may be driving value-added estimates rather than actual teacher quality. A test is said to be equal interval when gaining a unit at one end of the scale is equivalent to gaining a unit anywhere else along the scale. Research shows that wrongly assuming a test scale is equal-interval can be problematic, especially when the assessment is being used to achieve a policy aim like evaluating growth over time. However, little research considers whether teacher value added is sensitive to the underlying test scale, and in particular whether treating an ordinal scale as interval might lead to erroneous conclusions about teacher quality. This study addresses the issue by estimating teacher value added, then applying mild non-linear transformations to the original scale and re-estimating the value added. Results show that value added is sensitive to the scale used, and that even mild departures from the original scale can change a teacher's odds of being considered high- or low-performing by a factor of 5. Study 3: Are Conclusions About the Impact of Charter Schools and Vouchers on Noncognitive Outcomes Based on Improper Inferences Drawn from Surveys? Program evaluators increasingly use surveys to measure the impact of educational programs on noncognitive outcomes. Evaluators with the ability to randomize students into control and treatment conditions typically do not worry that these surveys perform differently across ethnicities, native languages, and ages because randomization helps account for observed and unobserved differences between the conditions. One violation of this assumption could occur when the instrument used is not measurement invariant across two groups—that is, when it measures different constructs. This study tests the measurement invariance of noncognitive surveys from two U.S. government-funded evaluations, one of charter schools, the other of vouchers. Results show that these surveys not only measure different constructs across age groups, but that this measurement problem can reverse estimated treatment effects.


Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English


Associated with Soland, James
Associated with Stanford University, Graduate School of Education.
Primary advisor Haertel, Edward
Primary advisor Hakuta, Kenji
Thesis advisor Haertel, Edward
Thesis advisor Hakuta, Kenji
Thesis advisor Loeb, Susanna
Advisor Loeb, Susanna


Genre Theses

Bibliographic information

Statement of responsibility James Soland.
Note Submitted to the Graduate School of Education.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

© 2015 by James Gilbert Soland

Also listed in

Loading usage metrics...