Multiple views on multiple metrics : how teachers perceive the validity and utility of metrics in a multi-measure evaluation system
- Over the past five years, policy makers have shown a renewed interest in using teacher evaluation as a lever for increasing the quality of the teacher labor market. As such, states have been spurred by federal policies such as Race to the Top and influential foundations such as the Bill & Melinda Gates Foundation to use a more reliable, more nuanced method of evaluation: multiple metrics. Unlike relying solely on test scores or observation checklists, these new composite metrics were touted as more stable estimates for both human resource decisions and for targeting professional development. These multiple measures vary from district to state, but they are largely a combination of growth-based student test metrics, standards-based observation metrics, and surveys from stakeholders such as students or teachers. This study focuses on a state that uses the former two metrics, student growth and teacher observation, to ask whether such metrics are providing information to teachers that they can use to inform their instruction. This mixed-methods study takes place in Delaware, which was the first winner of President Obama's federal education initiative, Race to the Top. As such, it won $100 million to invest in reforming aspects of their educational system, teacher evaluation among them. Delaware overhauled their existing evaluation system, the Delaware Performance Appraisal System (DPAS) and unveiled DPAS-II in 2012. The data utilized for this study include a survey of all teachers in the state of Delaware, administrative data from the Delaware Department of Education, and interviews with 32 Math and English/Language Arts teachers. The findings of this study are organized into three chapters. The first chapter investigates teachers' perceptions of the validity and utility of information they receive as part of their test-based student growth metrics. In general, teachers do not find test-based metrics to be accurate or valid methods of capturing teacher performance. Major themes that emerged regarding the lack of validity included teachers' ability to manipulate scores on the exams, the subjectivity of individualized growth-goals for certain metrics, and bias in test scores that correlated with student demographics. Major themes that emerged regarding teachers' ability to use test-based data included the specificity of the data (more detail was generally regarded as better), the frequency of the test administration (more frequent was preferred), and the timeliness with which teachers received the data (sooner was better). The second findings chapter investigates teachers' perceptions of the validity and utility of information teachers receive as part of their observation-based metrics. On average, teachers find observation metrics to be more valid than test-based metrics; most teachers also reported that the Framework for Teaching aligned with their vision of good instruction. Furthermore, teachers liked that the observations were grounded in a rubric that demanded low-inference data from classrooms. However, teachers reported validity concerns around the representative of announced observations as well as how student composition affected teacher ratings on the observation rubric. Despite their generally positive view of observations, teachers reported that the feedback they received from principals was largely unhelpful; rather, teachers seemed to derive utility from the process of the observation—including time for detailed planning and reflection—as well as the time set aside to engage in meaningful dialogue with administrators. The third findings chapter uses the survey and administrative data to investigate systematic variations of teacher perceptions by teacher and school characteristics. Regression analyses find novice teachers were more likely to report that the conferences in the DPAS process were useful and that the evaluation is related to practice, both between and within schools. In addition to novice status, the number of observations conducted by an administrator was positively predictive of teachers' perceptions of evaluators, utility of the conferences, and utility of evaluation process for practice. Teachers in high-needs schools were also more likely to rate the utility of the evaluation process and its conferences well; this may be the result of such schools participating in a special program called the "Delaware Talent Cooperative" meant to support struggling schools. Last, the average number of years of experience of administrators in a school was positively predictive of teachers' ratings of their evaluator as well as trust in observation scores. The final chapter offers a discussion of the implications of this work for policy, practitioners, and research. Taken as a whole, this study demonstrates that teachers may be receiving limited amounts of additional evidence from multiple-metric evaluation systems; however, many serious validity concerns often impede the usage of this information. If policymakers indeed desire teachers to use evaluation information formatively, they must invest in systems and human resources to afford teachers the ability to do so. If the system instead is meant to monitor teachers, policymakers should temper their insistence that such metrics are useful for the improvement of practice.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Stanford University, Graduate School of Education.
|Grossman, Pamela L. (Pamela Lynn), 1953-
|Grossman, Pamela L. (Pamela Lynn), 1953-
|Statement of responsibility
|Submitted to the Graduate School of Education.
|Thesis (Ph.D.)--Stanford University, 2016.
- © 2016 by Lindsay Brown
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...