This study examines the accuracy of performance ratings from the Framework for Leadership (FFL), Pennsylvania's tool for evaluating the leadership practices of principals and assistant principals. The study analyzed four key properties of the FFL: score variation, internal consistency, year-to-year stability, and concurrent validity. Score variation was characterized by the percentages of school leaders earning scores in different portions of the rating scale. To measure the internal consistency of the FFL, Cronbach's alpha was calculated for the full FFL and for each of its four categories of leadership practices. Analyses of score stability used data on FFL scores of school years across two years to calculate Pearson's correlation coefficient. Concurrent validity was assessed through a regression model for the relationship between school leaders' estimated contributions to student achievement growth and their FFL scores. This report is based primarily on the 2013/14 pilot in which 517 principals and 123 assistant principals were rated by their supervisors; an interim report examined data from the 2012/13 pilot year. The study finds that the FFL is a reliable measure, with good internal consistency and a moderate level of year-to-year stability in scores. The study also finds evidence of the FFL's concurrent validity: principals with higher scores on the FFL, on average, make larger estimated contributions to student achievement growth. Higher total FFL scores and scores in two of the four FFL domains are significantly or marginally significantly associated with both value-added in all subjects combined and value-added in math specifically. This evidence of the validity of the FFL sets it apart from other principal evaluation tools: No other measures of principals' professional practice have been shown to be related to principals' effects on student achievement. However, in both pilot years, variation in scores was limited, with most school leaders scoring in the upper third of the rating scale. As the FFL is implemented statewide, continued examination of evidence on its statistical properties, especially the variation in scores, is important. The following are appended: (1) Prior research on measuring principal effectiveness; (2) Structure of the Framework for Leadership; (3) Data used in the study; (4) Technical details and supplementary findings on variation in Framework for Leadership scores; (5) Technical details and supplementary findings on the internal consistency of the Framework for Leadership; (6) Technical details and supplementary findings on year-to-year stability; (7) Technical details of school and principal value-added models; and (8) Technical details and supplementary findings on the relationships between Framework for Leadership scores and principals' value-added.