It's One of the Biggest Failures Yet in K-12 Philanthropy. What Are the Lessons?

Public education is a legendary graveyard for ambitious philanthropic plans.

The pantheon of failures includes the $500 million Annenberg Challenge for School Reform, which was launched in 1993 and yielded disappointing results; the $200 million push to overhaul Newark's public school system, which was kickstarted in 2010 with a huge gift from Mark Zuckerberg and turned into a fiasco (although it produced more gains than some critics think); and a $100 million student data collection effort, InBloom, which was bankrolled by the Gates Foundation before it was shut down in 2014.

Now, we can add to this list the Gates-backed Intensive Partnerships for Effective Teaching initiative, which aimed to improve student achievement through greater access to effective teachers. This effort cost participating schools $575 million between 2009 and 2016, with Gates contributing $212 million of that. Yet according to a 500-plus page study released in late June by the RAND Corporation, the initiative failed to improve student graduation rates, the effectiveness of teachers, or retention of the most effective teachers. The results cannot be a surprise to Bill Gates, who told the Council of the Great City Schools last fall that the funder would no longer invest directly in teacher evaluation, but would continue to gather data on such systems and encourage the use of these tools to help teachers improve.

Before getting into the details of the study, which some critics have seized on to amplify longstanding critiques of ed reform funders generally and the Gates Foundation specifically, it's worth saying a few words about risk and philanthropy. A great strength of private grantmaking is that funders can try things that might not work. Philanthropic dollars have been called "society's risk capital," and when things work out, such investments can act—in another memorable phrase—as "society's passing gear."

We want philanthropists to take risks and often criticize them when they're overly cautious. Risk-taking is especially important for tackling stubborn problems that have repeatedly defied successful intervention. Funders need to keep trying new things. To make progress in a tough area like K-12 education requires research and development, including large-scale experiments, to see what works. Bill Gates has often described the mission of the Gates Foundation's education work in precisely these terms.

If you agree that philanthropy should be taking big risks, you shouldn't be too surprised by big failures. Nor should you be reflexively critical of the funders behind them, since they're doing what we want them to do. At Inside Philanthropy, we make a point of not piling on when funders fail. If donors and foundations are kicked around too harshly for their mistakes, they'll take fewer risks and we'll all be worse off.

All that said, risk taking in philanthropy needs to be approached with great care, especially in an area like public education. When risky initiatives go wrong, they can impose major costs on all involved. In the case of K-12, that means students, parents, teachers, administrators and the taxpayers who pick up the bill for failure.

The Gates Foundation has attracted so much criticism over the years not just because of its perceived top-down approach, but because of the real costs its mistakes have imposed on intended beneficiaries—costs that some critics predicted in advance while urging caution.

The big Gates push on teacher effectiveness is a case in point.

The Intensive Partnerships for Effective Teaching initiative grew out of Gates' Measures of Effective Teaching (MET) project, which sought to determine and quantify what quality teaching actually looks like. The logical next step from this work was to transform the funder's learning into a set of policy levers that would increase student access to effective teachers, leading to improved academic outcomes.

The initiative sought to create teacher evaluation systems that relied on a combination of standardized test scores and observations by peer evaluators to identify which teachers were most effective in improving student performance. This would help schools place these top teachers where they were needed most, as well as offer professional development to make lower performers more effective. Measuring teacher effectiveness was also seen as the key to merit pay, an idea championed by many reformers.

Gates believed the controversial teacher evaluation system at the heart of the initiative would boost student achievement. The foundation had a powerful ally in then-Education Secretary Arne Duncan, who also supported the use of value-added systems such as the one championed by Gates.

Gates' vision made some sense conceptually. Decades of research have shown that effective teachers are the single most important in-school factor impacting student academic achievement. But as with most major policy initiatives, the way in which concepts are operationalized and implemented is critical. Gates and Duncan thought student test scores should be a data point in measuring teacher effectiveness. Assessment experts, meanwhile, were waving a large caution flag, warning that assessments designed to gauge student learning should not be used to evaluate teaching. They stated that value-added models lacked statistical validity. It is also important to note that student scores are affected by multiple factors, both in and out of school, adding to misgivings about their use in such high-stakes decisions affecting personnel and compensation.

We've written in the past about how this high-profile Gates project played out on the ground, paying particular attention to the initiative's largest participant: the Hillsborough County Public Schools (HCPS) in Tampa. While HCPS was the largest district involved, it was not the only participant. Gates also funded this work in the Pittsburgh and Memphis public schools, as well as four charter school networks: Alliance, Aspire, Green Dot, and Partnerships to Uplift Communities Schools.

The experience in HCPS is instructive. A Gates-funded 2014 study published in the peer-reviewed journal Educational Evaluation and Policy Analysis found little or no correlation between quality of instruction and value-added measures of effectiveness. No doubt this study was a factor in the funder's decision to back away from the evaluation models it had previously championed. A "rushed" system could have the effect of punishing teachers rather than providing guides for improvement, the foundation concluded. The American Statistical Association also discouraged the use of value-added models in staffing and compensation decisions.

Unfortunately, lawmakers in Florida and other states had codified such systems as part of school accountability laws, binding many schools and districts to unproven policies that carry a high price tag. In Florida, the costs of the evaluation system ballooned to $70 million more than the $200 million initially budgeted. HCPS often had to dip into its reserves to meet the project's expenses. What's more, the rising costs neither resulted in improved graduation rates nor channeled more effective teachers into schools with the neediest students.

RAND conceded that it could not say for certain why this initiative failed to deliver. Helping teachers improve appears to have been a major weak link in the initiative. The RAND study notes that while the project succeeded in helping the participating schools and districts measure effectiveness, it was not successful in connecting the dots to use those measures for improving effectiveness. Observers who do not buy into the definition of effectiveness underlying this project may dispute the report's positive conclusion about measuring it.

The study's authors also suggested it was possible that there had not been enough time for positive effects to appear, but they warned that the focus on teacher effectiveness, to the exclusion of factors ranging from early education to family and community factors, is insufficient to improve student achievement alone.

We've previously written about how the Gates Foundation has issued mea culpas for some of its past missteps, including its high-handedness in pushing the Common Core, and how it has pivoted in its K-12 work to a less top-down approach that includes support for local networks. Presumably, the disappointing results from its teacher effectiveness work—which, as we say, have been clear for a while, now—have helped to inform changes in how the foundation approaches its education work.

The moral of the story here isn't that the Gates Foundation shouldn't take big risks—let's hope that it keeps doing exactly that—but that it needs to operate in a more conscientious and collaborative way. The good news: As a "learning organization," it appears that Gates is already well along in internalizing some of the hard lessons of its past K-12 work.