Armor’s Reply to Barnett: Research on Early Childhood Ed Still Unpersuasive

W. Steven Barnett’s attempt to rebut my review of preschool research begins with an ad hominem attack on my (and Cato’s) motives for publishing this piece, calling it an “October Surprise” with an aim “to raise a cloud of uncertainty regarding preschool’s benefits that is difficult to dispel in the time before the election.” He omits that my first review of preschool research was published in January, the same month Cato sponsored a public forum on the topic with both pro and con speakers.  The current, expanded review was published now because it took me that long to finish it.  

Of course, it is crucial to let the research and arguments speak for themselves, but for what it is worth, I have no formal affiliation with Cato or any other organization other than George Mason University, while Barnett is Director of The National Institute for Early Education Research (NIEER), whose mission is to “support high-quality, effective early childhood education for all young children.”  Barnett is a long-time advocate of universal preschool, while I had no position on pre-k until I read reports from the national Head Start Impact Study (HSIS).   

Moving on to substantive matters, Barnett says that because the successful Perry and Abecedarian programs were small and more intensive than current proposals, we should devote more resources to replicate them at scale, not discount them as of limited value in indicating how much larger, and different, programs would work.  But current “high quality” pre-K programs, including Abbott pre-K, do not in fact replicate either of these programs.  Moreover, Barnett ignores the national Early Head Start demonstration, a program similar to Abecedarian, which found no significant long-term effects in Grade 5 except for a few social behaviors of black parents–hardly an endorsement to make it universal.  Moreover, this one area of positive effects is tempered by significant negative effects on certain cognitive skills for the most at-risk students. 

The difference in outcomes between the tiny Abecedarian project and the national Early Head Start demonstration program may be simply one of scale and bureaucracy.   There is an enormous difference between designing and implementing a program for a few dozen mothers and infants in a single community and doing the same for thousands of children in many different communities across the country.  In a national implementation, there are many more opportunities for implementation problems in leadership, staffing, program design, and so forth.   

About my criticism of Regression Discontinuity Design (RDD) studies, Barnett says the flaws I describe are “purely hypothetical and unsubstantiated.” But I’m not alone in perceiving them; my concerns are shared by Russ Whitehurst, former Director of the Institute for Education Research in the U.S. Department of Education. More importantly, Barnett says my criticism about attrition (or program dropouts) is pure speculation, which is simply untrue.  

In my review, I reported that the Tulsa, Oklahoma, and Georgia treatment and control groups differed significantly in family background characteristics (Mother’s education and limited English proficiency) that are known to be related to achievement test scores.  The Boston study reported a 20 percent dropout rate from the treatment group, and those students were more disadvantaged than the stay-ins.  It is true, though, that I can’t estimate the dropout problem in Barnett’s 2007 Abbott study;  he does not report or describe attrition rates, nor does he provide any benchmark data that would allow a reader to compare the treatment and control group prior to testing.  For the RDD studies that provide data (many do not), there is no empirical support for Tom Bartik’s suggestion that the dropouts could be children from wealthier families.  Where data has been reported, the dropouts are more disadvantaged on one or more socioeconomic characteristics. 

Barnett next claims that the Head Start and Tennessee evaluations are “not experimental” but quasi-experimental, like the Chicago Longitudinal Study.  Regarding Head Start, Barnett (relying on Tom Bartik), misunderstands a reanalysis of the Head Start data by Peter Bernardy, which I cited.  The original Head Start study found no significant long-term effects.  Bernardy simply did a sensitivity analysis by excluding control group children who had some type of preschool; he found no long-term effects, same as the original Head Start study.  

Regarding the Tennessee experiment, Barnett is correct that I omitted a small positive effect for a single outcome: grade retention.  But he conveniently fails to mention that the Tennessee experiment found no long-term effects for the major outcome variables, including cognitive performance and social behaviors, and there was even a statistically significant negative effect for one of the math outcomes.

He then complains that my cost figures are miscalculated, saying I should subtract costs for existing pre-K programs and use “marginal” rather than average costs per child.  On the first, point, my review simply says that “states could be spending nearly $50 billion per year to fund universal preschool”; there is no need to subtract existing costs because it is simply an estimate of possible peak expenditures.  Regarding marginal vs. average costs, my $12,000 figure is based on 2010 per pupil costs. Current average costs are most certainly higher – the federal Digest of Education Statistics actually places total costs per-pupil at roughly $13,000 – and $12,000 is not unreasonable for marginal costs since 80% of education costs are for teacher salaries and benefits, the primary marginal cost components. 

Barnett then argued that I “omit[ted] much of the relevant research,” implying I would get different results had I included other reviews, especially one by the Washington State Institute for Public Policy (WSIPP) published in January. This is simply inaccurate.  The WSIPP report breaks programs down by state/city, Head Start, and “Model” programs.  Of the 13 state preschool evaluations, my review included 8 (WSIPP counted three different reports for the Tulsa, Oklahoma, program as separate studies).  Of the remaining five programs not in my review, three are RDD studies for Arkansas, New Mexico, and North Carolina with no information on attrition or relevant statistics to compare the treatment and control group at the time of testing.  I did include the New Jersey Abbott program, despite this problem of interpretation, because it is frequently mentioned and promoted as a high-quality preschool program.

Of the three model programs in the WSIPP study, my review included two, Perry Preschool and the Abecedarian project. The third is the IDS program mentioned by Barnett and also reviewed by him in other reports.  I have been unable to obtain a copy of this 1974 study, and Barnett’s review provides very little information about it.  He reports a standardized effect size of .4 at the end of pre-K, but with no documentation about treatment and control group equivalence at the start of preschool. Neither is there information about attrition or dropout rates during the preschool year. It is therefore hard to assess the reliability of this effect.  He acknowledges that a later follow-up study to document long term effects in adulthood suffered from “severe attrition” and may not be reliable. 

Most important, the average standardized effect that WSIPP found for all test scores across all state/city programs was .31, which is only somewhat higher than the average Head Start standardized effect of about .2 across all tests.  One reason the WSIPP effect is higher than Head Start is the inclusion of the extraordinary standardized effects (.9 or so) for the three Tulsa studies.  Furthermore, the WSIPP study also documented the fade-out effect.  

I do not understand Barnett’s claim that the New Jersey Abbott program has effects “three times as large” as the Head Start study.  His 2007 report says the gain in reading at age four  “…represents an improvement of about 28 percent of the standard deviation for the control (No Preschool) group”  and the gain in math “…represents an improvement of about 36 percent of the standard deviation for the control (No Preschool) group”  These represent standardized effects of .28 and .36, which averages out to .33 and thus is about the same as the WSIPP average for all state/city programs.  This is somewhat higher than the Head Start Impact Study (.2) but certainly not three times higher.  Moreover, his study presents no information about attrition for the treatment group, nor does he provide the reader with a table that compares the treatment and control group on socioeconomic characteristics prior to or at the time of testing.

Perhaps the most important point in this debate is something that Barnett does not explain, which is how any of the studies we have discussed support universal preschool.  The only studies that give reliable information – meaning valid research designs with statistically significant results – on long-term benefits such as crime, educational attainment, and employment are the Abecedarian and Perry Preschool programs.  Even assuming that these programs could be generalized to larger populations (holding aside the contrary implications of Early Head Start), these programs apply only to disadvantaged children who need a boost.  There is little justification based on these programs to claim that middle class children will experience the same benefits.