In Defense of Error-Laden Reporting

Tempted though I am to join the pile-on over the many inaccuracies in the data on the Recovery.gov stimulus reporting site—including claims of jobs created in non-existent congressional districts—I think the White House actually makes a good point here: You can get something out fast, or you can get it out bug-free, but you usually can’t do both. And in fact, concerns about “data quality” at government agencies have often been a great enemy of transparency. It is, after all, embarrassing when your department puts out information that’s poorly formatted or riddled with typos or just plain wrong. But in practice, that means agencies sit on the data until someone gets around to fixing it, which is seldom a high priority. The insight behind open source is that the best debugger is a release: Ten-thousand coders actually using software are going to find and patch problems faster and better than any in-house team. And the same holds here: Get the data out, and dumb mistakes get spotted.

There are, to be sure, ways some of these errors could have been avoided. As David Freddoso points out, it would have been trivial to design the backend to only permit legitimate congressional districts to be entered.  But again, getting the site up quickly means they can count on critics to point out those sorts of possibilities for improvement. That said, Freddoso surely has a point when he argues that there’s no sane reason this kludgy beast of a site should have cost $18 million. Far better would have been to take the open-source logic to its conclusion and simply dump the raw data on a server in XML format, then let outside groups—maybe the Sunlight Foundation or Americans for Tax reform or just some clever lone hacker—figure out how best to mash it up and present it.