Yesterday, I shared my doubts about the prospect of getting budget and organizational data from the White House. Today, I'm happy to report genuine progress on open data from Congress.
The Government Printing Office announced today that it will be making House bills available in XML format and in bulk through FDsys, GPO’s Federal Digital System. House bills now join other material on GPO's bulk data page.
If you're like me, following that link gives you some idea of what's there, but clicking through any further gives you no idea how to use it any more than other copies of bills. That's OK, because the kids with the computers do know how to use it. And they can take well structured, timely data reflecting the proposals in Congress and turn it into various information services, applications, and web sites that make all of us better aware of what's happening.
I believe the public has an Internet-fueled expectation that they should understand what happens in Congress. It's one explanation for rock-bottom esteem for government in opinion polls. Access to good data would help produce better public understanding of what goes on in Washington and also, I believe, more felicitous policy outcomes—not only reduced demand for government, but better administered government in the areas the public wants it. (If you're a reader of a certain partisan bent, you might appreciate the idea that the era of passing bills to find out what's in them will end.)
Upon the release of my Cato Policy Analysis, "Grading the Government's Data Publication Practices" I characterized President Obama as lagging House Republicans in terms of transparency. Today's development helps solidify Republicans' small lead. The GPO release says the initiative comes "[a]t the direction of the House Appropriations Committee, and in support of the task force on bulk data established by House report 112-511."
The administration has plenty of capacity to retake the lead, of course, and could do so quite easily. I'll call it like I see it, doing my best to reflect consensus among the transparency community as to the quality of data publication, when we return to grading the data produced by various organs of government in another year or so.
Did you think this praise would come without garnish? It's like you don't know me at all.
For now, this data is of limited use because it includes only House bills. The entire oeuvre of congressional bill-writers should be published the same way in the same place so that contrasts and comparisons can be drawn among House and Senate work. In short, why is the Senate not on board?
That I've been able to find, the XML is not well documented. What each of the technical codes means is understood by several people in Washington's transparency community, but the idea is to make it available very broadly, so the documentation should be very strong. The information at xml.house.gov should be updated, tightened up, and made easily available to the people gathering bill data on FDsys.
The XML data structures put in bills are limited in terms of what they convey. There is rudimentary information about who introduced and cosponsored bills, what committees they were referred to, and other procedural information. That's good. But the effects of bills—on agencies, existing law, programs, places—this is not available in machine-readable code. That would be great.
Watch this space. In the coming weeks and months, we'll show how semantically rich data can automatically reveal more about what happens in the legislative process. Technical people will be able to draw insights about legislation and the legislative process that were never available before. They will translate that for us myriad ways, better equipping the public to oversee the government.