Tag: House Administration Committee

Helping the House Advance Data Transparency

The House of Representatives is poised to make great strides forward in transparency, and our work over the last year aims to help them do that. Here’s how this spreadsheet (.xls) will do that.

In December, the House Administration Committee announced a plan to improve the publication of House documents. In January, a new site—docs.house.gov—went live. (It’s attractive looking, but still bare-bones.) On Thursday this week, the Committee is hosting a “Legislative Data and Transparency Conference” to examine what data is out there and what data should be out there. Little information is on the Web yet, but you can sign up to attend at the link just above.

I’ll be speaking on the last panel of the day, which deals with measuring transparency success. Likely, they chose me for this panel because I’ve already been grading the government on its publication practices.

Last September, you see, we graded Congress on how well it publishes data that would assist the public in computer-aided oversight. The summary blog post is called “Needs Improvement.” And then in December, we graded the government on publication of budget, appropriations, and spending data. That’s a joint legislative-executive responsibility, but mostly executive. The message was: “‘Needs Improvement’ is Understatement.”

How do you grade Congress and the government on their data publication?

You start out by modeling the data government should publish. We put together a data model for legislative process, for example, and then a data model for budgeting, appropriating, and spending. We got a great deal of help from folks at the Sunlight Foundation, OMB Watch, and others such as the National Priorities Project, as well as data guru Josh Tauberer, whose latest project is PopVox.

Even with all this help, these models won’t be the last word—there is much to learn yet about the data structure that will serve every use the public may want to make of information. But it’s a strong start.

Then we compared the data that’s actually out there to the practices described in my paper, “Publication Practices for Transparent Government,” and out popped the grades! They were pretty bad…

The House of Representatives aims to fix that—for its part, at least.

Now to this spreadsheet: it’s a list of the things that should be identified in congressional documents so that computers can find the most salient information in them. It also indicates the “vocabularies” that already exist for identifying many of them: members of Congress, bills, laws, statutes, committees, agencies, programs, and so on. We’ve talked about how to identify “budget authority” and appropriations (spending) so that computers can capture that information from bills and committee reports. Locations, state and foreign governments, times, meetings—all these things can be put into electronic versions of documents to allow computer-aided public oversight.

Once documents contain data like this in the proper structures, literally thousands of questions about Congress will be answered instantly.

  • How much new budget authority has each member of Congress proposed? Voted for? Voted against? Allowed to go through on voice vote or unanimous consent? How about this same information by state? By region? Or by seniority?
  • What title of the U.S. code do members of Congress most often propose to amend? What title do they actually amend the most?
  • What bills affect my state specifically, such as by naming buildings, creating wilderness areas, changing boundaries on parks, or giving land to localities?
  • How often do my member of Congress and senators break with their party?

These are just a few examples. In the hands of varied users, the data will be converted to hundreds or thousands of uses. It will go into studies performed by political scientists and it will supercharge news reporting. But more importantly, it will go into services that inform people directly and quickly about how their own representatives in Congress are acting and what they’re saying.

It will give people insight into where the money goes—from the moment new spending is proposed all the way through to when Congress spends it—or declines to spend.

Credit is due to the leadership in the House of Representative for starting this work. There is a lot to do before they show clear success. But they are way ahead of President Obama, whose Sunlight Before Signing transparency promise lags badly, and who has yet to put together a machine-readable organization chart for the executive branch of the federal government. He can easily do the latter, and coordination with Congress is essential for transparency success. The sooner that happens the better.

House Transparency Slated to Improve

Perhaps my mean grading has contributed to nascent competition between the Republican House and the Democratic administration for the transparency prize. Last Friday, the House Administration Committee adopted standards that “require all House legislative documents be published electronically in an open, searchable format on one centralized website.”

At a September Cato Capitol Hill briefing, I rated Congress on the quality of the data it publishes reflecting its membership, activities, documents, and decisions. Its grades weren’t that good. At a briefing last week, I graded the data about federal budgeting, appropriations, and spending, which is largely an executive branch responsibility. Those grades weren’t very good either.

Able and dogged transparency advocate Daniel Schuman at the Sunlight Foundation has a good write-up up the House’s move to produce good data—he and Sunlight certainly did their part to encourage it—though I’ll quibble with one particular. The adoption of the document—a two-page outline of what should be standardized, and not a standards document itself—was not really “a tremendous step into the 21st Century.” It was an outline of a course to improved transparency. 21st-Century transparency.

What is required to produce that transparency? My recent paper “Publication Practices for Transparent Government” sought to establish guideposts for publication of data that will foster public access to meaningful information about what happens in Washington, D.C. The practices, in ascending order of importance and difficulty, are: authority, availability, machine-discoverability, and machine-readability.

Putting all documents on a single site will enhance authority. People will know where to look, and what source to trust. In our rough grading system, we weighted the simple practice of authoritative publishing at 10% of the total grade.

The second practice, availability, means ensuring that the data is complete, that it remains permanently in the same location, that it is not proprietary itself, and that it is not in a proprietary format. This is likely to be fulfilled by adherence to the Committee’s language and basic good practices. Availability we weighted at 20% of the total grade.

Machine-discoverability is when data is identified and located consistent with a variety of good practices going to the naming and locating of Internet resources. It’s weighted at 30% of the total grade in our system for rating data publication. It is likely that the House will develop good practices, but it will be important to watch and see that it does.

Machine-readability is the most important part of transparency. It means publishing data so that the logical relationships among elements are clear, and so that computers can automatically detect the semantic meaning of the documents and data they examine.

This is where the House Administration Committee’s release is least clear. Documents like bills and committee reports could be published so that each reference to existing law, to federal agencies, bureaus, and programs, to newly authorized spending, and to a variety of other items and entities are automatically discoverable in the document.

You should be able to do a quick search, rather than labor for hours, to see what bills affect the Labor Department. You should be able to see every dollar authorized or appropriated in every bill, nearly instantly. The data should be a foundation for dozens of sites and services that disseminate iformation in different ways to different audiences.

Here’s hoping that the House Administration Committee’s standards drive all the way to machine-readability. It will be a step into the 21st century if the House provides data the Internet can use and that the Internet-connected public very much wants to see.

Coming through with robust machine-readability will handily take the transparency mantle from President Obama, who promised transparency as a campaigner, but who was not produced the vibrant, different government people wanted. As I noted in a write-up last week, the administration has some low-hanging transparency fruit that could bring its grades up decisively. House Republicans are first out of the gate.