"Needs improvement" is the understated theme of a Capitol Hill briefing this morning entitled "Publication Practices for Transparent Government: Rating the Congress." (Live-streamed starting at 9:00 am. If timely, check it out---the video will come up before too long also---and join the conversation on Twitter at the #RateCongress hashtag.)
Congress needs to improve its data publication practices if it's going to be the transparent legislature that it should be.
How did we arrive at this conclusion? We're doing more than stating the obvious.
A Cato Briefing Paper released today entitled "Publication Practices for Transparent Government" goes through some technically challenging but essential concepts in data publication: authoritative sourcing, availability, machine-discoverability, and machine-readability. Together, these practices will allow computers to automatically generate the myriad stories that the data Congress produces have to tell. Following these practices will allow many different users to put the data to hundreds of new uses in government oversight.
At the event, we're releasing informal grades that rate how each of the major parts of the legislative process are published as data. To produce the grades, we constructed a "data model" of formal federal legislative processes (HTML version, Word version).
Data modeling is pretty arcane stuff, but in this model we reduced everything to "entities," each having various "properties." The entities and their properties describe the logical relationships of things in the real world, like members of Congress, votes, bills, and so on. We also loosely defined several "markup types" guiding how documents that come out of the legislative process should be structured and published.
Then we compared the publication practices in the briefing paper to the "entities" in the model. Are data about the key entities in the legislative process well published? That's what we graded on, with a little commentary pointing toward what is good and bad in current publication practices. The grades are listed on this report card, which you can use to cut to the chase, but the real story is in the assessment below.
Are we stating the obvious? Yes. But a little humility and grace is in order. This stuff is tough sledding. The data model isn't the last word, and there are things happening in varied places on and around Capitol Hill to improve matters. Several pieces of the legislative process nobody has ever talked about publishing as data before, so we forgive the fact that this isn't already being done. If things haven't improved in another year, then you might start to see a little more piquant commentary.
Without further ado, here is the full listing of Congress' transparency grades. As far as data publication for transparency, Congress needs improvement.
Publication Practices for Transparent Government: Rating the Congress
How well can the Internet access data about Congress’ work? In consultation with transparency experts, the Cato Institute’s director of information policy studies, Jim Harper, rated how Congress publishes key legislative data in terms of authoritative sourcing, availability, machine-discoverability, and machine-readability.These criteria envision a world where there is one authoritative source for each category of information. Unfortunately, today’s congressional data are published by a lot of sources that have grown up haphazardly. There might even be some sources we don’t know about. Future grades will undoubtedly reflect improvements in what researchers, reporters, websites, and the public at large can see and use, aided by their computers.
House and Senate Membership: B+
How does the public find out about who holds office in the House of Representatives and Senate? A couple of ways.
The Biographical Directory of the United States Congress is a compendium of information about all present and former members of the United States Congress (as well as the Continental Congress), including delegates and resident commissioners. The "Bioguide” website is a great resource for searching out historical information.
But there’s no sign that it’s Congress’ repository of record, and it’s little known by users, giving it low authority marks. Bioguide scores highly on availability—we know of no problems with up-time or completeness (though it could use quicker updating when new members are elected).
Bioguide isn’t structured for discoverability. Most people haven’t seen it, because search engines aren’t finding it. Bioguide does a good thing in terms of machine-readability, though. It assigns a unique ID to each of the people in its database. This is the first, basic step in machine-readability, and the Bioguide ID should probably be the standard for machine-identification of elected officials wherever they are referred to in data. Unfortunately, the biographical content in Bioguide is not machine-readable.
The other ways of learning about House and Senate membership are nothing if not ad hoc. The lists of members that appear on the House and Senate websites are adequate for some purposes. They’re authoritative, available, and discoverable due to their prime location on the top-level House and Senate domains. But the HTML presentation on the House side does not break out key information in ways useful for computers. The Senate includes a link to an XML representation that is machine readable. Good job, Senate.
The rest of the information flows to the public via congressmembers’ individual websites. These are non-authoritative websites that search engine spidering combines to use as a record of the Congress’s membership. They are available and discoverable, again because of that prime house.gov and senate.gov real estate. But they only reveal data about the membership of Congress incidentally to communicating the press releases, photos, and announcements that representatives want to have online.
So far there is no authoritative, really well-published source of information about House and Senate membership, but the variety of sources that exist combine to give Congress a pretty good grade on publishing information about who represents Americans in Washington, D.C.
Committees and Subcommittees: C
If you want to find out about the committees to which Congress delegates much of its work, and the subcommittees to which the work gets further distributed, you might have to form a commit— … a search party.
The Senate has committee names and URLs prominently available on its main website, and the House does too. But that would just be the starting point for researching what all these committees do and who serves on them. For that, you’d go to individual committee websites, each one different from the others.
With the data scattered about this way, the Internet can’t really see it. The Senate has a little known machine-readable listing of its membership and their assignments. More prominence, data such as subcommittees and jurisdiction, and use of a recognized set of standard identifiers would take this resource a long way.
Without a recognized place to go to get data about committees, this area suffers from lacking authority. To the extent there are data, availability is not a problem, but machine-discoverability suffers for having each committee publish distinctly, in formats like HTML, who their members are, who their leaders are, and what their jurisdiction is.
Until committee data are centrally published using standard identifiers (for both committees and their members), machine-readability will be very low. The Internet makes sense of congressional committees as best it can, but a whole lot of organizing and centralizing—with a definitive, always-current, and machine-readable record of committees, their memberships, and their jurisdictions—would create a lot of clarity in this area with a minimum of effort.
Meetings of House, Senate, and Committees---Senate: B+ / House: D+
When the House, the Senate, committees, and subcommittees have their meetings, the business of Congress is being done. Can the public learn easily about what meetings are happening, when, and what they are about? It depends on which side of the Capitol you’re on.
The Senate is pretty good about publishing notices of committee meetings. In addition to a webpage with meeting notices on it, it publishes an XML page with lots of good features, like distinct codes for each committee. (If only we knew whose codes they were, and if only they could be used consistently throughout legislative data….) If a particular bill is under consideration in a Senate committee meeting, this is a way for the public to learn about it. This is authoritative, it’s available, it’s machine-discoverable, and it’s got some machine-readable features. That means any website, researcher, or reporter can quickly use these data to generate more—and more useful—information about Congress.
The House doesn’t have anything similar. To learn about meetings of its committees, you might have to scroll through page after page of committee announcements or calendars. The House can catch up with the Senate in this area, and we are aware that they are working on it. This area is ripe for rapid improvement.
Meeting Records: C-
There is lots of work to do before meeting records can be called transparent. We have one thing, the Congressional Record. It is the authoritative record of what transpires on the House and Senate floors, but nothing similar reveals the content of committee meetings. Those meeting records are produced after much delay—sometimes an incredibly long delay—by the committees themselves. These records are obscure, not being published in ways that make things easy for computers to find and to comprehend.
The Congressional Record also doesn’t have the machine-discoverable publication or machine-readable structure that it could and should. Giving unique, consistent IDs in the Record to members of Congress, to bills, and other regular subjects of this publication would go a long way to improving it. The same would improve transcripts of committee meetings.
Another form of meeting record exists: videos. These have yet to be standardized, organized, and published in a reliable and uniform way, though the HouseLive site is a significant step in the right direction. Real-time flagging of members and key subjects of debate in the video stream would be a great improvement in transparency. Setting video and video meta-data standards for use by both Houses of Congress, by committees, and by subcommittees would improve things dramatically.
Committee Reports: D+
Committee reports are important parts of the legislative process, documenting the findings and recommendations that committees report to the full House and Senate. They do see publication on the most authoritative resource for committee reports, the Library of Congress’s THOMAS system. They are technically machine discoverable, but without good semantic information embedded in them, committee reports are barely visible to the Internet.
Rather than publication in HTML and PDF, committee reports should be published with the full array of signals that reveal what bills, statutes, and agencies they deal with, as well as authorizations and appropriations, so that the Internet can discover and make use of these documents.
Bills are a “pretty-good-news” story in legislative transparency. Most are promptly published. It would be better, of course, if they were all immediately published at the moment they were introduced, and if both the House and Senate published last-minute, omnibus bills before debating and voting on them.
A small gap in authority exists around bills: people look to the Library of Congress rather than Congress or the Government Printing Office, which are better authorities for bill content, but this has not caused any problems. Once published, bill information remains available, which is good.
Publication of bills in HTML on the THOMAS site makes them reasonably machine-discoverable. Witness the fact that searching for a bill will often turn up the version at that source.
Where bills could improve some is machine-readability. Some information such as sponsorship and U.S. code references is present in the bills that are published in XML, and nearly all bills are now published in XML, which is great. Much more information should be published machine-readably in bills, though, such as references to agencies and programs, to states or localities, and so on, referred to using standard identifiers.
With the work that the THOMAS system does to gather information in one place, bill data are good. This is relative to other, less-well-published data, though. There is yet room for improvement.
Amendments---House and Senate: C / Committees: I
Amendments are not the good-news story that bills are. With a few exceptions, amendments are hard to track in any systematic way. When it comes to the House and Senate floors, amendment text is often available, but the authoritative source is different if you want to see the text (GPO) and the status (THOMAS) of an amendment. It is very hard to see how amendments affect the bills they would change.
In committees, the story is quite a bit worse. Committee amendments are almost completely opaque. There is almost no publication of amendments at all—certainly not amendments that have been withdrawn or defeated. Some major revisions in process are due if committee amendments are going to see the light of day as they should.
When the House, the Senate, or a committee is going to take some kind of action, it does so on the basis of a motion. If the public is going to have insight into the decisions Congress makes, it should have access to the motions on which Congress acts.
But motions are something of a black hole. Many of them can be found in the Congressional Record, but it really takes a human who understands procedure reading the Congressional Record to find them. That’s not modern transparency.
Motions can be articulated as data. There are distinct types of motions. Congress can publish which meeting a motion occurs in, when the motion occurs, what the proposition is, what the object of the motion is, and so on. Along with decisions, motions are key elements of the legislative process. They can and should be published as data.
When a motion is pending, a body such as the House, the Senate, or a committee will make a decision on it, often using votes. These decisions are crucial moments in the legislative process, which should be published as data. Like motions, these are not yet published usefully. Decisions made in the House or Senate are published in text form as part of the Congressional Record, but they are not published as data, so they remain opaque to the Internet.
Voting puts members of Congress on record about where they stand. And happily, vote information is in pretty good shape. Each chamber publishes data about votes, meaning authority is well handled. Vote data are available and timely.
Both sides could sorely use an index that lists all votes, though, along with an indication of the last time the vote was modified (i.e., if corrections to the original data have been posted). But both the House and Senate produce vote information in XML, which is useful for computers and the Internet. Both houses also use unique identifiers for their members, though they’re not so good at indicating who those unique IDs refer to. (The House does not have a list-of-people database, and the Senate uses lis_member_ids rather than Bioguide IDs.) Overall, though, voting data are pretty well handled.
Communications (Inter- and Intra-Branch): I
The messages sent among the House, Senate, and Executive Branch are essential parts of the legislative process, but they do not see publication. Putting these communications online—including unique identifiers, the sending and receiving body, any meeting that produced the communication, the text of the communication, and key subjects such as bills—would complete the picture that is available to the public.