Tag: transparency

House Transparency Slated to Improve

Perhaps my mean grading has contributed to nascent competition between the Republican House and the Democratic administration for the transparency prize. Last Friday, the House Administration Committee adopted standards that “require all House legislative documents be published electronically in an open, searchable format on one centralized website.”

At a September Cato Capitol Hill briefing, I rated Congress on the quality of the data it publishes reflecting its membership, activities, documents, and decisions. Its grades weren’t that good. At a briefing last week, I graded the data about federal budgeting, appropriations, and spending, which is largely an executive branch responsibility. Those grades weren’t very good either.

Able and dogged transparency advocate Daniel Schuman at the Sunlight Foundation has a good write-up up the House’s move to produce good data—he and Sunlight certainly did their part to encourage it—though I’ll quibble with one particular. The adoption of the document—a two-page outline of what should be standardized, and not a standards document itself—was not really “a tremendous step into the 21st Century.” It was an outline of a course to improved transparency. 21st-Century transparency.

What is required to produce that transparency? My recent paper “Publication Practices for Transparent Government” sought to establish guideposts for publication of data that will foster public access to meaningful information about what happens in Washington, D.C. The practices, in ascending order of importance and difficulty, are: authority, availability, machine-discoverability, and machine-readability.

Putting all documents on a single site will enhance authority. People will know where to look, and what source to trust. In our rough grading system, we weighted the simple practice of authoritative publishing at 10% of the total grade.

The second practice, availability, means ensuring that the data is complete, that it remains permanently in the same location, that it is not proprietary itself, and that it is not in a proprietary format. This is likely to be fulfilled by adherence to the Committee’s language and basic good practices. Availability we weighted at 20% of the total grade.

Machine-discoverability is when data is identified and located consistent with a variety of good practices going to the naming and locating of Internet resources. It’s weighted at 30% of the total grade in our system for rating data publication. It is likely that the House will develop good practices, but it will be important to watch and see that it does.

Machine-readability is the most important part of transparency. It means publishing data so that the logical relationships among elements are clear, and so that computers can automatically detect the semantic meaning of the documents and data they examine.

This is where the House Administration Committee’s release is least clear. Documents like bills and committee reports could be published so that each reference to existing law, to federal agencies, bureaus, and programs, to newly authorized spending, and to a variety of other items and entities are automatically discoverable in the document.

You should be able to do a quick search, rather than labor for hours, to see what bills affect the Labor Department. You should be able to see every dollar authorized or appropriated in every bill, nearly instantly. The data should be a foundation for dozens of sites and services that disseminate iformation in different ways to different audiences.

Here’s hoping that the House Administration Committee’s standards drive all the way to machine-readability. It will be a step into the 21st century if the House provides data the Internet can use and that the Internet-connected public very much wants to see.

Coming through with robust machine-readability will handily take the transparency mantle from President Obama, who promised transparency as a campaigner, but who was not produced the vibrant, different government people wanted. As I noted in a write-up last week, the administration has some low-hanging transparency fruit that could bring its grades up decisively. House Republicans are first out of the gate.

The DATA Act and Cato’s Transparency Work

In his final “Chairman’s Corner” blog post as head of the White House’s Recovery Act Transparency and Accountability Board, Earl Devaney highlights the need for orderly publication of data about government spending.

There is bi-partisan legislation now in the Congress—it’s called the Digital Accountability and Transparency Act, or DATA Act—that could accomplish this mission. But the reform bill faces an uphill battle, primarily because some in the bureaucracy prefer the status quo—a hodgepodge of data collection and display sites that, frankly, makes no sense at all unless you believe your government should confuse you.

The DATA Act would establish an independent board within the executive branch to track federal spending, and it would require federal agencies and recipients of federal funds to comply with reporting requirements set up by the board.

The board would “designate common data elements, such as codes, identifiers, and fields, for information required to be reported by recipients or agencies” (section 102 of the reported version, adding a new §3611 to title 31 of the U.S. code). The bill’s author, Rep. Darrell Issa (R-CA), spoke at our September Capitol Hill briefing, rolling out our legislative data model.

On Wednesday, another Cato Capitol Hill briefing highlighted the results of our work the last few months to model federal budgeting, appropriating, and spending. Should the DATA Act become law, the model we’ve been working on can illuminate the work of the proposed board. Use of our model will help ensure that the structure of government spending data supports public oversight use cases.

I don’t know that there needs to be a board—certainly not a permanent one. The bill authorizes more money than I think is required for the board, and the Congressional Budget Office’s cost estimate for implementing the requirements of the DATA Act seems wildly high. But the dynamics set in motion by making government spending more transparent may well reduce government spending by well more than even these high estimated costs.

Government Spending Transparency: ‘Needs Improvement’ Is Understatement

Back in September, I rated Congress on how well it is publishing information about its deliberations and decisions. “Needs Improvement” was the understated theme.

Now we’re looking at the government’s publication of data that reflects budgeting, appropriations, and spending. “Needs improvement” isn’t just understated in this area. It’s really, really understated.

On the budgeting, appropriations, and spending transparency report card I’m putting out today, B+ is the best grade—and it goes to just half of one subject area. There are 2.5 Cs, 3 Ds, and 4 incompletes. This area needs improvement.

What is transparency, anyway? In my briefing paper, “Publication Practices for Transparent Government,” I wrote about the publication practices that support transparency. They are: authority, availability, machine-discoverability, and machine-readability. That means putting good data out from a consistent source in sensible ways, and, especially, structuring the data so that computers can interpret it.

You know what the World Wide Web is? It’s a whole bunch of structured data. If you want the kind of breakthrough in transparency for government data that the Web was for communications, you want the data structured right.

Our draft structure for data in this area is in our “Conceptual Data Model of the U.S. Federal Government Budgetary Process.” (HTML version, Word version)

Structured data doesn’t really exist yet in the area of budgeting, appropriating, and spending. The one bright spot is the president’s annual budget submission, which includes some information in a workable structure, but there is much room for improvement even there.

Because I’m so nice, I’ve given a lot of “incompletes” where I could have—and some say should have—given Fs. Believe it or not, there is NO federal government “organization chart” that is published in a way computers can use. That’s one of the building blocks of computerized oversight, and its absence is easily rectified.

When we return to these issues in the summer or fall of next year, and review more formally how Congress and the administration have done on transparency, I expect these things to be fixed. (Fear the blog post!)

In the meantime, here’s a run-down of the grades and why they were given. A Hill briefing today might be available online at the page for the event. (It’s somewhat symbolic that the room we have on Capitol Hill is ill-equipped for live-streaming, but we’re going to try.)

I’ve alternated in this post between “I” and “we” because I’ve gotten so much help on this. People from OMB Watch, the National Priorities Project, and the Sunlight Foundation have helped a great deal with this project, to name a few—and omit many others! The grades, the commentary, the errors, the misstatements, and omissions are all mine. And there are going to be plenty of gaps in this work. That’s why this is a blog post and not a formal Cato publication.


Publication Practices for Transparent Government: Budgeting, Appropriations, and Spending

How well can the Internet access data about the federal government’s budgeting, appropriating, and spending? In consultation with transparency experts, the Cato Institute’s director of information policy studies, Jim Harper, rated how Congress and the administration publish key spending-cycle data in terms of authoritative sourcing, availability, machine-discoverability, and machine-readability.

These criteria envision a world where there is one authoritative source for each category of information. Unfortunately, what spending data there is appears in a lot of sources that have grown up haphazardly. There might even be some sources we don’t know about. Future grades will undoubtedly reflect improvements in what researchers, reporters, websites, and the public at large can see and use, aided by their computers.

Agencies: I

Federal agencies are the “agents” of Congress and the president. They carry out federal policy and spending decisions. Accordingly, one of the building blocks of data about spending is going to be a definitive list of the organizational units that do the spending.

Is there such a list? Yes! It’s Appendix C of OMB Circular A-11, “Listing of OMB Agency/Bureau and Treasury Codes.” But this list is a PDF document that is found on the Office of Management and Budget website.

Believe it or not, there is NO federal government “organization chart” that is published in a way amenable to computer processing!

There are distinct identifiers for agencies in both the Treasury Department and the Office of Management and Budget. Either of these could be published as the executive branch’s definitive list of its agencies. This fruit is hanging so low that a gopher could snack on it without leaving its hole, but nobody seems to have thought of publishing data about the basic units of the executive branch online in a machine-discoverable and machine-readable format.

A pathological excess of generosity spurs us to give this category an “incomplete” rather than a straight F. We expect improvement in publication of this data, pronto.

Bureaus: I

The sub-units of agencies are bureaus, and the same situation applies to data about the offices where the work of agencies get divided up. Bureaus have identifiers. It’s just that nobody publishes a list of bureaus, their parent agencies, and other key information for the Internet-connected public to use in coordinating its oversight.

Again, an “incomplete” in this area will quickly convert to an F if this gap in data publication is not soon rectified.

Programs: I

The work of the government is parceled out for actual execution in programs. Like information about their parental units, the agencies and bureaus, data that identifies and distinguishes programs is not comprehensively published.

There is some information about programs available in usable forms. The Catalog of Federal Domestic Assistance website (www.cfda.gov) has useful aggregation of some information on programs, but the canonical guide to government programs, along with the bureaus and agencies that run them does not exist.

This is a little bit heavier a lift than agencies and bureaus—the number of programs exceeds the number of bureaus by something like an order of magnitude (much as the number of bureaus exceeds the number of agencies). And it might be that some programs have more than one agency/bureau parent. But today’s powerful computers can keep track of these things—they can count pretty high! And the government should figure out all the programs it has, keep that list up to date, and publish it for public consumption.

Until it does, the program category gets an “incomplete” and the threat of a future F. (Or maybe a D thanks to the CFDA.)

Projects: D-

Projects are where the rubber hits the road. These are the organizational vehicles the government uses to enter into contracts and create other obligations that deliver on government services.

Some project information gets published—we finally have an item that is not incomplete—but the publication is so bad that we give this area a low grade indeed.

Information about projects can be found. You can search for projects by name on USASpending.gov, and descriptions of projects appear in USASpending/FAADS downloads. (“FAADS” is the Federal Assistance Award Data System), but there is no canonical list of projects that we could find. There should be, and there should have been for a long time now.

The generosity and patience we showed with respect to agencies, budgets, and programs has run out. There’s more than nothing here, but programs get a D-.

Budget Documents — Congress: D / White House: B+

The president’s annual budget submission and the congressional budget resolutions are the planning documents that the president and Congress use to map the direction of government spending each year. These documents are published authoritatively, and they are consistently available, which is good. They are kind-of machine-discoverable, but they are not terribly machine-readable.

The appendices to the president’s budget are published in XML format, which vastly reduces the time it takes to work with the data in them. That’s really good. But the congressional budget resolutions have no similar organization, and there is low correspondence between the budget resolutions that Congress puts out and the budget the president puts out. You would think that a person—or better yet, a computer—should be able to lay these documents side by side for comparison, but you can’t.

For its use of XML, the White House gets a B+. Congress gets a flat D.

Budget Authority—Congress: C- / Executive Branch: D

“Budget authority” is a term of art for what probably should be called “spending authority.” It’s the power to spend money, created when Congress and the president pass a law containing such authority.

Proposed budget authority is pretty darn opaque. The bills in Congress that contain proposed budget authority are consistently published online—that’s good—but they don’t highlight budget authority in machine readable ways. No computer can figure out how much budget authority is out there in pending legislation.

Existing budget authority is pretty well documented in the Treasury Department’s FAST book (Federal Account Symbols and Titles). This handy resource lists Treasury accounts and the statutes and laws that provide their budget authority. The FAST book is not terrible, but the only form we’ve found it in is PDF. PDF is terrible.

Congress can do a lot better, but because some of the publication basics are there, we give it a C-. The administration gets a D for publishing the obscure FAST book in PDF.

Ideally, there would be a nice, neat connection from budget authority right down to every outlay of funds, and back up again from every outlay to its budget authority. These connections, published online in useful ways, would allow public oversight to blossom.

Warrants, Apportionments, and Allocations: I

After Congress and the president create budget authority, that authority gets divvied up to different agencies, bureaus, programs and projects. How well documented are these processes? Not well.

An appropriation warrant is an assignment of funds by the Treasury to a treasury account to serve a particular budget authority. It’s the indication that there is money in an account for an agency to obligate and then spend.

Where is warrant data? We can’t find it. Given Treasury’s thoroughness, it probably exists, but it’s just not out there for public consumption. We’ve again generously given this area an “incomplete.”

An apportionment is an instruction from the Office of Management and Budget to an agency about how much it may spend from a treasury account in service of given budget authority in a given period of time.

We haven’t seen any data about this, and we’re less sure that there is some. There should be. And we should get to see it. Incomplete.

An allocation is a similar division of budget authority by an agency into programs or projects. We don’t see any data on this either. And we should. Incomplete.

Step up, Executive Branch, or we’ll convert these incompletes to very low grades, indeed…

Obligations: C+

Obligations are the commitments to spend money into which government agencies enter. Things like contracts to buy pens, hiring of people to write with those pens, and much, much more.

There are several different data sources that reveal obligations: FAADS/FAADS+ and CFDA, for example. But their numbers don’t match up, and—unless you’re going to have each agency uniformly publish its own data—obligations shouldn’t be published in different places. It’s hard to consider either one authoritative (even if the law says they both are). FAADS+/FPDS (via USASpending.gov), CFDA, and FPDS (the Federal Procurement Data System) are online and stable, but they are potentially incomplete because not all agencies may report to them. The use of proprietary DUNS numbers also weakens them in terms of availability.

Just sorting through all the acronyms can get you down. Ask data experts to get into the quality of each data source, and you’ll be boggled by the questions regarding which agencies’ obligations are reported at which source, whether given sources dumb down the data by excluding small dollar amounts or by aggregating data about smaller agencies. Some sources are more timely than others. Etc. etc. etc.

All these issues frustrate transparency. Data about obligations is not clean, complete and well documented. The ideal is to have one source of obligation data that combines the strengths of all the existing sources and that includes every agency, bureau, program, and project. With a decent amount of data out there, though, useful for experts, this category gets a C+.

Parties: D+

Of course, you want to know where the money is going. That is what we’re calling the “parties” category. (“Parties” sounds kinda fun, don’t it?)

Right now, reporting on parties is dominated by the DUNS number. That’s the Data Universal Numbering System, which provides a unique identifier for each business entity. It was developed by Dun & Bradstreet in the 1960s. It’s very nice to have a distinct identifier for every entity doing business with the government, but it is not very nice to have the numbering system be a proprietary one.

Parties would grade well in terms of machine-readability, which is one of the most important measures of but because it scores so low on availability, its machine-readability is kind of moot. Until the government moves to an open identifier system for recipients of funds, it will get weak grades on publication of this essential data.

Outlays: C-

For a lot of folks, the big kahuna is knowing where the money goes: outlays. An outlay—literally, the laying out of funds—satisfies an obligation. It’s the movement of money from the U.S. Treasury to the outside world.

Outlay numbers are fairly well reported after the fact and in the aggregate. All you have to do is look at the appendices to the president’s budget to see how much money has been spent in the past.

But outlay data can be much, much more detailed and timely than that. Each outlay goes to a particular party. Each outlay is done on a particular project or program at the behest of a particular bureau and agency. And each outlay occurs because of a particular budget authority. Right now these details about outlays are nowhere to be found.

Now, there are plenty of people inside the government who are very familiar with the movement of taxpayer money in the government. They will be inclined to say, “it’s more complicated than that,” and it is! But it’s going to have to get quite a bit less complicated before these processes can be called transparent.

The time do de-complicate outlays is now. It’s another feat of generosity to give this area a C-. That’s simply because there is an authoritative source for aggregate past outlay data. As the grades other areas come up, outlay data that stays the same could go down. Waaaayyy down.

Transparency and Its Discontents

Remember when you had to wait until the end of the month to see your bank statement?

Last week, on the cusp of failing to pass any annual appropriations bills ahead of the October 1 start of the new fiscal year, congressional leaders came up with a short-term government funding bill (or “continuing resolution”) that would fund the government until November 18th. For whatever reason, that deal (H.R. 2608) wasn’t ready to go before the end of the week, so Congress passed an even shorter-term continuing resolution (H.R. 2017) that funds the government until tomorrow, October 4th.

Every weekend, I hunch over my computer and update key records in the database of WashingtonWatch.com, a government transparency website I run as a non-partisan, non-ideological resource (disclosure: it’s my own, not a Cato project). Then I put a summary of what’s going on into an email like this one (subscribe!) that goes out to 7,000 or so of my closest friends.

Last weekend, the Library of Congress’ THOMAS website, which is one of my resources, was down a good chunk of the time for maintenance. Even after it came up again, some materials such as bill text and committee reports weren’t available. (They had come up by the wee hours this morning.) Maintenance is necessary sometimes, though when the service provider I use for the WashingtonWatch.com email does maintenance, it’s usually for an hour or so in the middle of a weekend night.

But when I went to update the database to reflect last week’s passage of H.R. 2017, I could find no record of its public law number. When a bill becomes a law, it gets a public law number starting with the number of the Congress that passed and then a sequential number, like Public Law No. 112-29. The Government Printing Office’s FDsys system lets you browse public laws. At this writing, it isn’t updated to reflect the passage of new laws last week. When THOMAS came back up, its public laws page also had no data to reflect the passage of that continuing resolution last week (and still doesn’t, also at this writing).

There is barely any news reporting on humdrum details about governing like the passage of a law expending $40 billion in taxpayer funds. (That’s about what H.R. 2017 spends to operate the government four more days, roughly $400 per U.S. family.) Where can you confirm with an official source that this happened?

The winning data resource this week, if by default, is Whitehouse.gov, which has a page dedicated to laws the president has signed. That page says that President Obama signed four new laws on Friday (Sept. 30). When might FDsys or THOMAS reflect this information? It’ll happen soon, and that data will start to propagate out to society.

But I think that’s not soon enough. A couple of days’ delay is a big deal.

If I were to take $400 in cash out of my bank account at an ATM, I could review that transaction from that instant forward on my bank’s website. If I had a concern or even a passing interest, I could just go look. That is an utterly unremarkable service in this day and age.

But it’s remarkable that such a service doesn’t exist in systems that are as important as our bank accounts. When Congress and the president pass a bill to spend $40 billion dollars, the fact of its passage is pretty much undocumented by any official sources until enough Mon-Fri, 9-to-5 work hours have passed.

In my recently published paper, Publication Practices for Transparent Government, I go through the things the government should do to make itself more transparent (thus improving public oversight and producing lots of felicitous outcomes). A practice I cite is “real-time or near-real-time publication.” Why? Because then any of the 300 million Americans who have an interest, real or passing, can see what is happening with their money as it happens, just like they can with their bank holdings. People like me (and many more) can propagate complete and timely information, making it that much more accessible.

When you’re talking about a potential audience of 200 million people and $40 billion in expense (one of the tiniest spending bills—others are much larger), it is not too much to ask to have the data published in real time.

I don’t expect a lot of people to join me at the barricades with pitchforks and torches on this one. Government transparency is an area ruled by implicit demand. People don’t know what they are missing, so they don’t know to suffer a sense of deprivation. I do that for them—all of them. (Heroic, idn’t it?)

Before too long, though, the government’s opacity will be recognized as a contributor to the public’s general—and strong—distaste for all that goes on in Washington, D.C. The idea of spending $400 per U.S. family without documenting every detail of it on the Internet will seem as absurd as waiting until the end of the month to see what happened in your bank account.

A ‘Soviet-Style Power-Grab,’ to Squelch Bad Press for ObamaCare

The Department of Health and Human Services has released new guidelines on communications between department employees and the media.  The guidelines evidently require all communications to be approved by the Assistant Secretary for Public Affairs.  Also: no off-the-record communications.

The media are not happy.  The editor of FDA Webview & FDA Review writes (via Poynter; more here):

The new formal HHS Guidelines on the Provision of Information to the News Media represent, to this 36-year veteran of reporting FDA news, a Soviet-style power-grab. By requiring all HHS employees to arrange their information-sharing with news media through their agency press office, HHS has formalized a creeping information-control mechanism that informally began during the Clinton Administration and was accelerated by the Bush and Obama administrations. The U.S. now takes a large step toward joining other information-controlling countries like my native Australia, where government employees who talk with the news media without permission commit a federal crime. I came to the U.S. in 1974 to escape this oppression.

The HHS guidelines once again show that the purpose of a public information office is not to disseminate information to the public but to withhold information from the public.

Since this came on the heels of an HHS official announcing that the agency is scuttling ObamaCare’s long-term care entitlement, a.k.a. the “CLASS Act,” one wonders if there is a connection.  Or maybe HHS is just motivated by a general fear that the more the public learns about ObamaCare, the less we will like it.

(Update: Turns out, HHS released their new guidelines the same day that agency official voiced his opinion about the future of the CLASS Act. HT: Chris Jacobs.)

Congress on Transparency: ‘Needs Improvement’

“Needs improvement” is the understated theme of a Capitol Hill briefing this morning entitled “Publication Practices for Transparent Government: Rating the Congress.” (Live-streamed starting at 9:00 am. If timely, check it out—the video will come up before too long also—and join the conversation on Twitter at the #RateCongress hashtag.)

Congress needs to improve its data publication practices if it’s going to be the transparent legislature that it should be.

How did we arrive at this conclusion? We’re doing more than stating the obvious.

A Cato Briefing Paper released today entitled “Publication Practices for Transparent Government” goes through some technically challenging but essential concepts in data publication: authoritative sourcing, availability, machine-discoverability, and machine-readability. Together, these practices will allow computers to automatically generate the myriad stories that the data Congress produces have to tell. Following these practices will allow many different users to put the data to hundreds of new uses in government oversight.

At the event, we’re releasing informal grades that rate how each of the major parts of the legislative process are published as data. To produce the grades, we constructed a “data model” of formal federal legislative processes (HTML version, Word version).

Data modeling is pretty arcane stuff, but in this model we reduced everything to “entities,” each having various “properties.” The entities and their properties describe the logical relationships of things in the real world, like members of Congress, votes, bills, and so on. We also loosely defined several “markup types” guiding how documents that come out of the legislative process should be structured and published.

Then we compared the publication practices in the briefing paper to the “entities” in the model. Are data about the key entities in the legislative process well published? That’s what we graded on, with a little commentary pointing toward what is good and bad in current publication practices. The grades are listed on this report card, which you can use to cut to the chase, but the real story is in the assessment below.

Are we stating the obvious? Yes. But a little humility and grace is in order. This stuff is tough sledding. The data model isn’t the last word, and there are things happening in varied places on and around Capitol Hill to improve matters. Several pieces of the legislative process nobody has ever talked about publishing as data before, so we forgive the fact that this isn’t already being done. If things haven’t improved in another year, then you might start to see a little more piquant commentary.

Without further ado, here is the full listing of Congress’ transparency grades. As far as data publication for transparency, Congress needs improvement.

Publication Practices for Transparent Government: Rating the Congress

 

How well can the Internet access data about Congress’ work? In consultation with transparency experts, the Cato Institute’s director of information policy studies, Jim Harper, rated how Congress publishes key legislative data in terms of authoritative sourcing, availability, machine-discoverability, and machine-readability.
These criteria envision a world where there is one authoritative source for each category of information. Unfortunately, today’s congressional data are published by a lot of sources that have grown up haphazardly. There might even be some sources we don’t know about. Future grades will undoubtedly reflect improvements in what researchers, reporters, websites, and the public at large can see and use, aided by their computers.

House and Senate Membership: B+

How does the public find out about who holds office in the House of Representatives and Senate? A couple of ways.

The Biographical Directory of the United States Congress is a compendium of information about all present and former members of the United States Congress (as well as the Continental Congress), including delegates and resident commissioners. The “Bioguide” website is a great resource for searching out historical information.

But there’s no sign that it’s Congress’ repository of record, and it’s little known by users, giving it low authority marks. Bioguide scores highly on availability—we know of no problems with up-time or completeness (though it could use quicker updating when new members are elected).

Bioguide isn’t structured for discoverability. Most people haven’t seen it, because search engines aren’t finding it. Bioguide does a good thing in terms of machine-readability, though. It assigns a unique ID to each of the people in its database. This is the first, basic step in machine-readability, and the Bioguide ID should probably be the standard for machine-identification of elected officials wherever they are referred to in data. Unfortunately, the biographical content in Bioguide is not machine-readable.

The other ways of learning about House and Senate membership are nothing if not ad hoc. The lists of members that appear on the House and Senate websites are adequate for some purposes. They’re authoritative, available, and discoverable due to their prime location on the top-level House and Senate domains. But the HTML presentation on the House side does not break out key information in ways useful for computers. The Senate includes a link to an XML representation that is machine readable. Good job, Senate.

The rest of the information flows to the public via congressmembers’ individual websites. These are non-authoritative websites that search engine spidering combines to use as a record of the Congress’s membership. They are available and discoverable, again because of that prime house.gov and senate.gov real estate. But they only reveal data about the membership of Congress incidentally to communicating the press releases, photos, and announcements that representatives want to have online.

So far there is no authoritative, really well-published source of information about House and Senate membership, but the variety of sources that exist combine to give Congress a pretty good grade on publishing information about who represents Americans in Washington, D.C.

Committees and Subcommittees: C

If you want to find out about the committees to which Congress delegates much of its work, and the subcommittees to which the work gets further distributed, you might have to form a commit— … a search party.

The Senate has committee names and URLs prominently available on its main website, and the House does too. But that would just be the starting point for researching what all these committees do and who serves on them. For that, you’d go to individual committee websites, each one different from the others.

With the data scattered about this way, the Internet can’t really see it. The Senate has a little known machine-readable listing of its membership and their assignments. More prominence, data such as subcommittees and jurisdiction, and use of a recognized set of standard identifiers would take this resource a long way.

Without a recognized place to go to get data about committees, this area suffers from lacking authority. To the extent there are data, availability is not a problem, but machine-discoverability suffers for having each committee publish distinctly, in formats like HTML, who their members are, who their leaders are, and what their jurisdiction is.

Until committee data are centrally published using standard identifiers (for both committees and their members), machine-readability will be very low. The Internet makes sense of congressional committees as best it can, but a whole lot of organizing and centralizing—with a definitive, always-current, and machine-readable record of committees, their memberships, and their jurisdictions—would create a lot of clarity in this area with a minimum of effort.

Meetings of House, Senate, and Committees—Senate: B+ / House: D+

When the House, the Senate, committees, and subcommittees have their meetings, the business of Congress is being done. Can the public learn easily about what meetings are happening, when, and what they are about? It depends on which side of the Capitol you’re on.

The Senate is pretty good about publishing notices of committee meetings. In addition to a webpage with meeting notices on it, it publishes an XML page with lots of good features, like distinct codes for each committee. (If only we knew whose codes they were, and if only they could be used consistently throughout legislative data….) If a particular bill is under consideration in a Senate committee meeting, this is a way for the public to learn about it. This is authoritative, it’s available, it’s machine-discoverable, and it’s got some machine-readable features. That means any website, researcher, or reporter can quickly use these data to generate more—and more useful—information about Congress.

The House doesn’t have anything similar. To learn about meetings of its committees, you might have to scroll through page after page of committee announcements or calendars. The House can catch up with the Senate in this area, and we are aware that they are working on it. This area is ripe for rapid improvement.

Meeting Records: C-

There is lots of work to do before meeting records can be called transparent. We have one thing, the Congressional Record. It is the authoritative record of what transpires on the House and Senate floors, but nothing similar reveals the content of committee meetings. Those meeting records are produced after much delay—sometimes an incredibly long delay—by the committees themselves. These records are obscure, not being published in ways that make things easy for computers to find and to comprehend.

The Congressional Record also doesn’t have the machine-discoverable publication or machine-readable structure that it could and should. Giving unique, consistent IDs in the Record to members of Congress, to bills, and other regular subjects of this publication would go a long way to improving it. The same would improve transcripts of committee meetings.

Another form of meeting record exists: videos. These have yet to be standardized, organized, and published in a reliable and uniform way, though the HouseLive site is a significant step in the right direction. Real-time flagging of members and key subjects of debate in the video stream would be a great improvement in transparency. Setting video and video meta-data standards for use by both Houses of Congress, by committees, and by subcommittees would improve things dramatically.

Committee Reports: D+

Committee reports are important parts of the legislative process, documenting the findings and recommendations that committees report to the full House and Senate. They do see publication on the most authoritative resource for committee reports, the Library of Congress’s THOMAS system. They are technically machine discoverable, but without good semantic information embedded in them, committee reports are barely visible to the Internet.

Rather than publication in HTML and PDF, committee reports should be published with the full array of signals that reveal what bills, statutes, and agencies they deal with, as well as authorizations and appropriations, so that the Internet can discover and make use of these documents.

Bills: A-

Bills are a “pretty-good-news” story in legislative transparency. Most are promptly published. It would be better, of course, if they were all immediately published at the moment they were introduced, and if both the House and Senate published last-minute, omnibus bills before debating and voting on them.

A small gap in authority exists around bills: people look to the Library of Congress rather than Congress or the Government Printing Office, which are better authorities for bill content, but this has not caused any problems. Once published, bill information remains available, which is good.

Publication of bills in HTML on the THOMAS site makes them reasonably machine-discoverable. Witness the fact that searching for a bill will often turn up the version at that source.

Where bills could improve some is machine-readability. Some information such as sponsorship and U.S. code references is present in the bills that are published in XML, and nearly all bills are now published in XML, which is great. Much more information should be published machine-readably in bills, though, such as references to agencies and programs, to states or localities, and so on, referred to using standard identifiers.

With the work that the THOMAS system does to gather information in one place, bill data are good. This is relative to other, less-well-published data, though. There is yet room for improvement.

Amendments—House and Senate: C / Committees: I

Amendments are not the good-news story that bills are. With a few exceptions, amendments are hard to track in any systematic way. When it comes to the House and Senate floors, amendment text is often available, but the authoritative source is different if you want to see the text (GPO) and the status (THOMAS) of an amendment. It is very hard to see how amendments affect the bills they would change.

In committees, the story is quite a bit worse. Committee amendments are almost completely opaque. There is almost no publication of amendments at all—certainly not amendments that have been withdrawn or defeated. Some major revisions in process are due if committee amendments are going to see the light of day as they should.

Motions: I

When the House, the Senate, or a committee is going to take some kind of action, it does so on the basis of a motion. If the public is going to have insight into the decisions Congress makes, it should have access to the motions on which Congress acts.

But motions are something of a black hole. Many of them can be found in the Congressional Record, but it really takes a human who understands procedure reading the Congressional Record to find them. That’s not modern transparency.

Motions can be articulated as data. There are distinct types of motions. Congress can publish which meeting a motion occurs in, when the motion occurs, what the proposition is, what the object of the motion is, and so on. Along with decisions, motions are key elements of the legislative process. They can and should be published as data.

Decisions: I

When a motion is pending, a body such as the House, the Senate, or a committee will make a decision on it, often using votes. These decisions are crucial moments in the legislative process, which should be published as data. Like motions, these are not yet published usefully. Decisions made in the House or Senate are published in text form as part of the Congressional Record, but they are not published as data, so they remain opaque to the Internet.

Votes: A-

Voting puts members of Congress on record about where they stand. And happily, vote information is in pretty good shape. Each chamber publishes data about votes, meaning authority is well handled. Vote data are available and timely.

Both sides could sorely use an index that lists all votes, though, along with an indication of the last time the vote was modified (i.e., if corrections to the original data have been posted). But both the House and Senate produce vote information in XML, which is useful for computers and the Internet. Both houses also use unique identifiers for their members, though they’re not so good at indicating who those unique IDs refer to. (The House does not have a list-of-people database, and the Senate uses lis_member_ids rather than Bioguide IDs.) Overall, though, voting data are pretty well handled.

Communications (Inter- and Intra-Branch): I

The messages sent among the House, Senate, and Executive Branch are essential parts of the legislative process, but they do not see publication. Putting these communications online—including unique identifiers, the sending and receiving body, any meeting that produced the communication, the text of the communication, and key subjects such as bills—would complete the picture that is available to the public.

Rating Congress on Transparency

Tomorrow morning, I’ll be officially releasing a paper entitled “Publication Practices for Transparent Government” at a Hill briefing entitled “Publication Practices for Transparent Government: Rating the Congress.”

If you’re a smart and savvy Internet user, you probably noticed that the paper is there at the first link above, unofficially released just for you. This qualifies you to read it and get some of the fascinating and different technical aspects of transparency.

This is all a teaser for our release tomorrow of “grades” on how Congress is doing with publishing data about the essential parts of its legislative work. For that, you’ll have to attend the event or watch it live-streamed (here, commencing at 9:00 Eastern with remarks from House Oversight and Government Reform Committee Chairman Darrell Issa (R-CA)).

If you like transparency—and chances are you do—you can help spur discussion tomorrow (or even today) using the hashtag #RateCongress, along with, of course, #transparency. (Don’t know what a hashtag is? Well, here’s a little help.)

Despite good faith efforts on the part of the Obama administration and congressional leaders, government transparency hasn’t flourished as it could the last few years. The paper, event, and “report card” are intended to spur progress on that front.

Transparency is interesting not only technically and administratively, but ideologically. Libertarians and conservatives believe it will expose waste and corruption, fomenting downward pressure on the size and scope of government. Liberals and progressives believe transparency will expose waste and corruption, validating many government programs and roles.

I say let’s get on with exposing waste and corruption, so we can find out what happens next!