Tag: transparency

SOPA/PIPA: Harbinger or Aberration?

He’s not unrestrained, but Larry Downes sees the remarkable downfall of legislation to regulate the Internet’s engineering as a harbinger of things to come. Jerry Brito, meanwhile, tells us “Why We Won’t See Many Protests like the SOPA Blackout.”

They’re both right—over different time-horizons. The information environment and economics of political organization today are still quite stacked against public participation in our unwieldy federal government. But in time this will change. Congress and Washington, D.C.’s advocacy and lobbying groups now have some idea what the future will feel like.

There’s No Machine-Readable Government Org Chart

At a recent Cato event on transparency, I emphasized that there is no federal government “organization chart” published in a way computers can use.

Here’s what I mean:

Appendix C of the Office of Management and Budget’s Circular A-11 is the White House’s definitive public listing of agencies and bureaus, along with their OMB and Treasury codes—unique identifiers for the agencies and bureaus of the federal government.

First problem: It’s a PDF document. To be computer-usable this should be represented in digital form as a lookup table.

But beyond that, it doesn’t follow a coherent organization. There’s an agency code (“200”) called “Other Defense Civil Programs,” for example. There’s obviously no agency called “Other Defense Civil Programs.” That’s a catch-all description, not an agency.

With most agencies, the bureau codes refer to bureaus, such as the Bureau of Land Management (bureau code: “04”) in the Department of the Interior (agency code: “010”), but with respect to the Department of Defense (agency code: “007”), the bureau codes become functional descriptions such as “Military Personnel” (“05”). There is no bureau in the Department of Defense called “Military Personnel.”

Even the most basic organizational information is a hash, and it’s published in PDF, unusable for computer-assisted oversight of the government!

The House appears committed to improving its publication practices. If the administration wants to advance the ball on transparency for its part, it will begin to publish coherent information—starting with basic information about the organization of the executive branch—in machine-readable form, using standardized identifiers. An edict from OMB to harmonize on identifiers down to the program level could be implemented in months, if not weeks.

My recent paper “Publication Practices for Transparent Government” talks about what to do. Our data model for budgeting, appropriating, and spending articulates how government agencies, bureaus, programs, and projects—and the relationships among them—should be represented.

Why Data Transparency?

At a recent Capitol Hill briefing on government transparency, I made an effort to describe the importance of getting data from the government reflecting its deliberations, management, and results.

I analogized to the World Wide Web. The structure that allows you to find and then view a blog post as a blog post is called hypertext markup language, or html. HTML is what made the Internet into the huge, rollicking information machine you see today. Think of the darkness we lived in before we had it.

Government information is not yet published in useable formats—as data—for the public to use as it sees fit. We need government information published as data, so we can connect it in new ways, the way the World Wide Web allowed connections among documents, images, and sounds.

And when you connect data together, you get power in a way that doesn’t happen with the web, with documents. You get this really huge power out of it.

Tim Berners-Lee was not thinking of wresting power from government when he said that, but the inventor of the World Web does a better job than I could of arguing for getting data and making it available for any use. We’ll look back on today with bemusement and surprise at the paucity of information we had about our government’s activities and expenditures.

House Transparency Slated to Improve

Perhaps my mean grading has contributed to nascent competition between the Republican House and the Democratic administration for the transparency prize. Last Friday, the House Administration Committee adopted standards that “require all House legislative documents be published electronically in an open, searchable format on one centralized website.”

At a September Cato Capitol Hill briefing, I rated Congress on the quality of the data it publishes reflecting its membership, activities, documents, and decisions. Its grades weren’t that good. At a briefing last week, I graded the data about federal budgeting, appropriations, and spending, which is largely an executive branch responsibility. Those grades weren’t very good either.

Able and dogged transparency advocate Daniel Schuman at the Sunlight Foundation has a good write-up up the House’s move to produce good data—he and Sunlight certainly did their part to encourage it—though I’ll quibble with one particular. The adoption of the document—a two-page outline of what should be standardized, and not a standards document itself—was not really “a tremendous step into the 21st Century.” It was an outline of a course to improved transparency. 21st-Century transparency.

What is required to produce that transparency? My recent paper “Publication Practices for Transparent Government” sought to establish guideposts for publication of data that will foster public access to meaningful information about what happens in Washington, D.C. The practices, in ascending order of importance and difficulty, are: authority, availability, machine-discoverability, and machine-readability.

Putting all documents on a single site will enhance authority. People will know where to look, and what source to trust. In our rough grading system, we weighted the simple practice of authoritative publishing at 10% of the total grade.

The second practice, availability, means ensuring that the data is complete, that it remains permanently in the same location, that it is not proprietary itself, and that it is not in a proprietary format. This is likely to be fulfilled by adherence to the Committee’s language and basic good practices. Availability we weighted at 20% of the total grade.

Machine-discoverability is when data is identified and located consistent with a variety of good practices going to the naming and locating of Internet resources. It’s weighted at 30% of the total grade in our system for rating data publication. It is likely that the House will develop good practices, but it will be important to watch and see that it does.

Machine-readability is the most important part of transparency. It means publishing data so that the logical relationships among elements are clear, and so that computers can automatically detect the semantic meaning of the documents and data they examine.

This is where the House Administration Committee’s release is least clear. Documents like bills and committee reports could be published so that each reference to existing law, to federal agencies, bureaus, and programs, to newly authorized spending, and to a variety of other items and entities are automatically discoverable in the document.

You should be able to do a quick search, rather than labor for hours, to see what bills affect the Labor Department. You should be able to see every dollar authorized or appropriated in every bill, nearly instantly. The data should be a foundation for dozens of sites and services that disseminate iformation in different ways to different audiences.

Here’s hoping that the House Administration Committee’s standards drive all the way to machine-readability. It will be a step into the 21st century if the House provides data the Internet can use and that the Internet-connected public very much wants to see.

Coming through with robust machine-readability will handily take the transparency mantle from President Obama, who promised transparency as a campaigner, but who was not produced the vibrant, different government people wanted. As I noted in a write-up last week, the administration has some low-hanging transparency fruit that could bring its grades up decisively. House Republicans are first out of the gate.

The DATA Act and Cato’s Transparency Work

In his final “Chairman’s Corner” blog post as head of the White House’s Recovery Act Transparency and Accountability Board, Earl Devaney highlights the need for orderly publication of data about government spending.

There is bi-partisan legislation now in the Congress—it’s called the Digital Accountability and Transparency Act, or DATA Act—that could accomplish this mission. But the reform bill faces an uphill battle, primarily because some in the bureaucracy prefer the status quo—a hodgepodge of data collection and display sites that, frankly, makes no sense at all unless you believe your government should confuse you.

The DATA Act would establish an independent board within the executive branch to track federal spending, and it would require federal agencies and recipients of federal funds to comply with reporting requirements set up by the board.

The board would “designate common data elements, such as codes, identifiers, and fields, for information required to be reported by recipients or agencies” (section 102 of the reported version, adding a new §3611 to title 31 of the U.S. code). The bill’s author, Rep. Darrell Issa (R-CA), spoke at our September Capitol Hill briefing, rolling out our legislative data model.

On Wednesday, another Cato Capitol Hill briefing highlighted the results of our work the last few months to model federal budgeting, appropriating, and spending. Should the DATA Act become law, the model we’ve been working on can illuminate the work of the proposed board. Use of our model will help ensure that the structure of government spending data supports public oversight use cases.

I don’t know that there needs to be a board—certainly not a permanent one. The bill authorizes more money than I think is required for the board, and the Congressional Budget Office’s cost estimate for implementing the requirements of the DATA Act seems wildly high. But the dynamics set in motion by making government spending more transparent may well reduce government spending by well more than even these high estimated costs.

Government Spending Transparency: ‘Needs Improvement’ Is Understatement

Back in September, I rated Congress on how well it is publishing information about its deliberations and decisions. “Needs Improvement” was the understated theme.

Now we’re looking at the government’s publication of data that reflects budgeting, appropriations, and spending. “Needs improvement” isn’t just understated in this area. It’s really, really understated.

On the budgeting, appropriations, and spending transparency report card I’m putting out today, B+ is the best grade—and it goes to just half of one subject area. There are 2.5 Cs, 3 Ds, and 4 incompletes. This area needs improvement.

What is transparency, anyway? In my briefing paper, “Publication Practices for Transparent Government,” I wrote about the publication practices that support transparency. They are: authority, availability, machine-discoverability, and machine-readability. That means putting good data out from a consistent source in sensible ways, and, especially, structuring the data so that computers can interpret it.

You know what the World Wide Web is? It’s a whole bunch of structured data. If you want the kind of breakthrough in transparency for government data that the Web was for communications, you want the data structured right.

Our draft structure for data in this area is in our “Conceptual Data Model of the U.S. Federal Government Budgetary Process.” (HTML version, Word version)

Structured data doesn’t really exist yet in the area of budgeting, appropriating, and spending. The one bright spot is the president’s annual budget submission, which includes some information in a workable structure, but there is much room for improvement even there.

Because I’m so nice, I’ve given a lot of “incompletes” where I could have—and some say should have—given Fs. Believe it or not, there is NO federal government “organization chart” that is published in a way computers can use. That’s one of the building blocks of computerized oversight, and its absence is easily rectified.

When we return to these issues in the summer or fall of next year, and review more formally how Congress and the administration have done on transparency, I expect these things to be fixed. (Fear the blog post!)

In the meantime, here’s a run-down of the grades and why they were given. A Hill briefing today might be available online at the page for the event. (It’s somewhat symbolic that the room we have on Capitol Hill is ill-equipped for live-streaming, but we’re going to try.)

I’ve alternated in this post between “I” and “we” because I’ve gotten so much help on this. People from OMB Watch, the National Priorities Project, and the Sunlight Foundation have helped a great deal with this project, to name a few—and omit many others! The grades, the commentary, the errors, the misstatements, and omissions are all mine. And there are going to be plenty of gaps in this work. That’s why this is a blog post and not a formal Cato publication.

Publication Practices for Transparent Government: Budgeting, Appropriations, and Spending

How well can the Internet access data about the federal government’s budgeting, appropriating, and spending? In consultation with transparency experts, the Cato Institute’s director of information policy studies, Jim Harper, rated how Congress and the administration publish key spending-cycle data in terms of authoritative sourcing, availability, machine-discoverability, and machine-readability.

These criteria envision a world where there is one authoritative source for each category of information. Unfortunately, what spending data there is appears in a lot of sources that have grown up haphazardly. There might even be some sources we don’t know about. Future grades will undoubtedly reflect improvements in what researchers, reporters, websites, and the public at large can see and use, aided by their computers.

Agencies: I

Federal agencies are the “agents” of Congress and the president. They carry out federal policy and spending decisions. Accordingly, one of the building blocks of data about spending is going to be a definitive list of the organizational units that do the spending.

Is there such a list? Yes! It’s Appendix C of OMB Circular A-11, “Listing of OMB Agency/Bureau and Treasury Codes.” But this list is a PDF document that is found on the Office of Management and Budget website.

Believe it or not, there is NO federal government “organization chart” that is published in a way amenable to computer processing!

There are distinct identifiers for agencies in both the Treasury Department and the Office of Management and Budget. Either of these could be published as the executive branch’s definitive list of its agencies. This fruit is hanging so low that a gopher could snack on it without leaving its hole, but nobody seems to have thought of publishing data about the basic units of the executive branch online in a machine-discoverable and machine-readable format.

A pathological excess of generosity spurs us to give this category an “incomplete” rather than a straight F. We expect improvement in publication of this data, pronto.

Bureaus: I

The sub-units of agencies are bureaus, and the same situation applies to data about the offices where the work of agencies get divided up. Bureaus have identifiers. It’s just that nobody publishes a list of bureaus, their parent agencies, and other key information for the Internet-connected public to use in coordinating its oversight.

Again, an “incomplete” in this area will quickly convert to an F if this gap in data publication is not soon rectified.

Programs: I

The work of the government is parceled out for actual execution in programs. Like information about their parental units, the agencies and bureaus, data that identifies and distinguishes programs is not comprehensively published.

There is some information about programs available in usable forms. The Catalog of Federal Domestic Assistance website (www.cfda.gov) has useful aggregation of some information on programs, but the canonical guide to government programs, along with the bureaus and agencies that run them does not exist.

This is a little bit heavier a lift than agencies and bureaus—the number of programs exceeds the number of bureaus by something like an order of magnitude (much as the number of bureaus exceeds the number of agencies). And it might be that some programs have more than one agency/bureau parent. But today’s powerful computers can keep track of these things—they can count pretty high! And the government should figure out all the programs it has, keep that list up to date, and publish it for public consumption.

Until it does, the program category gets an “incomplete” and the threat of a future F. (Or maybe a D thanks to the CFDA.)

Projects: D-

Projects are where the rubber hits the road. These are the organizational vehicles the government uses to enter into contracts and create other obligations that deliver on government services.

Some project information gets published—we finally have an item that is not incomplete—but the publication is so bad that we give this area a low grade indeed.

Information about projects can be found. You can search for projects by name on USASpending.gov, and descriptions of projects appear in USASpending/FAADS downloads. (“FAADS” is the Federal Assistance Award Data System), but there is no canonical list of projects that we could find. There should be, and there should have been for a long time now.

The generosity and patience we showed with respect to agencies, budgets, and programs has run out. There’s more than nothing here, but programs get a D-.

Budget Documents — Congress: D / White House: B+

The president’s annual budget submission and the congressional budget resolutions are the planning documents that the president and Congress use to map the direction of government spending each year. These documents are published authoritatively, and they are consistently available, which is good. They are kind-of machine-discoverable, but they are not terribly machine-readable.

The appendices to the president’s budget are published in XML format, which vastly reduces the time it takes to work with the data in them. That’s really good. But the congressional budget resolutions have no similar organization, and there is low correspondence between the budget resolutions that Congress puts out and the budget the president puts out. You would think that a person—or better yet, a computer—should be able to lay these documents side by side for comparison, but you can’t.

For its use of XML, the White House gets a B+. Congress gets a flat D.

Budget Authority—Congress: C- / Executive Branch: D

“Budget authority” is a term of art for what probably should be called “spending authority.” It’s the power to spend money, created when Congress and the president pass a law containing such authority.

Proposed budget authority is pretty darn opaque. The bills in Congress that contain proposed budget authority are consistently published online—that’s good—but they don’t highlight budget authority in machine readable ways. No computer can figure out how much budget authority is out there in pending legislation.

Existing budget authority is pretty well documented in the Treasury Department’s FAST book (Federal Account Symbols and Titles). This handy resource lists Treasury accounts and the statutes and laws that provide their budget authority. The FAST book is not terrible, but the only form we’ve found it in is PDF. PDF is terrible.

Congress can do a lot better, but because some of the publication basics are there, we give it a C-. The administration gets a D for publishing the obscure FAST book in PDF.

Ideally, there would be a nice, neat connection from budget authority right down to every outlay of funds, and back up again from every outlay to its budget authority. These connections, published online in useful ways, would allow public oversight to blossom.

Warrants, Apportionments, and Allocations: I

After Congress and the president create budget authority, that authority gets divvied up to different agencies, bureaus, programs and projects. How well documented are these processes? Not well.

An appropriation warrant is an assignment of funds by the Treasury to a treasury account to serve a particular budget authority. It’s the indication that there is money in an account for an agency to obligate and then spend.

Where is warrant data? We can’t find it. Given Treasury’s thoroughness, it probably exists, but it’s just not out there for public consumption. We’ve again generously given this area an “incomplete.”

An apportionment is an instruction from the Office of Management and Budget to an agency about how much it may spend from a treasury account in service of given budget authority in a given period of time.

We haven’t seen any data about this, and we’re less sure that there is some. There should be. And we should get to see it. Incomplete.

An allocation is a similar division of budget authority by an agency into programs or projects. We don’t see any data on this either. And we should. Incomplete.

Step up, Executive Branch, or we’ll convert these incompletes to very low grades, indeed…

Obligations: C+

Obligations are the commitments to spend money into which government agencies enter. Things like contracts to buy pens, hiring of people to write with those pens, and much, much more.

There are several different data sources that reveal obligations: FAADS/FAADS+ and CFDA, for example. But their numbers don’t match up, and—unless you’re going to have each agency uniformly publish its own data—obligations shouldn’t be published in different places. It’s hard to consider either one authoritative (even if the law says they both are). FAADS+/FPDS (via USASpending.gov), CFDA, and FPDS (the Federal Procurement Data System) are online and stable, but they are potentially incomplete because not all agencies may report to them. The use of proprietary DUNS numbers also weakens them in terms of availability.

Just sorting through all the acronyms can get you down. Ask data experts to get into the quality of each data source, and you’ll be boggled by the questions regarding which agencies’ obligations are reported at which source, whether given sources dumb down the data by excluding small dollar amounts or by aggregating data about smaller agencies. Some sources are more timely than others. Etc. etc. etc.

All these issues frustrate transparency. Data about obligations is not clean, complete and well documented. The ideal is to have one source of obligation data that combines the strengths of all the existing sources and that includes every agency, bureau, program, and project. With a decent amount of data out there, though, useful for experts, this category gets a C+.

Parties: D+

Of course, you want to know where the money is going. That is what we’re calling the “parties” category. (“Parties” sounds kinda fun, don’t it?)

Right now, reporting on parties is dominated by the DUNS number. That’s the Data Universal Numbering System, which provides a unique identifier for each business entity. It was developed by Dun & Bradstreet in the 1960s. It’s very nice to have a distinct identifier for every entity doing business with the government, but it is not very nice to have the numbering system be a proprietary one.

Parties would grade well in terms of machine-readability, which is one of the most important measures of but because it scores so low on availability, its machine-readability is kind of moot. Until the government moves to an open identifier system for recipients of funds, it will get weak grades on publication of this essential data.

Outlays: C-

For a lot of folks, the big kahuna is knowing where the money goes: outlays. An outlay—literally, the laying out of funds—satisfies an obligation. It’s the movement of money from the U.S. Treasury to the outside world.

Outlay numbers are fairly well reported after the fact and in the aggregate. All you have to do is look at the appendices to the president’s budget to see how much money has been spent in the past.

But outlay data can be much, much more detailed and timely than that. Each outlay goes to a particular party. Each outlay is done on a particular project or program at the behest of a particular bureau and agency. And each outlay occurs because of a particular budget authority. Right now these details about outlays are nowhere to be found.

Now, there are plenty of people inside the government who are very familiar with the movement of taxpayer money in the government. They will be inclined to say, “it’s more complicated than that,” and it is! But it’s going to have to get quite a bit less complicated before these processes can be called transparent.

The time do de-complicate outlays is now. It’s another feat of generosity to give this area a C-. That’s simply because there is an authoritative source for aggregate past outlay data. As the grades other areas come up, outlay data that stays the same could go down. Waaaayyy down.

Transparency and Its Discontents

Remember when you had to wait until the end of the month to see your bank statement?

Last week, on the cusp of failing to pass any annual appropriations bills ahead of the October 1 start of the new fiscal year, congressional leaders came up with a short-term government funding bill (or “continuing resolution”) that would fund the government until November 18th. For whatever reason, that deal (H.R. 2608) wasn’t ready to go before the end of the week, so Congress passed an even shorter-term continuing resolution (H.R. 2017) that funds the government until tomorrow, October 4th.

Every weekend, I hunch over my computer and update key records in the database of WashingtonWatch.com, a government transparency website I run as a non-partisan, non-ideological resource (disclosure: it’s my own, not a Cato project). Then I put a summary of what’s going on into an email like this one (subscribe!) that goes out to 7,000 or so of my closest friends.

Last weekend, the Library of Congress’ THOMAS website, which is one of my resources, was down a good chunk of the time for maintenance. Even after it came up again, some materials such as bill text and committee reports weren’t available. (They had come up by the wee hours this morning.) Maintenance is necessary sometimes, though when the service provider I use for the WashingtonWatch.com email does maintenance, it’s usually for an hour or so in the middle of a weekend night.

But when I went to update the database to reflect last week’s passage of H.R. 2017, I could find no record of its public law number. When a bill becomes a law, it gets a public law number starting with the number of the Congress that passed and then a sequential number, like Public Law No. 112-29. The Government Printing Office’s FDsys system lets you browse public laws. At this writing, it isn’t updated to reflect the passage of new laws last week. When THOMAS came back up, its public laws page also had no data to reflect the passage of that continuing resolution last week (and still doesn’t, also at this writing).

There is barely any news reporting on humdrum details about governing like the passage of a law expending $40 billion in taxpayer funds. (That’s about what H.R. 2017 spends to operate the government four more days, roughly $400 per U.S. family.) Where can you confirm with an official source that this happened?

The winning data resource this week, if by default, is Whitehouse.gov, which has a page dedicated to laws the president has signed. That page says that President Obama signed four new laws on Friday (Sept. 30). When might FDsys or THOMAS reflect this information? It’ll happen soon, and that data will start to propagate out to society.

But I think that’s not soon enough. A couple of days’ delay is a big deal.

If I were to take $400 in cash out of my bank account at an ATM, I could review that transaction from that instant forward on my bank’s website. If I had a concern or even a passing interest, I could just go look. That is an utterly unremarkable service in this day and age.

But it’s remarkable that such a service doesn’t exist in systems that are as important as our bank accounts. When Congress and the president pass a bill to spend $40 billion dollars, the fact of its passage is pretty much undocumented by any official sources until enough Mon-Fri, 9-to-5 work hours have passed.

In my recently published paper, Publication Practices for Transparent Government, I go through the things the government should do to make itself more transparent (thus improving public oversight and producing lots of felicitous outcomes). A practice I cite is “real-time or near-real-time publication.” Why? Because then any of the 300 million Americans who have an interest, real or passing, can see what is happening with their money as it happens, just like they can with their bank holdings. People like me (and many more) can propagate complete and timely information, making it that much more accessible.

When you’re talking about a potential audience of 200 million people and $40 billion in expense (one of the tiniest spending bills—others are much larger), it is not too much to ask to have the data published in real time.

I don’t expect a lot of people to join me at the barricades with pitchforks and torches on this one. Government transparency is an area ruled by implicit demand. People don’t know what they are missing, so they don’t know to suffer a sense of deprivation. I do that for them—all of them. (Heroic, idn’t it?)

Before too long, though, the government’s opacity will be recognized as a contributor to the public’s general—and strong—distaste for all that goes on in Washington, D.C. The idea of spending $400 per U.S. family without documenting every detail of it on the Internet will seem as absurd as waiting until the end of the month to see what happened in your bank account.