Tag: data

Big Data Tool For Trump’s Big Government Immigration Plans

During his campaign President Trump made it clear that his administration would strictly enforce immigration law while also seeking to limit immigration. Trump’s executive orders so far are consistent with his campaign rhetoric, including a revitalization of the controversial 287(g) program, threats to withdraw grants from so-called “Sanctuary Cities,” the construction of a wall on the southern border, a temporary ban on immigration from six Muslim-majority countries, and the hiring of 10,000 more Immigration and Customs Enforcement (ICE) agents. Recent reporting reveals that these agents, tasked with implementing significant parts of Trump’s immigration policy agenda, will have access to an intelligence system that should concern all Americans who value civil liberties.

Earlier this month The Intercept reported on Investigative Case Management (ICM), designed by Palantir Technologies. ICE awarded Palantir a $41 million contract in 2014 to build ICM. ICM is scheduled to be fully operational by September of this year.

Here is The Intercept’s breakdown of how ICM works:

ICM funding documents analyzed by The Intercept make clear that the system is far from a passive administrator of ICE’s case flow. ICM allows ICE agents to access a vast “ecosystem” of data to facilitate immigration officials in both discovering targets and then creating and administering cases against them. The system provides its users access to intelligence platforms maintained by the Drug Enforcement Administration, the Bureau of Alcohol, Tobacco, Firearms and Explosives, the Federal Bureau of Investigation, and an array of other federal and private law enforcement entities. It can provide ICE agents access to information on a subject’s schooling, family relationships, employment information, phone records, immigration history, foreign exchange program status, personal connections, biometric traits, criminal records, and home and work addresses.

Better Data, More Light on Congress

There’s an old joke about a drunk looking for his keys under a lamp post. A police officer comes along and helps with the search for a while, then asks if it’s certain that the keys were lost in that area.

“Oh no,” the drunk says. “I lost them on the other side of the road.”

“Why are we looking here?!”

“Because the light is better!”

In a way, the joke captures the situation with public oversight of politics and public policy. The field overall is poorly illuminated, but the best light shines on campaign finance. There’s more data there, so we hear a lot about how legislators get into office. We don’t keep especially close tabs on what elected officials do once they’re in office, even though that’s what matters most.

(That’s my opinion, anyway, animated by the vision of an informed populace keeping tabs on legislation and government spending as closely as they track, y’know, baseball, the stock market, and the weather.)

Our Deepbills project just might help improve things. As I announced in late August, we recently achieved the milestone of marking up every version of every bill in the 113th Congress with semantically rich XML. That means that computers can automatically discover references in federal legislation to existing laws in every citation format, to agencies and bureaus, and to budget authorities (both authorizations of appropriations and appropriations).

Is There No “Hiatus” in Global Warming After All?

A new paper posted today on ScienceXpress (from Science magazine), by Thomas Karl, Director of NOAA’s Climate Data Center, and several co-authors[1], that seeks to disprove the “hiatus” in global warming prompts many serious scientific questions.

The main claim[2] by the authors that they have uncovered a significant recent warming trend is dubious. The significance level they report on their findings (.10) is hardly normative, and the use of it should prompt members of the scientific community to question the reasoning behind the use of such a lax standard.

In addition, the authors’ treatment of buoy sea-surface temperature (SST) data was guaranteed to create a warming trend. The data were adjusted upward by 0.12°C to make them “homogeneous” with the longer-running temperature records taken from engine intake channels in marine vessels. 

As has been acknowledged by numerous scientists, the engine intake data are clearly contaminated by heat conduction from the engine itself, and as such, never intended for scientific use. On the other hand, environmental monitoring is the specific purpose of the buoys. Adjusting good data upward to match bad data seems questionable, and the fact that the buoy network becomes increasingly dense in the last two decades means that this adjustment must put a warming trend in the data.

The extension of high-latitude arctic land data over the Arctic Ocean is also questionable. Much of the Arctic Ocean is ice-covered even in high summer, meaning the surface temperature must remain near freezing. Extending land data out into the ocean will obviously induce substantially exaggerated temperatures.

Additionally, there exist multiple measures of bulk lower atmosphere temperature independent from surface measurements which indicate the existence of a “hiatus”[3]. If the Karl et al., result were in fact robust, it could only mean that the disparity between surface and mid-tropospheric temperatures is even larger that previously noted. 

Getting the vertical distribution of temperature wrong invalidates virtually every forecast of sensible weather made by a climate model, as much of that weather (including rainfall) is determined in large part by the vertical structure of the atmosphere.

Instead, it would seem more logical to seriously question the Karl et al. result in light of the fact that, compared to those bulk temperatures, it is an outlier, showing a recent warming trend that is not in line with these other global records.

And finally, even presuming all the adjustments applied by the authors ultimately prove to be accurate, the temperature trend reported during the “hiatus” period (1998-2014), remains significantly below (using Karl et al.’s measure of significance) the mean trend projected by the collection of climate models used in the most recent report from the United Nation’s Intergovernmental Panel on Climate Change (IPCC). 

It is important to recognize that the central issue of human-caused climate change is not a question of whether it is warming or not, but rather a question of how much. And to this relevant question, the answer has been, and remains, that the warming is taking place at a much slower rate than is being projected.

The distribution of trends of the projected global average surface temperature for the period 1998-2014 from 108 climate model runs used in the latest report of the U.N.’s Intergovernmental Panel on Climate Change (IPCC)(blue bars). The models were run with historical climate forcings through 2005 and extended to 2014 with the RCP4.5 emissions scenario. The surface temperature trend over the same period, as reported by Karl et al. (2015, is included in red. It falls at the 2.4th percentile of the model distribution and indicates a value that is (statistically) significantly below the model mean projection.

The distribution of trends of the projected global average surface temperature for the period 1998-2014 from 108 climate model runs used in the latest report of the U.N.’s Intergovernmental Panel on Climate Change (IPCC)(blue bars). The models were run with historical climate forcings through 2005 and extended to 2014 with the RCP4.5 emissions scenario. The surface temperature trend over the same period, as reported by Karl et al. (2015, is included in red. It falls at the 2.4th percentile of the model distribution and indicates a value that is (statistically) significantly below the model mean projection.


[1] Karl, T. R., et al., Possible artifacts of data biases in the recent global surface warming hiatus. Scienceexpress, embargoed until 1400 EDT June 4, 2015.

[2] “It is also noteworthy that the new global trends are statistically significant and positive at the 0.10 significance level for 1998-2012…”

[3] Both the UAH and RSS satellite records are now in their 21st year without a significant trend, for example

In Holding NSA Spying Illegal, the Second Circuit Treats Data as Property

The U.S. Court of Appeals for the Second Circuit has ruled that section 215 of the USA-PATRIOT Act never authorized the National Security Agency’s collection of all Americans’ phone calling records. It’s pleasing to see the opinion parallel arguments that Randy Barnett and I put forward over the last couple of years.

Two points from different parts of the opinion can help structure our thinking about constitutional protection for communications data and other digital information. Data is property, which can be unconstitutionally seized.

As cases like this often do, the decision spends much time on niceties like standing to sue. In that discussion—finding that the ACLU indeed has legal standing to challenge government collection of its calling data—the court parried the government’s argument that the ACLU suffers no offense until its data is searched.

“The Fourth Amendment protects against unreasonable searches and seizures,” the court emphasized. Data is a thing that can be owned, and when the government takes someone’s data, it is seized.

In this situation, the data is owned jointly by telecommunications companies and their customers. The companies hold it subject to obligations they owe their customers limiting what they can do with it. Think of covenants that run with land. These covenants run with data for the benefit of the customer.

Will the Administration Make a Run at Transparency?

Last fall, I reported that the Obama administration lagged the House of Representatives on transparency. The conclusion was driven by a study of the quality of data publication regarding key elements of budgeting, appropriating, spending, and the legislative process. (Along with monitoring progress in these area, we’ve been producing data to show that it can be done, to produce a cadre of users, and to simply deliver government transparency at a less plodding pace.)

There are signs that the administration may make a run at improving its transparency record. Buried deep in the FY 2014 budget justification for the Treasury Department’s Bureau of the Fiscal Service, it says that funds will support “government-wide data standardization efforts to increase accuracy and transparency of Federal financial reporting.” That means the public may get better access to where the money goes – outlays – in formats that permit computer-aided oversight.

In parallel, a Performance.gov effort called the Federal Program Inventory says that, in May of 2014, it will publish a Unique Federal Program Inventory number (pg. 4-5) for each federal program, along with agency IDs and bureau IDs. This may be the machine-readable federal government organization chart whose non-existence I have lamented for some time.

If this sounds jargon-y, you’re normal. Think of federal spending as happening on a remote jungle island, where all the inhabitants speak their own language. On Federal Spending Island, no visitor from the U.S. mainland can understand where things are there, or who is saying what to whom.

True machine-readable data will turn Federal Spending Island into a place where English is spoken, or at least a some kind of Federal Spending-English dialect that makes the movement of our tax dollars easier to track.

The Data Says Open-Ended Spending Bills Are Common

Let’s start with a little civics lesson: Congress spends money through a two-step process. Spending must first be authorized. That’s called an authorization of appropriations. Then, in a second step, the money is actually appropriated. There are exceptions, but on the whole this is how spending works. Authorizing bills go to authorizing committees, and appropriations bills go to the appropriations committees. When both do their thing, money gets spent. It’s good to keep an eye on.

In our project to generate better data about what Congress is doing, we’ve “marked up” over 80 percent of the bills introduced in Congress so far this year, adding richer and more revealing computer-readable data to the text of bills. That’s over 4,000 of the 5,000-plus bills introduced in Congress since January. We’re to the point where we can learn things.

I was surprised to find just how often the bills that authorize spending leave the amounts open-ended. A recent sample of the bills we’ve marked up includes 428 bills with authorizations of appropriations. Just over 40 percent of them place no limit on how much money will be spent. They say things like “such sums as may be necessary,” leaving entirely to the appropriations committees how much to spend. (There are many bills with both defined amounts and open-ended spending. To be conservative, we treated any bill having limited spending as not unlimited.)

This leads me to two related conclusions. First, authorizations of appropriations being a potential brake on spending, this surprisingly common practice is part of Congress’s fiscal indiscipline. The members of Congress and Senators who introduce such bills and vote to authorize open-ended spending are avoiding their responsibility to determine how much a program is worth to us, the taxpayers.

Cato’s “Deepbills” Project Advances Government Transparency

It’s not the culmination–that will come soon–but a major step in our work to improve government transparency has been achieved. I’ll be announcing and extolling it Wednesday at the House Administration Committee’s Legislative Data and Transparency conference. Here’s a quick survey of what we’ve been doing and the results we see on the near horizon.

After president Obama’s election in 2008, we recognized transparency as a bipartisan and pan-ideological goal at an event entitled: “Just Give Us the Data.” Widespread agreement and cooperation on transparency has held. But by the mid-point of the president’s first term, the deep-running change most people expected was not materializing, and it still has not. So I began working more assiduously on what transparency is and what delivers it.

In “Publication Practices for Transparent Government” (Sept. 2011), I articulated ways the government should deliver information so that it can be absorbed by the public through the intermediary of web sites, apps, information services, and so on. We graded the quality of government data publication in the aptly named November 2012 paper: “Grading the Government’s Data Publication Practices.”

But there’s no sense in sitting around waiting for things to improve. Given the incentives, transparency is something that we will have to force on government. We won’t receive it like a gift.

So with software we acquired and modified for the purpose, we’ve been adding data to the bills in Congress, making it possible to learn automatically more of what they do. The bills published by the Government Printing Office have data about who introduced them and the committees to which they were referred. We are adding data that reflects:

- What agencies and bureaus the bills in Congress affect;

- What laws the bills in Congress effect: by popular name, U.S. Code section, Statutes at Large citation, and more;

- What budget authorities bills include, the amount of this proposed spending, its purpose, and the fiscal year(s).

We are capturing proposed new bureaus and programs, proposed new sections of existing law, and other subtleties in legislation. Our “Deepbills” project is documented at cato.org/resources/data.

This data can tell a more complete story of what is happening in Congress. Given the right Web site, app, or information service, you will be able to tell who proposed to spend your taxpayer dollars and in what amounts. You’ll be able to tell how your member of Congress and senators voted on each one. You might even find out about votes you care about before they happen!

Having introduced ourselves to the community in March, we’re beginning to help disseminate legislative information and data on Wikipedia.

The uses of the data are limited only by the imagination of the people building things with it. The data will make it easier to draw links between campaign contributions and legislative activity, for example. People will be able to automatically monitor ALL the bills that affect laws or agencies they are interested in. The behavior of legislators will be more clear to more people. Knowing what happens in Washington will be less the province of an exclusive club of lobbyists and congressional staff.

In no sense will this work make the government entirely transparent, but by adding data sets to what’s available about government deliberations, management and results, we’re multiplying the stories that the data can tell and beginning to lift the fog that allows Washington, D.C. to work the way it does–or, more accurately, to fail the way it does.

At this point, data curator Molly Bohmer and Cato interns Michelle Newby and Ryan Mosely have marked up 75% of the bills introduced in Congress so far. As we fine-tune our processes, we expect essentially to stay current with Congress, making timely public oversight of government easier.

This is not the culmination of the work. We now require people to build things with the data–the Web sites, apps, and information services that can deliver transparency to your door. I’ll be promoting our work at Wednesday’s conference and in various forums over the coming weeks and months. Watch for government transparency to improve when coders get a hold of the data and build the tools and toys that deliver this information to the public in accessible ways.

Pages