Tag: data mining

The Difficulty With Finding Rare Events in Data

John Brennan, assistant to the president for homeland security and counterterrorism, made the rounds of the Sunday political shows this weekend. He’ll be reviewing the attempted bombing of Northwest flight 253 for the president. 

His appearance on ABC’s This Week program revealed his struggle with the limitations on data mining for counterterrorism purposes. His interviewer, Terry Moran, betrayed even less awareness of the challenge. Their conversation is revealing:

Moran: Who dropped the ball here? Where did the system fail?

Brennan: Well, first of all, there was no single piece of intelligence or “smoking gun,” if you will, that said that Mr. Abdulmutallab was going to carry out this attack against that aircraft. What we had, looking back on it now, were a number of streams of information. We had the information that came from his father, where he was concerned about his son going to Yemen, consorting with extremists, and that he was not going to go back.

We also, though, had other streams of information, coming from intelligence channels that were little snippets. We might have had a partial name, we might have had indication of a Nigerian, but there was nothing that brought it all together.

What we need to do as a government and as a system is to bring that information together so when a father comes in with information and we have intelligence, we can map that up so that we can stop individuals like Abdulmutallab from getting on a plane.

Moran: But that is exactly the conversation we had after 9/11, about connecting these disparate dots. You were one of the architects of the system put in place after that, the National Counterterrorism Center. That’s where the failure occured, right? The dots weren’t connected.

Brennan: Well, in fact, prior to 9/11, I think there was reluctance on the part of a lot of agencies and departments to share information. There is no evidence whatsoever that any agency or department was reluctant to share.

Moran: Including the NSA? Were the NSA intercepts shared with the National Counterterrorism Center?

Brennan: Absolutely. All the information was shared. Except that there are millions upon millions of bits of data that come in on a regular basis. What we need to do is make sure the system is robust enough that we can bring that information to the surface that really is a threat concern. We need to make the system stronger. That’s what the president is determined to do.

Moran: You see millions upon millions of bits of data that—Facebook has 350 million users who put out 3.5 billion pieces of content a week, and it’s always drawing connections. In the era of Google, why does [the] U.S. intelligence community not have the sophistication and power of Facebook?

Brennan: Well, in fact, we do have the sophistication and power of Facebook, and well beyond that. That’s why we were able to stop Mr. Najibullah Zazi, David Headley, [and] other individuals from carrying out attacks, because we were able to do that on a regular basis. In this one instance, the system didn’t work. There were some human errors. There were some lapses. We need to strengthen it.

In our paper, Effective Counterterrorism and the Limited Role of Predictive Data Mining, distinguished engineer and chief scientist with IBM’s Entity Analytic Solutions Group Jeff Jonas and I distinguished between what we called subject-based data analysis and pattern-based analysis.

Subject-based data analysis seeks to trace links from known individuals or things to others… . In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as “predictive data mining.”

The “power” that Facebook has is largely subject-based. People connect themselves to other people and things in Facebook’s data through “friending,” posting of pictures, and other uses of the site. Given a reason to suspect someone, Facebook data could reveal some of his or her friends, compatriots, and communications.

That’s a lot compared to what existed in the recent past, but it’s nothing special, and its nothing like what Brennan wants from the data collection done by our intelligence services. He appears to want data analysis that can produce suspicion in the absence of good intelligence—without the “smoking gun” he says we lacked here.

Unfortunately, the dearth of patterns indicative of terrorism planning will deny success to that project. There isn’t a system “robust” enough to identify current attacks or attempts in data when we have seen examples of them only a few times before. Pattern-based data mining works when there are thousands and thousands of examples from which to build a model of what certain behavior looks like in data.

If Brennan causes the country to double down on data collection and pattern-based data mining, plan on having more conversations about failures to “connect the dots” in the future.

As George Will said on the same show, “When you have millions of dots, you cannot define as systemic failure—catastrophic failure—anything short of perfection. Our various intelligence agencies suggest 1,600 names a day to be put on the terrorist watch list. He is a known extremist, as the president said. There are millions of them out there. We can’t have perfection here.”

We’ll have far less than perfection—more like wasted intelligence efforts—if we rely on pattern-based or predictive data mining to generate suspicions about terrorism.

Fort Hood: Reaction, Response, and Rejoinder

Commentary on the Fort Hood incident can be categorized three ways: reaction, response, and rejoinder (commentary on the commentary).

Reactions generally consist of pundits pouring their preconceptions over what is known of the facts. These are the least worthy of our time, and rejoinders like this one from Stephen M. Walt of Harvard University in the Fort Hood section of The Politico’s Arena blog dispense with them well:

Of course [Fort Hood] is being politicized; there is no issue that is immune to exploitation by politicians and media commentators. The problem is that there are an infinite number of “lessons” one can draw from a tragic event like this — the strain on our troops from a foolish war, the impact of hateful ideas from the fringe of a great religion (and most religions have them), the individual demons that drove one individual to a violent and senseless act, etc., — and so no limits to the ways it can be used by irresponsible politicians (is that redundant?) and pundits.

My favorite response—by “response,” I mean careful, productive analysis—was written last year as a general admonition about events like this (which at least has terrorist connotations):

Above all else is the imperative to think beyond the passions of those who are hurt, frightened or angry. Policymakers who become caught up in the short-term goals and spectacle of terrorist attacks relinquish the broader historical perspective and phlegmatic approach that is crucial to the reassertion of state power. Their goal must be to think strategically and avoid falling into the trap of reacting narrowly and directly to the violent initiatives taken by these groups.

That’s Audrey Kurth Cronin, Professor of Strategy at the U.S. National War College in her monograph, Ending Terrorism: Lessons for Defeating al-Qaeda.

But I want to turn to a critique leveled against my recent post, ”The Search for Answers in Fort Hood,” which discussed how little Fort Hood positions us to prevent similar incidents in the future. (I hope it was response and not reaction, but readers can judge for themselves.)

A thoughtful Cato colleague emailed me suggesting that there may have been enough indication in Nidal Hasan’s behavior—in particular, correspondence with Anwar al-Awlaki—to stop him before his shooting spree.

There may have been. Current reporting has it that his communications with al-Awlaki were picked up and examined, but because they were about a research paper that he was in fact writing, he was deemed not to merit any further investigation.

This can only be called error with the benefit of hindsight. And it tells us nothing about what might prevent a future attack, which was my subject.

If humans were inert objects, investigators could simply tweak the filter that caused this false negative to occur. They could not only investigate the people who contact known terrorists as they did Nidal Hassan, they could know to disregard claimed academic interests. Poof! The next Nidal Hassan would be thwarted at a small cost to actual researchers.

But future attacks are not like past attacks. Tweaking the filter to eliminate this source of false negatives would simply increase false positives without homing in on the next attacker. Terrorists and terrorist wannabes will change their behavior based on known and imagined measures to thwart them. Nobody’s going to be emailing this al-Awlaki guy for a while.

In “Effective Counterterrorism and the Limited Role of Predictive Data Mining,” IBM distinguished engineer Jeff Jonas and I used examples from medicine to illustrate the problem of false positives when searching for terrorism in large data sets, concluding:

The question is not simply one of medical ethics or Fourth Amendment law but one of resources. The expenditure of resources needed to investigate 3,000,000, 15,000,000, or 30,000,000 fellow citizens is not practical from a budgetary point of view, to say nothing of the risk that millions of innocent people would likely be under the microscope of progressively more invasive surveillance as they were added to suspect lists by successive data-mining operations.

The same problems exist here, where tens of thousands of leads may present themselves to investigators each year. They must balance the likelihood of harm coming to U.S. interests against the rights of U.S. citizens and the costs of investigating all these potential suspects.

Armchair terror warriors may criticize these conclusions a variety of ways, believing that post hoc outrage or limitless grants of money and power to government can produce investigative perfection. (n.b. Getting victim states to dissipate their own money and power is how terrorism does its work.) But none can accurately say based on currently available facts that anyone made an error. Much less can anyone say that we know any better how to prevent essentially random violent incidents like this in the future.

Report to DoD: Data Mining Won’t Catch Terrorism

Via Secrecy News, “JASON”—a unit of defense contractor the MITRE Corporation—has reported to the Department of Defense on the weakness of data mining for predicting or discovering inchoate terrorist attacks.

“[I]t is simply not possible to validate (evaluate) predictive models of rare events that have not occurred, and unvalidated models cannot be relied upon,” says the report.

In December 2006, Jeff Jonas and I published a paper making the case that predictive modeling won’t discover rare events like terrorism. The paper, Effective Counterterrorism and the Limited Role of Predictive Data Mining, was featured prominently in a Senate Judiciary Committee hearing early the next year.

Privacy gives way to appropriate security measures, as the Fourth Amendment suggests, where it approves “reasonable” searches and seizures. Given the incapacity of data mining to catch terrorism and the massive data collection required to “mine” for terrorism, data mining for terrorism is a wrongful invasion of Americans’ privacy—and a waste of time.