The Difficulty With Finding Rare Events in Data

John Brennan, assistant to the president for homeland security and counterterrorism, made the rounds of the Sunday political shows this weekend. He’ll be reviewing the attempted bombing of Northwest flight 253 for the president. 

His appearance on ABC’s This Week program revealed his struggle with the limitations on data mining for counterterrorism purposes. His interviewer, Terry Moran, betrayed even less awareness of the challenge. Their conversation is revealing:

Moran: Who dropped the ball here? Where did the system fail?

Brennan: Well, first of all, there was no single piece of intelligence or “smoking gun,” if you will, that said that Mr. Abdulmutallab was going to carry out this attack against that aircraft. What we had, looking back on it now, were a number of streams of information. We had the information that came from his father, where he was concerned about his son going to Yemen, consorting with extremists, and that he was not going to go back.

We also, though, had other streams of information, coming from intelligence channels that were little snippets. We might have had a partial name, we might have had indication of a Nigerian, but there was nothing that brought it all together.

What we need to do as a government and as a system is to bring that information together so when a father comes in with information and we have intelligence, we can map that up so that we can stop individuals like Abdulmutallab from getting on a plane.

Moran: But that is exactly the conversation we had after 9/11, about connecting these disparate dots. You were one of the architects of the system put in place after that, the National Counterterrorism Center. That’s where the failure occured, right? The dots weren’t connected.

Brennan: Well, in fact, prior to 9/11, I think there was reluctance on the part of a lot of agencies and departments to share information. There is no evidence whatsoever that any agency or department was reluctant to share.

Moran: Including the NSA? Were the NSA intercepts shared with the National Counterterrorism Center?

Brennan: Absolutely. All the information was shared. Except that there are millions upon millions of bits of data that come in on a regular basis. What we need to do is make sure the system is robust enough that we can bring that information to the surface that really is a threat concern. We need to make the system stronger. That’s what the president is determined to do.

Moran: You see millions upon millions of bits of data that—Facebook has 350 million users who put out 3.5 billion pieces of content a week, and it’s always drawing connections. In the era of Google, why does [the] U.S. intelligence community not have the sophistication and power of Facebook?

Brennan: Well, in fact, we do have the sophistication and power of Facebook, and well beyond that. That’s why we were able to stop Mr. Najibullah Zazi, David Headley, [and] other individuals from carrying out attacks, because we were able to do that on a regular basis. In this one instance, the system didn’t work. There were some human errors. There were some lapses. We need to strengthen it.

In our paper, Effective Counterterrorism and the Limited Role of Predictive Data Mining, distinguished engineer and chief scientist with IBM’s Entity Analytic Solutions Group Jeff Jonas and I distinguished between what we called subject-based data analysis and pattern-based analysis.

Subject-based data analysis seeks to trace links from known individuals or things to others… . In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as “predictive data mining.”

The “power” that Facebook has is largely subject-based. People connect themselves to other people and things in Facebook’s data through “friending,” posting of pictures, and other uses of the site. Given a reason to suspect someone, Facebook data could reveal some of his or her friends, compatriots, and communications.

That’s a lot compared to what existed in the recent past, but it’s nothing special, and its nothing like what Brennan wants from the data collection done by our intelligence services. He appears to want data analysis that can produce suspicion in the absence of good intelligence—without the “smoking gun” he says we lacked here.

Unfortunately, the dearth of patterns indicative of terrorism planning will deny success to that project. There isn’t a system “robust” enough to identify current attacks or attempts in data when we have seen examples of them only a few times before. Pattern-based data mining works when there are thousands and thousands of examples from which to build a model of what certain behavior looks like in data.

If Brennan causes the country to double down on data collection and pattern-based data mining, plan on having more conversations about failures to “connect the dots” in the future.

As George Will said on the same show, “When you have millions of dots, you cannot define as systemic failure—catastrophic failure—anything short of perfection. Our various intelligence agencies suggest 1,600 names a day to be put on the terrorist watch list. He is a known extremist, as the president said. There are millions of them out there. We can’t have perfection here.”

We’ll have far less than perfection—more like wasted intelligence efforts—if we rely on pattern-based or predictive data mining to generate suspicions about terrorism.