Tag: jeff jonas

Good News! Online Tracking is Slightly Boring

You have to wade through a lot to reach the good news at the end of Time reporter Joel Stein’s article about “data mining”—or at least data collection and use—in the online world. There’s some fog right there: what he calls “data mining” is actually ordinary one-to-one correlation of bits of information, not mining historical data to generate patterns that are predictive of present-day behavior. (See my data mining paper with Jeff Jonas to learn more.) There is some data mining in and among the online advertising industry’s use of the data consumers emit online, of course.

Next, get over Stein’s introductory language about the “vast amount of data that’s being collected both online and off by companies in stealth.” That’s some kind of stealth if a reporter can write a thorough and informative article in Time magazine about it. Does the moon rise “in stealth” if you haven’t gone outside at night and looked at the sky? Perhaps so.

Now take a hard swallow as you read about Senator John Kerry’s (D-Mass.) plans for government regulation of the information economy.

Kerry is about to introduce a bill that would require companies to make sure all the stuff they know about you is secured from hackers and to let you inspect everything they have on you, correct any mistakes and opt out of being tracked. He is doing this because, he argues, “There’s no code of conduct. There’s no standard. There’s nothing that safeguards privacy and establishes rules of the road.”

Securing data from hackers and letting people correct mistakes in data about them are kind of equally opposite things. If you’re going to make data about people available to them, you’re going to create opportunities for other people—it won’t even take hacking skills, really—to impersonate them, gather private data, and scramble data sets.

If Senator Kerry’s argument for government regulation is that there aren’t yet “rules of the road” pointing us off that cliff, I’ll take market regulation. Drivers like you and me are constantly and spontaneously writing the rules through our actions and inactions, clicks and non-clicks, purchases and non-purchases.

There are other quibbles. “Your political donations, home value and address have always been public,” says Stein, ”but you used to have to actually go to all these different places — courthouses, libraries, property-tax assessors’ offices — and request documents.”

This is correct insofar as it describes the modern decline in practical obscurity. But your political donations were not public records before the passage of the Federal Election Campaign Act in 1974. That’s when the federal government started subordinating this particular dimension of your privacy to others’ collective values.

But these pesky details can be put aside. The nuggets of wisdom in the article predominate!

“Since targeted ads are so much more effective than nontargeted ones,” Stein writes, ”websites can charge much more for them. This is why — compared with the old banners and pop-ups — online ads have become smaller and less invasive, and why websites have been able to provide better content and still be free.”

The Internet is a richer, more congenial place because of ads targeted for relevance.

And the conclusion of the article is a dose of smart, well-placed optimism that contrasts with Senator Kerry’s sloppy FUD.

We’re quickly figuring out how to navigate our trail of data — don’t say anything private on a Facebook wall, keep your secrets out of e-mail, use cash for illicit purchases. The vast majority of it, though, is worthless to us and a pretty good exchange for frequent-flier miles, better search results, a fast system to qualify for credit, finding out if our babysitter has a criminal record and ads we find more useful than annoying. Especially because no human being ever reads your files. As I learned by trying to find out all my data, we’re not all that interesting.

Consumers are learning how to navigate the online environment. They are not menaced or harmed by online tracking. Indeed, commercial tracking is congenial and slightly boring. That’s good news that you rarely hear from media or politicians because good news doesn’t generally sell magazines or legislation.

The Difficulty With Finding Rare Events in Data

John Brennan, assistant to the president for homeland security and counterterrorism, made the rounds of the Sunday political shows this weekend. He’ll be reviewing the attempted bombing of Northwest flight 253 for the president. 

His appearance on ABC’s This Week program revealed his struggle with the limitations on data mining for counterterrorism purposes. His interviewer, Terry Moran, betrayed even less awareness of the challenge. Their conversation is revealing:

Moran: Who dropped the ball here? Where did the system fail?

Brennan: Well, first of all, there was no single piece of intelligence or “smoking gun,” if you will, that said that Mr. Abdulmutallab was going to carry out this attack against that aircraft. What we had, looking back on it now, were a number of streams of information. We had the information that came from his father, where he was concerned about his son going to Yemen, consorting with extremists, and that he was not going to go back.

We also, though, had other streams of information, coming from intelligence channels that were little snippets. We might have had a partial name, we might have had indication of a Nigerian, but there was nothing that brought it all together.

What we need to do as a government and as a system is to bring that information together so when a father comes in with information and we have intelligence, we can map that up so that we can stop individuals like Abdulmutallab from getting on a plane.

Moran: But that is exactly the conversation we had after 9/11, about connecting these disparate dots. You were one of the architects of the system put in place after that, the National Counterterrorism Center. That’s where the failure occured, right? The dots weren’t connected.

Brennan: Well, in fact, prior to 9/11, I think there was reluctance on the part of a lot of agencies and departments to share information. There is no evidence whatsoever that any agency or department was reluctant to share.

Moran: Including the NSA? Were the NSA intercepts shared with the National Counterterrorism Center?

Brennan: Absolutely. All the information was shared. Except that there are millions upon millions of bits of data that come in on a regular basis. What we need to do is make sure the system is robust enough that we can bring that information to the surface that really is a threat concern. We need to make the system stronger. That’s what the president is determined to do.

Moran: You see millions upon millions of bits of data that—Facebook has 350 million users who put out 3.5 billion pieces of content a week, and it’s always drawing connections. In the era of Google, why does [the] U.S. intelligence community not have the sophistication and power of Facebook?

Brennan: Well, in fact, we do have the sophistication and power of Facebook, and well beyond that. That’s why we were able to stop Mr. Najibullah Zazi, David Headley, [and] other individuals from carrying out attacks, because we were able to do that on a regular basis. In this one instance, the system didn’t work. There were some human errors. There were some lapses. We need to strengthen it.

In our paper, Effective Counterterrorism and the Limited Role of Predictive Data Mining, distinguished engineer and chief scientist with IBM’s Entity Analytic Solutions Group Jeff Jonas and I distinguished between what we called subject-based data analysis and pattern-based analysis.

Subject-based data analysis seeks to trace links from known individuals or things to others… . In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as “predictive data mining.”

The “power” that Facebook has is largely subject-based. People connect themselves to other people and things in Facebook’s data through “friending,” posting of pictures, and other uses of the site. Given a reason to suspect someone, Facebook data could reveal some of his or her friends, compatriots, and communications.

That’s a lot compared to what existed in the recent past, but it’s nothing special, and its nothing like what Brennan wants from the data collection done by our intelligence services. He appears to want data analysis that can produce suspicion in the absence of good intelligence—without the “smoking gun” he says we lacked here.

Unfortunately, the dearth of patterns indicative of terrorism planning will deny success to that project. There isn’t a system “robust” enough to identify current attacks or attempts in data when we have seen examples of them only a few times before. Pattern-based data mining works when there are thousands and thousands of examples from which to build a model of what certain behavior looks like in data.

If Brennan causes the country to double down on data collection and pattern-based data mining, plan on having more conversations about failures to “connect the dots” in the future.

As George Will said on the same show, “When you have millions of dots, you cannot define as systemic failure—catastrophic failure—anything short of perfection. Our various intelligence agencies suggest 1,600 names a day to be put on the terrorist watch list. He is a known extremist, as the president said. There are millions of them out there. We can’t have perfection here.”

We’ll have far less than perfection—more like wasted intelligence efforts—if we rely on pattern-based or predictive data mining to generate suspicions about terrorism.

Fort Hood: Reaction, Response, and Rejoinder

Commentary on the Fort Hood incident can be categorized three ways: reaction, response, and rejoinder (commentary on the commentary).

Reactions generally consist of pundits pouring their preconceptions over what is known of the facts. These are the least worthy of our time, and rejoinders like this one from Stephen M. Walt of Harvard University in the Fort Hood section of The Politico’s Arena blog dispense with them well:

Of course [Fort Hood] is being politicized; there is no issue that is immune to exploitation by politicians and media commentators. The problem is that there are an infinite number of “lessons” one can draw from a tragic event like this — the strain on our troops from a foolish war, the impact of hateful ideas from the fringe of a great religion (and most religions have them), the individual demons that drove one individual to a violent and senseless act, etc., — and so no limits to the ways it can be used by irresponsible politicians (is that redundant?) and pundits.

My favorite response—by “response,” I mean careful, productive analysis—was written last year as a general admonition about events like this (which at least has terrorist connotations):

Above all else is the imperative to think beyond the passions of those who are hurt, frightened or angry. Policymakers who become caught up in the short-term goals and spectacle of terrorist attacks relinquish the broader historical perspective and phlegmatic approach that is crucial to the reassertion of state power. Their goal must be to think strategically and avoid falling into the trap of reacting narrowly and directly to the violent initiatives taken by these groups.

That’s Audrey Kurth Cronin, Professor of Strategy at the U.S. National War College in her monograph, Ending Terrorism: Lessons for Defeating al-Qaeda.

But I want to turn to a critique leveled against my recent post, ”The Search for Answers in Fort Hood,” which discussed how little Fort Hood positions us to prevent similar incidents in the future. (I hope it was response and not reaction, but readers can judge for themselves.)

A thoughtful Cato colleague emailed me suggesting that there may have been enough indication in Nidal Hasan’s behavior—in particular, correspondence with Anwar al-Awlaki—to stop him before his shooting spree.

There may have been. Current reporting has it that his communications with al-Awlaki were picked up and examined, but because they were about a research paper that he was in fact writing, he was deemed not to merit any further investigation.

This can only be called error with the benefit of hindsight. And it tells us nothing about what might prevent a future attack, which was my subject.

If humans were inert objects, investigators could simply tweak the filter that caused this false negative to occur. They could not only investigate the people who contact known terrorists as they did Nidal Hassan, they could know to disregard claimed academic interests. Poof! The next Nidal Hassan would be thwarted at a small cost to actual researchers.

But future attacks are not like past attacks. Tweaking the filter to eliminate this source of false negatives would simply increase false positives without homing in on the next attacker. Terrorists and terrorist wannabes will change their behavior based on known and imagined measures to thwart them. Nobody’s going to be emailing this al-Awlaki guy for a while.

In “Effective Counterterrorism and the Limited Role of Predictive Data Mining,” IBM distinguished engineer Jeff Jonas and I used examples from medicine to illustrate the problem of false positives when searching for terrorism in large data sets, concluding:

The question is not simply one of medical ethics or Fourth Amendment law but one of resources. The expenditure of resources needed to investigate 3,000,000, 15,000,000, or 30,000,000 fellow citizens is not practical from a budgetary point of view, to say nothing of the risk that millions of innocent people would likely be under the microscope of progressively more invasive surveillance as they were added to suspect lists by successive data-mining operations.

The same problems exist here, where tens of thousands of leads may present themselves to investigators each year. They must balance the likelihood of harm coming to U.S. interests against the rights of U.S. citizens and the costs of investigating all these potential suspects.

Armchair terror warriors may criticize these conclusions a variety of ways, believing that post hoc outrage or limitless grants of money and power to government can produce investigative perfection. (n.b. Getting victim states to dissipate their own money and power is how terrorism does its work.) But none can accurately say based on currently available facts that anyone made an error. Much less can anyone say that we know any better how to prevent essentially random violent incidents like this in the future.

Report to DoD: Data Mining Won’t Catch Terrorism

Via Secrecy News, “JASON”—a unit of defense contractor the MITRE Corporation—has reported to the Department of Defense on the weakness of data mining for predicting or discovering inchoate terrorist attacks.

“[I]t is simply not possible to validate (evaluate) predictive models of rare events that have not occurred, and unvalidated models cannot be relied upon,” says the report.

In December 2006, Jeff Jonas and I published a paper making the case that predictive modeling won’t discover rare events like terrorism. The paper, Effective Counterterrorism and the Limited Role of Predictive Data Mining, was featured prominently in a Senate Judiciary Committee hearing early the next year.

Privacy gives way to appropriate security measures, as the Fourth Amendment suggests, where it approves “reasonable” searches and seizures. Given the incapacity of data mining to catch terrorism and the massive data collection required to “mine” for terrorism, data mining for terrorism is a wrongful invasion of Americans’ privacy—and a waste of time.

600 Billion Data Points Per Day? It’s Time to Restore the Fourth Amendment

Jeff Jonas has published an important post: “Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food!”

More than you probably realize, your mobile device is a digital sensor, creating records of your whereabouts and movements:

Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day. Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not. Got a Blackberry? Every few minutes, it sends a heartbeat, creating a transaction whether you are using the phone or not. If the device is GPS-enabled and you’re using a location-based service your location is accurate to somewhere between 10 and 30 meters. Using Wi-Fi? It is accurate below 10 meters.

The process of deploying this data to markedly improve our lives is underway. A friend of Jonas’ says that space-time travel data used to reveal traffic tie-ups shaves two to four hours off his commute each week. When it is put to full use, “the world we live in will fundamentally change. Organizations and citizens alike will operate with substantially more efficiency. There will be less carbon emissions, increased longevity, and fewer deaths.”

This progress is not without cost:

A government not so keen on free speech could use such data to see a crowd converging towards a protest site and respond before the swarm takes form – detected and preempted, this protest never happens. Or worse, it could be used to understand and then undermine any political opponent.

Very few want government to be able to use this data as Jonas describes, and not everybody wants to participate in the information economy quite so robustly. But the public can’t protect itself against what it can’t see. So Jonas invites holders of space-time data to reveal it:

[O]ne way to enlighten the consumer would involve holders of space-time-travel data [permitting] an owner of a mobile device the ability to also see what they can see:

(a) The top 10 places you spend the most time (e.g., 1. a home address, 2. a work address, 3. a secondary work facility address, 4. your kids school address, 5. your gym address, and so on);

(b) The top three most predictable places you will be at a specific time when on the move (e.g., Vegas on the 215 freeway passing the Rainbow exit on Thursdays 6:07 - 6:21pm – 57% of the time);

(c) The first name and first letter of the last name of the top 20 people that you regularly meet-up with (turns out to be wife, kids, best friends, and co-workers – and hopefully in that order!)

(d) The best three predictions of where you will be for more than one hour (in one place) over the next month, not counting home or work.

Google’s Android and Latitude products are candidates to take the lead, he says, and I agree. Google collectively understands both openness and privacy, and it’s nimble enough still to execute something like this. Other mobile providers would be forced to follow this innovation.

What should we do to reap the benefits while minimizing the costs? The starting point is you: It is your responsibility to deal with your mobile provider as an adult. Have you read your contract? Have you asked them whether they collect this data, how long they keep it, whether they share it, and under what terms?

Think about how you can obscure yourself. Put your phone in airplane mode when you are going someplace unusual - or someplace usual. (You might find that taking a break from being connected opens new vistas in front of your eyes.) Trade phones with others from time to time. There are probably hacks on mobile phone system that could allow people to protect themselves to some degree.

Privacy self-help is important, but obviously it can be costly. And you shouldn’t have to obscure yourself from your mobile communications provider, giving up the benefits of connected living, to maintain your privacy from government.

The emergence of space-time travel data begs for restoration of Fourth Amendment protections in communications data. In my American University Law Review article, “Reforming Fourth Amendment Privacy Doctrine,” I described the sorry state of the Fourth Amendment as to modern communications.

The “reasonable expectation of privacy” doctrine that arose out of the Supreme Court’s 1967 Katz decision is wrong—it isn’t even founded in the majority holding of the case. The “third-party doctrine,” following Katz in a pair of early 1970s Bank Secrecy Act cases, denies individuals Fourth Amendment claims on information held by service providers. Smith v. Maryland brought it home to communications in 1979, holding that people do not have a “reasonable expectation of privacy” in the telephone numbers they dial. (Nevermind that they actually have privacy—the doctrine trumps it.)

Concluding, apropos of Jonas’ post, I wrote:

These holdings were never right, but they grow more wrong with each step forward in modern, connected living. Incredibly deep reservoirs of information are constantly collected by third-party service providers today.

Cellular telephone networks pinpoint customers’ locations throughout the day through the movement of their phones. Internet service providers maintain copies of huge swaths of the information that crosses their networks, tied to customer identifiers. Search engines maintain logs of searches that can be correlated to specific computers and usually the individuals that use them. Payment systems record each instance of commerce, and the time and place it occurred.

The totality of these records are very, very revealing of people’s lives. They are a window onto each individual’s spiritual nature, feelings, and intellect. They reflect each American’s beliefs, thoughts, emotions, and sensations. They ought to be protected, as they are the modern iteration of our “papers and effects.”