Google Flu Trends and Privacy

The recent privacy dust-up about Google’s Flu Trends service is interesting - and confounding.

Flu Trends is one of many cool things that can be done with data. By tracking searches that suggest the existence of flu symptoms, Google can identify influenza outbreaks about two weeks faster than the Centers for Disease Control, as illustrated by this video graph.

Searches reveal our interests, and this service highlights that fact. So the good folks at Patient Privacy Rights and the Electronic Privacy Information Center wrote a letter to Google asking for more information about the privacy consequences of Google Flu Trends. This kind of inquiry and exposure is important to the successful operation of markets because it helps educate both the public and businesses about the privacy consequences of services like Google.

The letter is a little confounding, though. It asks, “Would you agree to publish the technique that Google has adopted to protect the privacy of search queries for Google Flu Trends?”

It’s an inartfully drawn question. Search queries don’t have privacy - they’re inanimate character strings. What the letter intends, I think, is to ask how the privacy of Google users is protected in developing the data for Google Flu Trends. Still, the request is a bit incoherent.

Google said in response to the letter:

Flu Trends uses aggregated data from hundreds of millions of searches over time. Flu Trends uses aggregations of search query data which contain no information that can identify users personally. We also never reveal how many users are searching for particular queries. The only information released publicly or to the CDC is what is seen on the Flu Trends website now: estimates of the percentages of people with influenza-like illnesses.

It’s essentially a given that drawing aggregated data from hundreds of millions of searches produces data that is not identifiable. The data relevant for display by Google Flu Trends is not identifiable. Google Flu Trends doesn’t affect the privacy of Google users. It’s using Google at all that affects their privacy.

There’s value in exploring these issues though, and here’s where I think there is pay dirt in the PPR/EPIC letter:

[T]he question is how to ensure that Google Flu Trends and similar techniques will only produce aggregate data and will not open the door to user-specific investigations, which could be compelled, even over Google’s objection, by court order or Presidential authority.

The rule of law has fallen this far: Advocates must cite the privacy threat from unconstrained, unilateral “Presidential authority.” The letter is also right to point out that courts can strip away Google’s control of the information it collects about its users.

This is a problem with use of all Google services, and a problem with the use of all Internet services. The heart of the problem lies not with the current leader in search, or any other Internet innovator. The problem lies with our unconstrained government.

Yes, Google is playing a dangerous game with the data it collects from us. It has nonchalantly beaten the CDC at its own game, and one can’t predict how the agency will respond. CDC may seek to deputize Google as its public health agent. As the PPR/EPIC letter points out, it may drive Google to reveal more precise - and identifiable - information about health-related searches.

Any agency could do this to any Internet service provider while our law about privacy/search and seizure is in such a shambles.

Again, I think advocacy of this type is a valuable part of market processes because of its educational value, but if I had written the letter, I would have written it to the head of the Centers for Disease Control asking for a pledge that the agency will not use any informal or extra-judicial means to collect personally identifiable health information.