Misleading Project Veritas Accusations of Google “Bias” Could Prompt Bad Law

Tomorrow, the Senate’s Judiciary Committee’s Subcommittee on The Constitution will hold a hearing on Google’s alleged anti-conservative bias and “censorship.”  In a video released last month, James O’Keefe, a conservative activist, interviews an unnamed Google insider. The film, which has been widely shared by conservative outlets and cited by Sen. Ted Cruz (R-TX) and President Donald Trump, stitches a narrative of Orwellian, politically-motivated algorithmic bias out of contextless hidden camera footage, anodyne efforts to improve search results, and presumed links between unrelated products. Although the film’s claims are misleading and its findings unconvincing, they are taken seriously by lawmakers who risk using such claims to justify needless legislation and regulation. As such, they are worth engaging (the time stamps throughout this post refer to the Project Veritas video that can be viewed here).

Search algorithms use predefined processes to sift through the universe of available data to locate specific pieces of information. Simply put, they sort information in response to queries, surfacing whatever seems most relevant according to their preset rules. Algorithms that make use of artificial intelligence and machine learning draw upon past inputs to increase the accuracy of their results over time. These technologies have been adopted to improve the efficacy of search, particularly in relation to the gulf between how users are expected to input search queries, and the language they actually use to do so. They are only likely to be adopted to the extent that they improve the user’s search experience. When someone searches for something on Google, it is in the interest of both Google and the user for Google to return the most pertinent and useful results.

Board game enthusiasts, economics students, and those taking part in furious public policy debates over dinner all may have reasons to search for “Monopoly.” A company that makes it the easiest for such a diverse group of people to find what they’re looking for will enjoy increased traffic and profit than competitors. Search histories, location, trends, and additional search terns (e.g. “board game,” “antitrust”) help yield more tailored, helpful results.

Project Veritas’ film is intended to give credence to the conservative concern that culturally liberal tech firms develop their products to exclude and suppress the political right. While largely anecdotal, this concern has spurred hearings and regulatory proposals. Sen. Josh Hawley (R-MO) recently introduced legislation that would require social media companies to prove their political neutrality in order to receive immunity from liability for their users speech. Last week, President Trump hosted a social media summit featuring prominent conservative activists and conspiracy theorists who claim to have run afoul of politically biased platform rules.

The film begins by focusing on Google’s efforts to promote fairer algorithms, which are treated as attempts to introduce political bias into search results. The insider claims that while working at Google, he found “a machine learning algorithm called ML fairness, ML standing for machine learning, and fairness meaning whatever they want to define as fair.” (6:34) The implication being that Google employees actively take steps to ensure that Google search results yield anti-conservative content rather than what a neutral search algorithm would. Unfortunately, what a “neutral” algorithm would look like is not discussed.

Although we’re living in the midst of a new tech-panic, we should remember that questions about bias in machine learning and attempts to answer them are not new, nor are they merely a concern of the right. Rep. Alexandria Ocasio-Cortez (D-NY) and the International Committee of the Fourth International have expressed concerns about algorithmic bias. Adequate or correct representation is subjective, and increasingly a political subject. In 2017, the World Socialist Web Site sent a letter to Google, bemoaning the tech giant’s “anti-left bias” and claiming that “Google is “’disappearing’ the WSWS from the results of search requests.”

However, despite the breathlessness with which O’Keefe “exposes” Google’s efforts to reduce bias in its algorithms, he doesn’t bring us much new information. The documents he presents alongside contextless hidden camera clips of Google employees fail to paint a picture of fairness in machine learning run amok.

One of the key problems with O’Keefe’s video is that he creates a false dichotomy between pure, user created signals and machine learning inputs that have been curated to eliminate eventual output bias. The unnamed insider claims that attempts to rectify algorithmic bias are equivalent to vandalism: “because that source of truth (organic user input) has been vandalized, the output of the algorithm is also reflecting that vandalism” (8:14).

But there is little reason to presumptively expect organic data to generate more “truthful” or “correct” outputs than training data that has been curated in some fashion. Algorithms sort and classify data, rendering raw input useful. Part of tuning any given machine learning algorithm is providing it with training data, looking at its output, and then comparing that output to what we already know to be true.

Take a recent example from Wimbledon. IBM uses machine learning to select highlight clips, their algorithm’s inputs include player movements and crowd reactions. While crowd reactions can provide valuable signals, they can also be misleading. “An American playing on an outside court on 4 July may get a disproportionate amount of support, throwing the highlight picking algorithm out of sync.” While we expect the crowd’s cheers to be driven by their appreciation of a player’s skill, they may also cheer to celebrate the appearance of an American on Independence Day. If IBM wants to identify moments of skillful play rather than the mere appearance of Americans on the court, they must reduce the relative importance of audience applause in their algorithm, debiasing it.  

Despite the insider’s claim that “they would never admit this publicly,” (9:45) Google is quite open about its efforts to prevent algorithmic bias. The firm maintains a list of machine learning fairness resources, including an extensive glossary of terms describing different sorts of bias, and sample code demonstrating how to train classifiers while avoiding bias. These public resources are, frankly, far more extensive, and reveal more about Google’s efforts to prevent machine learning bias, than anything in the latest Veritas video.

The fact that Google News is not an organic, unfiltered search product (11:30) is not news either. Google’s news content policies are open to the public, and Google gives further public guidance to publishers as to what their algorithms prioritize in news pages.

The “demonstration” of Google search bias that follows, relying on autocomplete suggestions rather than actual search results, are far from “undeniable.” O’Keefe first types “Hillary Clinton’s emails are” into Google’s search bar and notes that Google does not continue to autofill the search. Without actually conducting a search, they conclude that “Google is suggesting that people do not search for this term” and that “its not even worth returning any results for” (15:48). But they haven’t actually conducted a search. If they had, Google’s search would have returned millions of web pages concerning Clinton’s use of a private email server as a government employee. When one uses a more generic query, with less punctuation, typing “clinton ema” into the search bar, Google autosuggests “clinton emails on film” and “clinton emails foia”, and surfaces results from Judicial Watch and the Daily Caller. While few people may use O’Keefe’s convoluted search term, Google’s autofill doesn’t shy away from suggesting searches for Clinton’s emails, and returns a deluge of results regardless of the search phrase used.

Next, O’Keefe uses Google trends to compare searches for “Hillary Clinton’s emails” to searches for “Donald Trump’s emails”. Compared to searches for Clinton’s emails, searches for Trump’s emails are effectively nonexistent. However, these are relative trends, so the fact many more people searched for “Clinton’s emails” than “Trump’s emails” does not mean that no one has ever searched for “Trump’s emails.” Nevertheless, O’Keefe claims that the low relative interest in “Trump’s emails” implies that there ought to be no autocomplete suggestions for the search term. “Now let’s go back to Google.com and search for Donald Trump’s emails and it should show us no autocomplete because according to Google no one searches for it compared to Hillary Clinton’s emails.” (17:07) Despite this expectation, Google does indeed suggest search queries when one types “Donald Trump’s emails” into the search bar. The Google insider explains this result by saying; “according to them (Google) Hillary Clinton’s emails is a conspiracy theory and it’s unfair to return results based on her emails” (17:40).

Is this the incontrovertible evidence of Google’s bias that conservatives have been searching for? Not at all. The fact that there were relatively more searches for Clinton’s emails than for Trump’s does not mean that no one ever searched for Trump’s emails. Simply because one query is used more often than another does not imply that the seldom-used query ought not generate search suggestions. Remember that O’Keefe never actually searched for “Clinton’s emails,” and generated an absence of autocomplete results by typing “Hillary Clinton’s emails are,” a more specific search term less likely to generate autofill suggestions. Finally, far from refraining from returning “results based on her emails,” because it would be “unfair,” if one actually searches for any of these Clinton email query permutations, Google returns a host of topical results.

While undercover footage of a Google employee discussing the 2016 election is supposed to provide evidence of anti-conservative animus, these clips have been so stripped of context. The employee (who I have not named here because of the threats she has received following her involuntary appearance in O’Keefe’s film) could be discussing either nefarious efforts to prevent Trump’s reelection or Google’s very public work to secure its services against Russian manipulation. Absent context it’s impossible to tell.

While perhaps inarticulately phrased, “We got screwed over in 2016, it wasn’t just us, it was, the people got screwed over, the news media got screwed over, like everyone got screwed over so we’re rapidly been like, what happened there and how do we prevent it from happening again?” (3:12) does not a smoking gun make. The employee, for her part, says that she had been approached to discuss mentorship programs for women of color and was filmed recounting how Google has worked to prevent election interference.

Escalating from mere misrepresentation to outright falsehood, the unnamed insider discusses Section 230 of the Communications Decency Act. CDA 230 contains two substantive provisions. The first, (c)(1), prevents “providers of an interactive computer service” from being “treated as the publisher or speaker of any information provided by another information content provider.” This prevents Planned Parenthood from suing YouTube for hosting O’Keefe’s undercover videos of Planned Parenthood employees.

The second provision, (c)(2), allows content hosts to moderate content, or “restrict access to or availability of material that the provider or user considers to be obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable.” This ensures that if YouTube does not want to play host to O’Keefe’s misleading videos, it does not have to.

After reading only CDA 230(c)(1) to viewers, O’Keefe asks the unnamed insider about proposals to amend the statute.

19:14 O’Keefe: Some people think a solution is this section 230 and taking it away?

19:20 Unnamed Insider: I mean, they violated not only the letter of the law but the spirit of law, section 230 says that in order for them to be a platform they can’t censor the content that they have, instead they decided to act as a publisher, making them responsible for everything they put on and they’re still masquerading as a platform even though they’re acting as a publisher.

This is not a case of poor interpretation of law or clumsy use of language. It is simply not true. CDA 230 (c)(2) clearly contravenes the insider’s claim. The full statute explicitly provides for private moderation; “censorship” in the employee’s parlance. It also makes no mention of a publisher/platform distinction. As far as the law is concerned, whether a website that hosts user submitted content is a “platform” or a “publisher” simply does not matter.

The comparison between discreet refusals to host videos and Nazi book burnings that follow (22:57) do not inspire greater confidence. While conservatives may have reason to suspect that their cultural distance from Silicon Valley makes fair moderation difficult, this concern should not spur the embrace of conspiratorial claims made by those with a history of misleadingly editing video. Public policy must rest on firm factual grounds, not aspersions and the deliberate misreading of existing statutes. Unfortunately, some members of Congress seem poised to legislate on the basis of misleading propaganda, instead of taking the time to understand how algorithms actually work.

Special thanks to Cato Institute research assistant William Duffield for research assistance.