The significant difference is that they are now automated, often with the aid of artificial intelligence such as machine learning. To isolate just one of these three examples, consider how the credit approval process has evolved over the past 100 years. In the early part of the 20th century, a loan was in large measure based on character as judged by a face-to-face meeting with a loan officer, supported by a few pages of paperwork on finances dropped into a loan file. When the credit card industry was in its infancy during the mid-20th century, a retail sales clerk used a rotary phone to call an authorization clerk who would look through stacks of computer-generated paper reports to determine if someone was approved for a credit card or an individual purchase.
That clunky process has been modernized, as explained by Michael Kearns and Erin Roth in their book The Ethical Algorithm. They write:
When you apply for a credit card, your application may never be examined by a human being. Instead an algorithm pulling in data about you (and perhaps also about people “like you”) from many different sources might automatically approve or deny your request.
Kearns and Roth are faculty members in the computer science department at the University of Pennsylvania. They specialize in and have published widely on algorithms, machine learning, and algorithmic game theory. Both have also co-authored academic books in this field: Kearns with An Introduction to Computational Learning Theory and Roth with The Algorithmic Foundations of Differential Privacy. In The Ethical Algorithm the authors try to address the less technical reader. To that end, they start with a simplified definition of an algorithm: “a very precisely specified series of instructions for performing a concrete task.”
The placement of “ethical” in the book’s title makes sense because one of the themes that arises throughout the book is the consideration of the privacy, fairness, and other ethical issues that occur in the development and application of algorithms. The authors also apply game theory to how users interact with algorithms and assess the reliability of data used in typical assessments of algorithms.
Widely applied algorithms / Everyone who follows the financial markets on a regular basis has heard of the “FANG” stocks: Facebook, Amazon, Netflix, and Google. Each of the members of this American technology club has its own famous underlying algorithm that contributed to such phenomenal success. Facebook has its news feed and advertising algorithms, Amazon has its “customers who bought this item also bought” algorithm, Netflix has a similar algorithm to recommend movies, and Google’s Search and Maps applications are driven by algorithms.
Most of these algorithms fall into the category of “collaborative filtering.” According to Kearns and Roth, the description of these algorithms as “collaborative” is because an individual user’s data are blended with the available data of others to create recommendations. The authors take this approach frequently throughout The Ethical Algorithm as they twin a recognizable or easily explained algorithm with an explanatory, technical term used by computer science experts.
Privacy concerns / Kearns and Roth break down case studies of the increasing lack of anonymity for our personal data because of the expansion of these algorithms. This includes what they call “reidentification,” the risk of exposing a data contributor’s identity or other personal details. A troublesome example of this phenomenon is the release of fitness data compiled based on contributions of users of Fitbit, an application that allows a user to track progress and set fitness goals. The technology for Fitbit relies on GPS coordinates, which has a benign purpose: to allow precise distance measurements and to enable those who want to keep fit while traveling to determine where popular running routes are in an unfamiliar city. But it also reveals the location of American military bases in countries like Afghanistan because U.S. military personnel are some of the biggest users of Fitbits. Given the dearth of Fitbit users in such countries, it is easy to locate U.S. military bases.
The authors also delve into very sensitive data issues such as health records:
In the mid-1990s, a government agency in Massachusetts called the Group Insurance Commission (GIC) decided to help academic researchers by releasing data summarizing hospital visits for every state employee. To keep the records anonymous, the GIC removed explicit patient identifiers.
The governor at the time, William Weld, assured voters that patient privacy was protected. “Latanya Sweeney, who was a PhD student at MIT at the time … set out to find William Weld’s medical records from the anonymous data release,” Kearns and Roth explain. Sweeney was able to narrow the data set to six records based on Weld’s birthday, and then narrowed it down to one because, of the six, “only one lived in the Governor’s zip code.” In her final act of this research, “She sent them to [Weld’s] office.”
Kearns and Roth explain the concept of “k‑anonymity” as one potential way to address these privacy concerns:
An initial idea for a solution … is to redact information from individual records so that no set of characteristics matches just a single data record. Individual characteristics are divided into “sensitive” and “insensitive” attributes…. The goal of k‑anonymity is to make it hard to link insensitive attributes to sensitive attributes.
Applying an accepted definition of fairness / The concept of privacy is something most people understand and appreciate, but the notion of fairness has a broad range of interpretations. Kearns and Roth explain that, in the case of some of the FANG algorithms, “controlled online experiments have demonstrated racial, gender, political and other types of bias in Google search results, Facebook advertising, and other Internet services.” Kearns and Roth commit some time to defining fairness in terms of statistical parity in a world of two races of people, Circles and Squares. They write:
Suppose for some reason we are concerned about discrimination against Squares in the granting of loans by a lender…. Statistical parity simply asks that the fraction of Square applicants that are granted loans be approximately the same as the fraction of Circle applicants that are granted loans, a crude constraint saying that the rate of granted loans has to be roughly the same for both races.
After walking through the mechanics of how this would work in practice, Kearns and Roth summarize the two likely results of developing an ethical algorithm: “one by denying loans to creditworthy Circle applicants and the other by granting loans to Square applicants we know (or at least predict) will default.” They conclude that “in an era of data and machine learning, society will have to accept, and make decisions about, trade-offs between how fair models are and how accurate they are.” Similar choices are presented in the case of having a “fair” university application process.
Conclusion / The authors open the book by making the argument that, rather than addressing these algorithm tradeoffs through post hoc regulations, “the idea is to fix them from the inside.” One example of this approach is the k‑anonymity concept, which was a solution to reduce the likelihood of reidentification. The authors also talked early on about developing “quantitative definitions of social values that many of us can agree on.”
I was expecting a final chapter (or two) that would bring home the strains of thought on these topics, but the final chapter was a bit of a disappointment. It is quite brief (half a dozen pages) and the final discussion of design of ethical algorithms ended abruptly, relying on a “case-by-case” approach to developing solutions, although many of the solutions posited throughout the book were helpful in giving a sense of possible approaches. The authors emphasize that avoidance of algorithms is simply not an option, as their omnipresent and growing nature means that it is not at all possible to “avoid algorithms altogether … [as] all decision-making — including that carried out by human beings — is ultimately algorithmic.”
I admit that I was a bit out of my comfort zone in reading this book. The case studies on the FANG companies were understandable and relatable, as I had not given much thought to how these algorithms come together. But the discussions of the technical issues were a tough climb at times. I consider myself comfortable with high-level discussions about statistical and technology issues, but some of the terminology on computer science was a bit too much in the weeds for my taste. They became difficult to follow once Kearns and Roth strayed from the case studies and tried to link them to statistical or technology concepts.
Serves me right for accepting a book recommendation from someone who has a doctorate in economics.