Injustice Ex Machina: Predictive Algorithms in Criminal Sentencing | UCLA Law Review

Injustice Ex Machina: Predictive Algorithms in Criminal Sentencing
By Andrew Lee Park In Law Meets World February 19, 2019 21 Min read Add comment
Introduction
The notion of crime-prediction technology has been explored in science fiction for quite some time. Though first rearing its head in Phillip K. Dick’s 1956 short story, “The Minority Report,” it remained somewhat dormant in popular consciousness until a half-century later, when Steven Spielberg brought the story to the screen as a thrilling tech-noir blockbuster. The premise was simple: In the year 2054, crime in America is all but nonexistent thanks to the advent of “PreCrime” technology, which allows law enforcement to “see” a crime before it occurs.

In the opening scene, Chief John Anderton of the PreCrime Division, played by the inimitable Tom Cruise, is presented with an urgent PreCrime report—that later that day, a man was to violently stab his wife to death in their home. Immediately, the authorities mobilize like clockwork, and within minutes we see Anderton burst through the front door of the home. There, he finds a dowdy man of middle age standing in the living room, a large pair of scissors in his hands. Seated before him are two lovers—his unfaithful wife and her paramour, caught in the act. Our hero rushes to accost the man in just the nick of time, scarcely a moment before the predicted killing would be realized. As the man is whisked away by authorities for the future murder of his wife, he is heard screaming, “I didn’t do anything! I wasn’t gonna do anything!” But this is America in the year 2054, in a world with infallible PreCrime technology, with Tom Cruise of all people heading the division. Due process be damned when you’ve got a sure thing at stake.

Yet the scene evokes a shade of dystopian anxiety, even assuming that the murder was certain to occur. Being punished and condemned for an unrealized crime offends our ideas of blameworthiness, of moral agency and free will. But lucky for us, we would strain to imagine an America that would allow for such a system to exist. Constitutional protections, as well as the limits of technology as far as we know, would presumably bar PreCrime-like measures from ever being enacted.

But if we were to imagine, for just a moment, that we are indeed living in such a timeline, we might also imagine that the seeds of this future were planted in the summer of 2016. It was then that State v. Loomis1 was decided, in which the Wisconsin Supreme Court upheld the use of a particular algorithm in judicial decisionmaking—a form of machine intelligence that had become an indispensable part of Wisconsin’s sentencing procedure. The algorithm at issue, Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), was designed to assess a defendant’s risk of recidivism—that is, the potential risk that the defendant will commit a crime in the future.2

I. The Use of COMPAS in Sentencing
In the 1990s, a company called Northpointe, Inc. set out to create what is now known as COMPAS, a statistically-based algorithm designed to assess the risk that a given defendant will commit a crime after release.3 In 2012, after years of development, Wisconsin implemented COMPAS into its state sentencing procedures, at which point COMPAS assessments officially became a part of a defendant’s presentence investigation (PSI) report.4

COMPAS’s algorithm uses a variety of factors, including a defendant’s own responses to a lengthy questionnaire, to generate a recidivism-risk score between 1 and 10.5 In general terms, this is accomplished by comparing an individual’s attributes and qualities to those of known high-risk offenders.6 Based on this score, COMPAS classifies the risk of recidivism as low-risk (1 to 4), medium-risk (5 to 7), or high-risk (8 to 10).7 This score is then included in a defendant’s PSI report supplied to the sentencing judge. As a result, a defendant’s sentence is determined—to at least some degree—by COMPAS’s recidivism risk assessment.

But in more precise terms, how does COMPAS calculate its risk score? And what specific kinds of data does it consider? Surprisingly, aside from Northpointe, no one truly knows. One would expect, at minimum, that the court implementing COMPAS would understand how it functions, but that is sadly not the case. In a concurring opinion in Loomis, Justice Shirley S. Abrahamson bemoans this point, noting that “the court repeatedly questioned both the State’s and defendant’s counsel about how COMPAS works. Few answers were available.”8 Though Northpointe has been asked to reveal COMPAS’s source code, it has staunchly refused to do so.9 And as troubling as it sounds, such a refusal is squarely within Northpointe’s legal rights.

As a privately developed algorithm, COMPAS is afforded the protections of trade secret law.10 That means that COMPAS’s algorithm—including its software, the types of data it uses, and how COMPAS weighs each data point—is all but immune from third-party scrutiny.11 This extends not only to those who might exploit the algorithm for pecuniary gain. This also applies to the prosecutors who are putting forth sentencing recommendations, to the defendants who are sentenced under consideration of COMPAS scores, and—as Kafka rolls furiously in his grave—to the judges who use those scores in their sentencing decisions.12

Put simply, COMPAS answers to no one but its creators. Perhaps if COMPAS could at least be demonstrated to treat all defendants fairly and reliably, there may be less cause for concern. But as a recent study has shown, that is almost certainly not the case.

II. Bias in the Machine
A. Competing Notions of Fairness
In a 2016 study conducted by the nonprofit news organization ProPublica, it was demonstrated that COMPAS exhibited a noticeable bias against black defendants.13 After looking at over ten thousand criminal defendants in Broward County, Florida who were sentenced with the assistance of COMPAS scores, the study concluded two things: (1) that black defendants were more likely than white defendants to be incorrectly judged at a higher risk of recidivism, and (2) white defendants were more likely than black defendants to be incorrectly judged as low risk.14 The study was criticized by Northpointe in a 37-page defense, which in turn was rebutted by ProPublica soon after.15 Northpointe contended that its scores were fair because its rate of accuracy in predicting recidivism was the same for black and white defendants—about 60 percent.16 This contention is true, and even ProPublica does not deny that figure.17 But ProPublica also stuck by its findings that COMPAS was unfair in treating black and white defendants differently in terms of its incorrect scores. So how can a score be both fair and unfair at the same time?

A group of researchers studied this phenomenon and published their findings in a Washington Post blog.18 The problem is not explicitly about race, as COMPAS purportedly does not use race as a factor in its risk score.19 The problem, according to the researchers, is about competing notions of what constitutes fairness in the first place. COMPAS’s model attempts to capture fairness in terms of accurately predicting those who do, in fact, reoffend; as an example, those who received a score of seven went on to reoffend roughly 60 percent of the time, regardless of race. However, ProPublica moves the focus away from reoffenders and toward those who end up not reoffending. Under this lens, blacks were roughly twice as likely as whites to be mistakenly classified as medium or high risk, even though they ultimately end up on the straight-and-narrow after release.

But here is the problem: It is mathematically impossible for a model to satisfy both fairness criteria at the same time. Correcting for one necessarily erodes the accuracy of the other.20 As long as COMPAS calibrates its algorithm according to its notion of fairness, the incongruity noted by ProPublica will inevitably occur. This leads us to an important question: Are the benefits of COMPAS getting it right worth the costs to black defendants when it gets it wrong? It is clear how Northpointe would respond. But given COMPAS’s potential for widespread use, perhaps the question isn’t Northpointe’s to answer.

B. Structural Biases in the Data Itself
The worry doesn’t stop there, unfortunately. As stated above, race is purportedly not a factor in COMPAS’s algorithm. But there are likely other factors COMPAS analyzes that serve as proxies for race, which may lead to racial bias in results. Even seemingly innocuous data points can exert prejudice against marginalized demographics. Consider, for example, one’s area of residence. Heavier policing in minority-dominated often inflate arrest statistics for individuals residing in those areas.21 As COMPAS is a statistics-based algorithm, this could make black defendants more prone to inordinately high-risk scores by virtue of where they live.

Such biases would not necessarily be caused by a racist algorithm, as algorithms can largely be characterized as number crunchers. The problem is that algorithms are trained on data produced by humans.22 If the data itself is tainted with historic and structural biases, that taint will necessarily be imputed onto an algorithm’s output. Algorithms in other contexts have already demonstrated this phenomenon, with gender stereotyping skewing Google Image searches,23 racial stereotyping influencing the appearance of targeted advertisements,24 and gay stereotyping leading to absurd recommendations in Google Play’s algorithm.25

As a result, there is a growing fear that algorithms can cause “runaway feedback loops.” This is where historic biases are reflected in an algorithm’s result, further skewing data against marginalized groups, which is then processed again by algorithms to produce even more biased results.26 Viewing COMPAS through this lens, the situation grows dire. The immediate risk is that overrepresented minorities will be issued errant risk scores at an ever-increasing rate. Looking beyond that, however, we can see how longer sentences for certain groups can push them further into poverty and joblessness, into deeper marginalization and disillusionment.

But given the black box of trade secret law, it is impossible to know whether race proxies are indeed being considered, and if so, to what extent. It is telling, however, that even Northpointe’s founder has suggested that race-correlated factors may be at play.27 At the very least, ProPublica’s study reveals that there is some data point causing bias against black defendants. Northpointe’s defense was to contend that COMPAS was accurate under its own measure of fairness. To lean on such a defense, unfortunately, is to burn ants with a looking glass in one hand while pointing to its shadow with the other.

III. Loomis’s Blind Spot: Heuristics and Cognitive Biases
A. How the Loomis Court Got It Wrong
If COMPAS does indeed discriminate based on race, should we be afraid that sentencing decisions will carry the taint of racial prejudice? The Loomis court certainly didn’t think so. The court acknowledged ProPublica’s study in one breath and discounted it in another.28 And though the court failed to properly assess COMPAS’s reliability, the court’s more egregious oversight was in failing to consider the fallibility of the relevant decisionmakers at issue: the sentencing judges.

Of all the arguments raised by the defendant in Loomis, there are two due process challenges that are particularly important: first, that the use of a COMPAS risk assessment at sentencing violates the right to be sentenced based upon accurate information, and second, that it violates a defendant’s right to an individualized sentence. 29 These challenges were grounded in two fundamental aspects of COMPAS assessments: that trade secret protections prevent defendants from verifying a risk score’s accuracy, and that COMPAS uses group data to determine its score.30

In responding to the first challenge, the court misses the mark by a mile. The court held that despite COMPAS’s trade secret protections, a defendant could verify that she correctly filled out her COMPAS questionnaire.31 There is no mention of whether a defendant has access to specific data inputs, the factors considered, or how heavily certain factors were weighed—information that is essential to understanding how a COMPAS score is achieved.

But more significant is the court’s response to the second challenge. It held that the use of COMPAS did not violate the defendant’s right to an individualized sentence based on three assumptions: (1) The COMPAS score, despite its use of group data, was not the determinative factor at sentencing; (2) risk scores give judges more “complete information” which allows them to better weigh sentencing factors; and (3) trial courts can “exercise discretion” when assessing a defendant’s score, and as a result, disregard scores that are inconsistent with a defendant’s other factors.32 How the Loomis court arrived at these assumptions is unclear, to say the least. Recall that sentencing judges do not have access to COMPAS’s algorithm. Like everyone else, they only have access to the naked score, which is appended to the back of their PSI report.33 The court assumes that judges—without any qualifications regarding COMPAS—will not over-rely on a score, will weigh the other sentencing factors fairly, and will know when a score is wrong. But if the wealth of studies regarding cognitive biases is any indication, these assumptions are misguided.

B. COMPAS and Cognitive Bias
Research has established that even the most prudent decisionmakers are prone to severe errors in judgment. These cognitive biases result from the brain’s natural tendency to rely on heuristics, or simple rules of thumb, when dealing with complicated mental tasks.34 Though there are several types of cognitive biases that can cause a judge to over-rely on a COMPAS score, there is one that is particularly salient, which is known as automation bias.

Automation bias refers to the tendency to “ascribe greater power and authority to automated aids than to other sources of advice.”35 Studies show that automation bias rears its head in a wide variety of situations, from evacuees blindly following malfunctioning robots in emergency situations,36 to seasoned radiologists relying on faulty diagnostic aids when they would have fared better without them. 37 This occurs because humans subconsciously prefer to delegate difficult tasks to machines, which we view as powerful agents with superior analysis and capability.38 And the more difficult the task—and the less time there is to do it—the more powerful this bias becomes.39

So was the Loomis court right in believing COMPAS scores would not become a determinative factor at sentencing? Or that judges would know when to disregard an errant score? Almost certainly not. Given the inherent complexities and time constraints of sentencing, judges are prone to place undue weight on a COMPAS score. This is exacerbated by the fact that COMPAS’s manual informs judges that a “counter-intuitive risk assessment” is not an indicator that the algorithm has functioned improperly.40

Other cognitive biases serve to further entrench COMPAS’s role in sentencing decisions. Confirmation bias, or the tendency to seek information that validates one preconceived notion while rejecting other information, 41 may lead judges to disregard other factors in a PSI report that counter a given risk score. Bias blind spot, or the tendency for people to see themselves as less susceptible to bias than others,42 may cause judges to underestimate the degree to which their sentencing decisions are being skewed by COMPAS assessments. As a result, it is not only likely that defendants’ due process rights are violated at sentencing when COMPAS is involved, but it is also likely that sentencing judges may be acting, inadvertently, as facilitators of racial prejudice as a result of this overreliance.

Conclusion
In the final act of Minority Report, PreCrime is abolished after the discovery of a crucial flaw—once potential criminals become aware of their future, they have the power to avert it. In our world, however, defendants assessed under COMPAS remain at the mercy of its imperfections.43 The ramifications of this are clear: Crime-prediction technology like COMPAS will continue to disfavor marginalized groups, judges will continue reinforcing this bias through their sentencing decisions, and defendants will continue to be sentenced without knowing whether they were afforded due process or equal treatment under the law.

Seeing how prevalent AI has become in our everyday lives,44 predictive algorithms like COMPAS will likely become increasingly common in our criminal justice system. However, the outlook is not all doom and gloom. When wielded correctly, AI may indeed promote efficiency without implicating significant concerns related to machine bias. Thus, the solution must be more nuanced than to banish them entirely.

For one, our laws must adapt to the novel challenges these technologies present. Specifically, trade secret law should not serve to bar defendants from raising and investigating valid due process questions. As scholars have noted, the law and policy governing trade secrets must be reformed to account for the individual rights and social interests at stake.45 Protections meant to safeguard a company’s economic interests should not be blindly applied where legitimate issues of social justice are implicated.46

In addition, data scientists must work with the state to develop ways to mitigate algorithmic prejudice and potential feedback cycles. It would be difficult to fully strip data of its structural and historical biases. However, research can be done to flag the types of data points that are most prone to racial bias. Algorithms can then be trained to weigh those data points less heavily, or to compute them in a manner that minimizes the probability of promoting racial disparity.47

But most of all, there must be a broader discussion among policymakers regarding the role machine intelligence should play in our judicial system as a whole. Fortunately, this discussion need not start from scratch. At a developer conference in 2016, Microsoft CEO Satya Nadella shared his approach to AI. First, machine intelligence should augment rather than displace human decisionmaking; second, trust must be built directly into new technology by infusing them with modes of transparency and accountability; and third, technology must be inclusive and respectful to everyone.48 Principles such as these must anchor and guide future policy discussions regarding AI and criminal justice, in which the need for judicial efficiency must be balanced against society’s interests in judicial fairness, transparency, and racial equality.

As there are signs we are edging towards a PreCrime-like future in more ways than one,49 we find ourselves at a critical juncture. If we proceed without due consideration of the risks these algorithms pose, we may find ourselves relying far too much on technologies we do not fully understand. We may unwittingly begin perpetuating past injustices on a widespread, systematic level. And chillingly, we may be headed for a future where individuals are regularly condemned for prospective crimes they may never commit. And that, we may agree, is a future best left for the realm of science fiction.