The Guardian ran an article this weekend discussing predictive policing and its future. Read it, it’s worthwhile.
I greatly appreciated a number of the moral concerns Morozov raises, and he does an excellent job of connecting the issue to much of its surrounding social context. He also is quite balanced in his approach, urging caution while being cognizant of the real, on-the-ground benefits of the technology.
Unfortunately, he falls into the same trap as Eli Pariser (The Filter Bubble) in ascribing algorithmic deficiencies to questionable allegiances of their creators:
But how do we know that the algorithms used for prediction do not reflect the biases of their authors? For example, crime tends to happen in poor and racially diverse areas. Might algorithms – with their presumed objectivity – sanction even greater racial profiling?
I am sure there are people doing adversarial data mining and engaging in unscrupulous analytic activities, both in execution and intent. At RecSys 2010, for instance, the industry keynote discussed how one gambling operation uses customer modeling to predict when a patron is reaching a high risk of gambling to a level where they quit cold-turkey and intervening based on this knowledge, not to protect the gambler from their habit but to keep them coming back the next weekend.
But programmers building their algorithms inappropriately are not, in my opinion, the biggest threat. There is plenty of opportunity for racial profiling, religious bias, and other troubling bases for law enforcement to enter the equation without the need for complicit algorithmists.
First, the algorithm’s output is only as good as its input. The article acknowledges this to some extent, observing that police records are limited in their coverage and scope while the prospect of using data from sources such as Facebook could provide a much broader corpus to mine for crime signals. It does not, however, tie this in to the broader scope. Not only are the police records incomplete, they are tainted by the biases of current enforcement and law. If the police enforce laws disproportionately against certain groups (or perhaps worse, if the laws are written to have more impact on those groups), then those groups will be over-represented in the data; if the crime model learns this trait, it lends analytic support to closing the loop of a vicious cycle.
But suppose the data were perfect and the model fair. Suppose it then identifies “young male of Scandinavian descent”¹ as a trait predicting significantly² higher-than-normal probability for criminal involvement.
We are then faced with a profound moral question: is using a model that has identified such a feature, even from unbiased data, moral? Is it acceptable to increase police scrutiny (and likely hassling) of a certain ethnic, racial, or socio-economic group, even if we have data that says it is accurate?
Or is it better to sacrifice the accuracy of our crime modeling to sustain a democratic society with liberty and justice for all?
As data miners, model-builders, and members of society, we must think about these questions.
¹ For the record, I’m under 30 and half-Swedish.
² I use ‘significantly’ in the statistical sense, meaning that it is a robust predictor instead of noise or happenstance, not that it increases the probability by a substantial amount.