Dirty data, greed for gain and a lack of diversity in the tech sector: There are many reasons why algorithms discriminate. But lawyers, regulators and, most importantly, critical techies have started standing up against A.I.’s destructive potential. Will human intelligence win?
20 years ago, when I was in high school, we were asked to write a simple algorithm – and backed away once we understood what that meant. Our physical education teacher no longer wanted to grade students according to the absolute height they jumped but for the individual progress they made in comparison to others, also factoring in their body weight, height, and gender. And, being curious and progressive back in the 1990s, when nobody we knew talked about algorithms yet, he asked us math majors to develop a formula for that.
Social justice in sports – how cool was that!? Well, until we felt the awkward responsibility that was bestowed on us. Controlling for gender was easy, but how heavily should we factor in weight and height – how much more difficult was it for a small, heavier student to reach that high jump bar than for a tall slim one? Or was it? Also, would that mean everybody would be asked their weight? And shouldn’t we introduce more variables, like a past knee injury or nearsightedness? We argued, shifted numbers around but it just didn’t seem right, despite the teacher’s good intentions. We refused to arbitrarily write up a golden rule; our teacher would have to continue to rely on his judgment and justify his decisions. For the first time, we had a premonition that mathematics can be destructive if done with exaggerated positivism and keeping ethics out of the equation.
Of course, we had no idea to what extent such formulas would be used and abused a few years later – and most often not with the goal to maximize fairness but profits. Our little formula would have done little harm in comparison: Some students might have felt treated more fairly, but others might have received undeservedly bad grades. And some might have started to tweak their data – more weight, a lower previous result – to make their progress look better and harder earned.
Two decades later, all high school students know how ambivalent algorithms are. The US credit score algorithm, for instance, traps people in a circle of poverty and reinforces social inequalities in the US as discussed in my post last week. But why do so many algorithms discriminate and what we can do against it?
Dirty data can lead to algorithmic bias
Even with good intentions, an algorithm can be biased if the data that it is trained on is flawed – incomplete, unbalanced or not well selected. And this is not a rarity: If you train an algorithm on everything written on the Internet – as has been done with Google Translate – it will develop a prejudice against black people and women according to one study.
A place, where the pertinence of prejudice in user-generated data becomes visible, is Google’s search autocomplete feature – the one that proposes endings for our search terms by looking at previous searches by internet users in the same region. Google started manually removing racist, sexist and antisemitic entries in 2016: Indeed, the phrase “black people are” yields no suggestions anymore and “women are” only harmless ones. However, discrimination still finds its way into the feature: For this article, I reluctantly went down the rabbit hole to trigger stereotypes by typing “migrants are” into the search field – only to find that Google wants me to search whether they are “troublemakers” or whether they carry diseases (screenshots). Luckily, one can report such proposals, which I did.
However, I believe that it should be the responsibility of the algorithm developers to select appropriate data or, if none is available, to admit that their task is impossible. Unfortunately, tech companies do not often muster the guts to retract an expensive algorithm because of its bias. One example where this did happen is an Amazon hiring software that discriminated against women and was hence never used.
Young, white, male, anonymous – our non-elected leaders
Dirty data is not the only reason for the bias of artificial intelligence (A.I.) against women, black and poor people. Companies who want to raise their profits by using an algorithm often use problematic framing: Credit scores for example unduly confound consumption patterns logged by US credit cards with the moral category of trustworthiness.
Dirty data, dirty framing – where does all that come from? One reason is that the A.I. industry is far from diverse. Walk down San Francisco’s Market Street at lunchtime and you will get a feeling for that: The Equal Employment Opportunity Commission has found that data scientists in the US are predominantly white and male; in A.I. more specifically only 12 percent of researchers are women according to a Wired estimate. Furthermore, their salaries are among the highest of all professions: Young A.I. specialists with little or no industry experience can already make between $300,000 and $500,000 a year in salary and stock in the US – for many, this might be an incentive to stay on the job even when it gets ethically questionable.
But what does it mean for our democracies if we shift the power of decision making to a group of mostly young white men earning a fortune in Silicon Valley – self-declared leaders who have never been elected and stay as anonymous as the code they have written?
While journalists or medical doctors are supposed to follow specific ethical guidelines in most countries, Cathy O’Neil criticizes that data scientists tend to be disconnected from the people affected by their code. “So many of the data scientists that are in work right now think of themselves as technicians and think that they can blithely follow textbook definitions of optimization, without considering the wider consequences of their work”, she said in a 2018 interview with Wired.
That is why, already in 2011, O’Neil launched her blog MathBabe: “to mobilize fellow mathematicians against the use of sloppy statistics and biased models that created their own toxic feedback loops” (WMD). And they did mobilize.
Thousands of tech workers started to organize in recent years. It's a beautiful thing to realize we're stronger when we stand together.
This is a great look into how various organizing efforts have gone down.
Keep spreading the word, keep talking to your co workers. https://t.co/nR6MOTUuc2
— Tech Workers Coalition (@techworkersco) February 24, 2020
Since the election of Donald Trump in 2016, activism is on the rise in the US – and tech workers have become part of that wave. They organize walkouts, protests, and petitions whenever they feel that their employers collaborate with an unethical client such as the Chinese government (Google’s censored Dragonfly search engine), the Pentagon (Google’s Project Maven for drone warfare) or ICE, the U.S. Immigration and Customs Enforcement (Palantir’s database on undocumented immigrants). In some cases, several senior engineers quit over the dispute. Sometimes they succeed in pushing their bosses to discontinue a project (as with Project Maven), sometimes not – but they always succeed in alerting public attention to these cases.
Tech workers, judges, and regulators demand accountability
“Tech workers understand the consequences of technology at a theoretical level, but some are unwilling to be responsible for addressing or mitigating those consequences”, then-startup founder Lea Yu told me at a protest in front of the Palantir headquarters in Palo Alto in 2017. She and her organization – the Tech Worker Coalition, a labor rights group active in 13 tech industry centers worldwide – want to change that. “We believe that tech workers are especially resourced and privileged and that we need to use these resources to better stand in solidarity with other workers and fight for the open, inclusive and more equal society we want to see.”
Until now, algorithms are black boxes; their creators can refuse to reveal their ingredients – a secret sauce of code and data – with reference to proprietary law. Hence, one cannot hold algorithms accountable unless the case is ironclad, as O’Neil criticizes: “The human victims of [Weapons of Math Destruction…] are held to far higher standard of evidence than the algorithms themselves.” However, even this might be changing: The first lawsuits are being filed against the most dubious algorithms. The legal community has become increasingly self-critical about the use of algorithm-based risk assessment tools in courts: In 2017, an appeals court ruled that the secretive character of the algorithm used made it impossible for the offender to challenge the decision. And even the U.S. Department of Justice has acknowledged that their lack of transparency raises constitutional questions.
A.I. innovation is so fast-paced that regulation can still not catch up. The European Union which has one of the strictest privacy laws worldwide is trying to prevent algorithmic discrimination through its 2018 General Data Protection Regulation; it gives people “a right to ‘meaningful information’ about the logic underlying automated decisions”. However, the EU acknowledges that this is still difficult to enforce: “Properly auditing algorithms can be extremely complex. […] Copyright rules, companies’ interest in protecting their business secrets, as well as privacy rules can all discourage openness about the data used, hampering meaningful reviews.”
In contrast, the Trump administration does not even try to enforce regulations: In its 2020 guidelines, it urges federal regulators not to “needlessly” intervene because this could hamper innovation.
But it looks as if the current US president’s approval is not necessary. In 2016, Barack Obama has appealed to companies to voluntarily audit their algorithms – and Cathy O’Neil has taken up that idea, offering companies to have a deep look into their black boxes without compromising trade secrets. If she finds that the algorithm under scrutiny does not violate anybody’s rights under any conditions, she awards it her ORCAA (O’Neil Risk Consulting and Algorithmic Auditing) seal – like an “organic” sticker for algorithms. Let me guess: The credit score algorithm discussed in my last post would not receive an ORCAA.
By the way, our physical education teacher seemed oddly happy with our refusal to arbitrarily write up a golden rule. Who knows why he gave us the assignment in the first place…
If you were the CEO of a tech company, how would you make sure that the algorithms you develop do not discriminate against some groups in society? Let us know in the comments below or on twitter. Looking forward to your thoughts!