Date posted: 29/03/2018 5 min read

The toxic cocktail of human bias in data

Hear from Cathy O’Neil, an American mathematician, data scientist, Ted Talk speaker and author on how inaccurate and unethical data really is.

In brief

  • Data is not accurate as it involves human bias
  • Everyone needs to understand big data as its ethical implications are far-reaching for people, work prospects, education, health systems, politics and democracy
  • New field of algorithm auditing is emerging, to ensure accountability of algorithms

We have so much incidental data collected about us that people who understand big data techniques can infer all sorts of things about us.

The scary thing is big data is being applied in places where we might assume very high standards, like hiring people for jobs, deciding who gets fired or how long they should go to prison. Or in the recent case of Cambridge Analytica, making headlines about the use of data to influence the US presidential campaign.

  So, with so much data available on virtually everything how can we be sure the data is accurate?

Data is wildly inaccurate 

Cathy O’Neil argues that when used correctly data can be enormously powerful, yet data is also “wildly inaccurate”. However, she says, we're not trying to get perfect accuracy, it “only has to be 51% accurate.” We will never know exactly what people want to buy, but we'll have a better guess than nothing. And as we gather more information and history about people, the “wild guesses get less wild”, though she believes it will “never approach 99% accuracy.”

The recent conversation in relation to the presidential election and the involvement of Cambridge Analytica highlights the problems with data accuracy and privacy. Cathy O’Neil commented that she didn’t know how successful Cambridge Analytica was, but suspected that the effects could have been exaggerated.

“I do think political micro-targeting is potentially a very dangerous tool, because it's a propaganda tool, but I would say that in this country, Fox News was just as powerful as any political campaign at streaming and disseminating propaganda as well as all the fake news stuff that wasn't Cambridge Analytica at all."

O’Neil believes that political micro-targeting campaigns have an enormous amount of power, to send information to influence individual voters.

“The point here is that it's not informative. I think the worst example, which I do think maybe had something to do with Cambridge Analytica, of this kind of emotional cueing, was that they actually sent out voter suppression ads right before the election.”

She explained that the dark campaign, which the public are not shown, happened on Facebook prior to the US election, sending out suppression ads specifically to African Americans to convince them not to vote at all.

“The word on the street is that these voter suppression ads had something to do with Clinton and it was in the style of South Park. And, it was supposed to be like funny but anti-Clinton ... I feel like that really undermines democracy.

“I'm not an expert in propaganda. But, I do know that what we've built, especially the Facebook part of the internet, is a perfect mechanism to deliver propaganda, and that's a threat to democracy," says O’Neil. 

Power and ethics of big data

The real issue is that data is created by humans, and data scientists are doing two things at once. They're building code but they're also implicitly or explicitly inserting human values and ethics into their code. Often, they don't think about the ethical implications of their code. The problem is that we have default ethics that are often extremely destructive and lay waste to thousands of people's lives. But because we're not acknowledging or addressing it, so it continues to happen.

“When these algorithms are being applied to hundreds if not thousands of people in high stake situations, I think everyone deserves to assume a standard of accuracy.” 

Cathy O’Neil cites the example of the attempt to improve education in the US. Calls for accountability for education fell to “make the teachers accountable.”  So-called “bad teachers” were found using an inaccurate algorithm that was based solely on student test scores. It was a high stakes situation “with power behind it”, in which some of the teachers were denied tenure based on bad scores.” Teachers weren’t even explained how the algorithm worked, and couldn’t appeal if they thought they had the wrong score. 

“People are intimidated by math and big data algorithms,” Cathy says, so they don’t have the power to question and push back. “Instead we’re just going to replace this entire conversation with a silver bullet algorithm.”

Algorithms need to be accountable

Big data is considered a silver bullet to many problems and we need to make algorithms accountable. Cathy believes there should be some kind of check, which she calls ”ground-truth,” to ensure that the algorithm is performing as expected. 

In order to ensure that what we're doing isn't discriminatory, Cathy advises thinking about every algorithmic process as an old-fashioned human process, and using the same checks to ensure fairness. 

We can ask for “an algorithmic audit” in which we build methodology to test whether algorithms are meaningful, accurate, fair and legal. “And, these are all question that we absolutely must start asking.”

Talent for the future of data

There's a new field of expertise which is interpreting and auditing algorithms, which Cathy believes will grow in the next ten years. If we're successful, we’ll demystify the technical aspects of algorithms and raise awareness of the ethical problem of human bias, increasing fairness and transparency. 

“I think as more and more data is collected, assuming it continues to be legal to do such surveillance and collection, that algorithms will become more accurate over time.”

Hear the full podcast with Cathy O'Neil

Big Data is a Weapon of Math Destruction – Ep. 12

Listen here

Search related topics