Biased #data

Diana, October 5.

bias data

To continue my previous post, I’ll talk more about the biased data in this one.

Like I mentioned before, the Big Data is, unfortunately, not objective, but a human creation: Taylor and Schroeder accentuate that if we know the whole information on the matter, it can lead to the difficulty in understanding it and to the unwillingness to share it. Also, if we are not critical enough towards data we are receiving, we can buy false information as it is, without the evidence.

Big Data is everywhere. Big companies or “development professionals” such as the United Nations (UN) or Organisation for Economic Co-operation and Development (OECD) are using these types of data for research and exploration. Companies meet a lot of technical concerns on the way, like risks and issues of bias have tended to dominate the discussion so far.

Taylor and Schroeder point out the role of biased data in development politics. One example is how data is politicised, namely, that even correct data may not be accepted: all information has to be agreed upon in order to be useful to country authorities as support for policy decisions. Many undeveloped countries have that problem, where real information is hard to acquire. Officials censors all information that comes from sectors of the population who feel underrepresented.

bias data

Kate Crawford — a Principal Researcher at Microsoft Research New York City, a Visiting Professor at MIT’s Center for Civic Media and a Senior Fellow at NYU’s Information Law Institute, her research addresses the social impacts of big data and she’s currently writing a new book on data and power with Yale University Press — published an article in Harward Business Review: “The Hidden Biases in Big Data”.

Hidden biases in both the collection and analysis stages present considerable risks and are as important to the big-data equation as the numbers themselves. — Kate Crawford.

Kate takes up an example to explain the hidden bias in data. There was a lot of tweets about Hurricane Sandy, more than 20 million, between October 27 and November 1. A study shows that these data don’t represent the whole picture. The highest number of tweets about Sandy came from Manhattan: the city has a high level of smartphone ownership and Twitter use. On the other hand, it forms the illusion that Manhattan was the hub of the disaster. Not so many messages originated from affected locations, such as Breezy Point, Coney Island, Rockaway and even fewer tweets came from the worst-hit areas.

Here we can ask ourselves: how do the people outside of affected areas know about what is really happening there?

We rely more and more on Big Data’s numbers to speak for themselves, but we risk in misunderstanding the results and in turn misdirecting important public resources are as big as data itself. “Development professionals” do that mistake also, they rely on information without questioning it. All that misinformation can cause a wrong type of help to a wrong place or be an obstacle in aid relief.

Taylor and Schroeder take a similar example of biased data: the Big Data being used by “development professionals” in mobiles for tracking population movement in disaster relief. The problem with collecting this data is that it is not totally complete: not everyone uses mobile phones, with users particularly low amongst vulnerable and ‘hidden’ populations such as children, the elderly, the poorest and women.

As we move into an era in which personal devices are seen as proxies for public needs, we run the risk that already existing inequities will be further entrenched. Thus, with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? — Kate Crawford.


Tags: , , ,

Comments are closed.