Oct 17

Who are they? Citizens who are using the big data?

Diana, October 26.

big data citizens

In my last post, I would like to review the topic of Big Data one last time. I have been talking about challenges with Big Data, as well as about the hidden bias in it. One of my points, that I lifted up in my earlier posts, was that it is citizens, people, who create Big Data. People, by sharing, tagging, commenting, writing posts are creating all that information that is being selected in the big see of data. Like this group mentioned many times before, data, on one hand, could be very useful and helpful, but, on the other hand, damaging and contains some risks.

Let’s, then, look at the beginning, namely, at those people who write, share, like, tag etc. Who are they?

I can see two types of people who create information on the internet: “development professionals” and “readers”. I have already named “development professionals” – professional organisations such as NGO’s. Thomas Tufte states that technological frameworks have opened up a pathway for a better way to spread the information to locals, government, donors in order to provide an improved aid, for example in health and education.

“Development professionals” expect a transparent, legit data in order to collect the required information, in turn, to know where they are needed and what kind of help is needed from them. I’m not going into details here, I, and my group mates, already discussed both risks and opportunities with big data usage by the organisations. The group, also, lifted some positive and negative examples of big data usage. Instead, I’ll go directly to the second type of people being responsible for creating big data.

On the other side, Michael Mandiberg is talking about “readers”. Readers are people who used to just read the information on the internet, but have become the writers themselves and now has control of the information on social media.

Give the people control of media, they will use it. The corollary: Don’t give the people control of media, and you will lose. Whenever citizens can exercise control,
they will. – Jeff Jarvis

Recent development in social media gave people the freedom not just to communicate with each other, but, also, to express their opinions and thoughts online. New technological frameworks have emerged that focus on allowing media creations, like blogging, where people can speak openly about any issue, any matter, any problem. Those new frameworks gave way to so-called amateur media, which can be a risk of becoming bigger than any of those professional organizations.

Furthermore, “amateurs” can be a reason for biased data, namely, they don’t always doublecheck the legitimacy of the information they are spreading. I’m not going to go more into details, I already provided some examples of biased data in my previous posts.

Big data citizens

In conclusion, I have to say that this post is ironic because we are the newly cleated bloggers (well, some of us), who are writing on the subject of #BigData. Who is here to say that we are legit? We might just be those so-called “amateurs”. Are we contributing to a transparent data or biased? Yes, we are using the academic references. Yes, we are doing our researchers before posting our blog posts. No, we are not professionals. Who are we then? Here, in the end of my discussion, I discovered a third group of people who create information on the internet: us 🙂

Thank you for the time you spent on our blog reading it, commenting, participating in discussions, sharing it! It was a fun time for all of us! I hope that you enjoyed it as much as we in this group did!


Oct 17

Big (biased) Data. What can go wrong?

Diana, October 16.

Today’s question that I want to ask is: what can go wrong by using Big Data? I move, with that, from theoretical posts to more explanatory post to show, with a particular example, how Big Data can be a risk for “development professionals” in order to provide aid.

big data wrongs

In my previous posts, I talked about risks and hidden bias in Big Data. I mentioned, that people are the ones responsible for data construction.

Google and particularly the Internet are generally observed as groundbreaking discoveries that have changed the way millions of people live their lives and yet researchers and practitioners in the field of ICT and development often struggle to demonstrate explicit influences of the technology to “development professionals”. There are definite reasons why certain projects fail and there are even some generalisable outlines of failure.

One of the examples is Google – the most used search engine in the world, where millions of people can find all kind of information that affects their daily lives. In 2008, Google came up with a, like they thought, brilliant application – Google Flu Trends – to truck flu and its spreading in the world. That has been done in order to help “development professionals” to provide aid to affected areas. Google claimed that they could see the advances of flu based on people’s searches. The essential idea was that when people are sick with the flu, they search for flu-related information on Google, providing almost instant signals of overall flu prevalence.

But this concept didn’t work. Why? Let’s examine.

David Lazer and Ryan Kennedy write in SCIENCE that Google relayed too much on simple search. That led to the spectacular failure of Google Flu Trends. Application missed, at the peak of the 2013, flu season by 140 percent.

Like I mention in my previous post, it is hard to know what is really happening in the affected area if you are not actually present in this area.

That is what happened here – Google didn’t take into the account that multiple people with the flu don’t actually use the search engine to seek for flu-related information. Furthermore, Google didn’t do the research of how many people rely upon internet in order to find records about the flu. Also, Google didn’t take into the account all those people who use Yahoo or Bing instead of Google.

David Lazer and Ryan Kennedy – professors in the Department of Political Science at the College of Computer and Information Sciences at Northeastern University respective at the University of Houston – continue that Google’s algorithm was relatively weak to overfitting to seasonal terms unrelated to the flu. With millions of search terms being fit into data, there were searches that were strongly correlated by pure chance.

These terms were unlikely to be determined by actual flu cases or to be prognostic of future inclinations. Moreover, Google did not take into account variations in search activities over time. These errors are not randomly distributed: an old error predict a new error scale of error varies with the time of year (seasonality). These outlines mean that Google Flu Trends overlook significant information that could be extracted by traditional statistical methods.

big data wrong

Google, as well as the whole Internet, is continuously changing because of the activities of millions of engineers and consumers. Researchers require an improved understanding of how these changes transpire over time. Scientists need to reproduce findings using these data sources across time and using other data sources to guarantee that they are observing robust outlines and not temporary trends. For instance, it is extremely practicable to do controlled experiments with Google, e.g., observing how Google search results will differ based on location and past searches.

More commonly, reviewing the evolution of socio-technical systems rooted in our societies is fundamentally important and worthy of study. The algorithms underlying Google support to regulate what we find out about our health, politics, and friends.

It’s Not Just About Size of the Data. There is a tendency for big data research and more traditional applied statistics to live in two different realms – aware of each other’s existence but generally not very trusting of each other (SCIENCE).

Big data offer massive potentials for understanding human connections at a societal scale, with rich spatial and temporal changing aspects, and for spotting compound interactions and nonlinearities among variables. Those are the most thrilling borderlines in studying human behaviour.

As an alternative of focusing on a “big data revolution,” perhaps it is time to concentrate on an “all data revolution,” where it can be recognised that the critical change in the world has been innovative analytics, using data from all traditional and new sources, and providing a deeper, clearer understanding of our world.


Oct 17

Biased #data

Diana, October 5.

bias data

To continue my previous post, I’ll talk more about the biased data in this one.

Like I mentioned before, the Big Data is, unfortunately, not objective, but a human creation: Taylor and Schroeder accentuate that if we know the whole information on the matter, it can lead to the difficulty in understanding it and to the unwillingness to share it. Also, if we are not critical enough towards data we are receiving, we can buy false information as it is, without the evidence.

Big Data is everywhere. Big companies or “development professionals” such as the United Nations (UN) or Organisation for Economic Co-operation and Development (OECD) are using these types of data for research and exploration. Companies meet a lot of technical concerns on the way, like risks and issues of bias have tended to dominate the discussion so far.

Taylor and Schroeder point out the role of biased data in development politics. One example is how data is politicised, namely, that even correct data may not be accepted: all information has to be agreed upon in order to be useful to country authorities as support for policy decisions. Many undeveloped countries have that problem, where real information is hard to acquire. Officials censors all information that comes from sectors of the population who feel underrepresented.

bias data

Kate Crawford — a Principal Researcher at Microsoft Research New York City, a Visiting Professor at MIT’s Center for Civic Media and a Senior Fellow at NYU’s Information Law Institute, her research addresses the social impacts of big data and she’s currently writing a new book on data and power with Yale University Press — published an article in Harward Business Review: “The Hidden Biases in Big Data”.

Hidden biases in both the collection and analysis stages present considerable risks and are as important to the big-data equation as the numbers themselves. — Kate Crawford.

Kate takes up an example to explain the hidden bias in data. There was a lot of tweets about Hurricane Sandy, more than 20 million, between October 27 and November 1. A study shows that these data don’t represent the whole picture. The highest number of tweets about Sandy came from Manhattan: the city has a high level of smartphone ownership and Twitter use. On the other hand, it forms the illusion that Manhattan was the hub of the disaster. Not so many messages originated from affected locations, such as Breezy Point, Coney Island, Rockaway and even fewer tweets came from the worst-hit areas.

Here we can ask ourselves: how do the people outside of affected areas know about what is really happening there?

We rely more and more on Big Data’s numbers to speak for themselves, but we risk in misunderstanding the results and in turn misdirecting important public resources are as big as data itself. “Development professionals” do that mistake also, they rely on information without questioning it. All that misinformation can cause a wrong type of help to a wrong place or be an obstacle in aid relief.

Taylor and Schroeder take a similar example of biased data: the Big Data being used by “development professionals” in mobiles for tracking population movement in disaster relief. The problem with collecting this data is that it is not totally complete: not everyone uses mobile phones, with users particularly low amongst vulnerable and ‘hidden’ populations such as children, the elderly, the poorest and women.

As we move into an era in which personal devices are seen as proxies for public needs, we run the risk that already existing inequities will be further entrenched. Thus, with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? — Kate Crawford.


Sep 17

Think About It – Is Bigger Better?

Diana, September 26.

Information and communication technology (ICT) haven’t even existed for some years ago. Now, it helps us to interact in the digital world. ICT gave way to Big Data revolution, namely, to all voluminous amount of structured and unstructured data which meant to be quarried with information. The amount of data that’s being created and stored on a global level is almost inconceivable, and it just keeps growing.

I challenge you to think about it one more time – is bigger really better? Let’s try to answer this question.


SAS Software

Taylor and Schroeder talk about that the development of data and technologies, as well as usage of those by people, have the potential to give the public a rich mine of information about health interventions, human mobility, conflict and violence, technology adoption, communication dynamics and economic behaviour.

The bigger data, the better: it allows us to perceive the environment in new ways. By having more information, we can do things that you couldn’t do before. We can collect information, share it, analyse it, learn from it and store it for years to come. Also, big data is a good tool to solve some of the world’s problems, like global food insecurity, medical care, energy and climate change.

Additionally, data and technologies bring together heterogeneous “development professionals“, such as donors, non-governmental organisations’ activists, government policy officers, consultants, academics, intended beneficiaries and so forth, who are active in various development aid organisations distributed all over the world.

In the data-driven world, the usage of data is also necessary. “Development professionals” are using data not only to promote and endorse development discourse but, as well, to save time. Big data accessibility and availability to useful information allows organizations to better understand the changing aspects of local field environments and, in turn, simplifies a better decision-making. Big data is a game changer if it is good, clean, accurate and transparent.

Nevertheless, what about the risks of losing data in the sea of all that informational overflow?

Taylor and Schroeder stress that bigger is not better, namely, there is an absence of good data. They lift up few drawbacks with “Bigger” Data. Data is not always simple and stable, namely, we need knowledge of how to use it. Most of the time, it is enough with some basic knowledge. It depends, of course, on what is the purpose of usage: a post on Facebook or managing a website.

Further, data can be bias. If we know the whole information on the matter, it can lead to the difficulty in understanding it and to the unwillingness to share it. The other risk here can be that we are not critical enough towards data we are receiving, namely, we buy it as it is, without the evidence.

Moreover, risks with an absence of the clear ethical framework, as well as rules for handling and sharing. The data revolution is so far mainly a technical one: the power of data to sort, categorise and intervene has not yet been clearly linked to a moral basis. In fact, while data-driven unfairness is evolving at exactly the same pace as data processing technologies, awareness and tools for fighting it are not.

Furthermore, anonymization techniques are unreliable. Data anonymization is the system intended to make it impossible to identify a particular individual from stored data related to him/her. Unfortunately, it doesn’t always work. One aspect of anonymization that worries individuals who value their privacy is that the process can be reversed.

The only way to stop big data from becoming big brother is to introduce privacy laws that protect the basic human rights online.”
― Arzak Khan


Sep 17

Hello there!

@Data4ComDev, September 10.


Welcome to our newly started blog. We are happy that you have found your way here already. You may wonder: who are you and what is this blog all about?

We are five students enrolled in a Master’s Programme in Communication for Development at Malmö University. We are currently in the process of exploring the possibilities and challenges of starting up this joint blog. No doubt it will be a little bit hard, very rewarding, extremely educating and, for sure, heaps of fun.

The theme we will be exploring is called Social Media, Data and Development. We will hence be looking into data in the context of development communication. That is what we have decided for now, and we are looking forward to figuring out the rest.

While you wait for us to finalise our plans, please take some time to have a look at our brief bios in the “about us” section in the top right corner. And make sure to come back soon, we are all up for some intense weeks of blogging.