Big Data and You: Accessing Big Data

Johannes Kast on open data and big data… and the data revolution.

Big Data is shaping the way we look at the world and offers an alternative way of predicting what is going to happen next. And the amount of data is exponentially increasing. While in 2012, 2.8 Billion Terrabyte of data were saved, the IDC predicts that this number will increase to 40 Billion in the year 2020. Data is changing how we make sense of the the world, it changes classic business models drastically and it has the potential to revolutionise social sciences and the development sector.

There is an obvious benefit for companies to use their collected user data to analyse their markets and consumers, a practice that social media has monetized for a while now. And the tendency to collect massive amounts of data by government agencies has been demonstrated by the scope of the recent NSA scandal. However the Open Data and Open Government trend, which is essentially unstructured data being made publicly available to everyone, is growing as well and can potentially open up new possibilities how non-profits (or other third parties) can play a more active and creative role in shaping our world.

While it can be argued that the current form of data being released is supply driven, while it should be demand driven there are already several access points made available. With more than 150,000 data sets and tools to use them, the US Open Data initiative is a step into the right direction, offering raw information on over twenty topics, such as agriculture, climate and education.

Big, BIG, Data Warbles

Abigail Leffler perchs on the development branch and broods over the content analysis of multilingual tweets and posts

Any collection of signs systematically arranged (or the absence thereof) can be read and interpreted. Edgar Allan Poe’s A Dream within a Dream, Edvard Munch’s The Scream painting, Ludwig van Beethoven’s Fifth Symphony, a tiger’s territorial markings in the Amur region, mobile phone traffic in the aftermath of the Haiti earthquake and all the electronic footprints we ever leave behind by virtue of our Internet usage are examples of this. The key point is that, in our search for patterns or for elements that maintain or break patterns in a sample, we are searching for clues to predicting behaviour or finding trends and hidden messages.

Now for the sake of simplicity and to keep true to the title of this post, let us alight on the analysis and derivation of meaning (a.k.a. interpretation) of our Internet footprints. Let us, furthermore, focus on blogging and microblogging in the context of communication for development.

How do we analyse data from blogs and microblogs? We could be looking at quantitative methods such as collecting the amount of tweets and posts and the frequency thereof, and further we could be looking at the geographical distribution of such entries or at the speed at which they come during or after an event. We could consider which entries are the most influential within a specific period of time. We could also be looking into the qualitative content of such data, and we could be looking into a keyword analysis to gauge sentiments or determine key topics in discourse. And now let us expand on this last point. What are the caveats we need to bear in mind when the analysis is conducted within a globalised, multicultural environment, and where tweets and posts come in forms as diverse as chatter, clucks, quacks, chirps, hoots, coos and caws?

A story about when public data was too big, and maybe not so public after all

How open is Sweden to open data? Charlotta Duse investigates.

A daily routine at the local newspaper where I work, and at many others, is that the news chief goes to the town hall to fetch the daily public documents. In these documents one finds correspondence between institutions, decisions made in the municipality, prosecutions, judges, new guidelines, construction permits etc; basically anything that goes on nearby. 

Anyone can get these documents, the data is public and protected by the principle of public access (http://www.regeringen.se/content/1/c6/24/55/92/61c8bc18.pdf): a principle to make sure that the democratic system can be looked into, as well as to promote civic participation. Just as we saw in Abigails post Open Data, transparency is, and should be, one part of ”the good” of open data.

After getting the daily documents, the editorial sorts out what is of interest for its readers. (It should be pointed out that this is no objective process – here lies a big risk of misinterpretation, focus on some things while ignoring others, judging what is public interest and what is not etc.) After choosing the happenings of interest the reporter write his or her article based on the document, a document often written in a complicated language, in a manner that anyone can understand the information given in it.

But some time ago, colleagues in Kalmar had troubles getting access to these public papers. The reason? 

Visualising aid – Sweden’s Open Aid project

Catarina Nilsson presents a practical application of open data in development.

The Swedish government through the ministry for foreign affairs in collaboration with the Swedish International Development Agency (Sida) launched the website www.openaid.se in 2011. The idea was to make the whole chain of events in a development aid contribution available to the public, with transparency as a leading principle.
In line with the increased use of open data www.openaid.se was recently developed and updated. The new site is built as an open source site, which makes it possible for anyone to fetch data and use the software.

openaid.se display of aid to Uganda 2013

The data is available through an API (Application Programming Interface) that enables anyone to construct a query. Sida chose to develop its system in a way that allows anyone to visualise chosen data in an own way.
But which data? A fancy visualisation is never better than the data it is based on. The data used in Open Aid comes from Sida’s systems, so however openly they are shown it still builds on that the agency has its statistics and financial systems in order.

All data is packed in the so called IATI standard, making it somewhat comparable in an international setting. The International Aid Transparency Initiative (IATI), launched as a collaborative initiative at the High Level Forum on Aid Effectiveness in Accra 2008 and has become an international standard in aid transparency.
The IATI standard as used on Open Aid has its problems though, for example are substantial sectors of aid not technically classified as such not yet possible to identify. For example research support or ICT.

Further reading:
About Open Aid www.openaid.se
About IATI www.aidtransparency.net