Big Data and You: Accessing Big Data

Johannes Kast on open data and big data… and the data revolution.

Big Data is shaping the way we look at the world and offers an alternative way of predicting what is going to happen next. And the amount of data is exponentially increasing. While in 2012, 2.8 Billion Terrabyte of data were saved, the IDC predicts that this number will increase to 40 Billion in the year 2020. Data is changing how we make sense of the the world, it changes classic business models drastically and it has the potential to revolutionise social sciences and the development sector.

There is an obvious benefit for companies to use their collected user data to analyse their markets and consumers, a practice that social media has monetized for a while now. And the tendency to collect massive amounts of data by government agencies has been demonstrated by the scope of the recent NSA scandal. However the Open Data and Open Government trend, which is essentially unstructured data being made publicly available to everyone, is growing as well and can potentially open up new possibilities how non-profits (or other third parties) can play a more active and creative role in shaping our world.

While it can be argued that the current form of data being released is supply driven, while it should be demand driven there are already several access points made available. With more than 150,000 data sets and tools to use them, the US Open Data initiative is a step into the right direction, offering raw information on over twenty topics, such as agriculture, climate and education.

Many other nations and governmental agencies offer portions of their collected data as well. Data Catalogs is the most comprehensive list with now 390 catalogs of Open Data provided by local, regional and national governments, but also from international organisations like the United Nations, the World Bank and NGOs.

Other places to find and make use of data can be found at the center of knowledge and research. Several universities took part in Dataverse Network projects, which originated at Harvard University in 2006. The Dataverse Network is an open source application for “sharing, citing, analysing and preserving research data.” There are several other Dataverse Networks being launched by universities in the US and other countries, such as in Holland and Denmark.

Some of the largest collectors of data, and arguably the most insightful data when it comes to mirroring our societies, is that created by the collective users of social media sites like Twitter and Facebook. It is estimated that Facebook ingests 500 times more data than the New York Stock Exchange. In cooperation with the most popular social media sites, GNIP is trying to make this ever-expanding wealth of information available to everyone, however it comes with a price, depending on the size and focus of the project the data is being used for.

Another example for an public Open Data map is Natural Earth which flexibly interprets data through visually appealing maps. Freebase is a community driven and open repository of structured (in graphs) data of over 39 million topics about real-world entities like people, places and things. But also individuals come up with innovative ways to collect and interpret data. The user of the social networking site reddit ieeamo came up with this interactive visualisation of a number of data sets on different topics.

And here is a collection of 20 Big Data repositories that were shared by Bernard Marr and posted on Data Science Central to check out.

Big Data, Open Data and Open Government are on the rise and are being described by many as an impending Data Revolution. How will these massive amounts of data at our finger tips change our world and are they potentially impactful enough to eradicate poverty? With the increasing speed that information is created, shared and stored – and increasingly made accessible – we might soon find out.

  1. Thank you Johannes for this insightful post, which I read with interest. The list with dataset sites is comprehensive! I have bookmarked a few.

    Your post set me thinking about inclusion, though. Open data should be accessible in order to be actionable, but I noticed that the level of abstraction in many of the sites is quite high. Put simply, ‘open data has little value if people cannot use it’ (Hammer: 2013).

    Hopefully we are moving towards a model of more ‘user friendly’ dataset tools.

    Well done and best wishes from a fellow student
    Abigail Leffler

    Cf. Craig Hammer at [Accessed 18 October 2014]

    • This is an observation I have made as well. Big Data can be accessed through “Application Programming Interfaces” which is usually provided by the big data holders. Considering the hype around Big Data at the moment, it can likely be concluded that the software will become more user friendly and flexible in its data assessment in the future.