By Krystle van Hoof
Big Data is one of those buzzwords that everyone seems to be using but no one can clearly define. While I believe it’s always preferable to be clear, the lack of clarity around what is or is not big data is not necessarily a bad thing.
According Kenneth Cukier, data editor at The Economist, “to define [big data] is to constrain it.”
And why would we want to constrain big data?
The optimists will tell you that big data represents a relatively un-tapped spring of knowledge that can help solve some of the worlds most pressing and complex problems—from poverty to climate change.
The pessimists, on the other hand, may tell you about a scenario where big-data algorithms could enable authorities to predict and preemptively act on things that have yet to (and may never) happen in a dystopian, Minority Report pre-crime sort of way.
As with most extreme predictions about new advances in technology, I tend to think the truth will likely take a far more boring route along the middle road.
So what is it?
Lack of one clear, agreed-upon definition aside, there are a few characteristics that most people are willing to agree upon when it comes to big data.
The 3 (or 4 or 5 or 6…) Vs of Big Data
- Volume: There’s a lot of it. so much that any meaningful analysis requires a level of automation (i.e. with computers).
- Variety: It comes from a variety of sources and is in a variety of forms (documents, cell phone GPS data, environmental sensors)
- Velocity: More being added all the time. More and more very recent data. Fast to store and retrieve.
Some other Vs you might encounter in a search for big data definitions could include: Value, Veracity, Variability, Viscosity, Virality, Visibility…
Another aspect of big data that comes up in a lot of definitions is that it [the data] is made up of information that was not collected for the purpose of mining it. The way it is being used new, in the context of big data, is a secondary purpose. For instance, a store may have been collecting sales information for years. But now, they are analyzing that information across data points that allow them to anticipate and prepare for particular purchasing trends.
If we must force the idea into a nutshell-sized explanation, I personally like this definition from the guardian:
“Big data is a moniker for the astonishing amount of information that is created as a byproduct of the growing digitisation of our lives – our use of mobile phones, social networks, mobile money, search engines, online shopping, dating apps and so on. What excites policymakers and development practitioners is that if we can mine these datasets we could suddenly have a whole range of information about people that previously would only have been available with months of painstaking planning, travelling and surveying, or, as is often the case in the poorest countries, not at all.”
Big Data for Development
The big data revolution isn’t just about corporations’ bottom lines—the potential benefits for development cooperation are just starting to be uncovered.
The untapped potential of massive amounts of digital information is a promise that has made the UN sit up and take notice. In 2014, UN Secretary-General Ban Ki-moon set up a group of experts to make recommendations on how to bring about a data revolution for sustainable development.
The Data Revolution Group put out its report in November 2014, which outlines several high-level recommendations; lays out some ideas about what the data revolution means for sustainable development; identifies gaps in current data that need to be filled; and provides a few case studies, which illustrate how the data revolution is playing out around the world.
Apart from the threat of turning into a Big-Brother-like dystopian world, the success of Big Data in the service of development has some challenges:
- Existence of data/reliable collection systems
- Barriers to open data (government and corporate control)
- Privacy Issues
- Access & Representation (who is able to provide data? Who has access/can use it?)
- Standardisation (for better/more accurate comparison)
- Timeliness (can the right people get access and react in time?)
- High-Quality Analysis (do the people with access have the experience and expertise to accurately interpret the data?)
- Early warning: When I was working for WFP in Mali, we collaborated with a number of other organizations on the SAP, which was an early warning system designed to anticipate and allow us to react to food security emergencies. This system was mainly fed by traditional survey data, which takes weeks to collect and analyse. Big data, collected and analyzed in a timely way could have a significant impact in cases like this.
- Real-time: If programmes can respond in a nimble way, real-time information can mean better programs and policies.
- Immediate feedback: If you can continually monitor a population across several data sources, you can respond to adjust and improve policies and programs where needed.
Making it Work (for real)
According to the UN Global Pulse report, to get Big Data working for development, we need two key ingredients:
- Contextualization (if you don’t know what’s normal in a particular country or region, you won’t be able to accurately analyse the data)
- Becoming sophisticated users of information: This comes back to some of the challenges listed above. If you’re planning to spend months or years analyzing your big data, writing white papers on your big findings and setting up committees to discuss it before you do anything useful with it, you might as well flush it down the toilet. (Not mentioning any bureaucracy in particular…)