Big data in an African context

Shahin Madjidian


Many development goals, policies and programs are based on numbers and statistics. How accurate are these numbers on the African continent and can big data help in improving the accuracy?

African statistics today

In his book, Jerven offers a devastating critique over the state of statistics on the African continent. He notes that the numbers being produced and published are neither reliable nor valid, often being based on estimates, guesswork and/or assumptions. Many times, these assumptions are in turn based on older data sometimes dating back decades. These old baseline numbers have very little relevance with how things look today.

Several consequences can be identified here. Firstly, different actors may look at the same old data, but through different angles, and thus produce very different numbers (in Jerven’s case it is mostly about GDP/capita). Secondly, these poor numbers feed into a larger picture of how African countries are depicted, which problems they have and where these can be found, and any possible solutions to remedy them, to develop the nations.

Jerven laments the poor state of the countries’ statistical offices and argues that they are basically there to serve actors from the international aid, donor and development communities (Jerven 2013:105). “International institutions are the main providers and disseminators”, as he notes (Jerven 2013:8f).

Jerven calls for new baseline estimates, from which fresh statistics can be extrapolated and drawn from. However, he stresses that “these must be based on local applicability, not solely on theoretical or political preference” (Jerven 2013:xiii) and also highlights the importance and necessity of local knowledge and input. Data and statistics ought to serve the needs of the people on the ground, not reaching targets for some faraway aid organization.

Big data replacing statistics?

Can big data replace the poor state of statistics on the African continent and help improve public policy and development goals? First, let us quickly go through what big data is and how it works, before answering the question.

In their book, Mayer-Schönberger & Cukier provide us with a clear overview of big data and what it is. They note that “at its core, big data is about predictions” (Mayer-Schönberger & Cukier 2013:11), about inferring probabilities. Furthermore, big data is about finding the general direction, about a trade-off between being accurate at the micro level versus gaining insights at the macro level (Mayer-Schönberger & Cukier 2013:12f). So far, big data seems like a useful tool to use. In fact, big data can be viewed as pure statistics. But which data can be and is currently collected in big data sets?

A lot of the data comes from using various communication tools, such as cell phones and computers, while simultaneously being connected to the Internet. Taylor & Schroeder warn us when they point out that far from everyone use cell phones or is connected to the Internet in developing countries. This results in user bias and a situation where vulnerable or ‘hidden’ populations, such as children, the elderly and the poorest in society are left out in the data collection (Taylor & Schroeder 2015:510f). They argue that “mobile phone use is highly differentiated by gender and income level” in India (Taylor & Schroeder 2016:506), and a qualified guess is that many African countries exhibit the same patterns.

Meier concurs, saying that “not everyone is on social media. In fact, social media users tend to represent a very distinct demographic, one that is younger, more urban, and more affluent than the norm” (Meier 2015:37). So perhaps inferring national probabilities from a rather narrow subset of the population is a fairly poor idea, which will not give a rewarding big picture, as is one of big data’s positive sides.

Quality of analysis

If the previous section discussed the quality of data, this will delve deeper into the quality of analysis regarding big data. In the previous post I briefly mentioned how big data actors mostly are big corporations and governments. What they have in common is that the majority of them are based in the global North, far away from the realities of Africa.

Jerven writes: “In order to employ the evidence usefully, one must know the conditions under which the data were produced. This is readily recognized in qualitative analysis, but somehow these principles have not been applied to quantitative evidence” (Jerven 2013:7).

Read, Taithe & MacGinty are even more pessimistic and question the quality, reliability and validity of data when “field level information may be sent to headquarters in a different country, collated with other data and then sent back to the country of operation” (Read, Taithe & MacGinty 2016:7). They continue saying that there is a risk where people analyzing the data are cut-off from local knowledge and context, only looking at numbers (Read, Taithe & MacGinty 2016:12).

Mayer-Schönberger & Cukier in turn touch upon the very real possibility of a situation where “data-driven decisions are poised to augment or overrule human judgment” (Mayer-Schönberger & Cukier 2013:141). Let us hand over everything to the machines!

Big data the statistical saviour?

Based on the literature reviewed here, this question can only be answered with a resounding no. Jerven complained about the dominance of outsiders when producing statistics and I cannot see how things would be any different if big data actors were to run the show instead of today’s powerhouses within the statistical field. The same objections, such as democratic deficit and out-of-touch with local circumstances, can be raised and more, such as the gender and income gap among users, may even be added.

Big data proponents argue that big data “offers new and higher knowledge ‘with the aura of truth, objectivity, and accuracy’” (Read, Taithe & MacGinty 2016:10). But statistics, be it presented as big data or traditional surveys carried out on the ground, is always subjected to human bias. This is actually something that Meier, himself a big proponent of big data, confirms when he says that everything is biased (Meier 2015:39).


Jerven, M. 2013: Poor Numbers: How We Are Misled By African Development Statistics and What To Do About it. Ithaca, NY: Cornell University Press.

Mayer-Schönberger, V., Cukier, K. 2013: Big Data: A Revolution That Will Transform How We Live, Work, and Think. London: John Murray Publishers.

Meier, P. 2015: Digital Humanitarians: How BIG DATA Is Changing the Face of Humanitarian Response. Boca Raton, FL: CRC Press.

Read, R., Taithe, B., MacGinty, R. 2016: Data hubris? Humanitarian information systems and the mirage of technology, Third World Quarterly, forthcoming.

Taylor L, Schroeder R. 2015: Is bigger better? The emergence of big data as tool for international development policy. GeoJournal 80: 503-528.

Tags: , , ,

1 comment

  1. Every decision we make in our lives is based on some kind of prior information. In our private lives, these decision in most cases will only affect a limited number of people in our immediate environment. However, when we extrapolate to decisions on a political, national or international level, the impacts will be much larger and especially when referring to already vulnerable and sensitive populations, like are many in the African context which you mention here.
    Going a step backwards, as Hilbert generally notes for big data, “if we improve the structure of prior information on which to base our estimates, our uncertainty will on average be reduced. The better the prior, the better the estimate, the better the decision”. However, there is always the risk of a garbage in-garbage out (GIGO) effect where the analysis of low quality or inappropriate set of data gives low quality and questionable output.
    This could affect very legitimate efforts of using big data for ameliorating the life of populations already in risk, like for example in the Haitian cholera outbreak study I also mentioned in one of my posts in this blog. There, the researchers behind the Karolinska/Columbia study where a bit lucky in terms of the quality of data in that particular study, however, they were very aware that the representativeness of the dataset in the use of mobile data for epidemiological purposes is crucial.
    There is also a reference to this Haitian study example in the Taylor & Schroeder paper you cite here, where they say that if for example, “the mobile operator who has donated the data is favoured by richer mobile phone users, this represents a systematic bias which affects the data’s ability to predict population movements”. A point which is also linked to what you cite in your post about the use of mobile phone in India, which is highly differentiated by gender and income level and indeed could pose a qualified guess on what may be the situation in many African countries as you say.
    Furthermore, the points made by Taylor & Schroeder which you mention regarding the bias in the analysis where ’hidden’ populations, such as children, the elderly and the poorest in society are left out in the data collection, are also linked to the above point and are indeed shared by many scholars.
    As Hilbert notes, the Big data market has already “become bigger than the size of half of the world’s national economies … buying the privilege of access for a fee ‘produces considerable unevenness in the system: those with money – or those inside the company – can produce a different type of research than those outside”. Andrejevic also notes (and cited in Gillingham) “big data has a tendency to exacerbate power imbalances in the digital era, as its techniques are employed by an elite few to make decisions with wide-ranging effects for the many” because “those without access can neither reproduce nor evaluate the methodological claims of those who have privileged access”

    This “digital divide” gap is increasing on an international level too, where together with the “increasing concentration of technological capacity among an ever smaller number of ever more powerful devices … [where] the vast majority of this Big data hardware capacity resides in highly developed countries” as Hilbert says.

    As a closing remark, I would agree with your conclusions on the statistical value of Big data analysis, where I would only add that big data analysis, like all data analysis, is indeed very possible to be biased too and we should always remember that “data, which may seem to be innocuous, can have major societal repercussions” (as Scott, 1998, has argued and was cited in the Taylor & Schroeder 2015 paper which you cite here too).