Eraptis, October 12
In my last post, I asked the question about how data could be used in order to measure the impact of the #HeForShe movement on women’s empowerment and argued that theory could guide us in the interpretation of such data. But through logical deduction data must first be generated before it can be analyzed, how does it work when data is generated in practice?
In accordance with Morten Jerven, a basic point of departure when we want to know something about a population is to first establish what the population is. Only after we have established this can we know something about other properties affecting that population. From a development perspective factors such as economic growth, agricultural production, education and health measurements are all predicated upon population data to be meaningful. Many times, however, the definite (real) population number is not actually known but estimated through a population counting process commonly referred to as a census. Jerven illustrates the possible implications of census-taking in a development context through a case study from Nigeria, saying that:
“Today, we can only guess at the size of the total Nigerian population. In particular, very little is known about the population growth rate. The history of census-taking in Nigeria is an instructive example of the measurement problems that can arise in sub-Saharan Africa. It is also a powerful lens through which the legitimacy of the Nigerian colonial and postcolonial state can be observed” – Morten Jerven
Without us getting into the particulars of the Nigerian census-taking case, Jerven points out an interesting aspect of this quote which he elaborates further elsewhere in his book: the involvement of the state in the production of data and official statistics. Thus, argues Jerven, if a particular state is interested in achieving development, we should expect that it also has an interest in measuring (that particular) development. If that’s the case, then the availability of state-generated data should reflect its statistical priorities, which is likely to mirror its political priorities. We, therefore, once again, arrive at the question asked in my first blog post – is all data created equal?
Directing that question to Data2x seems to yield the simple answer “No”, at least not yet. Data2x is a joint initiative of the UN Foundation, the Bill & Melinda Gates Foundation, and the William & Flora Hewlett foundation dedicated to “improving the quality, availability, and use of gender data in order to make a practical difference in the lives of women and girls worldwide”. According to this report produced by Data2x, approximately eighty percent of countries produce sex-disaggregated data on education and training, labour force participation, and mortality. But only one third do the same on informal employment, unpaid work, and violence against women. Mapping the gender data gap across five development and women’s empowerment domains by using 28 indicators identified several types of gaps for each indicator as shown in this table:
To close these gaps Data2x argues that existing data sources should be mined for sex-disaggregated data, and new data collection should be designed as a tool for social change that takes into account gender disaggregation already in the planning stages. But useful as they are, conventional data forms generated by household surveys, institutional records, and national economic accounts are not very well suited to capture a detailed account of the lives, experiences, and expressions of women and girls.
Can big data help close this gap? In this new report, Data2x shows it might by profiling a set of innovative approaches of harnessing big data to close the gender data gap even further. For example, have you ever wondered how data generated by 500 million daily tweets across 25,000 development keywords from 50 million Twitter users could be disaggregated by sex and location for analysis? I admit it, before reading the report; I cannot recall being struck by that thought. But, apparently, open data generated from social media platforms may not be sex-disaggregated from the outset. To solve this, the UN Global Pulse and the University of Leiden jointly collaborated together with Data2x to develop and test an algorithm inferring the sex of Twitter users. The tool takes into account a number of classifiers such as name and profile picture to determine the sex of the user producing a tweet and was developed so it could be applied on a global scale across a variety of languages. Comparing the gender classification results generated by the tool to that of a crowdsourced panel for which the correct results were assumed assessed the accuracy of the algorithm. In 74 % of the cases, the algorithm indicated the correct sex, a number that UN Global Pulse deems could be improved through further system development. The results of the project show great potential in generating new insights on development concerns disaggregated by both sex and location by using user-generated data from social media channels, as shown in this screenshot of the online dashboard (go and explore it for yourself!):
Before ending this post I’d like to highlight three other relevant posts from our blog that takes up the relevant questions of bias in data generated from social media, the issue of privacy, as well as the geo-mapping and visualization of big data. Read, reflect, and tell us what you think in the comments field below!