(Big) Data for women’s empowerment? – How does it work?

Eraptis, October 12

In my last post, I asked the question about how data could be used in order to measure the impact of the #HeForShe movement on women’s empowerment and argued that theory could guide us in the interpretation of such data. But through logical deduction data must first be generated before it can be analyzed, how does it work when data is generated in practice?

In accordance with Morten Jerven, a basic point of departure when we want to know something about a population is to first establish what the population is. Only after we have established this can we know something about other properties affecting that population. From a development perspective factors such as economic growth, agricultural production, education and health measurements are all predicated upon population data to be meaningful. Many times, however, the definite (real) population number is not actually known but estimated through a population counting process commonly referred to as a census. Jerven illustrates the possible implications of census-taking in a development context through a case study from Nigeria, saying that:

“Today, we can only guess at the size of the total Nigerian population. In particular, very little is known about the population growth rate. The history of census-taking in Nigeria is an instructive example of the measurement problems that can arise in sub-Saharan Africa. It is also a powerful lens through which the legitimacy of the Nigerian colonial and postcolonial state can be observed” – Morten Jerven

Without us getting into the particulars of the Nigerian census-taking case, Jerven points out an interesting aspect of this quote which he elaborates further elsewhere in his book: the involvement of the state in the production of data and official statistics. Thus, argues Jerven, if a particular state is interested in achieving development, we should expect that it also has an interest in measuring (that particular) development. If that’s the case, then the availability of state-generated data should reflect its statistical priorities, which is likely to mirror its political priorities. We, therefore, once again, arrive at the question asked in my first blog post – is all data created equal?

[youtube]href=”https://www.youtube.com/watch?=2&v=YABFcA8yHZ0″[/youtube]

Directing that question to Data2x seems to yield the simple answer “No”, at least not yet. Data2x is a joint initiative of the UN Foundation, the Bill & Melinda Gates Foundation, and the William & Flora Hewlett foundation dedicated to “improving the quality, availability, and use of gender data in order to make a practical difference in the lives of women and girls worldwide”. According to this report produced by Data2x, approximately eighty percent of countries produce sex-disaggregated data on education and training, labour force participation, and mortality. But only one third do the same on informal employment, unpaid work, and violence against women. Mapping the gender data gap across five development and women’s empowerment domains by using 28 indicators identified several types of gaps for each indicator as shown in this table:

big data womens empowerment

Source: Data2x

To close these gaps Data2x argues that existing data sources should be mined for sex-disaggregated data, and new data collection should be designed as a tool for social change that takes into account gender disaggregation already in the planning stages. But useful as they are, conventional data forms generated by household surveys, institutional records, and national economic accounts are not very well suited to capture a detailed account of the lives, experiences, and expressions of women and girls.

Can big data help close this gap? In this new report, Data2x shows it might by profiling a set of innovative approaches of harnessing big data to close the gender data gap even further. For example, have you ever wondered how data generated by 500 million daily tweets across 25,000 development keywords from 50 million Twitter users could be disaggregated by sex and location for analysis? I admit it, before reading the report; I cannot recall being struck by that thought. But, apparently, open data generated from social media platforms may not be sex-disaggregated from the outset. To solve this, the UN Global Pulse and the University of Leiden jointly collaborated together with Data2x to develop and test an algorithm inferring the sex of Twitter users. The tool takes into account a number of classifiers such as name and profile picture to determine the sex of the user producing a tweet and was developed so it could be applied on a global scale across a variety of languages. Comparing the gender classification results generated by the tool to that of a crowdsourced panel for which the correct results were assumed assessed the accuracy of the algorithm. In 74 % of the cases, the algorithm indicated the correct sex, a number that UN Global Pulse deems could be improved through further system development. The results of the project show great potential in generating new insights on development concerns disaggregated by both sex and location by using user-generated data from social media channels, as shown in this screenshot of the online dashboard (go and explore it for yourself!):

big data womens empowerment

Source: UN Global Pulse

Before ending this post I’d like to highlight three other relevant posts from our blog that takes up the relevant questions of bias in data generated from social media, the issue of privacy, as well as the geo-mapping and visualization of big data. Read, reflect, and tell us what you think in the comments field below!

 

Tags: , , ,

3 comments

  1. Excellent article and many new information’s for me personally and indeed big data can open big doors. Reading this brought to my mind the question about data bias the misuse of data. Thinking about this massive amount of data for sure can bring good solutions for global problems and provide better analysis that will lead for better planning. But also data can be manipulated or misused to serve other purposes.
    Data2x how to they protect the data? And how own the data they have?

    • Thanks Ali! I agree with you that Big Data certainly is a coin with two sides. Diana has written excellently on the topic of bias in Big Data elsewhere in our blog so I will not go further into that in this reply. But I like the question of the role of Data2x and their agenda. In this example I take up here Data2x uses open source data from Twitter and adds to it a property (sex) that it did not have from the outset. In this sense, the data has been manipulated, is it then the same data as before or have they created something new? Arguably, combining open source data with an attribute is creating something new, which could then be used in various ways. One intentional aspect is the purpose of Data2x to create sex-disaggregated data for highliting gender differences. But this process could also create unintentional outcomes when other actors access the same data. This risk might be even more prominent when adding for example geographical properties to existing data in order to map certain aspects or even populations geographically.