Big, BIG, Data Warbles

Abigail Leffler perchs on the development branch and broods over the content analysis of multilingual tweets and posts

Any collection of signs systematically arranged (or the absence thereof) can be read and interpreted. Edgar Allan Poe’s A Dream within a Dream, Edvard Munch’s The Scream painting, Ludwig van Beethoven’s Fifth Symphony, a tiger’s territorial markings in the Amur region, mobile phone traffic in the aftermath of the Haiti earthquake and all the electronic footprints we ever leave behind by virtue of our Internet usage are examples of this. The key point is that, in our search for patterns or for elements that maintain or break patterns in a sample, we are searching for clues to predicting behaviour or finding trends and hidden messages.

Now for the sake of simplicity and to keep true to the title of this post, let us alight on the analysis and derivation of meaning (a.k.a. interpretation) of our Internet footprints. Let us, furthermore, focus on blogging and microblogging in the context of communication for development.

How do we analyse data from blogs and microblogs? We could be looking at quantitative methods such as collecting the amount of tweets and posts and the frequency thereof, and further we could be looking at the geographical distribution of such entries or at the speed at which they come during or after an event. We could consider which entries are the most influential within a specific period of time. We could also be looking into the qualitative content of such data, and we could be looking into a keyword analysis to gauge sentiments or determine key topics in discourse. And now let us expand on this last point. What are the caveats we need to bear in mind when the analysis is conducted within a globalised, multicultural environment, and where tweets and posts come in forms as diverse as chatter, clucks, quacks, chirps, hoots, coos and caws?

On the one hand we have Sharath Srinivasan, who, referring to the phenomenon of big data at the Voice and Matter Glocal Conference on Communication for Development last September, articulated the need for finding a voice ‘despite the diversity of voices and the big data agenda’ (Srinivasan: 2014). On the other, we have Tobias Denksus and Daniel Esser, who, in their illuminating article titled Social Media and Global Development Rituals, concluded that ‘the hope that social media might make a significant contribution toward global democratic participation in agenda setting was not fulfilled by social media content generated during the MDG Summit’ (Denksus and Esser: 2013, p. 418). And how do we link these together? To be clear –it is unlikely that either were preoccupied with what we will try to demonstrate below. To be fair –Denskus and Esser’s article is brilliant and should be read by everyone.

Denksus and Esser departed from the assumption that ‘conferences are not only diplomatic focal points but also serve as vehicles through which epistemic communities create shared discourses and thus maintain their identity and cohesion’ (op cit, p. 406). We could be led to believe that the epistemic community, which we assume to be multicultural and multilingual, was not granted a voice, or that it was not well represented either at the conference or in the related tweets and blogposts that were generated during this period. We would like to capitalise on Srinivasan’s ‘diversity of voices and the big data agenda’ statement. Diversity. Very important. Tobias Denksus and Daniel Esser analysed 108 blog entries and 3007 Summit-related tweets. How diverse were these, and what were the issues Denskus and Esser were faced with when collecting (and analysing) samples that needed to be representative of the group they reflected?

Firstly, we noted that Denksus and Esser mention that they performed searches in Topsy, a real-time search engine powered by the Social Web. They performed searches using the hashtag #MDG, which we assume rendered English-only results. Searches using the hashtags #ODM (Spanish: Objetivos de desarrollo del milenio) and #OMD (French: Objectifs du millénaire pour le développement), for example, also render results, though admittedly in a more modest volume than that of #MDG. At the time of writing, the #MDG search rendered 3107 results, whilst #ODM returned 75 results and #OMD 915 results. Our search was not bound to the 20-22 September 2010 date range.

Secondly, and having established the existence of MDG entries in other languages, we would like to raise, as a way of an example, one linguistic issue that we imagine could have arisen when dealing with a keyword analysis with English monolingual corpora of suchlike scope. Denksus and Esser ‘conducted a content analysis in NVivo[i] in order to identify salient themes and issues. The analysis was based on both manual and computer coding’ (Denksus and Esser: 2013, p. 412).

Even with monolingual corpora, problems could arise when handling linguistic variables, because dialectical variation (e.g. Indian English, British English, American English) or the sociolinguistic competence of the tweeters and bloggers (English native versus ELF –English as a Lingua Franca, used by non-natives) may affect content. Eric Friginal and Jack Hardy point out differences in word repetition rate among native and non-native speakers of English (cf. Friginal and Hardy: 2013, p. 37). In the interest of robustness of data, we think it would be worthwhile to separate the English from the non-English native entries and conduct a parallel analysis. Admittedly, this entails more work and besides the fluency level in the second group may vary substantially.

Denksus and Esser lament a dearth in social media entries during events such as the MDG Summit, and in their research they confirm that ‘emerging international development policies continue to be framed “offline”, with very limited input provided by social media’ (Denksus and Esser: 2013, p. 418). To this, we would like to endorse Carolyn Heitmeyer and Murali Shanmugavelan at the Voice and Matter Glocal Conference on Communication for Development when presenting their paper on voice and citizenship stated that ‘silence is a form of voice in the communication ecology'(Heitmeyer and Shamugavelan: 2014). This apparent silence and the lack of diversity in that most of the blogs and microblogs were produced in English might be an indication of the balance of power at the period analysed and of the tone of development discourse in the international arena.




Denskus, T. and Esser, D. (2013) Social Media and Global Development Rituals: a content analysis of blogs and tweets on the 2010 MDG Summit. Third World Quarterly, Vol. 34 Issue 3.

Friginal, E. And Hardy, J. (2013) Corpus-based Sociolinguistics. New York and London: Routledge.

Heitmeyer, C. and Shanmugavelan, M. (2014) Critique of Voices. Paper session at the Voice and Matter Glocal Conference on Communication for Development, Roskilde University 18 September 2014.

Srinivasan, S. (2014) ICT4D and Citizen Engagement. Panel at the Voice and Matter Glocal Conference on Communication for Development, Roskilde University 18 September 2014.



[i] Unfortunately we do not have access to NVivo, the tool that Denksus and Esser used to carry out the corpora analysis.

Tags: , , , , , ,

Comments are closed.