UN Global Pulse recently announced a partnership with Twitter (Tatevossian 2016):
“The Sustainable Development Goals are first and foremost about people, and Twitter’s unique data stream can help us truly take a real-time pulse on priorities and concerns – particularly in regions where social media use is common – to strengthen decision-making.”
Robert Kirkpatrick, Director of UN Global Pulse
It sounds so simple. But there are a lot of big questions here –
about who is included in this data stream, about exclusion and the digital divide, and about the self-selection of the data set. Here, though, I’m focusing on the interpretation of this massive data set, and especially whether the content of tweets can really be used accurately enough to “strengthen decision-making”.
The content of tweets is difficult for computers to interpret, and a massive effort goes into producing algorithms to deal with the data – to filter out the irrelevant, incomprehensible and downright rude, to work with the many different language sets, ever-changing slang, and the abbreviations forced by the brevity of a tweet. Sentiment analysis (also known as opinion mining) has become an industry, with the main aims of offering data analysis for product marketing and for political polling.
140 characters certainly focuses the expression of opinion. But Twitter encourages conversations, and human interaction is not straightforward. Big data proponents and programmers don’t take this lightly. The academic community SemEval, which carries out evaluations of computational semantic analysis systems, is increasingly focusing on sentiment analysis of social media. And a major subset of sentiment analysis research focuses on artificial intelligence and recognition of sarcasm/irony, which poses a major problem for interpreting this type of data.
© ™ “The Simpsons” 21st Century Fox and its related companies.
All rights reserved. Available as promotional material.
‘Poe’s law’ states that, without a clear indicator of the author’s intent, parodies of extreme views on the internet will be mistaken by some readers or viewers as sincere expressions of the parodied views. There have been debates about introducing a specific font (backward italic has been suggested), or dedicated punctuation marks (such as the upside-down exclamation mark), in the hope that they would become universally adopted to indicate online sarcasm (Judkis 2011).
Many data programmers like to regard hashtagged terms as more secure statements of sentiment (e.g. Riloff et al. 2013). But their sarcastic use also has the potential to be misinterpreted on a major scale. In just one example, in 2011 #blamethemuslims trended globally. The ironic hashtag was posted when initial speculation that extremist Islamic groups were responsible for the Norway attacks turned out to be false – the perpetrator was a far-right, white man. Although many people responded with humorous tweets incorporating the hashtag, others who had no idea why it was trending were outraged, and a third group actually started making offensive racist comments (Bell 2011).
One research approach is to focus on a subset of tweets that specify the sarcastic nature of the comment (using e.g. #sarcasm, #irony, #haha, #not, and the ~ironic~ tilde), then to analyse the structure of the language using self-learning, pattern-seeking programs that look for recurring words and phrases that people tend to reference when they’re being sarcastic. Some researchers verify data by comparing different methods of identifying sarcasm – machine learning versus crowdsourced interpretation versus individual, human readers (Filatova 2012) – but even the humans score relatively low when reading ironic comments out of context. “Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well” (González-Ibáñez et al. 2011), bringing to mind The Big Bang Theory character Sheldon Cooper who, despite (or because of?) his genius-level IQ, famously struggles with sarcasm:
“The Big Bang Theory” CBS
One online sarcasm detector described by Dewey (2015) offers instant interpretation of your own phrasing, but perhaps also illustrates the difficulty. Click on the image below to try it out:
I tried out Twitter’s Sentiment140 tool (which I found confusing, and may have misunderstood the purpose of – please do set me straight) by keying in “Brexit”, a topic guaranteed to elicit high levels of positive and negative sentiment here in the UK. The first listed tweet at that time, classed as pro-Brexit, was:
“A man on last night’s #bbcqt asked for ‘more positivity’ in relation to #Brexit. Happy to oblige. I’m positive it’ll be a?”
This is a classic use of sarcasm – the poster is positive it will be a … (fill in the blank – fiasco, or something less repeatable). But the combined use of “positivity”, “happy” and “positive” in the same tweet has resulted in a positive score.
More recent work (Bamman and Smith 2015) combines linguistic signals with contextual analysis – including both the poster’s and the likely readers’ historical topics, tweets and profile information. They claim 85% success in pinpointing sarcasm, a big improvement on previous attempts.
One study (Kreuz and Caucci 2007) attempted to develop linguistic patterns based on a literary data set drawn from Google Books, containing statements from fiction followed by the text “he/she said sarcastically” – raising slightly worrying questions about the quality of any literature that relies on such unsubtle signposting. For that’s perhaps one of the central problems – irony and wit depend on a lightness of touch that is negated by labelling a joke as a joke. Explaining a joke kills it. Deadpan delivery has always been central to comedy.
While researchers are quick to point out the limitations, more worrying is the prospect of a level of legitimacy being placed on data built on such shaky foundations – and, as UN Global Pulse suggests, basing public decision-making on the outcomes. It could be argued that the more urgent, humanitarian uses of big data shouldn’t be affected by this relatively trivial issue (Meier 2014) – people in adversity are not likely to tweet ironically. But there is a long human history of the use of humour to shore up resilience and empowerment, and as a way of asserting humanity and individuality in the face of chaos (Masters 2014; also see Joseph Heller’s Catch 22, M*A*S*H, and many others). The possibility of confusing positive and negative even in these circumstances can’t be discounted.
It’s easy enough to misinterpret the intention of a human being standing right next to us – even a friend, or even someone we love. Human communication is nuanced, complex and unpredictable. So what hope is there for a computer to get the joke?
Bamman, D. and Smith, N.A. (2015) ‘Contextualized sarcasm detection on Twitter’, presented at the 9th International AAAI Conference on Web and Social Media (ICWSM), 26-29 May, Oxford, UK.
Bell, M. (2011) It’s easy to misconstrue trending topics on Twitter, The Washington Post, 29 July.
Dewey, C. (2015) ‘Inside the surprisingly high-stakes quest to design a computer program that “gets” sarcasm online’, The Washington Post, 18 August.
Filatova, E. (2012) ‘Irony and sarcasm: Corpus generation and analysis using crowdsourcing’, in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC12), Istanbul, 23–25 May.
González-Ibáñez, R., Muresan, S. and Wacholder, N. (2011) ‘Identifying sarcasm in Twitter: A closer look’, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, OR, 19–24 June, pp. 581–586.
Judkis, M. (2011) Should sarcasm have its own font style? The Washington Post, 13 December.
Kreuz, R.J. and Caucci, G.M. (2007) ‘Lexical influences on the perception of sarcasm’, in Proceedings of the Workshop on Computational Approaches to Figurative Language, Rochester, NY, 26 April, pp. 1–4.
Masters, T. (2014) ‘World War One exhibition explores role of black humour’, BBC News, 18 June.
Meier, P. (2015) Digital Humanitarians: How Big Data is Changing the Face of Humanitarian Response. New York: Routledge.
Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N. and Huang, R. (2013) ‘Sarcasm as contrast between a positive sentiment and negative situation’, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, WA, 18–21 October, pp. 704–714.
Tatevossian, A.R. (2016) ‘Twitter and UN Global Pulse announce data partnership’, press release, UN Global Pulse, 23 September.