Infodemiology in the Battle Against Ebola: Mining the Web for Public Health Surveillance

“Infodemiology includes the analysis of queries from Internet search engines to predict disease outbreaks; monitoring people’s’ status updates on microblogs such as Twitter for syndromic surveillance; detecting and quantifying disparities in health information availability; identifying and monitoring of public health relevant publications on the Internet” (Eysenbach, 2009)

Internet data, especially search engine queries and social media postings, have shown promise in contributing to syndromic surveillance for several communicable diseases, including Ebola. Much has been written about the global response to the 2014 Ebola outbreak in West Africa, “lessons learned” have often focused on operational reasons why health systems faltered and why the humanitarian response came late, often taking donors and international aid agencies like the World Health Organisation (WHO) to task for mishandling the crisis.

A systematic review published in 2014 by Nuti and his colleagues, highlighted that in recent years, researchers have been increasingly utilising online search data for a diversity of health topics with some successful applications in the field of infectious disease surveillance, especially in countries with high Internet penetration levels (Nuti et al., 2014).

At the most general level, data can provide snapshots of the well-being of populations at high frequency, high degrees of granularity, and from a wide range of angles, narrowing both time and knowledge gaps. Traditional disease surveillance relies on data obtained from doctors, hospitals or laboratories through formal reporting systems. This yields valid and accurate data about emerging outbreaks and the impact of control strategies such as vaccinations or quarantine, but these systems are often not timely. As seen in the case of Ebola outbreak, epidemics can rapidly spiral out of control, at the height of the outbreak, cases were doubling at a rate of every 2-3 weeks, this exponential growth demonstrated that when it comes to infectious diseases, speed of response matters.

Once the Ebola outbreak had been brought under control, many sought to understand the spread of the Ebola virus by using data in order to prevent future outbreaks. Detecting and acting on the Ebola epidemic early, at the earliest stages of the outbreak, could have prevented the many cases and potentially saved the 4600 people who died between June and October 2014. Across the board, experts agree that the earlier epidemics are detected, the easier they are to control and contain. In Guinea, it took nearly three months for health officials and their international partners to identify the Ebola virus as the causative agent, by that time, the virus was firmly entrenched and spread was primed to explode.

The expanding reach and size of online social networks, means that the nature and speed of information sharing is changing (Meikle, 2016). As Taylor and Schroeder (2015) state, the growing use of ICT by individuals has resulted in the emission of large volumes of data that blur the line between personal and public communications, resulting in the production of many kinds of data. In terms of utilising data for epidemiological purposes, the authors note that while using data in health surveillance has been celebrated, for example in the case of tracking Cholera outbreak after the 2010 Haiti earthquake, there are also the limitations, specifically the fact that with big data derived from digital communications technologies, data sets are not necessarily representative of various socio-economic groups.

The data ecosystem includes data generated and captured from various sources, there is data generated actively, for example on social media platforms or applications, and then there is data that is generated passively as a result of other activities, for example searching for hotel or flight online (Spratt & Baker, 2015). The bulk of this form of information is sometimes called “data exhaust”. One of the reasons social media is so valuable in tracking epidemics is because its popularity and sharing functions means that the private becomes public very quickly, and by using systems that use data-sharing technologies to accurately track social media data, it is potentially possible to use this information to design early warning systems and outbreak response. The Ebola epidemic was – and still is – heavily discussed across social media platforms, with information about the disease and new developments in the research for a cure or vaccine against the disease being shared heavily. The site, already does this. By using feeds from social media, online news aggregators, and Twitter chats, along with official public health reports, the system is able to provide a comprehensive view of the current global trends of infectious diseases.

Systems using a combination of modelling layers, including real-time social media data, provided much excitement during the Ebola outbreak, especially when a group of researchers, epidemiologists and software developers at Boston Children’s Hospital, were able to pinpoint a “mystery hemorrhagic fever” spreading in Guinea nine days before WHO announced the outbreak.

While internet search data have been exploited both for public health surveillance purposes and for analysing the public’s searching behaviour as a reaction to infectious disease outbreaks (Al-Garad, 2016), the research on where the data is emanating from geographically has also highlighted the digital divide between the global South and global North. An example of this, is data on worldwide traffic on social media and search engine about Ebola. Data collected in the period between September and November 2014, increased dramatically when news spread about the first case diagnosed in the US (Fung et al., 2014). Prior to this, daily tweets containing the word “Ebola” were comparatively, very low (Rodriguez-Morales et al., 2015).

Geotagged Tweets mentioning “Ebola” over 2 month period in 2014

While social media presents an opportunity to enhance epidemic detection and control, it also played an adverse role during the Ebola outbreak, specifically its role in the rapid spread of misinformation and creating of public panic (Towers, 2015), for example in countries such as US where there were only 4 confirmed cases.

Whereas social media platforms can on one hand mean that information about outbreaks and treatments can reach remote areas within milliseconds, warning would-be victims of potential and ongoing outbreaks in real-time (assuming they had the technology and access necessary), there are also highly unregulated in terms of content. In this unregulated environment, platforms such as Twitter, only snippets of information are shared, thus risking the spread of distorted, false or exaggerated narratives. In Nigeria for example, rumoured preventatives and cures for Ebola, rapidly gained traction online as people desperately searched for any method to counteract the untreatable disease. Various social media platforms were swamped with information from concerned Nigerians, one message, though, stood out. Suggesting that by drinking salt water, an individual can protect or cures herself or himself from the disease, led to not only the hospitalisation of dozens of people, but also caused the death of at least two. Perhaps a more promising response that took hold online during the outbreak was the movement to counteract misinformation, with Twitter users starting hashtags like #EbolaFacts and #FactsOnEbola in an attempt to compile accurate information aggregated from authoritative and expert sources from across the web.

Nigeria’s Premium Times newspaper covering the “salt water cure” rumors

The emergence of social media has had a profound effect on many aspects of how society shares and receives information. In regards to social media data and global health, it’s potentially a double-edged sword, tilting between the benefits of reaching a broader audience and the importance of accurate information. When it comes to disease surveillance it is important to balance the benefits of social media visibility, with the more urgent need and importance of robust public health agendas and systems, without which healthcare professionals would not be able to effectively respond to outbreaks, no matter how much data is placed at their disposal.


Image credit: Getty Images


  • Al-Garadi, M.A., Khan, M.S., Varathan, K.D., Mujtaba, G. & Al-Kabsi, A.M. (2016). Using online social networks to track a pandemic: A systematic review. Journal of biomedical informatics, 62: 1-11.
  • Eysenbach, G. (2009). Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. J Med Internet Res., 11(1): e11. Doi: 10.2196/jmir.1157
  • Fung, I.C., Tse, Z.H., Cheung, C., Miu, A.S. & Fu, K. (2014). Ebola and the social media. The Lancet, 384(9961): 2207. Doi: 10.1016/S0140-6736(14)62418-1
  • Meikle, G. (2016). Social Media: Communication, Sharing and Visibility. New York: Routledge.
  • Nuti, S., Wayda, B., Ranasinghe, I., Wang, S., Dreyer, R., Chen, S. & Murugiah, K. (2014). The Use of Google Trends in Health Care Research: A Systematic Review. PLoS One, 9(10). Doi: 10.1371/journal.pone.0109583
  • Read, R., Taithe, B. & Ginty, R.M. (2016). Data hubris? Humanitarian information systems and the mirage of technology, Third World Quarterly, 37(8): 1314-1331. Doi: 10.1080/01436597.2015.1136208
  • Rodriguez-Morales, A.J., Castañeda-Hernández, D.M. & McGregor, A. (2015). What makes people talk about Ebola on social media? A retrospective analysis of Twitter use. Travel Medicine And Infectious Disease, 13(1): 100-101.
  • Spratt, S. & Baker, J. (2015). Big Data and International Development: Impacts, Scenarios and Policy Options. Brighton: IDS.
  • Taylor, L. & Schroeder, R. (2015). Is bigger better? The emergence of big data as a tool for international development policy. Geojournal, 80: 503-528.
  • Towers, S., Afzal, S., Bernal, G., Bliss, N., Brown, S. & Espinoza, B. (2015). Mass Media and the Contagion of Fear: The Case of Ebola in America. PLoS One, 10(6): e0129179.

2 thoughts on “Infodemiology in the Battle Against Ebola: Mining the Web for Public Health Surveillance”

  1. Issues around data collection in the context of the 2014 Ebola outbreak in West Africa are still discussed. On 8 March 2017, a tweet was sent to advertise an article called “Ebola: A Big Data Disaster” written by Sean McDonald. On the basis of the Ebola crisis response in Liberia, this 2016 paper “calls for a critical discussion around the experimental nature of data modeling in emergency response due to mismanagement of information”.
    While McDonald highlights the significant legal risks posed by the use of Call Detail Records for public health response, others, including mobile data providers, emphasise how the use of mobile phones improved Ebola control.
    For his part, Chris Grundy, expert in information management at the London School of Hygiene and Tropical Medicine, noticed that during the Ebola outbreak, a lot of opportunities were missed to collect spatial data.

Comments are closed.