Big Data in healthcare: Dr Jekyll or Mr Hyde? Part II

Part II: the bad Mr Hyde?

By Athanasia K.

In a previous post, I have tried to show that Big Data applications could offer innovative and effective solutions towards better healthcare services.

Despite the very many promises however, Big Data applications in healthcare are not a panacea against all evils, but could also result in negative impacts with challenging aspects. And these challenges are still out there, unresolved for years now, despite the exponential technological development of the field.

In developed countries, one of the biggest concern seems to be the protection of personal data, which is even more sensitive when this is medical data. Kaplan very rightly notes that “data can be sold and replicated anywhere and, once sold, may be used for good or ill”. Furthermore, as Lunshof et al have showed, with the current IT technology we have, privacy and confidentiality can no longer be guaranteed. On the contrary, when we are dealing with the analysis of genetic samples, the re-identification of data samples back to their donor is more than possible, as Malin et al showed some years ago. But even if indeed there is a way to security and totally anonymise the samples in a genetic database, this can limit the usefulness of the data, as showed by Budimir et al.

One more point of concern on the impact of Big data in healthcare is that not all data are reliable. In fact, “people change their behavior and withhold information in order to protect their health information privacy” and  “according to a 1999 survey, nearly one in six patients withheld information, provided inaccurate information, doctor-hopped, paid out of pocket instead of using insurance, or even avoided care” as Kaplan notes. This has lead experts to fear a GIGO effect (e.g., garbage in–garbage out), and to a questioning of the reliability of this methodology to vulnerable groups and poorer regions, as also analysed previously by Shahin. However other scholars such as Alemayehu argue that “although much of real world data is sparse and a lot of the data is ‘‘dirty’’, with proper analytical, computational and data management tools, it is still useful and can support health policy decision-making”.

Adding to this conundrum of confidentiality vs usefulness, the lack of transparency in the acquisition and ownership of the data also adds more question marks in the field. It is common practice that Big data vendor companies do not disclose their contracts on the acquisition these data. As Kaplan notes, the legal framework in the United States and abroad ”does not address health data ownership clearly; it is not clear who the owner should be … Furthermore, it is also not clear where those who sell data analytics services obtain the data, or how they might use them.” Furthermore, as Kaplan continues, “vendors often consider their contracts intellectual property and do not reveal these and other contract provisions”.

But who benefits from this?

One could very logically assume that the companies involved in Big data do gain some sort of profit from this business. But what about the rest? As Kaplan notes, “the cost [of data gathering] is passed on to patients and payers, whether private of confidential. These individuals gain little benefit from the aggregation and sale of data about them, and they may even be harmed by it”. Indeed, Kaplan continues, “patients can be harmed when data about them are violated: to deny employment, credit, insurance”.

This unbalance of the distribution of benefits is more evident when we look in developing countries. As Rudan et al note, nearly all biobanks (at least back in 2011) “have been developed to address the health problems relevant to the minority of people living in wealthy countries”.  This has caused reluctance in developing countries to share their national data or permit foreign researchers to access them, in fear of exploitation. An example to illustrate this better is the one cited by Staunton and Moodley, where in “2007, Indonesia refused to share its H5N1 samples without a legally binding agreement which addressed among others, benefit arrangements and intellectual property rights”.

Apart from the benefits’ unbalance, one more real concern regarding data collection in healthcare is about the possible stigmatization of the patients in case the confidentiality of data is breached. This has been reflected even in court cases, where, as cited by Staunton and Moodley, in April 2010 the Arizona State University paid 700,000$ to the Havasupai Indian tribe as a settlement against claims of an improper use of blood samples which stigmatised the tribe. This fear of stigmatisation is also reported on African studies, where research participants fear about discrimination and possible stigmatisation of themselves and their family (see again the Staunton and Moodley paper). This aspect is more difficult to tackle since cultural differences make the analysis more difficult. As Kaplan notes, what is considered as very private, embarrassing, stigmatising, or posing grounds for discrimination varies among individuals and groups, and also differs between cultural backgrounds, places or time periods”.

But is it all that black and white, Dr Jekyll vs Mr Hyde situation when we speak about Big data for healthcare? In a forthcoming post, I’ll try to maybe find a third way of looking at this.


  1. In this text I will upload and share a series of thoughts that came in my mind, after reading Athanasia’s four posts on this blog.
    Data mining and tools for building upon data are the next useful thing for journalists and possible for communicators. Such technologies are more and more taught as courses in the curricula of US universities and this trend rapidly spreads all over the world.
    My prediction is that staying out of this world would be impossible for media people, at least as impossible as it came out to be when some journalists wanted to stay out from Social media at the very beginning of 21st century. They were simply cut out of possible new connections happening there and of the news first emerging on such spaces. They continued to be journalists but less and less informed, because social media could cover more space than the best agenda of a journalist.

    Athanasia opens up the discussion of how valid big data can be for a specific reason and stays mainly on the health and biology domains on her successive posts.
    Big data usually pass a validation test and is up to every researcher’s or scientist’s ability and personal interest to work more on this part of their work in order to provide more trustful results. She points out a series of problems that can emerge, using various sources of recent bibliography. There is almost not a technological step forward without some kind of side effects, I would say.
    Mark Graham writing on Big Data recently asked “What sorts of things aren’t transferrable”? (Graham, 2016). That’s another key question to follow in every big data research.
    Athanasia asks about balance between benefits for the individual user or a state and benefits for the Big Data holders which is a point very crucial for this discussion.

    Younger generations tend not to consider that much privacy rights since the beginning of their digital lives, as they enter this area not mature enough to take such decisions. Millions of kids under 12 are exposed to the dangers of the WWW, knowing little of where their personal data would go… Although their parents should help them making choices and prevent them from staying out on the web unattended, it is not guaranteed that the latter will happen.
    Less literate adults could also not understand negative effects of their actions, unless something very negative strikes their close environment or themselves. A recent paper suggests the application of “strong principles and strict rules” (Letouze et al., 2012, p.25) while working on such data.
    But on the other hand people can get assisted from Big Data in any domain and countries even the poorest and more disconnected can know better their citizens and plan better their future for a determined social purpose.
    “Long term sustainability” (CRCSI, 2016) of accession on data and longer duration of projects around big data could be much more meaningful for social change.


    CRSI (2016) Vanuatu Google Globe. Retrieved in December 18, 2016 from

    Graham, Mark (2016, September 16). Symposium on Big Data and Human Development – Closing remarks. Retrieved in December 19, 2016 from

    Letouzè, Emmanuel et al. (2012, May). Big Data for Development: Challenges and Opportunities, New York: UN Global Pulse. Retrieved in September 29, 2016 from

    NOTE: Product or corporate names may be registered trademarks or trademarks and are used only for identification and explanation without any intention to infringe

    • Thank you K. Tatakis for your comment and the additional literature which you recommend.

      I would definitely agree with you when you say that “there is almost not a technological step forward without some kind of side effects”. This is exactly what I wanted to (or was hoping to) be the main message of my first post in this blog.

      Thank you also for opening up to children and to illiterate or less literate persons, which I have missed mentioning in my posts. This is an aspect where we would indeed need to “dig more” when we study the impacts of big data, because these groups do not necessarily realise the dangers arising from a possible misuse of their data (with child pornography, identity theft or cyber-bullying being some examples of such imminent dangers). Education is probably the key to this problem, but still again when we speak about Big data analysis, we can see here a big gap between countries. This is because this high-tech field requires extensive (and expensive) infrastructure, which might indeed be available in universities in USA as you mention, but is not necessarily there for other developing countries. As Rudan et al also mention, “nearly all the progress made by the powerful new high-throughput research technologies was currently [in 2011] confined to wealthy countries and their [health] needs”, which illustrates the gap of opportunities when we speak about new technologies in general, with Big data analysis being a big part of this inequality.