Dec 16

Big Data in healthcare: Dr Jekyll or Mr Hyde? Part II

Part II: the bad Mr Hyde?

By Athanasia K.

In a previous post, I have tried to show that Big Data applications could offer innovative and effective solutions towards better healthcare services.

Despite the very many promises however, Big Data applications in healthcare are not a panacea against all evils, but could also result in negative impacts with challenging aspects. And these challenges are still out there, unresolved for years now, despite the exponential technological development of the field.

In developed countries, one of the biggest concern seems to be the protection of personal data, which is even more sensitive when this is medical data. Kaplan very rightly notes that “data can be sold and replicated anywhere and, once sold, may be used for good or ill”. Furthermore, as Lunshof et al have showed, with the current IT technology we have, privacy and confidentiality can no longer be guaranteed. On the contrary, when we are dealing with the analysis of genetic samples, the re-identification of data samples back to their donor is more than possible, as Malin et al showed some years ago. But even if indeed there is a way to security and totally anonymise the samples in a genetic database, this can limit the usefulness of the data, as showed by Budimir et al.

One more point of concern on the impact of Big data in healthcare is that not all data are reliable. In fact, “people change their behavior and withhold information in order to protect their health information privacy” and  “according to a 1999 survey, nearly one in six patients withheld information, provided inaccurate information, doctor-hopped, paid out of pocket instead of using insurance, or even avoided care” as Kaplan notes. This has lead experts to fear a GIGO effect (e.g., garbage in–garbage out), and to a questioning of the reliability of this methodology to vulnerable groups and poorer regions, as also analysed previously by Shahin. However other scholars such as Alemayehu argue that “although much of real world data is sparse and a lot of the data is ‘‘dirty’’, with proper analytical, computational and data management tools, it is still useful and can support health policy decision-making”.

Adding to this conundrum of confidentiality vs usefulness, the lack of transparency in the acquisition and ownership of the data also adds more question marks in the field. It is common practice that Big data vendor companies do not disclose their contracts on the acquisition these data. As Kaplan notes, the legal framework in the United States and abroad ”does not address health data ownership clearly; it is not clear who the owner should be … Furthermore, it is also not clear where those who sell data analytics services obtain the data, or how they might use them.” Furthermore, as Kaplan continues, “vendors often consider their contracts intellectual property and do not reveal these and other contract provisions”.

But who benefits from this?

One could very logically assume that the companies involved in Big data do gain some sort of profit from this business. But what about the rest? As Kaplan notes, “the cost [of data gathering] is passed on to patients and payers, whether private of confidential. These individuals gain little benefit from the aggregation and sale of data about them, and they may even be harmed by it”. Indeed, Kaplan continues, “patients can be harmed when data about them are violated: to deny employment, credit, insurance”.

This unbalance of the distribution of benefits is more evident when we look in developing countries. As Rudan et al note, nearly all biobanks (at least back in 2011) “have been developed to address the health problems relevant to the minority of people living in wealthy countries”.  This has caused reluctance in developing countries to share their national data or permit foreign researchers to access them, in fear of exploitation. An example to illustrate this better is the one cited by Staunton and Moodley, where in “2007, Indonesia refused to share its H5N1 samples without a legally binding agreement which addressed among others, benefit arrangements and intellectual property rights”.

Apart from the benefits’ unbalance, one more real concern regarding data collection in healthcare is about the possible stigmatization of the patients in case the confidentiality of data is breached. This has been reflected even in court cases, where, as cited by Staunton and Moodley, in April 2010 the Arizona State University paid 700,000$ to the Havasupai Indian tribe as a settlement against claims of an improper use of blood samples which stigmatised the tribe. This fear of stigmatisation is also reported on African studies, where research participants fear about discrimination and possible stigmatisation of themselves and their family (see again the Staunton and Moodley paper). This aspect is more difficult to tackle since cultural differences make the analysis more difficult. As Kaplan notes, what is considered as very private, embarrassing, stigmatising, or posing grounds for discrimination varies among individuals and groups, and also differs between cultural backgrounds, places or time periods”.

But is it all that black and white, Dr Jekyll vs Mr Hyde situation when we speak about Big data for healthcare? In a forthcoming post, I’ll try to maybe find a third way of looking at this.

Dec 16

Big Data in healthcare: Dr Jekyll or Mr Hyde? Part I

By Athanasia K.

Big Data applications in healthcare is probably the field with the most heated discussions about the controversial impacts of this new technology. In fact, I could bet that there are not so many other discourses in this field where there is such clear contrast of benefits vs harm, individual vs common good, public vs private, data identification vs identity or last but not least, a contrast between the virtual vs the real, as Kaplan also observes. It’s like an old spaghetti western, where after a closer look in the plot we realise that what is a “good” and justified against a “bad” and condemned behaviour, really depends on the observer. I’ll start with the positive part:

Part I: the good Dr. Jekyll

According to Ya-Ri Lee et al “the field that shows the most promise among the application areas of Big Data is the medical sector”.

As Alemayehu lists in a recent paper, in the context of healthcare Big Data includes “not only electronic health records, claims data but also data captured through every conceivable medium, including Social Media, Internet search, wearable devices, video streams, and personal genomic services; it may also include data collected from randomized controlled clinical trials (particularly when dealing with high dimensional data, including genomic, laboratory, or imaging data)”. And all this vast information could be exploited in different applications.

In epidemiology, Big Data analysis’ applications can indeed offer innovative approaches in communicable diseases’ outbreak investigations, adding useful tools for more effective and cost-efficient ways to prevent and manage outbreaks. One such example is a study by the Karolinska Institute and Columbia University in response to the cholera outbreak in Haiti, where researchers have used data from mobile phone providers in order to have a better overview of population movements, and thus plan a better and more efficient action plan for managing the outbreak.

The positive impact of what Big Data has to offer is probably even more visible in the field of human genetics which traditionally had a rather slow progress due to the nature of the experiments needed to prove the field’s theoretical models (most of the experiments could not be performed due to ethical concerns). However, following the sequencing of the human genome at the beginning of our century, a brave new world has opened for human geneticists since a vast volume of raw data waiting to be analysed. Terms like “computational biology and medicine” enter the medical students’ curricula, and at least basic knowledge of database and system analytics is now a must in the modern bioscience researcher’s armory.

The genome-wide data analysis could indeed identify the causes of rare or other serious hereditary diseases, which would otherwise be difficult to identify and investigate because of their rarity. For example, the analysis of the Icelandic genetic database led to the identification of genes linked to human diseases, as cited by Kaplan.

Moreover, as Alemayehu notes, Big Data and the use of biobanks are very useful in drug development and they open revolutionary possibilities in the development of more efficient and safer drugs, in the direction of a completely personalised medicine and patient care. Furthermore, the use of smart mobile phone applications (like e.g. apps which measure the blood pressure via a smartphone screen) provide new field of direct-time monitoring of patients, as well as healthy persons, which provide again a unprecedented level of statistical information to researchers.

Apart from the science-related opportunities however, Big Data applications in healthcare could also lead to the reduction of costs. As Kaplan notes: multiple healthcare professionals, payers, researchers, and commercial enterprises can access data and reduce costs by eliminating duplication of services and conducting research on effective care.  In other words, Big Data is good for the business too, since healthcare organisations may benefit financially by selling medical records of their patients, at least in the US context described by Kaplan.

By browsing on tech-related articles, blogs and webpages one could find even more current, or futuristic applications of Big Data which will make our lives easier, safer and healthier.

But what’s the price for this? I’ll try to analyse some of the negative aspects of Big data applications in healthcare in my next post.