aodaodaodaod via Shutterstock
There is an emerging class of data that is so pervasive and prolifically used that it merits new regulatory standards according to the Independent Ethics Committee in the US. The pharmaceutical industry, as well as scholarly researchers, tech company researchers, and others, collect this new kind of data, mining it and selling it in massive quantities, and Karger Publishers defines this nebulous data class as “objective, quantifiable physiological and behavioral data that are collected and measured by means of digital devices such as portables, wearables, implantables, or digestibles.” This is health and healthcare data that is being used to influence, comprehend and even predict health-related outcomes in the broadest sense.
Many refer to certain vast data sets included in the incredibly broad definition mentioned above as digital biomarkers, which some companies identify for reasons you haven’t even considered yet. The founder and managing director of digital health consultancy firm Volar Health, Carlos Rodarte, says that “Objective data captured through digital devices can be wide-ranging and not traditionally viewed as health information. However, the intentionality of analysis can make many types of data fall into the realm of health or healthcare.” In other words, the data some companies are collecting might not even seem health-related on the surface.
We’re talking about companies creating ways to determine your mood based on how you tap and swipe your smartphone. Another example of what Rodarte’s talking about might be studying whether the likelihood of you taking your medication is greater or smaller in relation to how engaged you are on Facebook. We could even be alluding to data gathered by a smart cuff that can measure blood pressure and possibly predict whether or not you’ll develop comorbidity. Where things really get murky, though, is when we talk about so-called wild data — wild in the feral, untamed sense. These are large data sets gathered with no real study purpose but retrospectively mined for significant correlations.
Rock Health reports that wild data uses range anywhere from presenting targeted interventions, optimizing recruitment for human experimentation or individualizing medical policy plans. Wild data gets collected by all kinds of companies and then sold to whoever deems it not only valuable but germane to their respective bottom line. All data is technically valuable, which is where companies like Cambridge Analytica come into the picture. That’s the British firm under dual investigations in the UK and the US for their influence on the 2016 US presidential election and the Brexit vote as whistleblowers like Christopher Wylie as well is thorough investigations have revealed that the firm opened an American shell company with US billionaire and political donor Robert Mercer’s money.
Cambridge Analytica relied on all kinds of data according to the whistleblowers from within the firm who have come forward, but the most critical data source for their algorithms was personal data mined from 50 million Facebook accounts. The same techniques Cambridge Analytica used, however, are actually an industry standard in both the UK and the US for the patient marketing sector when it comes to personal health information. The only difference is the level of sensitivity for said data. HIPAA regulation compliance is a must and fairly cut and dry for industry entities, yet it happens to have a few misunderstood areas dealing with specific patient and consumer data.
Companies are currently able to target your personal health information with pinpoint accuracy to the point of violating your privacy without violating the law. They do this by triangulating first-party identifiable data, third-party advertising, and social media activity. Remember: this is a time when even data seemingly unrelated to your health can be deemed health data, and this is one of the main reasons why. Much like people you know and your own personal, social network, data sets are all only six degrees separated from one another, so to speak.
This is what Dr. Camille Nebeker’s study analyzed back in 2013 when looking at digitally enabled behavioral medicine. She relied on informed consent, willing participants, wearables and data control. The Independent Ethics Committee still deemed it too risky. You might think this is because they were concerned about the privacy and safety of participants, but actually, they were more concerned about bystanders with whom participants interacted or around whom they lived their lives on a daily basis. The wearables included a camera, and nonconsenting people would be caught on those cameras. There was a list of other similar concerns, but this illustrates how the new data class is presenting the need for new standards, which currently don’t exist.
“The questions became, ‘What should we do with the 30,000 images we capture from every person during the study?’ and ‘What do we do with the GPS data?’” Nebeker recounts. After the data served its purpose, what other purposes might it serve? As such, Nebeker has established the Connected and Open Research Ethics initiative at the University of California, San Diego, sponsored by the Robert Wood Johnson Foundation. It’s an attempt to address the absence of standards by bringing together ethics board members, researchers, stakeholders and others to come up with a set of best practices pertaining to the use of pervasive sensing, mobile imaging, geolocation tracking and social media for research purposes.
tadamichi via Shutterstock
Data collection, after all, cannot and should not be stopped but, rather, carefully furthered from this point forward. There are many important things that have come from data collection on this scale despite the egregious oversights and overreaches seen thus far. For example, Dr. Michelle Odlum collaborated with Sunmoo Yoon to publish only a couple weeks ago a study that found health information deficiencies to yield public frustration and fear with respect to specifically Ebola in such a way that it sometimes bulwarked the meeting of health information needs relative to global priorities for Ebola. This was based on the exploration of public response to Ebola over an eight-month span via natural language processing — the analysis of over 150,000 tweets mentioning Ebola.