October 15, 2017


Michael R. Kosorok, PhD
W.R. Kenan Jr. Distinguished Professor and chair of biostatistics
Professor of statistics and operations research
Co-principal investigator, Big Data to Knowledge training grant


The meaning of “big data” can be summarized this way: Data are much more complicated than they used to be.

They’re more complicated in several ways – in total amount (think of the billions of Facebook users we can now access online); in amount per person (with fitness trackers, we can gather vital statistics every few seconds for months, which yields an incredible amount of information for one individual); and in level of detail (brain imaging provides us with millions of pixels per image, all of which contain data ripe for analysis).

Big data also can move fast. There are many scenarios in which scientists need systems that can both gather data and continuously update the analysis of those data to create near-real-time monitoring. Imagine a program that screens emergency room data on a national level. As patients are admitted, the program monitors for any uptick in similar cases that might indicate an emerging trend. An abnormally high number of flu cases, for example, might lead researchers to identify a new influenza strain and activate an early response to it.

Big data hold nearly infinite possibilities for public health.

Q: What are some of the most promising developments in big data?

A: mHealth is a new arena for public health centered on digital interactions and the collection of remote sensing data. Here at the Gillings School, I work with Dr. Beth Mayer-Davis, Cary C. Boshamer Distinguished Professor and chair of the Department of Nutrition, to develop a wearable device that can monitor patients with Type 1 diabetes and intervene immediately to maximize their health and well-being. This device conceivably will collect data on metrics such as blood glucose level, heart rate and sleep quality.

By storing and continuously analyzing these data in the cloud, the device could track progress on fitness goals, warn a patient of danger (e.g., if they are on the verge of hypoglycemia and need a snack) or suggest strategies for healthful living, such as prompting a walk in a nearby park, based on a user’s GPS location.

As co-principal investigator for a National Institutes of Health-funded grant called “Big Data to Knowledge,”* I also have a view of how big data are being used to solve various biomedical problems. Our training program gives doctoral students a fundamental grounding in big data – they don’t become advanced experts, but they learn how to approach the data so they can lead collaborations addressing issues such as the opioid epidemic or schizophrenia. This training program involves more than 20 departments across the University – an example of how the best big data work is interdisciplinary.

Big data are complicated, can move fast and hold nearly infinite possibilities for public health.

Big data are complicated, can move fast and hold nearly infinite possibilities for public health.

Ambitious projects require many types of experts. In past years, a researcher might deliver his or her study data to a single analyst. The analyst would return the results, and that was that. The kinds of complex questions we’re asking today require working closely with multiple experts over time to approach problems in creative ways.

One project might involve a data scientist, a computer scientist and a biostatistician (to build data collection and analysis programs), experts in a specific domain, such as nutrition (to design the study or intervention), experts in user interfaces (to ensure the final product works well for patients), and experts in policy and implementation science (to build political will for an intervention and make it happen).

We now have the capacity to collect staggering amounts of information. Our capacity to comprehend the data and apply it for the public good is limited only by our imagination.

Q: What should we keep in mind as we participate in the big data revolution?

A: We don’t use the data we already have particularly well. The trajectory of big data will be defined by how quickly we learn to design better data collection systems that give us the right information, as well as analytics programs that transform those raw data into something useful. In our biostatistics department, we recognize the tremendous potential for creating new artificial intelligence (AI) tools that will change the face of public health research.

Many new developments involve something called “deep learning,” a very impressive type of AI that analyzes data to solve incredibly difficult problems. For example, a program exists – developed by Andre Esteva, doctoral student at Stanford University, and colleagues – that can access photos of individuals online and identify with great accuracy whether they have a particular type of skin cancer.**

When creating programs such as this, which work with people’s medical information, we always have to keep in mind the ethics of data sharing. As researchers, we shouldn’t invade people’s privacy or make it easy for others to do so.

We also must be committed to scientific integrity. Anyone working with big data should understand statistical inference issues and avoid big-data hubris. Bad design, biases and uncertainty all are enhanced when more data points are involved.

As we continue improving the technical tools and study designs we use to work with big data, we must remember not to put everything into one box. Big data represent a continually evolving and fast-moving new area. If we can avoid constraining it too early, I think it will surprise us with the places it can take future research.

*Kosorok’s co-principal investigator on the Big Data to Knowledge grant is M. Gregory Forest, PhD, Grant Dahlstrom Distinguished Professor of Mathematics, director of the Carolina Center for Interdisciplinary Applied Mathematics and associate chair of the Department of Applied Physical Sciences at UNC-Chapel Hill.

**See Esteva et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542, 115-118.


Return to Table of Contents

Carolina Public Health is a publication of the University of North Carolina at Chapel Hill Gillings School of Global Public Health. To view previous issues, please visit sph.unc.edu/cph.