About Our Research

The primary research goal of the Department of Biostatistics is to develop new statistical methodology to address issues in public health and the biomedical sciences. Investigators might also apply existing methodologies to new application areas in innovative ways. Because of the dual focus on statistical methodology and its application in biomedical and health-related areas, virtually all faculty members in the department have research interests both in the development of statistical methodology and in the application of statistics in applied research. Applied research projects generally involve other faculty in the Gillings School of Global Public Health, investigators at the Medical School or even scientists at outside institutes such as the Research Triangle Institute, the National Institute of Statistical Sciences or the National Institute of Environmental Health Sciences.

Scientific discoveries made by the Gillings School’s Dr. Michael Kosorok, left, were put into practice by pediatric pulmonologist Dr. George Retsch-Bogart. The result was better hospital care and better health for children with cystic fibrosis.
Scientific discoveries made by the Gillings School’s Dr. Michael Kosorok, left, were put into practice by pediatric pulmonologist Dr. George Retsch-Bogart. The result was better hospital care and better health for children with cystic fibrosis.


Faculty Areas of Expertise and Interest


Bayesian Methodology

There are two fundamental paradigms to statistical inference: frequentist and Bayesian. Unlike frequentist inference, Bayesian inference is a type of statistical inference in which one expresses uncertainties about parameters in a statistical model through a probability distribution, called the prior distribution. Then, after the data is collected, the prior distribution is updated using Bayes theorem, yielding the posterior distribution of the unknown parameter. Statistical inference about the unknown parameter is then carried out using the posterior distribution. There are many advantages of Bayesian methods for statistical inference, including (i) computational advantages in fitting complex models, (ii) the ability to make exact inference for any sample size without resorting to asymptotic calculations, (iii) incorporating prior information in a natural way, and (iv) flexibility in model building and data analysis tools. Bayesian methodology is employed in a wide range of biomedical applications, including genomics, clinical trials, medical imaging, environmental health, infectious diseases, and cancer research.

Faculty: Ibrahim, Ivanova, Stewart, Zhu, Zou

Categorical Data Analysis

Categorical data analysis involves the statistical analysis of a response variable described through discrete categories. These include binary, ordinal, and nominal scales of measurement, as well as discrete counts and grouped time-to-event data structures. Analysis of categorical data is typically performed through contingency tables or via statistical modeling (e.g., logistic or Poisson regression).

Faculty: Preisser, Qaqish, Schwartz, Stewart, Zhou

Clinical Trials

A clinical trial is a medical research study in humans designed to test the efficacy and safety of an intervention – a treatment or preventive measure. “Phase I” trials represent the first use of a new treatment in humans, often healthy volunteers. “Phase II” trials are the first tests of how the treatment works in people with the disease or condition of interest relative to a comparison (control) group. The treatment given to the control group is either a placebo (an inert substance or some other form of sham treatment designed to be indistinguishable from the new treatment) or another active treatment. “Phase III” trials are large-scale trials of the efficacy and safety of the new treatment against a control treatment. Faculty in the department are working on statistical methodology for clinical trials as well as collaborating in the design, conduct and analysis of trials. Trials range in size from small Phase I-II trials in the Lineberger Comprehensive Cancer Center and the School of Medicine to large multi-center national and international trials coordinated by the Collaborative Studies Coordinating Center.

Faculty: Couper, Hudgens, Ibrahim, Ivanova, Kosorok, F.C. Lin, Monaco, Qaqish, Schwartz, Stewart, Zeng, Zhou

Empirical Processes

Empirical processes are functions computed from data. In biostatistics, such quantities arise frequently in many areas, including brain imaging, genomics, and in clinical trials. Two common examples are the empirical cumulative distribution function and the Kaplan-Meier survival curve. Many statistical estimates can be shown to be solutions of estimating equations, and estimating equations are also empirical processes. The study of empirical processes consists of the development and understanding of foundational principles from probability, analysis and statistics that can apply to a broad array of related complex and high dimensional scientific settings. Empirical processes are especially useful in biostatistics for semiparametric modeling and inference.

Faculty: Kosorok, Zhou, Zhu

Generalized Linear Models

Generalized linear models provide a flexible generalization of linear models. The use of generalized linear models provides the flexibility to model responses of various types, including continuous responses (as in multiple linear regression), dichotomous responses (as in logistic regression), and counts (as in Poisson regression). More specifically, a generalized linear model allows a linear model to be related to the response variable via a link function and by allowing the variance of each response to be a function of its mean value, thus relaxing the often inappropriate variance homogeneity assumption. Available software allows for the building of valid and precise generalized linear models involving multiple predictors and for the implementation of regression diagnostic procedures.

Faculty: Ibrahim, Q. Li, Liu, Preisser, Qaqish, Sotres-Alvarez, Stewart, Truong, Zhou, Zhu

Longitudinal or Dependent Data

Correlated data arise in many public health settings, such as when multiple measurements are taken on a single subject or subjects within a family. Longitudinal data are a special case of correlated data in which outcomes from each subject are measured over time. Methods for analysis of longitudinal data are a major research area in the department, with considerable overlap with many other research areas, including generalized linear models, clinical trials survival analysis, Bayesian methods, survey sampling statistical genetics and computational biology, missing data, nonparametric methods, and imaging.

Faculty: Cai, Ibrahim, Preisser, Qaqish, Sotres-Alvarez, Stewart, Tan, Truong, Zeng, Zhou, Zhu

Medical Imaging

Medical imaging research centers on improving the display and analysis of images collected with a wide array of methods, including traditional x-rays, CT, MRI, PET, and ultrasound, among others. Such images are used for diagnosis, treatment planning, guidance, and verification. The tremendous growth in the use and study of medical imaging continues. Similar growth in computing power interacts with advances in basic sciences and computing methods to drive these advances. Biostatisticans play key roles in the research. Collaboration with members of the departments of Radiology, Radiation Oncology, Neurosurgery, Psychiatry, Biomedical Engineering, and Computer Science involve study design, analysis, and sample size choice, as well as data reduction and representations. The research includes both large multicenter trials and small laboratory studies. The medical research focuses on improving image representation and processing, as well as improving display. Over seventy faculty, students and staff on the Carolina campus identify themselves as involved in the Medical Display and Analysis Group.

Faculty: Marron, Truong, Zeng, Zhu

Missing or Mismeasured Data

Missing data often arise in biomedical research when data is incomplete for one or more subjects. Specific examples include unavailability of covariate measurements, survey nonresponse, study subjects failing to report to a clinic for monthly evaluations, respondents refusing to answer certain items on a questionnaire, or when data are lost. Intuitively, when the subjects with missing values differ systematically from those with complete data with respect to the outcome of interest, results from a traditional data analysis omitting the missing cases, called a complete case analysis, may no longer be valid. As a result, much research has been developed for dealing with missing data and many statistical methods have been developed for carrying out appropriate statistical inference in the presence of missing data. The four common methodologies for dealing with missing data include maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB) methods, and methods based on weighted estimating equations (WEE). This topic remains a very active area of research.

Faculty: Cai, Ibrahim, Preisser, Stewart, Zeng, Zhou

Semiparametric or Nonparametric Methodology

Semiparametric and nonparametric models provide much more flexible structures for modeling complex data, as compared to parametric models. Nonparametric models leave data distributions completely unknown; while semiparametric models onlyparameterize those of interest. They are widely used in survival analysis, longitudinal data analysis and economic data analysis, etc. The estimation methods for these models include kernel estimation, spline smoothing, local polynomials, and nonparametric maximum likelihood estimation and many other techniques. The empirical process theory and semiparametric efficiency theory have become standard mathematical tools in the statistical inference.

Faculty: Bair, Cai, Hudgens, Kosorok, Q. Li, Liu, Truong, Zeng, Zhou, Zhu, Zou

Statistical Genetics and Computational Biology

Statistical genetics and computational biology are attracting increasing interest in the biostatistical community. Traditionally, the field of statistical genetics has dealt with the study of inherited genetic variation and its effects on disease risk, while computational biology has focused on problems such as extracting information from genetic sequence data and modeling physical, chemical, and evolutionary properties of DNA and proteins. However, recent advances in experimental techniques are expanding both of these fields, while at the same time producing a convergence and synthesis of statistical approaches to biological systems. These new technologies, which include high throughput microarrays for expression and genotyping, tiling arrays, and sequencing techniques, generate an enormous amount of data. These data, combined with an increasing knowledge of biological systems, present a variety of interesting and challenging scientific questions for biostatistical researchers. The Department of Biostatistics at UNC collaborates with other departments/schools at UNC to address various problems in genetics and computational biology. Research in the department has resulted in new statistical theory and methodology, as well as new algorithms and popular software. Specific interests include methods for DNA/protein sequence alignment and motif finding, analyzing protein-protein and protein-DNA interactions, studies on nucleosome location and regulation, analysis of microarray gene expression and genomic pathways, cancer genomics, analysis of genome-wide association datasets, and systems biology approaches to incorporating different types of biological data.

Faculty: Ibrahim, Q. Li, Lin, Liu, Truong, Zou

Survey Sampling

Survey sampling is relevant to public health research when one wishes to learn something (e.g., rates or determinants of knowledge about HIV-AIDS transmission) about a large well-defined population of humans (e.g., all current non-institutionalized adult residents of North Carolina). Good samples in this research setting are those where random selection methods are strategically used to capture the population’s diversity in a way that all of study’s information needs can be met. Appropriate estimation methods must also be used to extract findings from the sample data. Some of the areas of recent research in this area have been on ways to more effectively sample rare and elusive population subgroups (e.g., race-ethnic groups and the homeless) and on improving the statistical quality of sample-derived estimates, especially from samples obtained from more complex designs.

Faculty: Zeng, Zhou

Survival Analysis

Survival analysis is a popular data analysis approach for handling possibly censored time-to-event data. Censoring refers to the incomplete observation of the failure time. Censored time-to-event-data arise often in biomedical research. Kaplan-Meier estimation, logrank test, and Cox regression model are some commonly used tools in survival analysis.

Faculty: Bair, Cai, Hudgens, Ibrahim, Kosorok, Lin, Monaco, Truong, Zeng, Zhou


AIDS, HIV and Other Infectious Diseases

The Department of Biostatistics participates in AIDS related research by leading the Biostatistics Core of the Center for AIDS Research (CFAR) at the University Of North Carolina at Chapel Hill. The Biostatistics Core facilitates, encourages, strengthens, and expands new AIDS-related research. The Core makes current knowledge of statistical science readily available to investigators participating in research related to HIV and AIDS. It identifies and resolves questions of statistical methodology, invents/refines and publishes new statistical methods for AIDS research, and assists in design, data collection, data management and quality control in coordinated studies.

Faculty: Hudgens, F.C. Lin, Stewart, Truong, Zeng


Biostatistical research in the cancer area is primarily conducted through the Biostatistics Core Facility in the Lineberger Comprehensive Cancer Center at UNC-Chapel Hill. The principal objective of this core facility is to provide statistical analytic collaboration and support to university cancer researchers. Center researchers’ statistical needs encompass research/protocol design and data analysis for clinical trials and for epidemiologic, cancer prevention and control, and basic science studies. Technical support (management, preparation, and analysis of data) is also critical to the smooth functioning of this core. The Facility is supporting over 50 institutional clinical trials. Research areas of particular emphasis are: breast cancer mammography and treatment, lung cancer treatment, and prostrate cancer screening and treatment.

Faculty: Bair, Cai, Ibrahim, Ivanova, D. Lin, Liu, Qaqish, Tan, Zeng, Zhou

Cardiovascular and Cerebrovascular Disease

The Collaborative Studies Coordinating Center (CSCC), a division within the Department of Biostatistics, has been involved with research in cardiovascular disease since its founding in 1971 under the name Lipids Research Clinics (LRC) Coordinating Center. The LRC project was supported by the National Heart, Lung, and Blood Institute (NHLBI), of the National Institutes of Health NIH. The LRC was funded for 19 years, making it one of the longest running studies funded by NIH. The LRC study was actually a collection of studies, including the major clinical trial establishing that cholesterol reduction could prevent heart disease, an epidemiologic cohort study focusing on lipids and cardiovascular disease, and a family study on the genetic factors related to the lipids/cardiovascular disease association. In 1984, the “Lipids Coordinating Center ” changed its name to CSCC to reflect the broad area of health research done at the Center. The CSCC is responsible for statistical, scientific, study management, and quality assurance roles for various health research studies, including active development of fieldwork and support for publication activities. The current cardiovascular research projects include: The Arteriosclerosis Risk in Communities Study (ARIC), a cardiovascular epidemiologic study; the Enhancing Recovery in Coronary Heart Diseases Study (ENRICHD), a study of the effect of psychosocial interventions in post-heart attack patients; the Obesity Prevention in Native American Children (PATHWAYS), a school-based intervention to prevent obesity in native American children; the Vitamin Intervention for Stroke Prevention (VISP) study, a randomized clinical trial to prevent recurrent stroke; and the Trial of Activity for Adolescent Girls (TAAG) study, a school-based intervention to prevent a falloff in physical activity as girls move into adolescence.

Faculty: Cai, Couper, Q. Li, F.C. Lin, Schwartz, Stewart


The Department of Biostatistics actively participates in dental research through collaboration with faculty members from the UNC Dental School. Biostatistics faculty collaborate on projects funded through the Comprehensive Center for Inflammatory Disorders (CCID), formed via a 5-year grant from the National Institute of Dental Research. Among the goals of the center are to identify innovative approaches to the prevention, diagnosis, and treatment of chronic inflammation, and to integrate basic research studies of inflammation with patient-based and population research on inflammatory diseases and disorders such as arthritis, cardiovascular disease, periodontal disease, asthma, inflammatory bowel disease, and pre-term labor and delivery. Further collaboration in dental research takes place through the Biostatistics Consulting Laboratory (BCL) of the Department of Biostatistics. The BCL provides assistance in study design, data analysis, interpretation of results, and assistance in publication.

Faculty: Cai, Couper, Preisser, Stewart, Zhu

Environmental and Occupational Exposure

The Environmental Biostatistics research and training program was established in the Department of Biostatistics in 1971. The training program has had continual financial support over a period of thirty years from the National Institute of Environmental Health Sciences (NIEHS), and it is currently the largest training program funded by NIEHS. This training program has produced many Masters, doctoral, and postdoctoral graduates who now occupy leadership positions in academia, government, and private industry. Faculty and students in this program conduct state-of-the-art biostatistical research relevant to important environmental health problems and provide high-level statistical consulting support for other researchers in the environmental health field. The Department of Biostatistics works closely with the Departments of Environmental Sciences and Engineering and Epidemiology in this research program. The Department also has collaborative research arrangements with researchers at the National Institute of Environmental Health Sciences and the Environmental Protection Agency located at the nearby Research Triangle Park. Environmental health research concentrates on investigating the effects of environmental exposures on human health. The research program also incorporates molecular biology, especially as it relates to gene-environment interactions. Collaboration with the Department of Epidemiology expands the role of Environmental Biostatistics to include statistical issues in genetic epidemiology and reproductive epidemiology. Faculty and students in Environmental Biostatistics also conduct research on toxicology studies in collaboration with researchers in the Department of Environmental Sciences and Engineering.

Faculty: Cai, Preisser, Qaqish, Stewart, Truong, Zhou

Psychology and Mental Health

The Department of Biostatistics collaborates with the School of Medicine in conducting research on the area of psychiatry. This is done both by having joint faculty appointments in departments of Psychiatry and Biostatistics, and by having biostatistics faculty participate in psychiatry research. For example the Biostatistics department faculty members currently participate in a clinical drug study which is the largest research contact ever awarded by the National Institutes of Mental Health. This study focuses on a new group of atypical anti-psychotic drugs, used to treat schizophrenia and Alzheimer’s disease.

Faculty: Cai, Lin, Schwartz, Stewart, Tan, Truong, Zhu, Zou