Faculty Areas of Expertise and Interest
Methodology
Bayesian Methodology There are two fundamental paradigms to statistical inference: frequentist and Bayesian. Unlike frequentist inference, Bayesian inference is a type of statistical inference in which one expresses uncertainties about parameters in a statistical model through a probability distribution, called the prior distribution. Then, after the data is collected, the prior distribution is updated using Bayes theorem, yielding the posterior distribution of the unknown parameter. Statistical inference about the unknown parameter is then carried out using the posterior distribution. There are many advantages of Bayesian methods for statistical inference, including (i) computational advantages in fitting complex models, (ii) the ability to make exact inference for any sample size without resorting to asymptotic calculations, (iii) incorporating prior information in a natural way, and (iv) flexibility in model building and data analysis tools. Bayesian methodology is employed in a wide range of biomedical applications, including genomics, clinical trials, medical imaging, environmental health, infectious diseases, and cancer research. Categorical Data Analysis Categorical data analysis involves the statistical analysis of a response variable described through discrete categories. These include binary, ordinal, and nominal scales of measurement, as well as discrete counts and grouped time-to-event data structures. Analysis of categorical data is typically performed through contingency tables or via statistical modeling (e.g., logistic or Poisson regression). Clinical Trials A clinical trial is a medical research study in humans designed to test the efficacy and safety of an intervention – a treatment or preventive measure. “Phase I” trials represent the first use of a new treatment in humans, often healthy volunteers. “Phase II” trials are the first tests of how the treatment works in people with the disease or condition of interest relative to a comparison (control) group. The treatment given to the control group is either a placebo (an inert substance or some other form of sham treatment designed to be indistinguishable from the new treatment) or another active treatment. “Phase III” trials are large-scale trials of the efficacy and safety of the new treatment against a control treatment. Faculty in the department are working on statistical methodology for clinical trials as well as collaborating in the design, conduct and analysis of trials. Trials range in size from small Phase I-II trials in the Lineberger Comprehensive Cancer Center and the School of Medicine to large multi-center national and international trials coordinated by the Collaborative Studies Coordinating Center. Faculty: Couper, Hudgens, Ibrahim, Ivanova, Kosorok, LaVange, F.C. Lin, Monaco, Psioda, Qaqish, Schwartz, Zeng, Zhou Data Science Biostatistics is the core of data science and its application to human health. As massive, complex and varied data sets are produced through biomedical research, the necessity of skills possessed by biostatisticians is more evident than ever. The Department of Biostatistics develops methods and software in the field of Data Science, often in collaboration with other units on campus, including the Departments of Mathematics, Epidemiology, Genetics, and others. Examples of the types of complex data sets that researchers in the Department of Biostatistics may encounter include cohort data gathered for inference of causal effects of treatments or behaviors, or genetic and multi-omics data gathered to understand the inner workings of cells and tissues. Key aspects to Data Science research include scalability, flexible modeling, visualization and exploratory data analysis (EDA), assessment of technical or selection biases, implementation and distribution of methods as open source software, and computational reproducibility. Empirical Processes Empirical processes are functions computed from data. In biostatistics, such quantities arise frequently in many areas, including brain imaging, genomics, and in clinical trials. Two common examples are the empirical cumulative distribution function and the Kaplan-Meier survival curve. Many statistical estimates can be shown to be solutions of estimating equations, and estimating equations are also empirical processes. The study of empirical processes consists of the development and understanding of foundational principles from probability, analysis and statistics that can apply to a broad array of related complex and high dimensional scientific settings. Empirical processes are especially useful in biostatistics for semiparametric modeling and inference. Longitudinal or Dependent Data Correlated data arise in many public health settings, such as when multiple measurements are taken on a single subject or subjects within a family. Longitudinal data are a special case of correlated data in which outcomes from each subject are measured over time. Methods for analysis of longitudinal data are a major research area in the department, with considerable overlap with many other research areas, including generalized linear models, clinical trials survival analysis, Bayesian methods, survey sampling, statistical genetics and computational biology, missing data, nonparametric methods, and imaging. Faculty: Cai, Crandell, Garcia, Howard, Ibrahim, Preisser, Qaqish, Sotres-Alvarez, Tan, Truong, Zeng, Zhou, Zhu Medical Imaging Medical imaging research centers on improving the display and analysis of images collected with a wide array of methods, including traditional x-rays, CT, MRI, PET, and ultrasound, among others. Such images are used for diagnosis, treatment planning, guidance, and verification. The tremendous growth in the use and study of medical imaging continues. Similar growth in computing power interacts with advances in basic sciences and computing methods to drive these advances. Biostatisticans play key roles in the research. Collaboration with members of the departments of Radiology, Radiation Oncology, Neurosurgery, Psychiatry, Biomedical Engineering, and Computer Science involve study design, analysis, and sample size choice, as well as data reduction and representations. The research includes both large multicenter trials and small laboratory studies. The medical research focuses on improving image representation and processing, as well as improving display. Over seventy faculty, students and staff on the Carolina campus identify themselves as involved in the Medical Display and Analysis Group. Missing or Mismeasured Data Missing data often arise in biomedical research when data is incomplete for one or more subjects. Specific examples include unavailability of covariate measurements, survey nonresponse, study subjects failing to report to a clinic for monthly evaluations, respondents refusing to answer certain items on a questionnaire, or when data are lost. Intuitively, when the subjects with missing values differ systematically from those with complete data with respect to the outcome of interest, results from a traditional data analysis omitting the missing cases, called a complete case analysis, may no longer be valid. As a result, much research has been developed for dealing with missing data and many statistical methods have been developed for carrying out appropriate statistical inference in the presence of missing data. The four common methodologies for dealing with missing data include maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB) methods, and methods based on weighted estimating equations (WEE). This topic remains a very active area of research. Faculty: Cai, Crandell, Garcia, Ibrahim, Preisser, Zeng, Zhou Precision Medicine and Causal Inference Precision medicine and causal inference both seek to discover which treatments or interventions work best based on data collected from existing records, clinical trials and other randomized studies, and/or from observational studies. Causal inference provides a framework for understanding and addressing how to differentiate between correlative relationships and cause-and-effect relationships, including addressing confounding and other relevant inferential challenges. Precision medicine seeks to discover causally valid ways to learn how to treat each patient in a population, especially when that population is heterogeneous and diverse. The Department of Biostatistics is a world leader in both causal inference and precision medicine methodology and applications. Semiparametric models and machine learning tools, including deep learning and reinforcement learning, are often utilized in these exciting and rapidly evolving areas. Semiparametric or Nonparametric Methodology Semiparametric and nonparametric models provide much more flexible structures for modeling complex data, as compared to parametric models. Nonparametric models leave data distributions completely unknown; while semiparametric models only parameterize those of interest. They are widely used in survival analysis, longitudinal data analysis and economic data analysis, etc. The estimation methods for these models include kernel estimation, spline smoothing, local polynomials, and nonparametric maximum likelihood estimation and many other techniques. The empirical process theory and semiparametric efficiency theory have become standard mathematical tools in the statistical inference. Faculty: Cai, Hudgens, Kosorok, Q. Li, Liu, Truong, Zeng, Zhou, Zhu, Zou Statistical Genetics and Computational Biology Statistical Genetics and Computational Biology are key disciplines in the study of human health and disease, and often build upon foundational theory in Biostatistics. These disciplines concern datasets of varying scale, from examination of individual cells within a tissue or tumor, to massive biobank data sets with ‘omics data collected on hundreds of thousands to millions of individuals. These data, combined with an increasing knowledge of biological systems, present a variety of interesting and challenging scientific questions for biostatistical researchers. The Department of Biostatistics at UNC collaborates with other departments and schools at UNC to address various problems in Genetics and Computational Biology. Research in the department has resulted in new statistical theory and methodology, as well as new algorithms and popular software packages. Specific interests include methods for inference of single cell gene expression and epigenetic state, cancer genomics, integration of large ‘omics datasets from NIH programs such as TOPMed, personalized medicine, and identifying molecular mechanisms for loci identified by genome-wide association studies (GWAS). Faculty: Ibrahim, Jiang, Q.Li, Y.Li, Lin, Liu, Love, Rashid, Truong, Wu, Zou Survey Sampling Survey sampling is relevant to public health research when one wishes to learn something (e.g., rates or determinants of knowledge about HIV-AIDS transmission) about a large well-defined population of humans (e.g., all current non-institutionalized adult residents of North Carolina). Good samples in this research setting are those where random selection methods are strategically used to capture the population’s diversity in a way that all of study’s information needs can be met. Appropriate estimation methods must also be used to extract findings from the sample data. Some of the areas of recent research in this area have been on ways to more effectively sample rare and elusive population subgroups (e.g., race-ethnic groups and the homeless) and on improving the statistical quality of sample-derived estimates, especially from samples obtained from more complex designs. Survival Analysis Survival analysis is a popular data analysis approach for handling possibly censored time-to-event data. Censoring refers to the incomplete observation of the failure time. Censored time-to-event-data arise often in biomedical research. Kaplan-Meier estimation, logrank test, and Cox regression model are some commonly used tools in survival analysis. Faculty: Cai, Hudgens, Ibrahim, Kosorok, Lin, Monaco, Truong, Zeng, Zhou COVID, HIV and Other Infectious Diseases The Department of Biostatistics participates in numerous Covid-19 and HIV related research projects. For example, students and faculty in the department participate in the Biostatistics Core of the Center for AIDS Research (CFAR) at the University Of North Carolina at Chapel Hill. The purpose of the CFAR Biostatistics Core is to accelerate successful HIV/AIDS research by direct provision of biostatistical support to HIV related projects, and by arranging mutually beneficial collaborations between CFAR researchers and statistical scientists. The department’s CSCC also serves as the coordinating center for the Adolescent Medicine Trials Network for HIV/AIDS Interventions. Since 2019, many of our faculty and students have pivoted to working on Covid-19 related research projects on various aspects of the pandemic, ranging from seroprevalence studies to the design and analysis of clinical trials for evaluation of vaccines and treatments for SARS-CoV-2. Cancer Biostatistics research in cancer is primarily conducted through the Biostatistics Core Facility in the Lineberger Comprehensive Cancer Center (LCCC) at UNC-Chapel Hill. The principal objective of this core facility is to provide statistical collaboration and biostatistical, data science, and machine learning support to university cancer researchers. LCCC researchers’ statistical needs encompass research/protocol design, data analysis for clinical trials, as well as methods for epidemiologic, genomic, cancer prevention and control, and basic science studies. Technical support, including the management, preparation, visualization, and reproducible analysis of data is also critical to the smooth functioning of this core. The Facility is supporting over 50 institutional clinical trials. Biostatistics faculty members associated with the core also engage in the development of novel statistical methods and software to support cancer research efforts at the LCCC, including the development of innovative dose finding designs, sequential continuous toxicity monitoring methods, and SMART designs. Significant efforts are also dedicated to the development of methods and software for genomic data analysis, such as bulk RNAseq, DNA-seq, ChIP-seq, proteomic, and single cell data. Recent work from faculty has impacted clinical practice in a number of areas in cancer, including breast and pancreatic cancer. Faculty: Cai, Ibrahim, Ivanova, Jiang, Kosorok, D.Lin, Liu, Love, Rashid, Qaqish, Tan, Wu, Zeng Cardiovascular and Cerebrovascular Disease The Collaborative Studies Coordinating Center (CSCC), a division within the Department of Biostatistics, has been involved with research in cardiovascular disease since its founding in 1971 under the name Lipids Research Clinics (LRC) Coordinating Center. The LRC project was supported by the National Heart, Lung, and Blood Institute (NHLBI), of the National Institutes of Health NIH. The LRC was funded for 19 years, making it one of the longest running studies funded by NIH. The LRC study was actually a collection of studies, including the major clinical trial establishing that cholesterol reduction could prevent heart disease, an epidemiologic cohort study focusing on lipids and cardiovascular disease, and a family study on the genetic factors related to the lipids/cardiovascular disease association. In 1984, the “Lipids Coordinating Center ” changed its name to CSCC to reflect the broad area of health research done at the Center. The CSCC is responsible for statistical, scientific, study management, and quality assurance roles for various health research studies, including active development of fieldwork and support for publication activities. The current cardiovascular research projects include: The Arteriosclerosis Risk in Communities Study (ARIC), a cardiovascular epidemiologic study; the Enhancing Recovery in Coronary Heart Diseases Study (ENRICHD), a study of the effect of psychosocial interventions in post-heart attack patients; the Obesity Prevention in Native American Children (PATHWAYS), a school-based intervention to prevent obesity in native American children; the Vitamin Intervention for Stroke Prevention (VISP) study, a randomized clinical trial to prevent recurrent stroke; and the Trial of Activity for Adolescent Girls (TAAG) study, a school-based intervention to prevent a falloff in physical activity as girls move into adolescence. Dentistry The Department of Biostatistics actively participates in dental research through collaboration with faculty members from the UNC Dental School. Biostatistics faculty collaborate on projects funded through the Comprehensive Center for Inflammatory Disorders (CCID), formed via a 5-year grant from the National Institute of Dental Research. Among the goals of the center are to identify innovative approaches to the prevention, diagnosis, and treatment of chronic inflammation, and to integrate basic research studies of inflammation with patient-based and population research on inflammatory diseases and disorders such as arthritis, cardiovascular disease, periodontal disease, asthma, inflammatory bowel disease, and pre-term labor and delivery. Further collaboration in dental research takes place through the Biostatistics Consulting Laboratory (BCL) of the Department of Biostatistics. The BCL provides assistance in study design, data analysis, interpretation of results, and assistance in publication. Environmental and Occupational Exposure The Environmental Biostatistics research and training program was established in the Department of Biostatistics in 1971. The training program has had continual financial support over a period of thirty years from the National Institute of Environmental Health Sciences (NIEHS), and it is currently the largest training program funded by NIEHS. This training program has produced many Masters, doctoral, and postdoctoral graduates who now occupy leadership positions in academia, government, and private industry. Faculty and students in this program conduct state-of-the-art biostatistical research relevant to important environmental health problems and provide high-level statistical consulting support for other researchers in the environmental health field. The Department of Biostatistics works closely with the Departments of Environmental Sciences and Engineering and Epidemiology in this research program. The Department also has collaborative research arrangements with researchers at the National Institute of Environmental Health Sciences and the Environmental Protection Agency located at the nearby Research Triangle Park. Environmental health research concentrates on investigating the effects of environmental exposures on human health. The research program also incorporates molecular biology, especially as it relates to gene-environment interactions. Collaboration with the Department of Epidemiology expands the role of Environmental Biostatistics to include statistical issues in genetic epidemiology and reproductive epidemiology. Faculty and students in Environmental Biostatistics also conduct research on toxicology studies in collaboration with researchers in the Department of Environmental Sciences and Engineering. Neuroscience, Psychiatry, and Mental Health The Department of Biostatistics collaborates with the School of Medicine in conducting research on the area of psychiatry. This is done both by having joint faculty appointments in departments of Psychiatry and Biostatistics, and by having biostatistics faculty participate in psychiatry research. For example the Biostatistics department faculty members currently participate in a clinical drug study which is the largest research contact ever awarded by the National Institutes of Mental Health. This study focuses on a new group of atypical anti-psychotic drugs, used to treat schizophrenia and Alzheimer’s disease. Faculty: Cai, Garcia, Y.Li, D.Lin, Love, Schwartz, Tan, Truong, Zhu, Zou
Collaborative