The Bernard G. Greenberg
Distinguished Lecture Series

The Bernard G. Greenberg Distinguished Lecture Series honors the first chair of the UNC Biostatistics Department, Dr. Bernard G. Greenberg, who later served with distinction as dean of the School of Public Health from 1972 to 1982.


2024 Greenberg Lecture Series – May 20 and 21

Daniela M. Witten, PhD, Professor of Statistics and Biostatistics at the University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics
2024 Greenberg Lecture Series (PDF).


Lecture #1: “Data Thinning and its Applications”
We propose data thinning, a new approach for splitting an observation from a known distributional family with unknown parameter(s) into two or more independent parts that sum to yield the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general and can be applied to a broad class of distributions within the natural exponential family, including the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. Furthermore, we generalize data thinning to enable splitting an observation into two or more parts that can be combined to yield the original observation using an operation other than addition; this enables the application of data thinning far beyond the natural exponential family. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the “usual” approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data, in a setting where sample splitting is not applicable. This is joint work with Anna Neufeld (Fred Hutch), Ameer Dharamshi (University of Washington), Lucy Gao (University of British Columbia), and Jacob Bien (University of Southern California).


Lecture #2: “Selective Inference for Clustering”
In contemporary applications, it is common to collect very large data sets with the vaguely-defined goal of hypothesis generation. Once a dataset is used to generate a hypothesis, we might wish to test that hypothesis on the same set of data. However, this type of “double dipping” violates a cardinal rule of statistical hypothesis testing: namely, that we must decide what hypothesis to test before looking at the data. When this rule is violated, then standard statistical hypothesis tests (such as t-tests and z-tests) fail to control the selective Type 1 error — that is, the probability of rejecting the null hypothesis, provided that the null hypothesis holds, and given that we decided to test this null hypothesis. While double dipping is pervasive across many application areas, in this talk Dr. Witten will focus on the analysis of single-cell RNA-sequencing data, in which it is common to cluster a set of observations — corresponding to cells — and then to test for “statistical significance” of the resulting clusters. While of course a naive double-dipping approach to this task is not valid, she will show that we can apply the framework of conditional selective inference to conduct valid inference in this setting. In particular, she will consider settings in which the clusters are estimated via hierarchical or k-means clustering. This work was conducted in collaboration with UW PhD students Lucy Gao (Biostat PhD 2020) and Yiqun Chen (Biostat PhD 2022), as well as Jacob Bien (USC).


Lecture #3: “Inference After F-screening in Linear Regression”
It is well-known that researchers tend to publish only positive findings. The consequence of this reality, known as the “file drawer problem”, is that the published literature is rife with “findings” for which the statistical evidence is vastly overstated. Dr. Witten will consider an idealized case of the file drawer problem, in which a researcher performs “F-screening” to the output of a multiple linear regression model: that is, they decide whether to publish the model’s output based on whether the overall F-test yields a p-value below a specified threshold, such as 0.05. It is clear that among the datasets that survive F-screening, the p-values for the individual regression coefficients will not follow a Uniform(0,1) distribution, even when the null hypothesis holds. In this talk, she will propose a solution to the F-screening problem using the conditional selective inference framework. In particular, she will show that we can conduct inference on the coefficients in a multiple linear regression model conditional on the fact that the model output survived F-screening. This will enable selective Type 1 error control. Remarkably, this correction for F-screening does not require access to the raw data used to fit the model, nor even to the sufficient statistics in the regression model: we can conduct the correction using only the regression output of a standard statistical software package, e.g., summary(lm(y~x)) in R. This is joint work with Olivia McGough (UW Stat PhD ongoing) and Dan Kessler (UW, soon to be faculty in UNC STOR and SDSS).


2022 Lecture Videos

Lecture #1: "All the ways that Bayes can go wrong"

Probability theory is false. Weak priors give strong and implausible posteriors. If you could give me your subjective prior I wouldn't need Bayesian inference. The best predictive model averaging is non-Bayesian. There will always be a need to improve our models. Nonetheless, we still find Bayesian inference to be useful. How can we make the best use of Bayesian methods in light of all their flaws?

Lecture #2 "From sampling and causal inference to policy analysis: Interactions and the challenges of generalization"

The three central challenges of statistics are generalizing from sample to population, generalizing from control to treated group, and generalizing from observed data to underlying constructs of interest. These are associated with separate problems of sampling, causal inference, and measurement, but in real decision problems all three issues arise. We discuss the way in which varying treatment effects (interactions) bring sampling concerns into causal inference, along with the real challenges of applying this insight into real problems. We consider applications in medical studies, A/B testing, social science research, and policy analysis.

Lecture #3: "Statistical workflow"

Statistical modeling has three steps: model building, inference, and model checking, followed by possible improvements to the model and new data that allow the cycle to continue. But we have recently become aware of many other steps of statistical workflow, including simulated-data experimentation, model exploration and understanding, and visualizing models in relation to each other. Tools such as data graphics, sensitivity analysis, and predictive model evaluation can be used within the context of a topology of models, so that data analysis is a process akin to scientific exploration. We discuss these ideas of dynamic workflow along with the seemingly opposed idea that statistics is the science of defaults. We need to expand our idea of what data analysis is, in order to make the best use of all the new techniques being developed in statistical modeling and computation.


Past Speakers

2022 – Andrew Gelman, PhD, Columbia University

Dr. Andrew Gelman is the winner of the 2022 Greenberg Distinguished Lecturer Award, and presented talks as part of the 2022 Bernard G. Greenberg Distinguished Lecture Series. Dr. Gelman is a professor of statistics and political science at Columbia University. He has received the Outstanding Statistical Application award three times from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. His books include Bayesian Data Analysis (with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Don Rubin), Teaching Statistics: A Bag of Tricks (with Deb Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, and Jeronimo Cortina), A Quantitative Tour of the Social Sciences (co-edited with Jeronimo Cortina), and Regression and Other Stories (with Jennifer Hill and Aki Vehtari).

2021- Dr. Xihong Lin, Harvard University

Professor Xihong Lin

Dr. Xihong Lin, winner of the 2021 Greenberg Distinguished Lecturer Award, will present talks as part of the 2021 Bernard G. Greenberg Distinguished Lecture Series. Lin is a Professor and former Chair of the Department of Biostatistics, Coordinating Director of the Program in Quantitative Genomics at the Harvard T. H. Chan School of Public Health, and Professor of the Department of Statistics at the Faculty of Arts and Sciences of Harvard University, and Associate Member of the Broad Institute of Harvard and MIT.


2019- Dr. Nicholas Jewell, University of California Berkeley

Dr. Nicholas Jewell

Dr. Nicholas Jewell, winner of the 2019 Greenberg Distinguished Lecturer Award, presented talks as part of the 2019 Bernard G. Greenberg Distinguished Lecture Series. Jewell is a Professor of Biostatistics and Statistics from the University of California Berkley. He received his PhD in mathematics from the University of Edinburgh in 1976.


2018- Dr. Jamie Robins, Harvard University

Dr. Jamie Robins

Dr. Jamie Robins, winner of the 2018 Greenberg Distinguished Lecturer Award, presented talks on May 14 and 15 as part of the 2018 Bernard G. Greenberg Distinguished Lecture Series. Robins is a Mitchell L. and Robin LaFoley Dong Professor of Epidemiology at Harvard University. He received his MD from the Washington University School of Medicine in 1976.


2017- Dr. Robert E. Kass, Carnegie Mellon

Photo of Dr. Robert E. Kass

Dr. Robert E. Kass

Dr. Robert E. Kass, winner of the 2017 Greenberg Distinguished Lecturer Award, presented talks on May 15 and 16 as part of the 2017 Bernard G. Greenberg Distinguished Lecture Series. Kass is a Maurice Falk Professor of Statistics and Computational Neuroscience at Carnegie Mellon University. He received his doctorate in statistics from the University of Chicago and has been been on the faculty of the Department of Statistics at Carnegie Mellon since 1981.


2016 – Dr. James O. Berger, Duke University

Dr. James Berger

Dr. James Berger

James O. Berger, PhD, winner of the 2016 Greenberg Distinguished Lecturer Award, presented three talks on May 12 and 13 as part of the 2016 Bernard G. Greenberg Distinguished Lecture Series. Berger’s lectures included “The Use of Rejection Odds and Rejection Ratios in Testing Hypotheses,” [PDF] “The Progress on the Foundations of Bayesian-Frequentist Unification” [PDF] and “Bayesian Multiplicity Control” [PDF].


2015 – Dr. Susan A. Murphy, University of Michigan

Susan A. Murphy

Susan A. Murphy
Photo Courtesy of the John D. and Catherine T. MacArthur Foundation

Dr. Susan A. Murphy, winner of the 2015 Greenberg Distinguished Lecturer Award, presented talks on May 11 and 12 as part of the 2015 Bernard G. Greenberg Distinguished Lecture Series. Dr. Murphy is a H.E. Robbins Distinguished University Professor of statistics and professor of psychiatry at the University of Michigan. She received her doctorate in statistics from UNC-Chapel Hill and was named a John D. and Catherine T. MacArthur Foundation Fellow for her work in designing the Sequential Multiple Assignment Randomized Trial, or SMART.


2014 – Dr. Jianqing Fan, Princeton University

Dr. Jianqing Fan

Dr. Jianqing Fan

Dr. Jianqing Fan, winner of the 2014 Greenberg Distinguished Lecturer Award, presented talks on May 28 and 29 as part of the 2014 Bernard G. Greenberg Distinguished Lecture Series. Fan is the Frederick L. Moore Professor of Finance and chair of the Department of Operations Research and Financial Engineering at Princeton University.  View the presentation abstracts and slides.

2013 – Dr. Trevor Hastie, Stanford University

Dr. Trevor Hastie

Dr. Trevor Hastie

Dr. Trevor Hastie, winner of the 2013 Greenberg Distinguished Lecturer Award, presented talks on May 8 and 9 as part of the 2013 Bernard G. Greenberg Distinguished Lecture Series. Hastie is a professor of statistics and professor of health, research and policy at Stanford University. Hastie’s lectures included “Sparse Linear Models” [PDF] “Matrix Completion and Large Scale SVD Computation” [PDF] and “Graphical Model Selection” [PDF].


2012 – Dr. Robert John Tibshirani, Stanford University

Dr. Robert Tibshirani

Dr. Robert Tibshirani

Dr. Robert John Tibshirani, winner of the 2012 Greenberg Distinguished Lecturer Award, presented talks on June 6 and 7 as part of the 2012 Bernard Greenberg Distinguished Lecture Series. Tibshirani is a professor of public health sciences and statistics at Stanford University. Tibshirani’s lectures included “Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data” [PDF] “The lasso: some novel algorithms and applications” [PDF] and “Sparse hierarchical interactions” [PDF].

2011 – Dr. Roderick Little, University of Michigan

Dr. Roderick J.A. Little, PhD

Dr. Roderick J.A. Little, PhD

Dr. Roderick Little, winner of the 2011 Greenberg Distinguished Lecturer Award, presented talks on May 12 and 13 as part of the 2011 Bernard Greenberg Distinguished Lecture Series. Little is the Richard D. Remington Collegiate Professor of Biostatistics at University of Michigan. Little’s lectures included “Calibrated Bayes: Spanning the Divide Between Frequentist and Bayesian Inference” [PDF] “Some Methods for Handling Missing Values in Outcome Variables” [PDF] “Subsample Ignorable Likelihood Methods for Regression with Missing Values of Covariates – throwing data away can actually pay!” [PDF] and “Measurement Error as Missing Data: The Case of Epidemiologic Assays” [PDF].


2010 – Dr. Marvin Zelen, Harvard University

Dr. Marvin Zelen

Dr. Marvin Zelen

Dr. Marvin Zelen, winner of the 2010 Greenberg Distinguished Lecturer Award, presented talks as part of the 2010 Bernard Greenberg Distinguished Lecture Series. Dr Marvin Zelen is a Lemuel Shattuck Research Professor of Statistical Science in the department of biostatistics at Harvard University. View the presentation slides.


2009 – Niels Keiding, University of Copenhagen

Niels Keiding

Niels Keiding

Niels Keiding, winner of the 2009 Greenberg Distinguished Lecturer Award, presented talks on May 4 and 5 as part of the 2009 Bernard Greenberg Distinguished Lecture Series. Keiding is the director of the Danish Graduate School in Biostatistics at the University of Copenhagen. Keiding’s lectures included “Event history analysis and the cross-section” [PDF] “Time-to-pregnancy: classical designs” [PDF]”Time to pregnancy: current duration data” [PDF] and “Describing episodes of drug treatment from joint observation of a prescription registry and a cross-sectional survey” [PDF].

RELATED PAGES
CONTACT INFORMATION
Contact your Academic Coordinator.
Assistant to Chair: Ty Baker
Looking for someone else?

135 Dauer Drive
3101 McGavran-Greenberg Hall, CB #7420
Chapel Hill, NC 27599-7420
(919) 966-7250