Bernard G. Greenberg Distinguished Lecture Series
The Bernard G. Greenberg
Distinguished Lecture Series
The Bernard G. Greenberg Distinguished Lecture Series honors the first chair of the UNC Biostatistics Department, Dr. Bernard G. Greenberg, who later served with distinction as dean of the School of Public Health from 1972 to 1982.
2024 Greenberg Lecture Series – May 20 and 21
Daniela M. Witten, PhD, Professor of Statistics and Biostatistics at the University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics
2024 Greenberg Lecture Series (PDF).
Lecture #1: “Data Thinning and its Applications”
We propose data thinning, a new approach for splitting an observation from a known distributional family with unknown parameter(s) into two or more independent parts that sum to yield the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general and can be applied to a broad class of distributions within the natural exponential family, including the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. Furthermore, we generalize data thinning to enable splitting an observation into two or more parts that can be combined to yield the original observation using an operation other than addition; this enables the application of data thinning far beyond the natural exponential family. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the “usual” approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data, in a setting where sample splitting is not applicable. This is joint work with Anna Neufeld (Fred Hutch), Ameer Dharamshi (University of Washington), Lucy Gao (University of British Columbia), and Jacob Bien (University of Southern California).
Lecture #2: “Selective Inference for Clustering”
In contemporary applications, it is common to collect very large data sets with the vaguely-defined goal of hypothesis generation. Once a dataset is used to generate a hypothesis, we might wish to test that hypothesis on the same set of data. However, this type of “double dipping” violates a cardinal rule of statistical hypothesis testing: namely, that we must decide what hypothesis to test before looking at the data. When this rule is violated, then standard statistical hypothesis tests (such as t-tests and z-tests) fail to control the selective Type 1 error — that is, the probability of rejecting the null hypothesis, provided that the null hypothesis holds, and given that we decided to test this null hypothesis. While double dipping is pervasive across many application areas, in this talk Dr. Witten will focus on the analysis of single-cell RNA-sequencing data, in which it is common to cluster a set of observations — corresponding to cells — and then to test for “statistical significance” of the resulting clusters. While of course a naive double-dipping approach to this task is not valid, she will show that we can apply the framework of conditional selective inference to conduct valid inference in this setting. In particular, she will consider settings in which the clusters are estimated via hierarchical or k-means clustering. This work was conducted in collaboration with UW PhD students Lucy Gao (Biostat PhD 2020) and Yiqun Chen (Biostat PhD 2022), as well as Jacob Bien (USC).
Lecture #3: “Inference After F-screening in Linear Regression”
It is well-known that researchers tend to publish only positive findings. The consequence of this reality, known as the “file drawer problem”, is that the published literature is rife with “findings” for which the statistical evidence is vastly overstated. Dr. Witten will consider an idealized case of the file drawer problem, in which a researcher performs “F-screening” to the output of a multiple linear regression model: that is, they decide whether to publish the model’s output based on whether the overall F-test yields a p-value below a specified threshold, such as 0.05. It is clear that among the datasets that survive F-screening, the p-values for the individual regression coefficients will not follow a Uniform(0,1) distribution, even when the null hypothesis holds. In this talk, she will propose a solution to the F-screening problem using the conditional selective inference framework. In particular, she will show that we can conduct inference on the coefficients in a multiple linear regression model conditional on the fact that the model output survived F-screening. This will enable selective Type 1 error control. Remarkably, this correction for F-screening does not require access to the raw data used to fit the model, nor even to the sufficient statistics in the regression model: we can conduct the correction using only the regression output of a standard statistical software package, e.g., summary(lm(y~x)) in R. This is joint work with Olivia McGough (UW Stat PhD ongoing) and Dan Kessler (UW, soon to be faculty in UNC STOR and SDSS).
2022 Lecture Videos
Probability theory is false. Weak priors give strong and implausible posteriors. If you could give me your subjective prior I wouldn't need Bayesian inference. The best predictive model averaging is non-Bayesian. There will always be a need to improve our models. Nonetheless, we still find Bayesian inference to be useful. How can we make the best use of Bayesian methods in light of all their flaws?
The three central challenges of statistics are generalizing from sample to population, generalizing from control to treated group, and generalizing from observed data to underlying constructs of interest. These are associated with separate problems of sampling, causal inference, and measurement, but in real decision problems all three issues arise. We discuss the way in which varying treatment effects (interactions) bring sampling concerns into causal inference, along with the real challenges of applying this insight into real problems. We consider applications in medical studies, A/B testing, social science research, and policy analysis.
Statistical modeling has three steps: model building, inference, and model checking, followed by possible improvements to the model and new data that allow the cycle to continue. But we have recently become aware of many other steps of statistical workflow, including simulated-data experimentation, model exploration and understanding, and visualizing models in relation to each other. Tools such as data graphics, sensitivity analysis, and predictive model evaluation can be used within the context of a topology of models, so that data analysis is a process akin to scientific exploration. We discuss these ideas of dynamic workflow along with the seemingly opposed idea that statistics is the science of defaults. We need to expand our idea of what data analysis is, in order to make the best use of all the new techniques being developed in statistical modeling and computation.
Past Speakers
2022 – Andrew Gelman, PhD, Columbia University
Dr. Andrew Gelman is the winner of the 2022 Greenberg Distinguished Lecturer Award, and presented talks as part of the 2022 Bernard G. Greenberg Distinguished Lecture Series. Dr. Gelman is a professor of statistics and political science at Columbia University. He has received the Outstanding Statistical Application award three times from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. His books include Bayesian Data Analysis (with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Don Rubin), Teaching Statistics: A Bag of Tricks (with Deb Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, and Jeronimo Cortina), A Quantitative Tour of the Social Sciences (co-edited with Jeronimo Cortina), and Regression and Other Stories (with Jennifer Hill and Aki Vehtari).
2021- Dr. Xihong Lin, Harvard University
Dr. Xihong Lin, winner of the 2021 Greenberg Distinguished Lecturer Award, will present talks as part of the 2021 Bernard G. Greenberg Distinguished Lecture Series. Lin is a Professor and former Chair of the Department of Biostatistics, Coordinating Director of the Program in Quantitative Genomics at the Harvard T. H. Chan School of Public Health, and Professor of the Department of Statistics at the Faculty of Arts and Sciences of Harvard University, and Associate Member of the Broad Institute of Harvard and MIT.
2019- Dr. Nicholas Jewell, University of California Berkeley
Dr. Nicholas Jewell, winner of the 2019 Greenberg Distinguished Lecturer Award, presented talks as part of the 2019 Bernard G. Greenberg Distinguished Lecture Series. Jewell is a Professor of Biostatistics and Statistics from the University of California Berkley. He received his PhD in mathematics from the University of Edinburgh in 1976.
2018- Dr. Jamie Robins, Harvard University
Dr. Jamie Robins, winner of the 2018 Greenberg Distinguished Lecturer Award, presented talks on May 14 and 15 as part of the 2018 Bernard G. Greenberg Distinguished Lecture Series. Robins is a Mitchell L. and Robin LaFoley Dong Professor of Epidemiology at Harvard University. He received his MD from the Washington University School of Medicine in 1976.
2017- Dr. Robert E. Kass, Carnegie Mellon
Dr. Robert E. Kass, winner of the 2017 Greenberg Distinguished Lecturer Award, presented talks on May 15 and 16 as part of the 2017 Bernard G. Greenberg Distinguished Lecture Series. Kass is a Maurice Falk Professor of Statistics and Computational Neuroscience at Carnegie Mellon University. He received his doctorate in statistics from the University of Chicago and has been been on the faculty of the Department of Statistics at Carnegie Mellon since 1981.
2016 – Dr. James O. Berger, Duke University
James O. Berger, PhD, winner of the 2016 Greenberg Distinguished Lecturer Award, presented three talks on May 12 and 13 as part of the 2016 Bernard G. Greenberg Distinguished Lecture Series. Berger’s lectures included “The Use of Rejection Odds and Rejection Ratios in Testing Hypotheses,” [PDF] “The Progress on the Foundations of Bayesian-Frequentist Unification” [PDF] and “Bayesian Multiplicity Control” [PDF].
2015 – Dr. Susan A. Murphy, University of Michigan
Dr. Susan A. Murphy, winner of the 2015 Greenberg Distinguished Lecturer Award, presented talks on May 11 and 12 as part of the 2015 Bernard G. Greenberg Distinguished Lecture Series. Dr. Murphy is a H.E. Robbins Distinguished University Professor of statistics and professor of psychiatry at the University of Michigan. She received her doctorate in statistics from UNC-Chapel Hill and was named a John D. and Catherine T. MacArthur Foundation Fellow for her work in designing the Sequential Multiple Assignment Randomized Trial, or SMART.
2014 – Dr. Jianqing Fan, Princeton University
2013 – Dr. Trevor Hastie, Stanford University
Dr. Trevor Hastie, winner of the 2013 Greenberg Distinguished Lecturer Award, presented talks on May 8 and 9 as part of the 2013 Bernard G. Greenberg Distinguished Lecture Series. Hastie is a professor of statistics and professor of health, research and policy at Stanford University. Hastie’s lectures included “Sparse Linear Models” [PDF] “Matrix Completion and Large Scale SVD Computation” [PDF] and “Graphical Model Selection” [PDF].