Spring 2021 Colloquia
All colloquia this semester will be held virtually, via Zoom, and are tentatively scheduled for 11:00am - 12:30pm on Fridays. We will also try to schedule meetings with each speaker for both faculty and students.
- 03/05/21: Xiang Zhou (Biostatistics, University of Michigan)
- 03/12/21: Zhaoran Wang (Industrial Engineering & Management Sciences, Northwestern University)
- 03/19/21: Chiung-Yu Huang (Biostatistics, UCSF)
- 04/02/21: Li Hsu (Public Health Sciences Division, Fred Hutchinson Cancer Research Center)
- 04/09/21: Wenbin Lu (Statistics, NCSU)
- 04/16/21: Qing Lu (Biostatistics, University of Florida)
Title: Statistical Analysis of Spatial Expression Pattern for Spatially Resolved Transcriptomic Studies
Abstract: Identifying genes that display spatial expression patterns in spatially resolved transcriptomic studies is an important first step towards characterizing the spatial transcriptomic landscape of complex tissues. Here, we developed a statistical method, SPARK, for identifying such spatially expressed genes in data generated from various spatially resolved transcriptomic techniques. SPARK directly models spatial count data through the generalized linear spatial models. It relies on newly developed statistical formulas for hypothesis testing, providing effective type I error control and yielding high statistical power. With a computationally efficient algorithm based on penalized quasi-likelihood, SPARK is also scalable to data sets with tens of thousands of genes measured on tens of thousands of samples. In four published spatially resolved transcriptomic data sets, we show that SPARK can be up to ten times more powerful than existing methods, revealing new biology in the data that otherwise cannot be revealed by existing approaches.
Title: The Scale Transformed Power Prior for Use with Historical Data from a Different Outcome Model
Abstract: We develop the scale transformed power prior for settings where historical and current data involve different data types, such as binary and continuous data, respectively. This situation arises often in clinical trials, for example, when historical data involve binary responses and the current data involve time-to-event or some other type of continuous or discrete outcome. The power prior proposed by Ibrahim and Chen (2000) does not address the issue of different data types. Herein, we develop a current type of power prior, which we call the scale transformed power prior (straPP). The straPP is constructed by transforming the power prior for the historical data by rescaling the parameter using a function of the Fisher information matrices for the historical and current data models, thereby shifting the scale of the parameter vector from that of the historical to that of the current data. Examples are presented to motivate the need for a scale transformation and simulation studies are presented to illustrate the performance advantages of the straPP over the power prior and other informative and non-informative priors. A real dataset from a clinical trial undertaken to study a novel transitional care model for stroke survivors is used to illustrate the methodology.
Title: Statistical Learning for High-dimensional Tensor Data
Abstract: The analysis of tensor data has become an active research topic in statistics and data science recently. Many high order datasets arising from a wide range of modern applications, such as genomics, material science, and neuroimaging analysis, requires modeling with high-dimensional tensors. In addition, tensor methods provide unique perspectives and solutions to many high-dimensional problems where the observations are not necessarily tensors. High-dimensional tensor problems generally possess distinct characteristics that pose unprecedented challenges to the statistical community. There is a clear need to develop novel methods, algorithms, and theory to analyze the high-dimensional tensor data.
In this talk, we discuss some recent advances in high-dimensional tensor data analysis through several fundamental topics and their applications in microscopy imaging and neuroimaging. We will also illustrate how we develop new statistically optimal methods, computationally efficient algorithms, and fundamental theories that exploit information from high-dimensional tensor data based on the modern theory of computation, non-convex optimization, applied linear algebra, and high-dimensional statistics.
Title: 2dFDR: A Two-Dimensional False Discovery Rate Control for Powerful Confounder Adjustment in Omics Association Studies
Abstract: One problem that plagues omics association studies is the loss of statistical power when adjusting for confounders and multiple testing. While there is a vast literature on multiple testing methodologies, methods that simultaneously take into account confounders and multiple testing are lacking. To fill this methodological gap, we develop 2dFDR, a linear model-based two-dimensional false discovery rate control procedure (2dFDR), for powerful confounder adjustment under multiple testing. Through extensive simulation studies and a large-scale evaluation on real data, we demonstrate that 2dFDR is substantially more powerful than the traditional procedure while controlling for false positives. In the presence of strong confounding and weak signals, power improvement could be more than 100%.
Title: Cultivating a Career as a Statistical Collaborator in the Pharmaceutical Industry
Abstract: A major element of professional success is to cultivate a culture of collaboration within one’s work and organization. The successful 21st-century statistician needs to develop and refine first-rate quantitative skills through dedication, habitual study, and regular practice. This presentation provides an historical perspective on the eclectic role and responsibilities of statisticians specifically in the pharmaceutical industry. Recommendations are given on how statisticians there can be effective collaborators. Topics covered include the importance of finding mentor, being open and aware of professional opportunities, and developing a tolerance for change.
Title: Bidimensional Linked Matrix Decomposition for Pan-Omics Pan-Cancer Analysis
Abstract: Several recent methods address the integrative dimension reduction and decomposition of linked high‐content data matrices. Typically, these methods consider one dimension, rows or columns, that is shared among the matrices. This shared dimension may represent common features measured for different sample sets (horizontal integration) or a common sample set with features from different platforms (vertical integration). This is limiting for data that take the form of bidimensionally linked matrices, e.g., multiple molecular omics platforms measured for multiple sample cohorts, which are increasingly common in biomedical studies. We propose a flexible approach to the simultaneous factorization and decomposition of variation across bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., sample cohorts). Our objective function extends nuclear norm penalization, is motivated by random matrix theory, and can be shown to give the mode of a Bayesian posterior distribution. We apply the method to pan-omics pan-cancer data from The Cancer Genome Atlas (TCGA), integrating data from 4 different omics platforms and 29 different cancer types.
Title: Data Integration Via Analysis of Subspaces (DIVAS)
Abstract: A major challenge in the age of Big Data is the integration of disparate data types into a data analysis. That is tackled here in the context of data blocks measured on a common set of experimental cases. This data structure motivates the simultaneous exploration of the joint and individual variation within each data block. DIVAS improves earlier methods using a novel random direction approach to statistical inference, and by treating partially shared blocks. Usefulness is illustrated using mortality, cancer and neuroimaging data sets.