Colloquia | Department of Statistics

Spring 2023 Colloquia

Upcoming Colloquia:

Friday, February 10: Lijia Wang (University of Southern Califorina) - 10:00 a.m. in 214 Duxbury Hall
Friday, February 24: Catherine Calder (University of Texas at Austin) - 11:00 a.m. via Zoom
Friday, March 10: Yi Li (University of Michigan) - 10:00 a.m. in 214 Duxbury Hall
Friday, March 24: Heping Zhang (Yale University) - 10:00 a.m. in 214 Duxbury Hall
Friday, March 31: Linglong Kong (University of Alberta) - 10:00 a.m. in 214 Duxbury Hall
Friday, April 7: Hua Liang (George Washington University) - 10:00 a.m. in 214 Duxbury Hall
Friday, April 14: Bo Li (University of Illinois Urbana-Champaign) - 10:00 a.m. via Zoom
Friday, April 21: Timothy D. Johnson (University of Michigan) - 10:00 a.m. in 214 Duxbury Hall

Previous Colloquia:

Friday, February 10: Lijia Wang (University of Southern Califorina)

10:00 a.m. in 214 Duxbury Hall

Title: Local Perspectives in Networks: Partial Information in Latent Space Models and Local Community Detection

Abstract: In network analysis, we usually use the entire network to estimate various global properties of the network. However, in real networks, two problems arise: 1) people frequently make decisions based on their local views, and 2) the interested community is a tiny part of the global network. We introduce partial information in latent space network models to address the first problem. Latent space models are powerful tools for network data modeling. Prior research commonly assumed that the en- tire network or a random fraction (noise version) of the network could be observed. As individuals’ understanding of the global structure of social networks is limited to their local viewpoint, we consider the existing individual-centered partial information framework that uses knowledge depth to characterize an individual’s partial knowledge of the network. For the partial adjacency matrix raised from the framework, we propose a projected gradient descent method to approximate the parameters in the latent space models. We establish the statistical rates of convergence and further analyze the influence of neighborhood structure on an individual’s learning rate for the global latent positions. For the second problem, we introduce the local clustering technique involving adjusted personalized PageRank with conductance for community size selection and provide theoretical guarantees under the degree-corrected stochastic block model. By applying the procedure to the statistical citation network data, we can identify the most relevant community in statistics, given an external research topic.

Wednesday, February 8: Samuel Baugh (University of California, Los Angeles)

10:00 a.m. in 214 Duxbury Hall

Title: Bayesian Hierarchical Modeling for Inferring the Causal Relationship Between Human Activities and Climate Change Impacts

Abstract: While the impacts of heat waves, droughts, and floods have been increasing along with rising greenhouse gas concentrations, the complex structure of natural variability in the climate system makes it challenging to precisely quantify the extent to which human activities are responsible for observed changes. The statistical methods used by high-profile scientific bodies to address this connection have been observed in recent findings to underestimate the magnitude of variability, resulting in potentially misleading over-confidence. To address this issue, I propose a physically-informed basis function parameterization of the covariance structure within a regularized Bayesian selection method to avoid over-fitting the limited amount of data and to propagate the estimation uncertainty to the final inference. When evaluated on statistically and dynamically simulated data, this method achieves lower RMSE scores and better-calibrated posterior coverage rates than methods that rely on potentially uncertain principal components. Incorporating the physically-informed basis representation into a mixture model allows for the error in the dynamical climate simulations informing the natural variability component to be assessed and accounted for in the inference procedure. Motivated by the need for policymakers and the public at large to understand the extent of human responsibility for climate impacts at specific locations, ongoing work funded aims to leverage the global covariance structure to provide robust quantification of causal connections at fine spatial scales. Longer-term extensions include the use of deep learning techniques to understand more complex distributions and non-linear causal relationships within a Bayesian framework.

Monday, February 6: Biao Cai (University of Miami)

10:00 a.m. in Chemical Sciences Laboratories Auditorium (CSL 1003)

Title: Latent Network Structure Learning from High Dimensional Multivariate Point Processes

Abstract: Learning the latent network structure from large scale multivariate point process data is an important task in a wide range of scientific and business applications. For instance, we might wish to estimate the neuronal functional connectivity network based on spiking (or firing) times recorded from a collection of neurons. To characterize the complex processes underlying the observed point patterns, we propose a new and flexible class of non-stationary Hawkes processes that allow both excitatory and inhibitory effects. We estimate the latent network structure using a scalable sparse least squares estimation approach. Using a novel thinning representation, we establish concentration inequalities for the first and second order statistics of the proposed Hawkes process. Such theoretical results enable us to establish the non-asymptotic error bound and the selection consistency of the estimated parameters. Furthermore, we describe a penalized least squares based statistic for testing if the background intensity is constant in time. We apply our proposed method to a neurophysiological dataset that studies working memory.

Wednesday, February 1: Ying Ma (University of Michigan)

10:00 a.m. in 214 Duxbury Hall

Title: Statistical and Computational Methods for High-Dimensional Genomics Data

Abstract: Spatial transcriptomics technologies have enabled gene expression profiling on complex tissues with spatial localization information. The majority of these technologies, however, effectively measure the average gene expression from a mixture of cells of potentially heterogeneous cell types on each tissue location. Here, I develop a deconvolution method, CARD, that combines cell-type-specific expression information from single-cell RNA sequencing (scRNA-seq) with correlation in cell-type composition across tissue locations. Modeling spatial correlation allows us to borrow the cell-type composition information across locations, improving accuracy of deconvolution even with a mismatched scRNA-seq reference. CARD can also impute cell-type compositions and gene expression levels at unmeasured tissue locations to enable the construction of a refined spatial tissue map with a resolution arbitrarily higher than that measured in the original study and can perform deconvolution without a scRNA-seq reference. In a real data application on the human pancreatic ductal adenocarcinoma (PDAC) dataset, CARD identified multiple cell types and molecular markers with distinct spatial localization that define the progression, heterogeneity, and compartmentalization of pancreatic cancer. In addition, if time allows, I will also discuss my other methodological work on integrative differential expression and gene set enrichment analysis in scRNA-seq studies, integrative reference-informed tissue segmentation in SRT studies, and collaborative work on polygenic risk scores for common health-related exposure traits in the Michigan Genomics Initiative (MGI) cohort.

Monday, January 30: Ying Zhou (University of Toronto)

10:00 a.m. in Chemical Sciences Laboratories Auditorium (CSL 1003)

Title: The Promises of Parallel Outcomes

Abstract: A key challenge in causal inference from observational studies is the identification and estimation of causal effects in the presence of unmeasured confounding. In this talk, I will introduce a novel approach for causal inference that leverages information in multiple outcomes to deal with unmeasured confounding. The key assumption in this approach is conditional independence among multiple outcomes. In contrast to existing proposals in the literature, the roles of multiple outcomes in the key identification assumption are symmetric, hence the name parallel outcomes. I will show nonparametric identifiability with at least three parallel outcomes and provide parametric estimation tools under a set of linear structural equation models. The method is applied to a data set from Alzheimer's Disease Neuroimaging Initiative to study the causal effects of tau protein level on regional brain atrophies.

Wednesday, January 25: Daiwei (David) Zhang (University of Michigan)

10:00 a.m. in 214 Duxbury Hall

Title: Inference of Causal Networks Using Bi-Directional Mendelian Randomization and Network Deconvolution with GWAS Summary Data

Abstract: Inferring causal relationships among potential risk factors and diseases from observational data is both important and challenging, e.g. due to hidden confounding. Emerging as a powerful tool, Mendelian randomization (MR) has been increasingly applied for causal inference with observational data by using genetic variants as instrumental variables (IVs), thanks to the recent availability of GWAS summary data. However, the current practice of MR has been largely restricted to investigating the total causal effect between two traits, while it would be more useful to infer the direct causal effect between any two of many traits (by accounting for indirect or mediating effects through other traits). In this talk, we will introduce a two-step framework for causal network inference. In the first step, we propose a robust bi-directional Mendelian Randomization method that accommodates overlapping samples in GWAS data to infer the graph of total causal effects. In the second step, we convert the total causal effect to the direct causal effect with a modified network deconvolution method. We will present an application of the proposed method to 17 large-scale GWAS summary datasets to infer the causal networks among 11 common cardiometabolic risk factors, 4 cardiometabolic diseases (coronary artery disease, stroke, type 2 diabetes, atrial fibrillation), Alzheimer's disease and asthma. If time allows, we will also touch on a multivariable MR method for direct causal effect inference.

Monday, January 23: Tian Gu (University of Michigan)

10:00 a.m. in Chemical Sciences Laboratories Auditorium (CSL 1003)

Title: Targeting underrepresented populations in precision medicine: Multi-source data integration via transfer learning

Abstract: The increasing numbers of large-scale biobanks and institutional data networks have brought unique opportunities to link patients’ genomics, electronic health records, and survey data for studying complex human diseases, especially to address the diminished model performance in minority and disadvantaged groups due to their low representation in biomedical studies. In this talk, I will introduce two transfer learning methods to improve statistical learning in underrepresented populations by integrating data from multiple biobanks, different ancestries, and related outcomes. These methods protect data privacy by learning from pre-trained models in external data sources without sharing patient-level data and account for potential data heterogeneity. We provide theoretical guarantees for the model performance and insights regarding when the external model can be helpful to the target model. We demonstrate the superiority of our methods compared to benchmark methods, with examples using data from the UK biobank and the electronic Medical Records and Genomics (eMERGE) Network.

Friday, January 20: Rishabh Dudeja (Columbia University)

10:00 a.m. in 214 Duxbury Hall

Title: Universality in High-Dimensional Statistics

Abstract: It has been observed that the statistical properties of many high-dimensional regression problems empirically exhibit universality with respect to the underlying design matrices. Specifically, design matrices with very different constructions seem to lead to identical estimation performance if they share the same spectrum and have generic singular vectors. This general universality phenomenon appears in numerous applications: in random optimization problems arising in statistical physics, in statistical inference problems like sparse regression or compressed sensing, and in the performance of sketching algorithms in randomized numerical linear algebra. In the first part of this talk, I will show how these empirical observations of universality can be exploited to design and analyze information-theoretically optimal spectral estimators for the phase retrieval problem: a non-linear regression problem that arises in imaging applications like X-ray crystallography. In the second part of the talk, I will describe recent progress toward a mathematical understanding of this universality phenomenon. In the context of regularized linear regression with strongly convex penalties, I will describe nearly deterministic conditions on the design matrix under which this universality phenomenon occurs. I will show that these conditions can be easily verified for highly structured and practically relevant design matrices constructed with limited randomness, like randomly subsampled Hadamard transforms and signed incoherent tight frames. I will conclude this talk by describing other exciting applications where we can potentially exploit similar universality phenomena to design and analyze improved estimators under assumptions justified by practical considerations rather than mathematical convenience.

Wednesday, January 18: Zhaotong Lin (University of Michigan)

10:00 a.m. in 214 Duxbury Hall

Title: Inference of causal networks using bi-directional Mendelian randomization and network deconvolution with GWAS summary data

Abstract: Inferring causal relationships among potential risk factors and diseases from observational data is both important and challenging, e.g. due to hidden confounding. Emerging as a powerful tool, Mendelian randomization (MR) has been increasingly applied for causal inference with observational data by using genetic variants as instrumental variables (IVs), thanks to the recent availability of GWAS summary data. However, the current practice of MR has been largely restricted to investigating the total causal effect between two traits, while it would be more useful to infer the direct causal effect between any two of many traits (by accounting for indirect or mediating effects through other traits). In this talk, we will introduce a two-step framework for causal network inference. In the first step, we propose a robust bi-directional Mendelian Randomization method that accommodates overlapping samples in GWAS data to infer the graph of total causal effects. In the second step, we convert the total causal effect to the direct causal effect with a modified network deconvolution method. We will present an application of the proposed method to 17 large-scale GWAS summary datasets to infer the causal networks of both total and direct effects among 11 common cardiometabolic risk factors, 4 cardiometabolic diseases (coronary artery disease, stroke, type 2 diabetes, atrial fibrillation), Alzheimer's disease and asthma. If time allows, we will also touch on a multivariable MR method for direct causal effect inference.

Wednesday, January 11: Xiulin Xie (University of Florida)

10:00 a.m. in 214 Duxbury Hall

Title: Transparent Sequential Learning: A Powerful Tool for Monitoring Sequential Processes

Abstract: Sequential process monitoring has received considerable attention due to its broad applications, including manufacturing industry, spatial-temporal disease surveillance, environmental monitoring and many more. To sequentially monitor a process, a major statistical tool is statistical process control (SPC) chart, whose major goal is to check whether a process has a significant distributional shift over time. However, traditional SPC charts are developed mainly for monitoring production lines in the manufacturing industry under the assumptions that process observations at different observation times are independent and identically distributed with a parametric (e.g., normal) distribution when the process is stable. However, these assumptions are rarely valid in applications. In this talk, we introduce a new learning framework, called “Transparent Sequential Learning”, for monitoring sequential processes. The new method can properly accommodate the longitudinal pattern of the process under monitoring and serial correlation in the observed data. It also is not limited to parametric distributional families. These properties make it an effective and powerful tool for monitoring sequential processes.

Previous Colloquia

Fall 2022 Colloquia

Spring 2022 Colloquia

Fall 2021 Colloquia

Spring 2021 Colloquia

Fall 2020 Colloquia

Spring 2020 Colloquia

Fall 2019 Colloquia

Spring 2019 Colloquia

Fall 2018 Colloquia

Spring 2018 Colloquia

Fall 2017 Colloquia

Spring 2016 Colloquia Part II

Fall 2016 Colloquia

Spring 2016 Colloquia

Legacy Sort

Legacy Priority