Spring 2019 Colloquia
- Dr. Joshua Cape - January 18th at 10am in 214 Duxbury Hall
- Dr. Zhengling Qi - January 25th at 10am in 214 Duxbury Hall
- Dr. Fei Gao - January 29th at 2pm in 214 Duxbury Hall
- Dr. Chao Huang - February 1st at 10am in 214 Duxbury Hall
- Dr. Andres Felipe Barrientos - February 5th at 2pm in 214 Duxbury Hall
- Dr. Jingshu Wang - February 8th at 10am in 214 Duxbury Hall
- Dr. Abhishek Chakrabortty - February 12th at 2pm in 214 Duxbury Hall
- Dr. Hai Shu - February 15th at 10am in 214 Duxbury Hall
- Dr. Taps Maiti - February 22nd at 10am in 214 Duxbury Hall
- Dr. Ian Dryden - March 8th at 10am in 214 Duxbury Hall
- Dr. Eric Chi - March 15th at 10am in 214 Duxbury Hall
- Dr. Ming Yuan - March 29th at 10am in 214 Duxbury Hall
204 Duxbury Hall, 2:00pm
Title: On Statistical Learning for Individualized Decision Making with Complex Data
Abstract: In this talk, I will present my research on individualized decision making with modern complex data. In precision medicine, individualizing the treatment decision rule can capture patients' heterogeneous response towards treatment. In finance, individualizing the investment decision rule can improve individual's financial well-being. In a ride-sharing company, individualizing the order dispatching strategy can increase its revenue and customer satisfaction. With the fast development of new technology, modern datasets often consist of massive observations, high-dimensional covariates and are characterized by some degree of heterogeneity.
The talk is divided into two parts. In the first part, I will focus on the data heterogeneity and introduce a new maximin-projection learning for recommending an overall individualized decision rule based on the observed data from different populations with heterogeneity in optimal individualized decision making. In the second part, I will briefly summarize the statistical learning methods I've developed for individualized decision making with complex data and discuss my future research directions.
Title: Set-based Inference for Integrative Analysis of Genetic Compendiums
Abstract: The increasing popularity of biobanks and other genetic compendiums has introduced exciting opportunities to extract knowledge using datasets combining information from a variety of genetic, genomic, environmental, and clinical sources. To manage the large number of association tests that may be performed with such data, set-based inference strategies have emerged as a popular alternative to testing individual features. Set-based tests enjoy natural advantages including a decreased multiplicity burden and superior interpretations in certain settings. However, existing methods are often challenged to provide adequate power due to three issues in particular: sparse signals, weak effect sizes, and features exhibiting a diverse variety of correlation structures. Motivated by these challenges, we propose the Generalized Berk-Jones (GBJ) statistic, a set-based association test designed to detect rare and weak signals while explicitly accounting for arbitrary correlation patterns. Consistent with its formulation as a generalization of the Berk-Jones statistic, GBJ demonstrates improved power compared to other set-based tests over a variety of moderately sparse settings. We apply GBJ to perform inference on sets of genotypes and sets of phenotypes, and we also discuss strategies for situations where the global null is not the null hypothesis of interest.
Title: Learning High Dimensional Time Series Data
Abstract: High-dimensional temporal dependent data arise in a wide range of disciplines. Despite its widespread applicability, however, methods and theoretical tools to analyze such data remain poorly investigated. My talk will mainly focus on three problems. The first part aims at prediction for high dimensional linear processes. Then I will introduce a new framework for high dimensional non-parametric additive Vector Autoregressive (VAR) models. Methodology and computationally efficient algorithms are developed under this new framework. Finally, I will present theoretical tools, optimal Bernstein-type inequalities for suprema of empirical processes with dependent data, equipped with which we can also establish a statistical learning theory for dependent data.
Title: Statistical Analysis and Spectral Methods for Signal-Plus-Noise Matrix Models
Abstract: Estimating eigenvectors and principal subspaces is of fundamental importance for numerous problems in statistics, data science, and network analysis, including covariance matrix estimation, principal component analysis, and community detection. For each of these problems, we obtain foundational results that precisely quantify the local (e.g., entrywise) behavior of sample eigenvectors within the context of a unified signal-plus-noise matrix framework. Our methods and results collectively address eigenvector consistency and asymptotic normality, decompositions of high-dimensional matrices, Procrustes analysis, deterministic perturbation bounds, and real-data spectral clustering applications in connectomics.