Fall 2020 Colloquia
- 10/16/20: Will Fithian (Statistics, UC Berkeley); Time: 11:00 AM - 12:30 PM via Zoom
- 10/30/20: Nancy Reid (Statistics, University of Toronto): Hollander Lecture; Time: 11:00 AM - 12:30 PM via Zoom - Click here to learn more
- 11/06/20: Hongyu Zhao (Statistics and Data Science, Biostatistics, and Genetics, Yale University); Time: 11:00 AM - 12:30 PM via Zoom
- 11/13/20: Xihong Lin (Biostatistics, Harvard T.H. Chan School of Public Health); Time: 10:00 AM - 11:30 AM via Zoom
- 11/20/20: Daniel Schaid (Biostatistics, Mayo Clinic); Time: 10:00 AM - 11:30 AM via Zoom
Title: Nonparametric Estimation of Distributions and Diagnostic Accuracy Based on Group-Tested Results with Differential Misclassification
Abstract: This talk concerns the problem of estimating a continuous distribution in a diseased or nondiseased population when only group-based test results on the disease status are available. The problem is challenging in that individual disease statuses are not observed and testing results are often subject to misclassification, with further complication that the misclassification may be differential as the group size and the number of the diseased individuals in the group vary. We propose a method to construct nonparametric estimation of the distribution and obtain its asymptotic properties.
The performance of the distribution estimator is evaluated under various design considerations concerning group sizes and classification errors. The method is exemplified with data from the National Health and Nutrition Examination Survey (NHANES) study to estimate the distribution and diagnostic accuracy of C-reactive protein in blood samples in predicting chlamydia incidence.
Title: Integrative Methods for Biobank-Scale Studies
Abstract: With recent breakthroughs in cost effective genotyping has allowed the creation of ultra-large biobanks that link genetic data of millions of patients with a multitude of phenotypic measurements (usually curated from the electronic health records). The drastic increase in the number of individuals routinely analyzed in genomic studies has enabled novel statistical methods that employ fewer assumptions in estimating key parameters such as heritability explained by genomic variants. I will present methods showcasing how SNP-heritability can be estimated accurately and efficiently, both at genome-wide scale as well at particular regions in the genome.
Title: Efficient Integration of EHR and Other Healthcare Datasets
Abstract: The growth of availability and variety of healthcare data sources has provided unique opportunities for data integration and evidence synthesis, which can potentially accelerate knowledge discovery and enable better clinical decision making. However, many practical and technical challenges, such as data privacy, high-dimensionality and heterogeneity across different datasets, remain to be addressed. In this talk, I will introduce several methods for effective and efficient integration of electronic health records (EHR) and other healthcare datasets. Specifically, we develop communication-efficient distributed algorithms for jointly analyzing multiple datasets without the need of sharing patient-level data. Our algorithms do not require iterative communication across sites, and are able to account for heterogeneity across different datasets. We provide theoretical guarantees for the performance of our algorithms, and examples of implementing the algorithms to real-world clinical research networks.
Title: PPA: Principal Parcellation Analysis for Human Brain Connectomes of Multiple Human Traits
Abstract: Human brain parcellation plays a fundamental role in neuroimaging. Standard practice parcellates the brain into Regions Of Interest (ROIs) based roughly on anatomical function. However, many different schemes are available involving different numbers and locations of ROIs, and choosing which scheme to use in practice is challenging. We propose a novel tractography-based Principal Parcellation Analysis (PPA), which conducts the clustering analysis on the fibers' ending points to redefine parcellation and eventually predict human traits. Specifically, our PPA eliminates the need to choose ROIs manually, reduces subjectivity and leads to a substantially different representation of the connectome. We illustrate the proposed approach through applications to HCP data and show that PPA connectomes are able to improve power in predicting a variety of human traits, while dramatically improving parsimony, compared to anatomical parcellation based connectomes.
Title: Scalable and Consistent Estimation of Random Graph Models With Dependent Edge Variables and Parameter Vectors of Increasing Dimension Using the Pseudolikelihood
Abstract: An important question in statistical network analysis is how to construct models of dependent network data without sacrificing computational scalability and statistical guarantees. In this talk, we demonstrate that scalable estimation of random graph models with dependent edges and parameter vectors of increasing dimension is possible, using maximum pseudolikelihood estimators. On the statistical side, we establish the first consistency results and convergence rates for maximum pseudolikelihood estimators in scenarios where a single observation of dependent random variables is available and the number of parameters increases without bound. The main results make weak assumptions and may be of independent interest. These results help establish the first consistency results and convergence rates for maximum pseudolikelihood estimators of random graph models with dependent edges and parameter vectors of increasing dimension, under weak dependence and smoothness conditions. We showcase consistency results and convergence rates by using generalized β-models with dependent edges and parameter vectors of increasing dimension, in dense- and sparse-graph settings. The talk concludes with a discussion of potential future work and extensions. The primary results presented in this talk assume a complete observation of the random graph is observed. We will discuss how the theoretical developments presented in this talk offer avenues to advance the challenging topic of subgraph-to-graph estimation and inference, which considers estimating a random graph model based only on an observed subgraph.
Title: Brain Connectivity Alternation Detection via Matrix-variate Differential Network Model
Abstract: Brain functional connectivity reveals the synchronization of brain systems through correlations in neurophysiological measures of brain activities. Growing evidence now suggests that the brain connectivity network experiences alterations with the presence of numerous neurological disorders, thus differential brain network analysis may provide new insights into disease pathologies. The data from neurophysiological measurement are often multi-dimensional and in a matrix form, posing a challenge in brain connectivity analysis. Existing graphical model estimation methods either assume a vector normal distribution that in essence requires the columns of the matrix data to be independent, or fail to address the estimation of differential networks across different populations. To tackle these issues, we propose an innovative Matrix-Variate Differential Network (MVDN) model. We exploit the D-trace loss function and a Lasso-type penalty to directly estimate the spatial differential partial correlation matrix, and use an ADMM algorithm for the optimization problem. Theoretical and simulation studies demonstrate that MVDN significantly outperforms other state-of-the-art methods in dynamic differential network analysis. We illustrate with a functional connectivity analysis of an Attention Deficit Hyperactivity Disorder (ADHD) dataset. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies.