Dimension Reduction for Big Data Analysis

Department of Statistics Seminar
Statistics

Dimension Reduction for Big Data Analysis

Dan Shen, Assistant Professor, Department of Mathematics and Statistics, University of South Florida
October 2, 2015 - 2:00pm

 

Abstract

High dimensionality has become a common feature of “big data” encountered in many divergent fields, such as imaging and genetic analysis, which provides modern challenges for statistical analysis. To cope with the high dimensionality, dimension reduction becomes necessary.

I first introduce Multiscale Weighted PCA (MWPCA), a new variation of PCA, for imaging analysis. MWPCA introduces two sets of novel weights, including global and local spatial weights, to enable a selective treatment of individual features and incorporation of class label information as well as spatial pattern within imaging data. Simulation studies and real data analysis show that MWPCA outperforms several competing PCA methods.

Second we develop statistical methods for analyzing tree-structured data objects.  This work is motivated by the statistical challenges of analyzing a set of blood artery trees, which is from a study of Magnetic Resonance Angiography (MRA) brain images of a set of 98 human subjects.  We develop an entirely new approach that uses the Dyck path representation, which builds a bridge between the tree space (a non-Euclidean space) and curve space (standard Euclidean space).  That bridge enables the exploitation of the power of functional data analysis to explore statistical properties of tree data sets. 

 

NOTE: Coffee/cookies in Lounge, Room 2726 after the talk.