Sanjeena Dang.
With increasing size and complexity of biological datasets, there is a growing need for developing statistical models for large, complex datasets. Modelling such datasets requires flexible statistical models that can capture the various characteristics of the data and the underlying data-generating mechanisms. My research has coalesced around developing efficient and scalable statistical models for clustering various types of biological data. Cluster analysis is an unsupervised approach (no group labels are known a priori for any observation) and, therefore more challenging than a supervised classification approach, in which some group labels for observations are known a priori. Model-based clustering identifies and learns underlying groups using flexible statistical models in data where no group information is known. In this talk, I will give an overview of cluster analysis, statistical models my research group has developed for various datatypes, and a discussion of some challenges and what lies ahead.
View/Download the presentation from here.
Keywords: Large data, flexible statistical models, data-generating mechanisms, cluster analysis, unsupervised approach
Author: Sanjeena Dang (Subedi), PhD
Canada Research Chair in Data Science and Analytics
Associate Professor, School of Mathematics and Statistics, Carleton University, Canada