Research Stories

Integrating Multidimensional Data for Clustering Analysis with Applications to Cancer Patient Data

Professor Seyoung Park in the department of Statistics recently proposed the novel clustering method by integrating multi-dimensional data for clustering analysis.

Statistics
Prof. PARK, SEYOUNG

  • Integrating Multidimensional Data for Clustering Analysis with Applications to Cancer Patient Data
  • Integrating Multidimensional Data for Clustering Analysis with Applications to Cancer Patient Data
Scroll Down

Professor Seyoung Park in the department of Statistics recently proposed the novel clustering method by integrating multi-dimensional data for clustering analysis. Advances in high-throughput genomic technologies coupled with large-scale study projects have generated rich resources of diverse types of omics data to better understand disease etiology and treatment responses. Clustering patients into subtypes with similar disease etiologies and/or treatment responses using multiple omics data has the potential to improve the precision of clustering than using a single type of omics data. However, in practice patient clustering is still mostly based on a single omics data type or ad hoc integration of clustering results from each data type, leading to potential loss of information.

By treating each omic data type as a different informative representation from patients, this research proposes a novel multi-view spectral clustering framework to integrate different omics data types measured from the same subject. The proposed method learns the weight of each data type as well as a similarity measure between patients via a non-convex optimization framework. When the proposed method is applied to the TCGA data, the patient clusters inferred by the proposed method show more significant differences in survival times between clusters than those between clusters inferred from existing clustering methods.

Professor Park said “The main contribution of this research is to conduct clustering analysis using multiple high-dimensional data by considering the heterogeneity of different data and learning importance of data. We expect to apply the same idea to the different statistical frameworks using multiple high-dimensional data. “

This research is published in the “Journal of the American Statistical Association”, which is the top journal in Statistics.

※ Title: Integrating multidimensional data for clustering analysis with applications to cancer patient data

※ Source: https://doi.org/10.1080/01621459.2020.1730853

COPYRIGHT ⓒ 2017 SUNGKYUNKWAN UNIVERSITY ALL RIGHTS RESERVED. Contact us