Research Stories

Developed the AI-based tools for identification of virus-induced phosphorylation sites and 2OM sites in human RNA

- Identification of virus-induced phosphorylation sites using meta-learning approach - Identification of 2’-O-methylation (2OM) sites in human RNA using hybrid deep learning framework Department of Integrative Biotechnology Prof. Balachandran Manavalan Nhat Truong Pham (PhD student)

Integrative Biotechnology
Prof. BALACHANDRAN, MANAVALAN

  • Developed the  AI-based tools for identification of virus-induced phosphorylation sites and 2OM sites in human RNA
  • Developed the  AI-based tools for identification of virus-induced phosphorylation sites and 2OM sites in human RNA
Scroll Down

The CBBL team led by Prof. Balachandran Manavalan at the Department of Integrative Biotechnology, developed two cutting-edge AI-based tools for identification of virus-induced phosphorylation sites and 2OM sites in human RNA. Their innovative approach was published in the top-tier journal Briefings in Bioinformatics (Impact factor 9.5 & JCR = 3). The two methods are as follows:


1. Identification of virus-induced phosphorylation sites using meta-learning approach


The global spread of the coronavirus (SARS-CoV-2) has caused significant concern and challenges to global health. Phosphorylation is a common post-translational modification that affects many vital cellular functions and is closely associated with SARS-CoV-2 infection. Accurate identification of phosphorylation sites could provide more in-depth insight into the processes underlying SARS-CoV-2 infection and help alleviate the continuing COVID-19 crisis. However, currently available computational methods in predicting these sites lack accuracy and effectiveness. My team, in collaboration with departmental colleagues including Prof. Young-Jun Jeon, Prof. Minkyung Song, and Prof. Sukchan Lee, developed the novel MeL-STPhos predictor using a meta-learning approach (Figure 1). Specifically, my two PhD students, Nhat Truong Pham and Le Thi Phan created two cell-specific datasets and a generic one using data from Nature and Cell publications. We built a large-scale baseline model (~400) for each dataset, by exploring 29 feature descriptors and 14 different classifiers. The top-performing model from each descriptor was then combined and re-trained for the final prediction. Interestingly, MeL-STPhos generic model has the capability of identifying phosphorylation caused by other viruses, not only SARS-CoV2. Additionally, one cell-specific model accurately detects Threonine phosphorylation sites, showcasing the necessity of multiple models. MeL-STPhos significantly outperformed the best predictor on both datasets, demonstrating the importance of our systematic approach in exploiting different feature descriptors, classifiers, and meta-learning approach, which is responsible for such improved performance.

Figure 1. Overview of MeL-STPhos framework. This computational framework includes four steps: Dataset preparation, meta-learning approach, identification of optimal features and classifiers, and web server development.



This research was conducted with the support of the Korea Health Technology R&D Project grant through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (HI23C0701); National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2021R1A2C1014338, RS-2023-00217881 and 2021R1C1C1007833) and the result was published in the Briefings in Bioinformatics (https://doi.org/10.1093/bib/bbad433) on December 06, 2023.


2. Identification of 2’-O-methylation sites in human RNA using hybrid deep learning framework


2’-O-methylation (2OM) is the most common post-transcriptional modification of RNA, which plays a crucial role in RNA splicing, RNA stability, and innate immunity. Despite advances in high-throughput detection, the chemical stability of 2OM makes it difficult to detect and map in messenger RNA. While a few bioinformatics tools have made significant advancements in this area, there is still a need for greater accuracy and improvement. My PhD student, Nhat Truong Pham, has developed H2Opred, a novel hybrid learning approach (Figure 2), for accurately identifying 2OM sites. H2Opred incorporated both stacked 1D convolutional neural network (1D-CNN) blocks and stacked attention-based bidirectional gated recurrent unit (Bi-GRU-Att) blocks. 1D-CNN blocks learned effective feature representations from 14 conventional descriptors, while Bi-GRU-Att blocks learned feature representations from five natural language processing-based embeddings extracted from RNA sequences. H2Opred integrated these feature representations to make the final prediction. Moreover, the generic model of H2Opred demonstrated a remarkable performance on both training and testing datasets, significantly outperforming the existing predictor and other four nucleotide specific H2Opred models.



Figure 2. Overview of H2Opred framework. This computational framework includes three steps: Dataset preparation, feature extraction and model construction, and feature fusion and web server development.


This research was conducted with the support of the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2021R1A2C1014338 and 2021R1I1A1A01056363), and Korea Health Technology R&D Project grant through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (HI23C0701) and the result was published online in the Briefings in Bioinformatics (https://doi.org/10.1093/bib/bbad476) on January 04, 2024.


These approaches are not limited in identifying only 2OM or phosphorylation sites. They can also be applied to other research areas, including identification of peptide therapeutic functions and Alzheimer's disease (AD) prediction using gene expression data.



COPYRIGHT ⓒ 2017 SUNGKYUNKWAN UNIVERSITY ALL RIGHTS RESERVED. Contact us