Deep learning methods for identifying nucleitide modifications from third generation sequencing datasets

Project Details

Project Lead
Sarath Chandra Janga 
Project Manager
Sarath Chandra Janga 
Project Members
Sasank Vemuri, Raja Shekar Varma Kadumuri  
Institution
IUPUI, Department of Biohealth Informatics, IUPUI School of Informatics and Computing  
Discipline
Computer Science (401) 
Subdiscipline
26.0402 Molecular Biology 

Abstract

Innovations in epitranscriptomics have resulted in the identification of more than 160 RNA modifications to date. These developments, together with the recent discovery of writers, readers, and erasers of modifications occurring across a wide range of RNAs and tissue types, have led to a surge in integrative approaches for transcriptome-wide mapping of modifications and protein–RNA interaction profiles of epitranscriptome players. RNA modification maps and crosstalk between them have begun to elucidate the role of modifications as signaling switches, entertaining the notion of an epitranscriptomic code as a driver of the post-transcriptional fate of RNA. Emerging single-molecule sequencing technologies and development of antibodies specific to various RNA modifications could enable charting of transcript-specific epitranscriptomic marks across cell types and their alterations in disease. My lab is interested in developing computational methods to predict the RNA modifications from third generation sequencing technologies and hence GPU resources are being requested to facilitate developing such algorithms.

Intellectual Merit

The problem would enable developing methods for an emerging concept of epitranscriptomic code which proposes that modifications on RNA would contribute to their spatiotemporal regulation in the cell. It would also provide a platform for uncovering RNA modifications at a single molecule resolution for the first time.

Broader Impacts

Training oppurtunities for postdoctoral scientists to undergraduate students on deep learning methods will be provided via this project. In addition, raw datasets from the state of the art sequencing technologies will be provided as part of the training. Tools and databases will be made publicly available.

Scale of Use

I want to run a set of comparisons on 2 K80s or P100s and for each such comparison, I'll need about 5 days in a month for the next 5 months.