Ontology highlight
ABSTRACT: Background
The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms.Results
We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis.Conclusions
Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.
SUBMITTER: Miller RM
PROVIDER: S-EPMC8892804 | biostudies-literature | 2022 Mar
REPOSITORIES: biostudies-literature
Miller Rachel M RM Jordan Ben T BT Mehlferber Madison M MM Jeffery Erin D ED Chatzipantsiou Christina C Kaur Simi S Millikin Robert J RJ Dai Yunxiang Y Tiberi Simone S Castaldi Peter J PJ Shortreed Michael R MR Luckey Chance John CJ Conesa Ana A Smith Lloyd M LM Deslattes Mays Anne A Sheynkman Gloria M GM
Genome biology 20220303 1
<h4>Background</h4>The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full ...[more]