Comparison of proteogenomic strategies for the generation of zebrafish fit-for-purpose protein databases
Ontology highlight
ABSTRACT: High-quality protein databases (DBs) are essential for optimal analysis of mass spectrometry (MS)-based proteomic data since protein identifications fully rely on the sequences of proteins present in these DBs. The generation of custom protein DB using proteogenomics is an effective way to cope with the absence, incompleteness, or inaccuracy of public resources as well as the specificities of the samples under study. In this work, we implemented a proteogenomic pipeline with the aim to build protein DBs for zebrafish using both short- and long-read RNASequencing (RNA-Seq and Iso-Seq, respectively). We evaluated the impact of these genomic technologies on the size and quality of the resulting DBs, as well as their influence on protein identification using MS-based proteomics. Specific protein DBs were produced for different zebrafish samples and tissues, i.e., larva, larval tail, muscle, brain and liver. They were compared to assess the relevance of using sample-specific protein DBs for proteomic analysis, and we determined that the current long-read Iso-Seq approach was more appropriate for this goal. Different strategies for DB curation were evaluated to clean and reduce the size of the DBs. Curation resulted in increased numbers of protein identifications. In summary, our study provides relevant observations and methodological recommendations for the generation of protein DBs using proteogenomics applied and application to the proteomic analysis of zebrafish and other species with gaps in annotation of DBs.
ORGANISM(S): Danio rerio
PROVIDER: GSE217623 | GEO | 2025/11/04
REPOSITORIES: GEO
ACCESS DATA