Proteomics

Dataset Information

0

Improvements to the rice genome annotation through large-scale analysis of RNA-Seq and proteomics data sets


ABSTRACT: We have performed a Proteogenomics meta-analysis of data sets deposited in ProteomeXchange: PXD000265, PXD000313, PXD000923, PXD001030, PXD001058, PXD002291, PXD002739, PXD002740 and PXD003156 and using 29 RNA-Seq data sets on rice (Oryza sativa). We created a search database comprising translated reads that had been mapped onto the rice genome, as well as officially annotated rice proteins sequences. The RNA Seq database was pre-processed to identify “novel transcripts” for those not mapping fully to an existing exon, and “novel junctions” for those reads mapped with a gap, implying a potential novel splice site that was not annotated in the official gene set. Confidentially identified “novel peptides” i.e. those mapping to a novel junction or novel transcript were post-processed to ensure that there were no other better explanations for the corresponding spectra e.g. peptide from a canonical gene with a modification or amino acid substitution. Data were exported from the pipeline in PSI mzIdentML 1.2 format, containing chromosomal coordinates, and further converted to PSI proBed format for genome visualisation. Novel peptides were searched against other plant databases using BLAST to see if they had predicted in genes from other species. A total of 1584 novel peptides were identified, mapping to ~700 genomic loci in which either new genes have been predicted (~100) or updates to existing gene models have been predicted (~600).

INSTRUMENT(S): LTQ Orbitrap, LTQ Orbitrap Velos, TripleTOF 5600, Q Exactive

ORGANISM(S): Oryza Sativa (rice)

TISSUE(S): Plant Cell, Callus, Flower, Embryo, Seedling, Egg, Semen, Sperm, Seed, Pollen, Endosperm, Meiotic Cell

SUBMITTER: Da Qi  

LAB HEAD: Andrew R Jones

PROVIDER: PXD008960 | Pride | 2018-10-30

REPOSITORIES: Pride

altmetric image

Publications

Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets.

Ren Zhe Z   Qi Da D   Pugh Nina N   Li Kai K   Wen Bo B   Zhou Ruo R   Xu Shaohang S   Liu Siqi S   Jones Andrew R AR  

Molecular & cellular proteomics : MCP 20181006 1


Rice (<i>Oryza sativa</i>) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches  ...[more]

Similar Datasets

2014-04-15 | E-GEOD-37242 | biostudies-arrayexpress
2020-01-17 | PXD016150 | Pride
2008-08-21 | E-GEOD-11014 | biostudies-arrayexpress
2020-03-01 | E-MTAB-8353 | biostudies-arrayexpress
2015-05-18 | E-GEOD-57707 | biostudies-arrayexpress
2008-12-08 | E-GEOD-12317 | biostudies-arrayexpress
2012-01-12 | E-GEOD-30367 | biostudies-arrayexpress
2011-03-07 | E-GEOD-27699 | biostudies-arrayexpress
2013-08-07 | E-GEOD-43577 | biostudies-arrayexpress
2022-06-09 | PXD032314 | Pride