Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

De novo sequencing of DIA data

ABSTRACT: Testing datasets and pre-trained model for DeepNovo, a deep learning-based tool for de novo sequencing of DIA data.

INSTRUMENT(S): Q Exactive

ORGANISM(S): Homo Sapiens (ncbitaxon:9606)

SUBMITTER: Ming Li

PROVIDER: MSV000082368 | MassIVE | Wed May 16 14:18:00 BST 2018

REPOSITORIES: MassIVE

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

Project description:Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systematically varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.

2023-01-16 | PXD037803 | Pride

Zero-Shot De Novo Peptide Sequencing with Open Post-Translational Modification Discovery

Project description:Proteins play essential roles in biology, yet identifying their precise sequences and modifications remains challenging. De novo peptide sequencing offers a powerful solution by directly inferring sequences from mass spectrometry data without relying on protein databases. Recent deep learning models have significantly advanced this task but remain trapped in a major dilemma: they require labeled training data to recognize post-translational modifications (PTMs), which is unavailable for most biologically relevant but rare or unknown modifications. We solve this long-standing problem by introducing RNovA, a transformer-based de novo sequencing algorithm enhanced with relative positional embeddings and a reinforcement-learning–style sequential decision framework. RNovA enables open PTM discovery in a zero-shot settingwithout retraining or a predefined list of candidate residues—while maintaining state-of-the-art performance on standard benchmarks. Demonstrating this capability, we successfully identified peptides modified by kynurenine—an uncommon and biologically relevant PTM—in clinical samples from rheumatoid arthritis patients. RNovA overcomes key limitations of existing methods and provides a foundation for exploring previously inaccessible regions of the proteome, including peptides with unexpected or unannotated modifications. This capability is widely needed in immunology, biomarker discovery, and biomedical research.

2026-03-29 | PXD076296 | Pride

DeNovo Peptide Identification Deep Learning Test Set

Project description:A set of bottom-up proteomics data for testing the deep learning network trained with data in PXD010000

2022-09-25 | PXD010613 | Pride

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers

Project description:Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood and enhancer de novo design is considered impossible. Here we built a deep learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally non-equivalent instances of the same TF motif that are determined by motif-flanking sequence and inter-motif distances. We validated these rules experimentally and demonstrated their conservation in human by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo. This SuperSeries is composed of the SubSeries listed below.

2022-02-24 | GSE183939 | GEO

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers [Human oligo UMI-STARR-seq]

2022-02-24 | GSE183938 | GEO

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers [Drosophila oligo UMI-STARR-seq]

2022-02-24 | GSE183937 | GEO

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers [Drosophila genome-wide UMI-STARR-seq]

2022-02-24 | GSE183936 | GEO

De novo peptide sequencing by deep learning

Project description:Including all training and testing datasets, pretrained models, and source code of DeepNovo.

2017-07-25 | MSV000081382 | MassIVE

A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data

Project description:A core computational challenge in the analysis of mass spectrometry data is the de novo sequencing problem, in which the generating amino acid sequence is inferred directly from an observed fragmentation spectrum without the use of a sequence database. Recently, deep learning models have made significant advances in de novo sequencing by learning from massive datasets of high confidence labeled mass spectra. However, these methods are primarily designed for data-dependent acquisition (DDA) experiments. Over the past decade, the field of mass spectrometry has been moving toward using data-independent acquisition (DIA) protocols for the analysis of complex proteomic samples due to their superior specificity and reproducibility. Hence, we present a new de novo sequencing model called Cascadia, which uses a transformer architecture to handle the more complex data generated by DIA protocols. In comparisons with existing approaches for de novo sequencing of DIA data, Cascadia achieves improved performance across a range of instruments and experimental protocols. Additionally, we demonstrate Cascadia’s ability to accurately discover de novo coding variants and peptides from the variable region of antibodies.

2024-06-21 | PXD053291 | panorama

De novo assembly of siRNA immunity in wild plants

Project description:We describe an application of deep sequencing and de novo assembly of short RNA reads to investigate small interfering (si)RNAs mediated immunity in leaf samples from eight tree taxa naturally occurring in Wytham Woods, Oxfordshire, UK. BLAST search for homologues of contigs in the GenBank identified siRNA populations against a number of RNA viruses and a Ty1-copia retrotransposons in these tree species. Small RNA sequencing and de novo assembly

2012-06-01 | E-GEOD-22079 | biostudies-arrayexpress

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data