Dataset Information

Homology-driven assembly of NOn-redundant protEin sequence sets (NOmESS) for mass spectrometry.

ABSTRACT:

Unlabelled

To enable mass spectrometry (MS)-based proteomic studies with poorly characterized organisms, we developed a computational workflow for the homology-driven assembly of a non-redundant reference sequence dataset. In the automated pipeline, translated DNA sequences (e.g. ESTs, RNA deep-sequencing data) are aligned to those of a closely related and fully sequenced organism. Representative sequences are derived from each cluster and joined, resulting in a non-redundant reference set representing the maximal available amino acid sequence information for each protein. We here applied NOmESS to assemble a reference database for the widely used model organism Xenopus laevis and demonstrate its use in proteomic applications.

Availability and implementation

NOmESS is written in C#. The source code as well as the executables can be downloaded from http://www.biochem.mpg.de/cox Execution of NOmESS requires BLASTp and cd-hit in addition.

Contact

cox@biochem.mpg.de

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Temu T

PROVIDER: S-EPMC4848398 | biostudies-literature | 2016 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Homology-driven assembly of NOn-redundant protEin sequence sets (NOmESS) for mass spectrometry.

Temu Tikira T Mann Matthias M Räschle Markus M Cox Jürgen J

Bioinformatics (Oxford, England) 20160106 9

<h4>Unlabelled</h4>To enable mass spectrometry (MS)-based proteomic studies with poorly characterized organisms, we developed a computational workflow for the homology-driven assembly of a non-redundant reference sequence dataset. In the automated pipeline, translated DNA sequences (e.g. ESTs, RNA deep-sequencing data) are aligned to those of a closely related and fully sequenced organism. Representative sequences are derived from each cluster and joined, resulting in a non-redundant reference s ...[more]

PMID: 26743511

Dataset Information

Homology-driven assembly of NOn-redundant protEin sequence sets (NOmESS) for mass spectrometry.

Unlabelled

Availability and implementation

Contact

Supplementary information

Publications

Homology-driven assembly of NOn-redundant protEin sequence sets (NOmESS) for mass spectrometry.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.
| S-EPMC5835207 | biostudies-literature

OWL--a non-redundant composite protein sequence database.
| S-EPMC308323 | biostudies-other

High-Throughput Deconvolution of Native Protein Mass Spectrometry Imaging Data Sets for Mass Domain Analysis.
| S-EPMC10515104 | biostudies-literature

Insights into virus capsid assembly from non-covalent mass spectrometry.
| S-EPMC7168407 | biostudies-literature

Analyzing native membrane protein assembly in nanodiscs by combined non-covalent mass spectrometry and synthetic biology.
| S-EPMC5291076 | biostudies-literature

Mass spectrometry captures structural intermediates in protein fiber self-assembly.
| S-EPMC5530726 | biostudies-literature

Data-driven fingerprint nanoelectromechanical mass spectrometry.
| S-EPMC11496504 | biostudies-literature

The annotation-enriched non-redundant patent sequence databases.
| S-EPMC3568390 | biostudies-literature

ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets.
| S-EPMC6709584 | biostudies-literature

Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.
| S-EPMC2773710 | biostudies-literature