Project description:Site-specific phosphorylation events affect nearly all the cellular processes and correct phosphosite localization plays an important role in biological or medical health studies. However, direct false localization rate (FLR) control remains challenging in phosphoproteomics. Here, we propose DeepFLR, a deep learning-based framework utilizing spectrum prediction and the target-decoy method for FLR estimation. We demonstrate that the similarity between predicted and experimental phosphopeptide spectra is comparable to the measurement reproducibility. We further benchmark our method with four synthetic datasets and three real biological sample datasets, showcasing its ability for sensitive phosphosite localization with accurate FLR estimation.
Project description:Untargeted mass spectrometry metabolomics is an increasingly popular approach for characterizing complex mixtures. Recent studies have highlighted the impact of data pre-processing for determining the quality of metabolomics data analysis. The first step in data processing with untargeted metabolomics requires that signal thresholds be selected for which features (detected ions) are included in the dataset. Analysts face the challenge of knowing where to set these thresholds; setting them too high could mean missing relevant features but setting them too low could result in a complex and unwieldy dataset. This study compared data interpretation for an example metabolomics dataset when intensity thresholds were set at a range of feature heights. The main observations were that low signal thresh-olds 1) improved limit of detection, 2) increased the number of features detected with an associated isotope pattern and/or MS-MS fragmentation spectrum and 3) increased the number of in-source clusters and fragments detected for known analytes of interest. When the settings of parameters differing in intensities were applied on a set of 39 samples to discriminate the samples through principal component analyses (PCA), similar results were obtained with both low and high-intensity thresholds. We conclude that the most information-rich datasets can be obtained by setting low-intensity thresholds. However, in cases where only a qualitative comparison of samples with PCA is to be performed, it may be sufficient to set high thresholds and thereby reduce the complexity of the data processing and amount of computational time required.
Project description:Gebauer2016 - Genome-scale model of
Caenorhabditis elegans metabolism (with bacteria)
This model is one of the two versions
of ElegCyc presented in the paper. It describes the metabolism of a
worm raised in a medium with bacteria
This model is described in the article:
A Genome-Scale Database and
Reconstruction of Caenorhabditis elegans Metabolism.
Gebauer J, Gentsch C, Mansfeld J,
Schmeißer K, Waschina S, Brandes S, Klimmasch L, Zamboni N,
Zarse K, Schuster S, Ristow M, Schäuble S, Kaleta C.
Cell Syst 2016 May; 2(5): 312-322
Abstract:
We present a genome-scale model of Caenorhabditis elegans
metabolism along with the public database ElegCyc
(http://elegcyc.bioinf.uni-jena.de:1100), which represents a
reference for metabolic pathways in the worm and allows for the
visualization as well as analysis of omics datasets.
Our model reflects the metabolic peculiarities of
C. elegans that make it distinct from other higher
eukaryotes and mammals, including mice and humans. We
experimentally verify one of these peculiarities by showing
that the lifespan-extending effect of L-tryptophan
supplementation is dose dependent (hormetic). Finally, we show
the utility of our model for analyzing omics datasets through
predicting changes in amino acid concentrations after genetic
perturbations and analyzing metabolic changes during normal
aging as well as during two distinct, reactive oxygen
species (ROS)-related lifespan-extending treatments. Our
analyses reveal a notable similarity in metabolic adaptation
between distinct lifespan-extending interventions and point to
key pathways affecting lifespan in nematodes.
This model is hosted on
BioModels Database
and identified by:
MODEL1704200001.
To cite BioModels Database, please use:
BioModels Database:
An enhanced, curated and annotated resource for published
quantitative kinetic models.
To the extent possible under law, all copyright and related or
neighbouring rights to this encoded model have been dedicated to
the public domain worldwide. Please refer to
CC0
Public Domain Dedication for more information.
Project description:Gebauer2016 - Genome-scale model of
Caenorhabditis elegans metabolism (without bacteria)
This model is one of the two versions
of ElegCyc presented in the paper. It describes the metabolism of a
worm raised in a medium without bacteria.
This model is described in the article:
A Genome-Scale Database and
Reconstruction of Caenorhabditis elegans Metabolism.
Gebauer J, Gentsch C, Mansfeld J,
Schmeißer K, Waschina S, Brandes S, Klimmasch L, Zamboni N,
Zarse K, Schuster S, Ristow M, Schäuble S, Kaleta C.
Cell Syst 2016 May; 2(5): 312-322
Abstract:
We present a genome-scale model of Caenorhabditis elegans
metabolism along with the public database ElegCyc
(http://elegcyc.bioinf.uni-jena.de:1100), which represents a
reference for metabolic pathways in the worm and allows for the
visualization as well as analysis of omics datasets.
Our model reflects the metabolic peculiarities of
C. elegans that make it distinct from other higher
eukaryotes and mammals, including mice and humans. We
experimentally verify one of these peculiarities by showing
that the lifespan-extending effect of L-tryptophan
supplementation is dose dependent (hormetic). Finally, we show
the utility of our model for analyzing omics datasets through
predicting changes in amino acid concentrations after genetic
perturbations and analyzing metabolic changes during normal
aging as well as during two distinct, reactive oxygen
species (ROS)-related lifespan-extending treatments. Our
analyses reveal a notable similarity in metabolic adaptation
between distinct lifespan-extending interventions and point to
key pathways affecting lifespan in nematodes.
This model is hosted on
BioModels Database
and identified by:
MODEL1704200000.
To cite BioModels Database, please use:
BioModels Database:
An enhanced, curated and annotated resource for published
quantitative kinetic models.
To the extent possible under law, all copyright and related or
neighbouring rights to this encoded model have been dedicated to
the public domain worldwide. Please refer to
CC0
Public Domain Dedication for more information.
Project description:Single-cell analysis of the transcriptome deepens our understanding of an individual cell's contribution to its microenvironment. Using single-cell analysis to study complex biological processes requires state-of-the-art computational tools. Assessing similarity is highly important for bioinformatics algorithms in order to determine correlations between biological information. Similarity can appear by chance, particularly for low expressed entities. This is especially relevant in single cell RNA-seq (scRNA-seq) because the read counts obtained are lower compared to bulk RNA-sequencing and therefore classic bioinformatics tools are insufficient to obtain reproducible results. Recently, a Bayesian correlation scheme, that assigns low correlation values to correlations coming from low expressed genes, has been proposed to assess similarity for bulk RNA-seq and miRNA. This Bayesian method uses a prior distribution before using empirical evidence. Our goal was to extend the properties of this Bayesian correlation scheme to scRNA-seq data. We assessed 3 ways to compute similarity. First, we computed the similarity of each pair of genes over all cells. Second, we identified specific cell populations and computed the correlation in those specific cells. Third, we computed the similarity of each pair of genes over all clusters, by including the total mRNA expression in those cells. To study the effect of the number of cells on the method, we did not rely on simulated data, we generated 4 scRNA-seq mouse liver cell libraries with a varying number of input cells. Results: We show that Bayesian correlations are more reproducible than Pearson correlations in all the scenarios studied. Compared to Pearson correlations, Bayesian correlations have a smaller dependence on the number of input cells. We demonstrate that the Bayesian correlation algorithm assigns high similarity values to genes with a biological relevance in a specific population. Significance: Our results demonstrate that Bayesian correlation is a robust similarity measure for scRNA-seq datasets. The Bayesian method allows researchers to study similarity between pairs of genes without discarding low expressed entities and to minimize biasing the results by fake correlations. Taken together, using our method of Bayesian correlation the reproducibility of scRNA-seq experiments is increased significantly.