Dataset Information

MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.

ABSTRACT:

Unlabelled

Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial sixfold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments. We overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. MSProGene is independent from existing reference databases or annotated SNPs and avoids large six-frame translated databases by constructing sample-specific transcripts. In addition, it creates a network combining RNA-Seq and peptide information that is optimized by a maximum-flow algorithm. It thereby also allows resolving the ambiguity of shared peptides for protein inference. We applied MSProGene on three datasets and show that it facilitates a database-independent reliable yet accurate prediction on gene and protein level and additionally identifies novel genes.

Availability and implementation

MSProGene is written in Java and Python. It is open source and available at http://sourceforge.net/projects/msprogene/.

SUBMITTER: Zickmann F

PROVIDER: S-EPMC4765881 | biostudies-literature | 2015 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.

Zickmann Franziska F Renard Bernhard Y BY

Bioinformatics (Oxford, England) 20150601 12

<h4>Unlabelled</h4>Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucle ...[more]

PMID: 26072472

Similar Datasets

Project description:IntroductionIncreasing evidence suggests that RNA modification plays a significant role in the kidney and may be an ideal target for the treatment of kidney diseases. However, the specific mechanisms underlying RNA modifications in the pathogenesis of kidney disease remain unclear. Genome-wide association studies (GWAS) have identified numerous genetic loci involved in kidney function and RNA modifications. The identification and exploration of RNA modification-related single-nucleotide polymorphisms (RNAm-SNPs) associated with kidney function can help us to comprehensively understand the underlying mechanism of kidney disease and identify potential therapeutic targets.MethodsFirst, we examined the association of RNAm-SNPs with eGFR. Second, we performed expression quantitative trait locus (eQTL) and protein quantitative trait locus (pQTL) analyses to explore the functions of the identified RNAm-SNPs. Finally, we evaluated the causality between RNAm-SNP-associated gene expression and circulating proteins and kidney function using a Mendelian randomization (MR) analysis.ResultsA total of 252 RNA m-SNPs related to m6A, m1A, A-to-I, m5C, m7G, and m5U were identified. All these factors were significantly associated with the eGFR. A total of 119(47.22 %) RNAm-SNPs showed cis-eQTL effects in blood cells, whereas 72 (28.57 %) RNAm-SNPs showed cis-pQTL effects in plasma. 47 (18.65 %) RNAm-SNPs exhibited cis-eQTL and cis-pQTL effects. In addition, we demonstrated a causal association between RNAm-SNP-associated gene expression, circulating protein levels, and eGFR decline. Some of the identified genes and proteins have been reported to be associated with kidney diseases, such as CDK10 and SDCCAG8.ConclusionsThis study reveals an association between RNAm-SNPs and kidney function. These SNPs regulate gene expression and protein levels through RNA modifications, eventually leading to kidney dysfunction. Our study provides novel insights that connect the genetic risk of kidney disease to RNA modification and suggests potential therapeutic targets for the prevention and treatment of kidney disease.

Project description:Single-nucleotide polymorphisms (SNPs) provide an abundant source of DNA polymorphisms in a number of eukaryotic species. Information on the frequency, nature, and distribution of SNPs in plant genomes is limited. Thus, our objectives were (1) to determine SNP frequency in coding and noncoding soybean (Glycine max L. Merr.) DNA sequence amplified from genomic DNA using PCR primers designed to complete genes, cDNAs, and random genomic sequence; (2) to characterize haplotype variation in these sequences; and (3) to provide initial estimates of linkage disequilibrium (LD) in soybean. Approximately 28.7 kbp of coding sequence, 37.9 kbp of noncoding perigenic DNA, and 9.7 kbp of random noncoding genomic DNA were sequenced in each of 25 diverse soybean genotypes. Over the >76 kbp, mean nucleotide diversity expressed as Watterson's theta was 0.00097. Nucleotide diversity was 0.00053 and 0.00111 in coding and in noncoding perigenic DNA, respectively, lower than estimates in the autogamous model species Arabidopsis thaliana. Haplotype analysis of SNP-containing fragments revealed a deficiency of haplotypes vs. the number that would be anticipated at linkage equilibrium. In 49 fragments with three or more SNPs, five haplotypes were present in one fragment while four or less were present in the remaining 48, thereby supporting the suggestion of relatively limited genetic variation in cultivated soybean. Squared allele-frequency correlations (r(2)) among haplotypes at 54 loci with two or more SNPs indicated low genome-wide LD. The low level of LD and the limited haplotype diversity suggested that the genome of any given soybean accession is a mosaic of three or four haplotypes. To facilitate SNP discovery and the development of a transcript map, subsets of four to six diverse genotypes, whose sequence analysis would permit the discovery of at least 75% of all SNPs present in the 25 genotypes as well as 90% of the common (frequency >0.10) SNPs, were identified.

Project description:BackgroundAncestry informative markers (AIMs) are a type of genetic marker that is informative for tracing the ancestral ethnicity of individuals. Application of AIMs has gained substantial attention in population genetics, forensic sciences, and medical genetics. Single nucleotide polymorphisms (SNPs), the materials of AIMs, are useful for classifying individuals from distinct continental origins but cannot discriminate individuals with subtle genetic differences from closely related ancestral lineages. Proof-of-principle studies have shown that gene expression (GE) also is a heritable human variation that exhibits differential intensity distributions among ethnic groups. GE supplies ethnic information supplemental to SNPs; this motivated us to integrate SNP and GE markers to construct AIM panels with a reduced number of required markers and provide high accuracy in ancestry inference. Few studies in the literature have considered GE in this aspect, and none have integrated SNP and GE markers to aid classification of samples from closely related ethnic populations.ResultsWe integrated a forward variable selection procedure into flexible discriminant analysis to identify key SNP and/or GE markers with the highest cross-validation prediction accuracy. By analyzing genome-wide SNP and/or GE markers in 210 independent samples from four ethnic groups in the HapMap II Project, we found that average testing accuracies for a majority of classification analyses were quite high, except for SNP-only analyses that were performed to discern study samples containing individuals from two close Asian populations. The average testing accuracies ranged from 0.53 to 0.79 for SNP-only analyses and increased to around 0.90 when GE markers were integrated together with SNP markers for the classification of samples from closely related Asian populations. Compared to GE-only analyses, integrative analyses of SNP and GE markers showed comparable testing accuracies and a reduced number of selected markers in AIM panels.ConclusionsIntegrative analysis of SNP and GE markers provides high-accuracy and/or cost-effective classification results for assigning samples from closely related or distantly related ancestral lineages to their original ancestral populations. User-friendly BIASLESS (Biomarkers Identification and Samples Subdivision) software was developed as an efficient tool for selecting key SNP and/or GE markers and then building models for sample subdivision. BIASLESS was programmed in R and R-GUI and is available online at http://www.stat.sinica.edu.tw/hsinchou/genetics/prediction/BIASLESS.htm.

Dataset Information

MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.

Unlabelled

Availability and implementation

Publications

MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets