Software tool for researching annotations of proteins: open-source protein annotation software with data visualization.
ABSTRACT: In order that biological meaning may be derived and testable hypotheses may be built from proteomics experiments, assignments of proteins identified by mass spectrometry or other techniques must be supplemented with additional notation, such as information on known protein functions, protein-protein interactions, or biological pathway associations. Collecting, organizing, and interpreting this data often requires the input of experts in the biological field of study, in addition to the time-consuming search for and compilation of information from online protein databases. Furthermore, visualizing this bulk of information can be challenging due to the limited availability of easy-to-use and freely available tools for this process. In response to these constraints, we have undertaken the design of software to automate annotation and visualization of proteomics data in order to accelerate the pace of research. Here we present the Software Tool for Researching Annotations of Proteins (STRAP), a user-friendly, open-source C# application. STRAP automatically obtains gene ontology (GO) terms associated with proteins in a proteomics results ID list using the freely accessible UniProtKB and EBI GOA databases. Summarized in an easy-to-navigate tabular format, STRAP results include meta-information on the protein in addition to complementary GO terminology. Additionally, this information can be edited by the user so that in-house expertise on particular proteins may be integrated into the larger data set. STRAP provides a sortable tabular view for all terms, as well as graphical representations of GO-term association data in pie charts (biological process, cellular component, and molecular function) and bar charts (cross comparison of sample sets) to aid in the interpretation of large data sets and differential analyses experiments. Furthermore, proteins of interest may be exported as a unique FASTA-formatted file to allow for customizable re-searching of mass spectrometry data, and gene names corresponding to the proteins in the lists may be encoded in the Gaggle microformat for further characterization, including pathway analysis. STRAP, a tutorial, and the C# source code are freely available from http://cpctools.sourceforge.net.
Project description:The identification of protein post-translational modifications (PTMs) is an increasingly important component of proteomics and biomarker discovery, but very few tools exist for performing fast and easy characterization of global PTM changes and differential comparison of PTMs across groups of data obtained from liquid chromatography-tandem mass spectrometry experiments. STRAP PTM (Software Tool for Rapid Annotation of Proteins: Post-Translational Modification edition) is a program that was developed to facilitate the characterization of PTMs using spectral counting and a novel scoring algorithm to accelerate the identification of differential PTMs from complex data sets. The software facilitates multi-sample comparison by collating, scoring, and ranking PTMs and by summarizing data visually. The freely available software (beta release) installs on a PC and processes data in protXML format obtained from files parsed through the Trans-Proteomic Pipeline. The easy-to-use interface allows examination of results at protein, peptide, and PTM levels, and the overall design offers tremendous flexibility that provides proteomics insight beyond simple assignment and counting.
Project description:RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool), QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery) tools. It offers a report on statistical analysis of functional and Gene Ontology (GO) annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA) by ab initio methods) helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is freely available at: http://www-labgtp.na.icar.cnr.it/Transcriptator.
Project description:We present Dragon TF Association Miner (DTFAM), a system for text-mining of PubMed documents for potential functional association of transcription factors (TFs) with terms from Gene Ontology (GO) and with diseases. DTFAM has been trained and tested in the selection of relevant documents on a manually curated dataset containing >3000 PubMed abstracts relevant to transcription control. On our test data the system achieves sensitivity of 80% with specificity of 82%. DTFAM provides comprehensive tabular and graphical reports linking terms to relevant sets of documents. These documents are color-coded for easier inspection. DTFAM complements the existing biological resources by collecting, assessing, extracting and presenting associations that can reveal some of the not so easily observable connections among the entities found which could explain the functions of TFs and help decipher parts of gene transcriptional regulatory networks. DTFAM summarizes information from a large volume of documents saving time and making analysis simpler for individual users. DTFAM is freely available for academic and non-profit users at http://research.i2r.a-star.edu.sg/DRAGON/TFAM/.
Project description:The increasing availability of multi-omic platforms poses new challenges to data analysis. Joint visualization of multi-omics data is instrumental in better understanding interconnections across molecular layers and in fully utilizing the multi-omic resources available to make biological discoveries. We present here PaintOmics 3, a web-based resource for the integrated visualization of multiple omic data types onto KEGG pathway diagrams. PaintOmics 3 combines server-end capabilities for data analysis with the potential of modern web resources for data visualization, providing researchers with a powerful framework for interactive exploration of their multi-omics information. Unlike other visualization tools, PaintOmics 3 covers a comprehensive pathway analysis workflow, including automatic feature name/identifier conversion, multi-layered feature matching, pathway enrichment, network analysis, interactive heatmaps, trend charts, and more. It accepts a wide variety of omic types, including transcriptomics, proteomics and metabolomics, as well as region-based approaches such as ATAC-seq or ChIP-seq data. The tool is freely available at www.paintomics.org.
Project description:Recently we introduced the concept of Suspension Trapping (STrap) for bottom-up proteomics sample processing that is based upon SDS-mediated protein extraction, swift detergent removal and rapid reactor-type protein digestion in a quartz depth filter trap. As the depth filter surface is made of silica, it is readily modifiable with various functional groups using the silane coupling chemistries. Thus, during the digest, peptides possessing specific features could be targeted for enrichment by the functionalized depth filter material while non-targeted peptides could be collected as an unbound distinct fraction after the digest. In the example presented here the quartz depth filter surface is functionalized with the pyridyldithiol group therefore enabling reversible in-situ capture of the cysteine-containing peptides generated during the STrap-based digest. The described C-STrap method retains all advantages of the original STrap methodology and provides robust foundation for the conception of the targeted in-situ peptide fractionation in the STrap format for bottom-up proteomics. The presented data support the method's use in qualitative and semi-quantitative proteomics experiments.
Project description:Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/
Project description:The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights.
Project description:The identification of nearly all proteins in a biological system using data-dependent acquisition (DDA) tandem mass spectrometry has become routine for organisms with relatively small genomes such as bacteria and yeast. Still, the quantification of the identified proteins may be a complex process and often requires multiple different software packages. In this protocol, I describe a flexible strategy for the identification and label-free quantification of proteins from bottom-up proteomics experiments. This method can be used to quantify all the detectable proteins in any DDA dataset collected with high-resolution precursor scans and may be used to quantify proteome remodeling in response to drug treatment or a gene knockout. Notably, the method is statistically rigorous, uses the latest and fastest freely-available software, and the entire protocol can be completed in a few hours with a small number of data files from the analysis of yeast.
Project description:We describe and demonstrate the proteomics computational toolkit provided in the open-source msInspect software distribution. The toolkit includes modules written in Java and in the R statistical programming language to aid the rapid development of proteomics software applications. It contains tools for processing and manipulating standard MS data files, including signal processing of LC-MS data and parsing of MS/MS search results, as well as for modeling proteomics data structures, creating charts, and other common tasks. We present this toolkit's capability to rapidly develop new computational tools by presenting an example application, Qurate, a graphical tool for manually curating isotopically labeled peptide quantitative events.