GIANT: galaxy-based tool for interactive analysis of transcriptomic data
ABSTRACT: Transcriptomic analyses are broadly used in biomedical research calling for tools allowing biologists to be directly involved in data mining and interpretation. We present here GIANT, a Galaxy-based tool for Interactive ANalysis of Transcriptomic data, which consists of biologist-friendly tools dedicated to analyses of transcriptomic data from microarray or RNA-seq analyses. GIANT is organized into modules allowing researchers to tailor their analyses by choosing the specific set of tool(s) to analyse any type of preprocessed transcriptomic data. It also includes a series of tools dedicated to the handling of raw Affymetrix microarray data. GIANT brings easy-to-use solutions to biologists for transcriptomic data mining and interpretation.
Project description:BACKGROUND:Intra-species genetic variation can be used to investigate population structure, selection, and gene flow in non-model vertebrates; and due to the plummeting costs for genome sequencing, it is now possible for small labs to obtain full-genome variation data from their species of interest. However, those labs may not have easy access to, and familiarity with, computational tools to analyze those data. RESULTS:We have created a suite of tools for the Galaxy web server aimed at handling nucleotide and amino-acid polymorphisms discovered by full-genome sequencing of several individuals of the same species, or using a SNP genotyping microarray. In addition to providing user-friendly tools, a main goal is to make published analyses reproducible. While most of the examples discussed in this paper deal with nuclear-genome diversity in non-human vertebrates, we also illustrate the application of the tools to fungal genomes, human biomedical data, and mitochondrial sequences. CONCLUSIONS:This project illustrates that a small group can design, implement, test, document, and distribute a Galaxy tool collection to meet the needs of a particular community of biologists.
Project description:Background:New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings:We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions:Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable.
Project description:Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker.
Project description:BACKGROUND: SRS (Sequence Retrieval System) has proven to be a valuable platform for storing, linking, and querying biological databases. Due to the availability of a broad range of different scientific databases in SRS, it has become a useful platform to incorporate and mine microarray data to facilitate the analyses of biological questions and non-hypothesis driven quests. Here we report various solutions and tools for integrating and mining annotated expression data in SRS. RESULTS: We devised an Auto-Upload Tool by which microarray data can be automatically imported into SRS. The dataset can be linked to other databases and user access can be set. The linkage comprehensiveness of microarray platforms to other platforms and biological databases was examined in a network of scientific databases. The stored microarray data can also be made accessible to external programs for further processing. For example, we built an interface to a program called Venn Mapper, which collects its microarray data from SRS, processes the data by creating Venn diagrams, and saves the data for interpretation. CONCLUSION: SRS is a useful database system to store, link and query various scientific datasets, including microarray data. The user-friendly Auto-Upload Tool makes SRS accessible to biologists for linking and mining user-owned databases.
Project description:Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.
Project description:Lotus japonicus is a well-characterized model legume widely used in the study of plant-microbe interactions. However, datasets from various Lotus studies are poorly integrated and lack interoperability. We recognize the need for a comprehensive repository that allows comprehensive and dynamic exploration of Lotus genomic and transcriptomic data. Equally important are user-friendly in-browser tools designed for data visualization and interpretation. Here, we present Lotus Base, which opens to the research community a large, established LORE1 insertion mutant population containing an excess of 120,000 lines, and serves the end-user tightly integrated data from Lotus, such as the reference genome, annotated proteins, and expression profiling data. We report the integration of expression data from the L. japonicus gene expression atlas project, and the development of tools to cluster and export such data, allowing users to construct, visualize, and annotate co-expression gene networks. Lotus Base takes advantage of modern advances in browser technology to deliver powerful data interpretation for biologists. Its modular construction and publicly available application programming interface enable developers to tap into the wealth of integrated Lotus data. Lotus Base is freely accessible at: https://lotus.au.dk.
Project description:Novel approaches for the control of agriculturally damaging nematodes are sorely needed. Endoparasitic nematodes complete their life cycle within the root vascular cylinder, inducing specialized feeding cells: giant cells for root-knot nematodes and syncytia for cyst nematodes. Both nematodes hijack parts of the transduction cascades involved in developmental processes, or partially mimic the plant responses to other interactions with microorganisms, but molecular evidence of their differences and commonalities is still under investigation. Transcriptomics has been used to describe global expression profiles of their interaction with Arabidopsis, generating vast lists of differentially expressed genes. Although these results are available in public databases and publications, the information is scattered and difficult to handle. Here, we present a rapid, visual, user-friendly and easy to handle spreadsheet tool, called NEMATIC (NEMatode-Arabidopsis Transcriptomic Interaction Compendium; http://www.uclm.es/grupo/gbbmp/english/nematic.asp). It combines existing transcriptomic data for the interaction between Arabidopsis and plant-endoparasitic nematodes with data from different transcriptomic analyses regarding hormone and cell cycle regulation, development, different plant tissues, cell types and various biotic stresses. NEMATIC facilitates efficient in silico studies on plant-nematode biology, allowing rapid cross-comparisons with complex datasets and obtaining customized gene selections through sequential comparative and filtering steps. It includes gene functional classification and links to utilities from several databases. This data-mining spreadsheet will be valuable for the understanding of the molecular bases subjacent to feeding site formation by comparison with other plant systems, and for the selection of genes as potential tools for biotechnological control of nematodes, as demonstrated in the experimentally confirmed examples provided.
Project description:BACKGROUND: There is an increasing need in transcriptome research for gene expression data and pattern warehouses. It is of importance to integrate in these warehouses both raw transcriptomic data, as well as some properties encoded in these data, like local patterns. DESCRIPTION: We have developed an application called SQUAT (SAGE Querying and Analysis Tools) which is available at: http://bsmc.insa-lyon.fr/squat/. This database gives access to both raw SAGE data and patterns mined from these data, for three species (human, mouse and chicken). This database allows to make simple queries like "In which biological situations is my favorite gene expressed?" as well as much more complex queries like: <<what are the genes that are frequently co-over-expressed with my gene of interest in given biological situations?>>. Connections with external web databases enrich biological interpretations, and enable sophisticated queries. To illustrate the power of SQUAT, we show and analyze the results of three different queries, one of which led to a biological hypothesis that was experimentally validated. CONCLUSION: SQUAT is a user-friendly information retrieval platform, which aims at bringing some of the state-of-the-art mining tools to biologists.
Project description:The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker.
Project description:Evolutionary analyses of biological data are becoming a prerequisite in many fields of biology. At a time of high-throughput data analysis, phylogenetics is often a necessary complementary tool for biologists to understand, compare and identify the functions of sequences. But available bioinformatics tools are frequently not easy for non-specialists to use. We developed PhyleasProg (http://phyleasprog.inra.fr), a user-friendly web server as a turnkey tool dedicated to evolutionary analyses. PhyleasProg can help biologists with little experience in evolutionary methodologies by analysing their data in a simple and robust way, using methods corresponding to robust standards. Via a very intuitive web interface, users only need to enter a list of Ensembl protein IDs and a list of species as inputs. After dynamic computations, users have access to phylogenetic trees, positive/purifying selection data (on site and branch-site models), with a display of these results on the protein sequence and on a 3D structure model, and the synteny environment of related genes. This connection between different domains of phylogenetics opens the way to new biological analyses for the discovery of the function and structure of proteins.