Project description:Metagenome sequencing has revolutionized functional microbiome analysis across diverse ecosystems, but is fraught with technical hurdles. We introduce MOSHPIT (https://moshpit.readthedocs.io), software built on the QIIME 2 framework (Q2F) that integrates best-in-class CAMI2-validated metagenome tools with robust provenance tracking and multiple user interfaces, enabling streamlined, reproducible metagenome analysis for all expertise levels. By building on Q2F, MOSHPIT enhances scalability, interoperability, and reproducibility in complex workflows, democratizing and accelerating discovery at the frontiers of metagenomics.
Project description:An important hallmark of science is the transparency and reproducibility of scientific results. Over the last few years, internet-based technologies have emerged that allow for a representation of the scientific process that goes far beyond traditional methods and analysis descriptions. Using these often freely available tools requires a suite of skills that is not necessarily part of a curriculum in the life sciences. However, funders, journals, and policy makers increasingly require researchers to ensure complete reproducibility of their methods and analyses. To close this gap, we designed an introductory course that guides students towards a reproducible science workflow. Here, we outline the course content and possible extensions, report encountered challenges, and discuss how to integrate such a course in existing curricula.
Project description:Reproducible research and open science practices have the potential to accelerate scientific progress by allowing others to reuse research outputs, and by promoting rigorous research that is more likely to yield trustworthy results. However, these practices are uncommon in many fields, so there is a clear need for training that helps and encourages researchers to integrate reproducible research and open science practices into their daily work. Here, we outline eleven strategies for making training in these practices the norm at research institutions. The strategies, which emerged from a virtual brainstorming event organized in collaboration with the German Reproducibility Network, are concentrated in three areas: (i) adapting research assessment criteria and program requirements; (ii) training; (iii) building communities. We provide a brief overview of each strategy, offer tips for implementation, and provide links to resources. We also highlight the importance of allocating resources and monitoring impact. Our goal is to encourage researchers - in their roles as scientists, supervisors, mentors, instructors, and members of curriculum, hiring or evaluation committees - to think creatively about the many ways they can promote reproducible research and open science practices in their institutions.
Project description:When working with anaerobic bacteria it is important to have the capability to perform parallel bioreactor growth experiments that are both controllable and reproducible, although capital and consumables costs for commercially available systems are often prohibitively high. Hence, a three-vessel parallel bioreactor system was designed and constructed that has the capabilities for batch and fed batch processes and can also be set up for continuous culture at a fraction of the cost of commercial systems. This system carries over many of the same functionalities of those systems with a higher price point of entry, including in-line monitoring of temperature, pH, and redox poise. To validate the performance of this system Clostridium saccharoperbutylacetonicum was grown under conditions that promote ABE fermentation, an established industrial process used to produce the solvents acetone, butanol and ethanol. Measurements of cell density, pH, and redox poise all confirmed reproducible culture conditions for these parallel vessels, and solvent quantitation via GCMS verified consistent metabolic activities for the separate cultures. In future, this system will be of interest to researchers that require high performance parallel fermentation platforms but where commercial systems are not accessible.
Project description:Mass spectrometry imaging is increasingly used in biological and translational research because it has the ability to determine the spatial distribution of hundreds of analytes in a sample. Being at the interface of proteomics/metabolomics and imaging, the acquired datasets are large and complex and often analyzed with proprietary software or in-house scripts, which hinders reproducibility. Open source software solutions that enable reproducible data analysis often require programming skills and are therefore not accessible to many mass spectrometry imaging (MSI) researchers. We have integrated 18 dedicated mass spectrometry imaging tools into the Galaxy framework to allow accessible, reproducible, and transparent data analysis. Our tools are based on Cardinal, MALDIquant, and scikit-image and enable all major MSI analysis steps such as quality control, visualization, preprocessing, statistical analysis, and image co-registration. Furthermore, we created hands-on training material for use cases in proteomics and metabolomics. To demonstrate the utility of our tools, we re-analyzed a publicly available N-linked glycan imaging dataset. By providing the entire analysis history online, we highlight how the Galaxy framework fosters transparent and reproducible research. The Galaxy framework has emerged as a powerful analysis platform for the analysis of MSI data with ease of use and access, together with high levels of reproducibility and transparency.
Project description:Considerable concern has been raised regarding research reproducibility both within and outside the scientific community. Several factors possibly contribute to a lack of reproducibility, including a failure to adequately employ statistical considerations during study design, bias in sample selection or subject recruitment, errors in developing data inclusion/exclusion criteria, and flawed statistical analysis. To address some of these issues, several publishers have developed checklists that authors must complete. Others have either enhanced statistical expertise on existing editorial boards, or formed distinct statistics editorial boards. Although the U.S. Environmental Protection Agency, Office of Research and Development, already has a strong Quality Assurance Program, an initiative was undertaken to further strengthen statistics consideration and other factors in study design and also to ensure these same factors are evaluated during the review and approval of study protocols. To raise awareness of the importance of statistical issues and provide a forum for robust discussion, a Community of Practice for Statistics was formed in January 2014. In addition, three working groups were established to develop a series of questions or criteria that should be considered when designing or reviewing experimental, observational, or modeling focused research. This article describes the process used to develop these study design guidance documents, their contents, how they are being employed by the Agency's research enterprise, and expected benefits to Agency science. The process and guidance documents presented here may be of utility for any research enterprise interested in enhancing the reproducibility of its science.
Project description:Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).
Project description:Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.
Project description:Neuroimaging data analysis often requires purpose-built software, which can be challenging to install and may produce different results across computing environments. Beyond being a roadblock to neuroscientists, these issues of accessibility and portability can hamper the reproducibility of neuroimaging data analysis pipelines. Here, we introduce the Neurodesk platform, which harnesses software containers to support a comprehensive and growing suite of neuroimaging software (https://www.neurodesk.org/). Neurodesk includes a browser-accessible virtual desktop environment and a command line interface, mediating access to containerized neuroimaging software libraries on various computing platforms, including personal and high-performance computers, cloud computing and Jupyter Notebooks. This community-oriented, open-source platform enables a paradigm shift for neuroimaging data analysis, allowing for accessible, flexible, fully reproducible, and portable data analysis pipelines.