Watchdog - a workflow management system for the distributed analysis of large-scale experimental data.
ABSTRACT: BACKGROUND:The development of high-throughput experimental technologies, such as next-generation sequencing, have led to new challenges for handling, analyzing and integrating the resulting large and diverse datasets. Bioinformatical analysis of these data commonly requires a number of mutually dependent steps applied to numerous samples for multiple conditions and replicates. To support these analyses, a number of workflow management systems (WMSs) have been developed to allow automated execution of corresponding analysis workflows. Major advantages of WMSs are the easy reproducibility of results as well as the reusability of workflows or their components. RESULTS:In this article, we present Watchdog, a WMS for the automated analysis of large-scale experimental data. Main features include straightforward processing of replicate data, support for distributed computer systems, customizable error detection and manual intervention into workflow execution. Watchdog is implemented in Java and thus platform-independent and allows easy sharing of workflows and corresponding program modules. It provides a graphical user interface (GUI) for workflow construction using pre-defined modules as well as a helper script for creating new module definitions. Execution of workflows is possible using either the GUI or a command-line interface and a web-interface is provided for monitoring the execution status and intervening in case of errors. To illustrate its potentials on a real-life example, a comprehensive workflow and modules for the analysis of RNA-seq experiments were implemented and are provided with the software in addition to simple test examples. CONCLUSIONS:Watchdog is a powerful and flexible WMS for the analysis of large-scale high-throughput experiments. We believe it will greatly benefit both users with and without programming skills who want to develop and apply bioinformatical workflows with reasonable overhead. The software, example workflows and a comprehensive documentation are freely available at www.bio.ifi.lmu.de/watchdog.
Project description:BACKGROUND:Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows. IMPLEMENTATION:Here, we present Watchdog 2.0, which implements new developments for module creation, reusability, and documentation and for reproducibility of analyses and workflow execution. Developments include a graphical user interface for semi-automatic module creation from software help pages, sharing repositories for modules and workflows, and a standardized module documentation format. The latter allows generation of a customized reference book of public and user-specific modules. Furthermore, extensive logging of workflow execution, module and software versions, and explicit support for package managers and container virtualization now ensures reproducibility of results. A step-by-step analysis protocol generated from the log file may, e.g., serve as a draft of a manuscript methods section. Finally, 2 new execution modes were implemented. One allows resuming workflow execution after interruption or modification without rerunning successfully executed tasks not affected by changes. The second one allows detaching and reattaching to workflow execution on a local computer while tasks continue running on computer clusters. CONCLUSIONS:Watchdog 2.0 provides several new developments that we believe to be of benefit for large-scale bioinformatics analysis and that are not completely covered by other competing workflow management systems. The software itself, module and workflow repositories, and comprehensive documentation are freely available at https://www.bio.ifi.lmu.de/watchdog.
Project description:BACKGROUND: Heterogeneity in the features, input-output behaviour and user interface for available bioinformatics tools and services is still a bottleneck for both expert and non-expert users. Advancement in providing common interfaces over such tools and services are gaining interest among researchers. However, the lack of (meta-) information about input-output data and parameter prevents to provide automated and standardized solutions, which can assist users in setting the appropriate parameters. These limitations must be resolved especially in the workflow-based solution in order to ease the integration of software. FINDINGS: We report a Taverna Workbench plugin: the XworX BIFI (Beautiful Interfaces for Inputs) implemented as a solution for the aforementioned issues. BIFI provides a Graphical User Interface (GUI) definition language used to layout the user interface and to define parameter options for Taverna workflows. BIFI is also able to submit GUI Definition Files (GDF) directly or discover appropriate instances from a configured repository. In the absence of a GDF, BIFI generates a default interface. CONCLUSION: The Taverna Workbench is an open source software providing the ability to combine various services within a workflow. Nevertheless, users can supply input data to the workflow via a simple user interface providing only a text area to enter the input in text form. The workflow may contain meta-information in human readable form such as description text for the port and an example value. However, not all workflow ports are documented so well or have all the required information.BIFI uses custom user interface components for ports which give users feedback on the parameter data type or structure to be used for service execution and enables client-side data validations. Moreover, BIFI offers user interfaces that allow users to interactively construct workflow views and share them with the community, thus significantly increasing usability of heterogeneous, distributed service consumption.
Project description:<h4>Summary</h4>Large-scale data analysis in bioinformatics requires pipelined execution of multiple software. Generally each stage in a pipeline takes considerable computing resources and several workflow management systems (WMS), e.g. Snakemake, Nextflow, Common Workflow Language, Galaxy, etc. have been developed to ensure optimum execution of the stages across two invocations of the pipeline