BIFI: a Taverna plugin for a simplified and user-friendly workflow platform.
ABSTRACT: BACKGROUND: Heterogeneity in the features, input-output behaviour and user interface for available bioinformatics tools and services is still a bottleneck for both expert and non-expert users. Advancement in providing common interfaces over such tools and services are gaining interest among researchers. However, the lack of (meta-) information about input-output data and parameter prevents to provide automated and standardized solutions, which can assist users in setting the appropriate parameters. These limitations must be resolved especially in the workflow-based solution in order to ease the integration of software. FINDINGS: We report a Taverna Workbench plugin: the XworX BIFI (Beautiful Interfaces for Inputs) implemented as a solution for the aforementioned issues. BIFI provides a Graphical User Interface (GUI) definition language used to layout the user interface and to define parameter options for Taverna workflows. BIFI is also able to submit GUI Definition Files (GDF) directly or discover appropriate instances from a configured repository. In the absence of a GDF, BIFI generates a default interface. CONCLUSION: The Taverna Workbench is an open source software providing the ability to combine various services within a workflow. Nevertheless, users can supply input data to the workflow via a simple user interface providing only a text area to enter the input in text form. The workflow may contain meta-information in human readable form such as description text for the port and an example value. However, not all workflow ports are documented so well or have all the required information.BIFI uses custom user interface components for ports which give users feedback on the parameter data type or structure to be used for service execution and enables client-side data validations. Moreover, BIFI offers user interfaces that allow users to interactively construct workflow views and share them with the community, thus significantly increasing usability of heterogeneous, distributed service consumption.
Project description:BACKGROUND: Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. RESULTS: In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. CONCLUSIONS: Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.
Project description:In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.CaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain.By extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other service-computing infrastructures that leverage similar technology stack.
Project description:Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata.
Project description:BACKGROUND: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. RESULTS: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. CONCLUSION: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.
Project description:R is the statistical language commonly used by many life scientists in (omics) data analysis. At the same time, these complex analyses benefit from a workflow approach, such as used by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical.We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv)Syntax highlighting of the R code; v) Improved usability through fewer port types.Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly.Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way.
Project description:There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools.Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench.Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
Project description:BACKGROUND:The Docker project is providing a promising strategy for the development of virtualization systems in bioinformatics. However, implementation, management, and launching of Docker containers is not entirely trivial for users not fully familiarized with command line interfaces. This has prompted the development of graphical user interfaces to facilitate the interaction of inexperienced users with Docker environments. RESULTS:We describe the BioPortainer Workbench, an integrated Docker system that assists inexperienced users in interacting with a bioinformatics-dedicated Docker environment at 3 main levels: (i) infrastructure, (ii) platform, and (iii) application. CONCLUSIONS:The BioPortainer Workbench represents a pioneering effort in developing a comprehensive and easy-to-use Docker platform focused on bioinformatics, which may greatly assist in the dissemination of Docker virtualization technology in this complex field of research.
Project description:The Norwegian e-Infrastructure for Life Sciences (NeLS) has been developed by ELIXIR Norway to provide its users with a system enabling data storage, sharing, and analysis in a project-oriented fashion. The system is available through easy-to-use web interfaces, including the Galaxy workbench for data analysis and workflow execution. Users confident with a command-line interface and programming may also access it through Secure Shell (SSH) and application programming interfaces (APIs). NeLS has been in production since 2015, with training and support provided by the help desk of ELIXIR Norway. Through collaboration with NorSeq, the national consortium for high-throughput sequencing, an integrated service is offered so that sequencing data generated in a research project is provided to the involved researchers through NeLS. Sensitive data, such as individual genomic sequencing data, are handled using the TSD (Services for Sensitive Data) platform provided by Sigma2 and the University of Oslo. NeLS integrates national e-infrastructure storage and computing resources, and is also integrated with the SEEK platform in order to store large data files produced by experiments described in SEEK. In this article, we outline the architecture of NeLS and discuss possible directions for further development.
Project description:Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.
Project description:The effectiveness of clinical information systems to improve nursing and patient outcomes depends on human factors, including system usability, organizational workflow, and user satisfaction.The aim of this study was to examine to what extent residents, family members, and clinicians find a sensor data interface used to monitor elder activity levels usable and useful in an independent living setting.Three independent expert reviewers conducted an initial heuristic evaluation. Subsequently, 20 end users (5 residents, 5 family members, 5 registered nurses, and 5 physicians) participated in the evaluation. During the evaluation, each participant was asked to complete three scenarios taken from three residents. Morae recorder software was used to capture data during the user interactions.The heuristic evaluation resulted in 26 recommendations for interface improvement; these were classified under the headings content, aesthetic appeal, navigation, and architecture, which were derived from heuristic results. Total time for elderly residents to complete scenarios was much greater than for other users. Family members spent more time than clinicians but less time than residents did to complete scenarios. Elder residents and family members had difficulty interpreting clinical data and graphs, experienced information overload, and did not understand terminology. All users found the sensor data interface useful for identifying changing resident activities.Older adult users have special needs that should be addressed when designing clinical interfaces for them, especially information as important as health information. Evaluating human factors during user interactions with clinical information systems should be a requirement before implementation.