Platform for Automated Real-Time High Performance Analytics on Medical Image Data.
ABSTRACT: Biomedical data are quickly growing in volume and in variety, providing clinicians an opportunity for better clinical decision support. Here, we demonstrate a robust platform that uses software automation and high performance computing (HPC) resources to achieve real-time analytics of clinical data, specifically magnetic resonance imaging (MRI) data. We used the Agave application programming interface to facilitate communication, data transfer, and job control between an MRI scanner and an off-site HPC resource. In this use case, Agave executed the graphical pipeline tool GRAphical Pipeline Environment (GRAPE) to perform automated, real-time, quantitative analysis of MRI scans. Same-session image processing will open the door for adaptive scanning and real-time quality control, potentially accelerating the discovery of pathologies and minimizing patient callbacks. We envision this platform can be adapted to other medical instruments, HPC resources, and analytics tools.
Project description:PURPOSE:We present a platform, GRAphical Pipeline Environment (GRAPE), to facilitate the development of patient-adaptive magnetic resonance imaging (MRI) protocols. METHODS:GRAPE is an open-source project implemented in the Qt C++ framework to enable graphical creation, execution, and debugging of real-time image analysis algorithms integrated with the MRI scanner. The platform provides the tools and infrastructure to design new algorithms, and build and execute an array of image analysis routines, and provides a mechanism to include existing analysis libraries, all within a graphical environment. The application of GRAPE is demonstrated in multiple MRI applications, and the software is described in detail for both the user and the developer. RESULTS:GRAPE was successfully used to implement and execute three applications in MRI of the brain, performed on a 3.0-T MRI scanner: (i) a multi-parametric pipeline for segmenting the brain tissue and detecting lesions in multiple sclerosis (MS), (ii) patient-specific optimization of the 3D fluid-attenuated inversion recovery MRI scan parameters to enhance the contrast of brain lesions in MS, and (iii) an algebraic image method for combining two MR images for improved lesion contrast. CONCLUSIONS:GRAPE allows graphical development and execution of image analysis algorithms for inline, real-time, and adaptive MRI applications.
Project description:SUMMARY:Here we introduce a Fiji plugin utilizing the HPC-as-a-Service concept, significantly mitigating the challenges life scientists face when delegating complex data-intensive processing workflows to HPC clusters. We demonstrate on a common Selective Plane Illumination Microscopy image processing task that execution of a Fiji workflow on a remote supercomputer leads to improved turnaround time despite the data transfer overhead. The plugin allows the end users to conveniently transfer image data to remote HPC resources, manage pipeline jobs and visualize processed results directly from the Fiji graphical user interface. AVAILABILITY AND IMPLEMENTATION:The code is distributed free and open source under the MIT license. Source code: https://github.com/fiji-hpc/hpc-workflow-manager/, documentation: https://imagej.net/SPIM_Workflow_Manager_For_HPC. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols.This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls.The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators--experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community.
Project description:In this review, we discuss the role of sensor analytics point solutions (SNAPS), a reduced complexity machine-assisted decision support tool. We summarize the approaches used for mobile phone-based chemical/biological sensors, including general hardware and software requirements for signal transduction and acquisition. We introduce SNAPS, part of a platform approach to converge sensor data and analytics. The platform is designed to consist of a portfolio of modular tools which may lend itself to dynamic composability by enabling context-specific selection of relevant units, resulting in case-based working modules. SNAPS is an element of this platform where data analytics, statistical characterization and algorithms may be delivered to the data either via embedded systems in devices, or sourced, in near real-time, from mist, fog or cloud computing resources. Convergence of the physical systems with the cyber components paves the path for SNAPS to progress to higher levels of artificial reasoning tools (ART) and emerge as data-informed decision support, as a service for general societal needs. Proof of concept examples of SNAPS are demonstrated both for quantitative data and qualitative data, each operated using a mobile device (smartphone or tablet) for data acquisition and analytics. We discuss the challenges and opportunities for SNAPS, centered around the value to users/stakeholders and the key performance indicators users may find helpful, for these types of machine-assisted tools.
Project description:BACKGROUND:Scientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FINDINGS:The iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation-supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). CONCLUSIONS:iMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform.
Project description:Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis-intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps.Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficiently, and can process more than a million sequences in a few hours on a small cluster.Source code and binaries are freely available at http://meringlab.org/software/hpc-clust/; the pipeline is implemented in Cþþ and uses the Message Passing Interface (MPI) standard for distributed computing.
Project description:Background:Application of microarrays in omics technologies enables quantification of many biomolecules simultaneously. It is widely applied to observe the positive or negative effect on biomolecule activity in perturbed versus the steady state by quantitative comparison. Community resources, such as Bioconductor and CRAN, host tools based on R language that have become standard for high-throughput analytics. However, application of these tools is technically challenging for generic users and require specific computational skills. There is a need for intuitive and easy-to-use platform to process omics data, visualize, and interpret results. Results:We propose an integrated software solution, eUTOPIA, that implements a set of essential processing steps as a guided workflow presented to the user as an R Shiny application. Conclusions:eUTOPIA allows researchers to perform preprocessing and analysis of microarray data via a simple and intuitive graphical interface while using state of the art methods.
Project description:The theoretical foundations of Big Data Science are not fully developed, yet. This study proposes a new scalable framework for Big Data representation, high-throughput analytics (variable selection and noise reduction), and model-free inference. Specifically, we explore the core principles of distribution-free and model-agnostic methods for scientific inference based on Big Data sets. Compressive Big Data analytics (CBDA) iteratively generates random (sub)samples from a big and complex dataset. This subsampling with replacement is conducted on the feature and case levels and results in samples that are not necessarily consistent or congruent across iterations. The approach relies on an ensemble predictor where established model-based or model-free inference techniques are iteratively applied to preprocessed and harmonized samples. Repeating the subsampling and prediction steps many times, yields derived likelihoods, probabilities, or parameter estimates, which can be used to assess the algorithm reliability and accuracy of findings via bootstrapping methods, or to extract important features via controlled variable selection. CBDA provides a scalable algorithm for addressing some of the challenges associated with handling complex, incongruent, incomplete and multi-source data and analytics challenges. Albeit not fully developed yet, a CBDA mathematical framework will enable the study of the ergodic properties and the asymptotics of the specific statistical inference approaches via CBDA. We implemented the high-throughput CBDA method using pure R as well as via the graphical pipeline environment. To validate the technique, we used several simulated datasets as well as a real neuroimaging-genetics of Alzheimer's disease case-study. The CBDA approach may be customized to provide generic representation of complex multimodal datasets and to provide stable scientific inference for large, incomplete, and multisource datasets.
Project description:The volume, diversity and velocity of biomedical data are exponentially increasing providing petabytes of new neuroimaging and genetics data every year. At the same time, tens-of-thousands of computational algorithms are developed and reported in the literature along with thousands of software tools and services. Users demand intuitive, quick and platform-agnostic access to data, software tools, and infrastructure from millions of hardware devices. This explosion of information, scientific techniques, computational models, and technological advances leads to enormous challenges in data analysis, evidence-based biomedical inference and reproducibility of findings. The Pipeline workflow environment provides a crowd-based distributed solution for consistent management of these heterogeneous resources. The Pipeline allows multiple (local) clients and (remote) servers to connect, exchange protocols, control the execution, monitor the states of different tools or hardware, and share complete protocols as portable XML workflows. In this paper, we demonstrate several advanced computational neuroimaging and genetics case-studies, and end-to-end pipeline solutions. These are implemented as graphical workflow protocols in the context of analyzing imaging (sMRI, fMRI, DTI), phenotypic (demographic, clinical), and genetic (SNP) data.
Project description:Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it.The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction: email@example.comSupplementary data are available at Bioinformatics online.