Project description:BackgroundPapers on COVID-19 are being published at a high rate and concern many different topics. Innovative tools are needed to aid researchers to find patterns in this vast amount of literature to identify subsets of interest in an automated fashion.ObjectiveWe present a new online software resource with a friendly user interface that allows users to query and interact with visual representations of relationships between publications.MethodsWe publicly released an application called PLATIPUS (Publication Literature Analysis and Text Interaction Platform for User Studies) that allows researchers to interact with literature supplied by COVIDScholar via a visual analytics platform. This tool contains standard filtering capabilities based on authors, journals, high-level categories, and various research-specific details via natural language processing and dozens of customizable visualizations that dynamically update from a researcher's query.ResultsPLATIPUS is available online and currently links to over 100,000 publications and is still growing. This application has the potential to transform how COVID-19 researchers use public literature to enable their research.ConclusionsThe PLATIPUS application provides the end user with a variety of ways to search, filter, and visualize over 100,00 COVID-19 publications.
Project description:Multimodal neuroimaging data of various brain disorders provides valuable information to understand brain function in health and disease. Various neuroimaging-based databases have been developed that mainly consist of volumetric magnetic resonance imaging (MRI) data. We present the comprehensive web-based neuroimaging platform "SWADESH" for hosting multi-disease, multimodal neuroimaging, and neuropsychological data along with analytical pipelines. This novel initiative includes neurochemical and magnetic susceptibility data for healthy and diseased conditions, acquired using MR spectroscopy (MRS) and quantitative susceptibility mapping (QSM) respectively. The SWADESH architecture also provides a neuroimaging database which includes MRI, MRS, functional MRI (fMRI), diffusion weighted imaging (DWI), QSM, neuropsychological data and associated data analysis pipelines. Our final objective is to provide a master database of major brain disease states (neurodegenerative, neuropsychiatric, neurodevelopmental, and others) and to identify characteristic features and biomarkers associated with such disorders.
Project description:VIrus Particle ExploreR data base (VIPERdb) (http://viperdb.scripps.edu) is a curated repository of virus capsid structures and a database of structure-derived data along with various virus specific information. VIPERdb has been continuously improved for over 20 years and contains a number of virus structure analysis tools. The release of VIPERdb v3.0 contains new structure-based data analytics tools like Multiple Structure-based and Sequence Alignment (MSSA) to identify hot-spot residues within a selected group of structures and an anomaly detection application to analyze and curate the structure-derived data within individual virus families. At the time of this writing, there are 931 virus structures from 62 different virus families in the database. Significantly, the new release also contains a standalone database called 'Virus World database' (VWdb) that comprises all the characterized viruses (∼181 000) known to date, gathered from ICTVdb and NCBI, and their capsid protein sequences, organized according to their virus taxonomy with links to known structures in VIPERdb and PDB. Moreover, the new release of VIPERdb includes a service-oriented data engine to handle all the data access requests and provides an interface for futuristic data analytics using machine leaning applications.
Project description:MotivationThe BLAST software package for sequence comparison speeds up homology search by preprocessing a query sequence into a lookup table. Numerous research studies have suggested that preprocessing the database instead would give better performance. However, production usage of sequence comparison methods that preprocess the database has been limited to programs such as BLAT and SSAHA that are designed to find matches when query and database subsequences are highly similar.ResultsWe developed a new version of the MegaBLAST module of BLAST that does the initial phase of finding short seeds for matches by searching a database index. We also developed a program makembindex that preprocesses the database into a data structure for rapid seed searching. We show that the new 'indexed MegaBLAST' is faster than the 'non-indexed' version for most practical uses. We show that indexed MegaBLAST is faster than miBLAST, another implementation of BLAST nucleotide searching with a preprocessed database, for most of the 200 queries we tested. To deploy indexed MegaBLAST as part of NCBI'sWeb BLAST service, the storage of databases and the queueing mechanism were modified, so that some machines are now dedicated to serving queries for a specific database. The response time for such Web queries is now faster than it was when each computer handled queries for multiple databases.AvailabilityThe code for indexed MegaBLAST is part of the blastn program in the NCBI C++ toolkit. The preprocessor program makembindex is also in the toolkit. Indexed MegaBLAST has been used in production on NCBI's Web BLAST service to search one version of the human and mouse genomes since October 2007. The Linux command-line executables for blastn and makembindex, documentation, and some query sets used to carry out the tests described below are available in the directory: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast [corrected]Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Large-scale immune monitoring experiments (such as clinical trials) are a promising direction for biomarker discovery and responder stratification in immunotherapy. Mass cytometry is one of the tools in the immune monitoring arsenal. We propose a standardized workflow for the acquisition and analysis of large-scale mass cytometry experiments. The workflow includes two-tiered barcoding, a broad lyophilized panel, and the incorporation of a fully automated, cloud-based analysis platform. We applied the workflow to a large antibody staining screen using the LEGENDScreen kit, resulting in single-cell data for 350 antibodies over 71 profiling subsets. The screen recapitulates many known trends in the immune system and reveals potential markers for delineating MAIT cells. Additionally, we examine the effect of fixation on staining intensity and identify several markers where fixation leads to either gain or loss of signal. The standardized workflow can be seamlessly integrated into existing trials. Finally, the antibody staining data set is available as an online resource for researchers who are designing mass cytometry experiments in suspension and tissue.
Project description:Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.
Project description:Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations.
Project description:BACKGROUND:Mobile health (mHealth) apps for pediatric chronic conditions are growing in availability and challenge investigators to conduct rigorous evaluations that keep pace with mHealth innovation. Traditional research methods are poorly suited to operationalize the agile, iterative trials required to evidence and optimize these digitally mediated interventions. OBJECTIVE:We sought to contribute a resource to support the quantification, analysis, and visualization of analytic indicators of effective engagement with mHealth apps for chronic conditions. METHODS:We applied user-centered design methods to design and develop an Analytics Platform to Evaluate Effective Engagement (APEEE) with consumer mHealth apps for chronic conditions and implemented the platform to analyze both retrospective and prospective data generated from a smartphone-based pain self-management app called iCanCope for young people with chronic pain. RESULTS:Through APEEE, we were able to automate the process of defining, operationalizing, and evaluating effective engagement with iCanCope. Configuring the platform to integrate with the app was feasible and provided investigators with a resource to consolidate, analyze, and visualize engagement data generated by participants in real time. Preliminary efforts to evaluate APEEE showed that investigators perceived the platform to be an acceptable evaluative resource and were satisfied with its design, functionality, and performance. Investigators saw potential in APEEE to accelerate and augment evidence generation and expressed enthusiasm for adopting the platform to support their evaluative practice once fully implemented. CONCLUSIONS:Dynamic, real-time analytic platforms may provide investigators with a powerful means to characterize the breadth and depth of mHealth app engagement required to achieve intended health outcomes. Successful implementation of APEEE into evaluative practice may contribute to the realization of effective and evidence-based mHealth care.
Project description:Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome.
Project description:PurposeIn a learning health system (LHS), data gathered from clinical practice informs care and scientific investigation. To demonstrate how a novel data and analytics platform can enable an LHS at a regional cancer center by characterizing the care provided to breast cancer patients.MethodsSocioeconomic information, tumor characteristics, treatments and outcomes were extracted from the platform and combined to characterize the patient population and their clinical course. Oncologists were asked to identify examples where clinical practice guidelines (CPGs) or policy changes had varying impacts on practice. These constructs were evaluated by extracting the corresponding data.ResultsBreast cancer patients (5768) seen at the Juravinski Cancer Centre between January 2014 and June 2022 were included. The average age was 62.5 years. The commonest histology was invasive ductal carcinoma (74.6%); 77% were estrogen receptor-positive and 15.5% were HER2 Neu positive. Breast-conserving surgery (BCS) occurred in 56%. For the 4294 patients who received systemic therapy, the initial indications were adjuvant (3096), neoadjuvant (828) and palliative (370). Metastases occurred in 531 patients and 495 patients died. Lowest-income patients had a higher mortality rate. For the adoption of CPGs, the uptake for adjuvant bisphosphonate was very low, 8% as predicted, compared to 64% for pertuzumab, a HER2 targeted agent and 40.2% for CD4/6 inhibitors in metastases. During COVID-19, the provincial cancer agency issued a policy to shorten the duration of radiation after BCS. There was a significant reduction in the average number of fractions to the breast by five fractions.ConclusionOur platform characterized care and the clinical course of breast cancer patients. Practice changes in response to regulatory developments and policy changes were measured. Establishing a data platform is important for an LHS. The next step is for the data to feedback and change practice, that is, close the loop.