Project description:BackgroundTo advance research on malaria, the outputs from existing studies and the data that fed into them need to be made freely available. This will ensure new studies can build on the work that has gone before. These data and results also need to be made available to groups who are developing public health policies based on up-to-date evidence. The Malaria Atlas Project (MAP) has collated and geopositioned over 50,000 parasite prevalence and vector occurrence survey records contributed by over 3,000 sources including research groups, government agencies and non-governmental organizations worldwide. This paper describes the results of a project set up to release data gathered, used and generated by MAP.MethodsRequests for permission to release data online were sent to 236 groups who had contributed unpublished prevalence (parasite rate) surveys. An online explorer tool was developed so that users can visualize the spatial distribution of the vector and parasite survey data before downloading it. In addition, a consultation group was convened to provide advice on the mode and format of release for data generated by MAP's modelling work. New software was developed to produce a suite of publication-quality map images for download from the internet for use in external publications.ConclusionMore than 40,000 survey records can now be visualized on a set of dynamic maps and downloaded from the MAP website on a free and unrestricted basis. As new data are added and new permissions to release existing data come in, the volume of data available for download will increase. The modelled data output from MAP's own analyses are also available online in a range of formats, including image files and GIS surface data, for use in advocacy, education, further research and to help parameterize or validate other mathematical models.
Project description:SummaryIn translational research, efficient knowledge exchange between the different fields of expertise is crucial. An open platform that is capable of storing a multitude of data types such as clinical, pre-clinical or OMICS data combined with strong visual analytical capabilities will significantly accelerate the scientific progress by making data more accessible and hypothesis generation easier. The open data warehouse tranSMART is capable of storing a variety of data types and has a growing user community including both academic institutions and pharmaceutical companies. tranSMART, however, currently lacks interactive and dynamic visual analytics and does not permit any post-processing interaction or exploration. For this reason, we developed SmartR , a plugin for tranSMART, that equips the platform not only with several dynamic visual analytical workflows, but also provides its own framework for the addition of new custom workflows. Modern web technologies such as D3.js or AngularJS were used to build a set of standard visualizations that were heavily improved with dynamic elements.Availability and implementationThe source code is licensed under the Apache 2.0 License and is freely available on GitHub: https://github.com/transmart/SmartR .Contactreinhard.schneider@uni.lu.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Extensive, multifactorial data sharing is a crucial prerequisite for current and future (radiotherapy) research. However, the cost, time and effort to achieve this are often a roadblock. We present an open-source based data-sharing infrastructure between two radiotherapy departments, allowing seamless exchange of de-identified, automatically translated clinical and biomedical treatment data.
Project description:Omics data sharing is crucial to the biological research community, and the last decade or two has seen a huge rise in collaborative analysis systems, databases, and knowledge bases for omics and other systems biology data. We assessed the "FAIRness" of NASA's GeneLab Data Systems (GLDS) along with four similar kinds of systems in the research omics data domain, using 14 FAIRness metrics. The range of overall FAIRness scores was 6-12 (out of 14), average 10.1, and standard deviation 2.4. The range of Pass ratings for the metrics was 29-79%, Partial Pass 0-21%, and Fail 7-50%. The systems we evaluated performed the best in the areas of data findability and accessibility, and worst in the area of data interoperability. Reusability of metadata, in particular, was frequently not well supported. We relate our experiences implementing semantic integration of omics data from some of the assessed systems for federated querying and retrieval functions, given their shortcomings in data interoperability. Finally, we propose two new principles that Big Data system developers, in particular, should consider for maximizing data accessibility.
Project description:ObjectivesThe Epilepsy Learning Healthcare System (ELHS) was created in 2018 to address measurable improvements in outcomes for people with epilepsy. However, fragmentation of data systems has been a major barrier for reporting and participation. In this study, we aimed to test the feasibility of an open-source Data Integration (DI) method that connects real-life clinical data to national research and quality improvement (QI) systems.MethodsThe ELHS case report forms were programmed as EPIC SmartPhrases at Mass General Brigham (MGB) in December 2018 and subsequently as EPIC SmartForms in June 2021 to collect actionable, standardized, structured epilepsy data in the electronic health record (EHR) for subsequent pull into the external national registry of the ELHS. Following the QI methodology in the Chronic Care Model, 39 providers, epileptologists and neurologists, incorporated the ELHS SmartPhrase into their clinical workflow, focusing on collecting diagnosis of epilepsy, seizure type according to the International League Against Epilepsy, seizure frequency, date of last seizure, medication adherence and side effects. The collected data was stored in the Enterprise Data Warehouse (EDW) without integration with external systems. We developed and validated a DI method that extracted the data from EDW using structured query language and later preprocessed using text mining. We used the ELHS data dictionary to match fields in the preprocessed notes to obtain the final structured dataset with seizure control information. For illustration, we described the data curated from the care period of 12/2018-12/2021.ResultsThe cohort comprised a total of 1806 patients with a mean age of 43 years old (SD: 17.0), where 57% were female, 80% were white, and 84% were non-Hispanic/Latino. Using our DI method, we automated the data mining, preprocessing, and exporting of the structured dataset into a local database, to be weekly accessible to clinicians and quality improvers. During the period of SmartPhrase implementation, there were 5168 clinic visits logged by providers documenting each patient's seizure type and frequency. During this period, providers documented 59% patients having focal seizures, 35% having generalized seizures and 6% patients having another type. Of the cohort, 45% patients had private insurance. The resulting structured dataset was bulk uploaded via web interface into the external national registry of the ELHS.ConclusionsStructured data can be feasibly extracted from text notes of epilepsy patients for weekly reporting to a national learning healthcare system.
Project description:Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it. We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters. Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research.
Project description:BackgroundResearch in the field of systems biology requires software for a variety of purposes. Software must be used to store, retrieve, analyze, and sometimes even to collect the data obtained from system-level (often high-throughput) experiments. Software must also be used to implement mathematical models and algorithms required for simulation and theoretical predictions on the system-level.ResultsWe introduce a free, easy-to-use, open-source, integrated software platform called the Systems Biology Research Tool (SBRT) to facilitate the computational aspects of systems biology. The SBRT currently performs 35 methods for analyzing stoichiometric networks and 16 methods from fields such as graph theory, geometry, algebra, and combinatorics. New computational techniques can be added to the SBRT via process plug-ins, providing a high degree of evolvability and a unifying framework for software development in systems biology.ConclusionThe Systems Biology Research Tool represents a technological advance for systems biology. This software can be used to make sophisticated computational techniques accessible to everyone (including those with no programming ability), to facilitate cooperation among researchers, and to expedite progress in the field of systems biology.
Project description:Extruders are necessary equipment for 3D filament manufacturing, which is considered a clean technology because it has less scrap and can reuse materials, increasing its life cycle. Open source extruders are less expensive than industrial extruders. However, they have little instrumentation, which limits processing analysis and thus the development of new materials, screw design and process control. Therefore, this project aims to develop a low-cost extruder with a high degree of instrumentation for in-situ process analysis. To achieve this, equipment was developed with an integrated circuit board, both with modularity, machine and peripheral control, process stability, and data acquisition. To validate the equipment, processing was done at constant temperature and with flow variation. The data obtained were the temperatures at different points in the barrel, the rotation speed of the extruder motor, the current consumed by the motor and the resistances, and the speed of the extruder motor. Thermal images of the components were obtained during processing, validating the type of material used in the parts manufactured by additive manufacturing. The ABS filament produced was analyzed by flow and surface analysis using a confocal microscope. Higher flow rates had a better surface quality of the filament.
Project description:The importance of maintaining data privacy and complying with regulatory requirements is highlighted especially when sharing omic data between different research centers. This challenge is even more pronounced in the scenario where a multi-center effort for collaborative omics studies is necessary. OmicSHIELD is introduced as an open-source tool aimed at overcoming these challenges by enabling privacy-protected federated analysis of sensitive omic data. In order to ensure this, multiple security mechanisms have been included in the software. This innovative tool is capable of managing a wide range of omic data analyses specifically tailored to biomedical research. These include genome and epigenome wide association studies and differential gene expression analyses. OmicSHIELD is designed to support both meta- and mega-analysis, so that it offers a wide range of capabilities for different analysis designs. We present a series of use cases illustrating some examples of how the software addresses real-world analyses of omic data.