Project description:Motivation:The Cancer Genome Atlas (TCGA) program has produced huge amounts of cancer genomics data providing unprecedented opportunities for research. In 2014, we developed TCGA-Assembler, a software pipeline for retrieval and processing of public TCGA data. In 2016, TCGA data were transferred from the TCGA data portal to the Genomic Data Commons (GDCs), which is supported by a different set of data storage and retrieval mechanisms. In addition, new proteomics data of TCGA samples have been generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) program, which were not available for downloading through TCGA-Assembler. It is desirable to acquire and integrate data from both GDC and CPTAC. Results:We develop TCGA-assembler 2 (TA2) to automatically download and integrate data from GDC and CPTAC. We make substantial improvement on the functionality of TA2 to enhance user experience and software performance. TA2 together with its previous version have helped more than 2000 researchers from 64 countries to access and utilize TCGA and CPTAC data in their research. Availability of TA2 will continue to allow existing and new users to conduct reproducible research based on TCGA and CPTAC data. Availability and implementation:http://www.compgenome.org/TCGA-Assembler/ or https://github.com/compgenome365/TCGA-Assembler-2. Contact:zhuyitan@gmail.com or koaeraser@gmail.com. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:Introduction: NMR spectroscopy is a powerful technique for studying metabolism, either in metabolomics settings or through tracing with stable isotope-enriched metabolic precursors. MetaboLabPy (version 0.9.66) is a free and open-source software package used to process 1D- and 2D-NMR spectra. The software implements a complete workflow for NMR data pre-processing to prepare a series of 1D-NMR spectra for multi-variate statistical data analysis. This includes a choice of algorithms for automated phase correction, segmental alignment, spectral scaling, variance stabilisation, export to various software platforms, and analysis of metabolic tracing data. The software has an integrated help system with tutorials that demonstrate standard workflows and explain the capabilities of MetaboLabPy. Materials and Methods: The software is implemented in Python and uses numerous Python toolboxes, such as numpy, scipy, pandas, etc. The software is implemented in three different packages: metabolabpy, qtmetabolabpy, and metabolabpytools. The metabolabpy package contains classes to handle NMR data and all the numerical routines necessary to process and pre-process 1D NMR data and perform multiplet analysis on 2D-1H, 13C HSQC NMR data. The qtmetabolabpy package contains routines related to the graphical user interface. Results: PySide6 is used to produce a modern and user-friendly graphical user interface. The metabolabpytools package contains routines which are not specific to just handling NMR data, for example, routines to derive isotopomer distributions from the combination of NMR multiplet and GC-MS data. A deep-learning approach for the latter is currently under development. MetaboLabPy is available via the Python Package Index or via GitHub.
Project description:ObjectivesThe main purpose of this publication is to help users (students, researchers, farmers, advisors, etc.) of weather data with agronomic purposes (e.g. crop yield forecast) to retrieve and process gridded weather data from different Application Programming Interfaces (API client) sources using R software.Data descriptionThis publication consists of a code-tutorial developed in R that is part of the data-curation process from numerous research projects carried out by the Ciampitti's Lab, Department of Agronomy, Kansas State University. We make use of three weather databases for which specific libraries were developed in R language: (i) DAYMET (Thornton et al. in https://daymet.ornl.gov/ , 2019; https://github.com/bluegreen-labs/daymetr ), (ii) NASA-POWER (Sparks in J Open Source Softw 3:1035, 2018; https://github.com/ropensci/nasapower ), and (iii) Climate Hazards Group InfraRed Precipitation with Station Data (CHIRPS) (Funk et al. in Sci Data 2:150066, 2015; https://github.com/ropensci/chirps ). The databases offer different weather variables, and vary in terms of spatio-temporal coverage and resolution. The tutorial shows and explain how to retrieve weather data from multiple locations at once using latitude and longitude coordinates. Additionally, it offers the possibility to create relevant variables and summaries that are of agronomic interest such as Shannon Diversity Index (SDI) of precipitation, abundant and well distributed rainfall (AWDR), growing degree days (GDD), crop heat units (CHU), extreme precipitation (EPE) and temperature events (ETE), reference evapotranspiration (ET0), among others.
Project description:Electrochemical sensors are major players in the race for improved molecular diagnostics due to their convenience, temporal resolution, manufacturing scalability, and their ability to support real-time measurements. This is evident in the ever-increasing number of health-related electrochemical sensing platforms, ranging from single-measurement point-of-care devices to wearable devices supporting immediate and continuous monitoring. In support of the need for such systems to rapidly process large data volumes, we describe here an open-source, easily customizable, multiplatform compatible program for the real-time control, processing, and visualization of electrochemical data. The software's architecture is modular and fully documented, allowing the easy customization of the code to support the processing of voltammetric (e.g., square-wave and cyclic) and chronoamperometric data. The program, which we have called Software for the Analysis and Continuous Monitoring of Electrochemical Systems (SACMES), also includes a graphical interface allowing the user to easily change analysis parameters (e.g., signal/noise processing, baseline correction) in real-time. To demonstrate the versatility of SACMES we use it here to analyze the real-time data output by (1) the electrochemical, aptamer-based measurement of a specific small-molecule target, (2) a monoclonal antibody-detecting DNA-scaffold sensor, and (3) the determination of the folding thermodynamics of an electrode-attached, redox-reporter-modified protein.
Project description:IRimage aims at increasing throughput, accuracy and reproducibility of results obtained from thermal images, especially those produced with affordable, consumer-oriented cameras. IRimage processes thermal images, extracting raw data and calculating temperature values with an open and fully documented algorithm, making this data available for further processing using image analysis software. It also allows the making of reproducible measurements of the temperature of objects in a series of images, and produce visual outputs (images and videos) suitable for scientific reporting. IRimage is implemented in a scripting language of the scientific image analysis software ImageJ, allowing its use through a graphical user interface and also allowing for an easy modification or expansion of its functionality. IRimage's results were consistent with those of standard software for 15 camera models of the most widely used brand. An example use case is also presented, in which IRimage was used to efficiently process hundreds of thermal images to reveal subtle differences in the daily pattern of leaf temperature of plants subjected to different soil water contents. IRimage's functionalities make it better suited for research purposes than many currently available alternatives, and could contribute to making affordable consumer-grade thermal cameras useful for reproducible research.
Project description:The datasets presented in this article are related the research paper entitled "A Linear Classifier Approach for Identifying Security Requirements in Open Source Software Development" Wang et al. (2018) [1]. This article describes requirements collected from three open-source software (OSS) projects and labels of security requirements. The datasets are made available to support automated security requirements analyzing tools development as well as tools' evaluation.
Project description:BackgroundResearch projects often involve observation, registration, and data processing starting from information obtained in field experiments. In many cases, these tasks are carried out by several persons in different places, times, and ways, adding different levels of complexity and error in data collecting. Furthermore, data processing can be time consuming, and input errors may produce unwanted results.ResultsWe have developed a novel, open source software called Phenobook, an easy, flexible, and intuitive tool to organize, collect, and save experimental data for further analyses. Phenobook was conceived to collect phenotypic observations in a user-friendly, cost-effective way. It consists of a web-based software for experiment design, data input and visualization, and exportation, combined with a mobile application for remote data collecting. We provide in this article a detailed description of the developed tool.ConclusionPhenobook is a software tool that can be easily implemented in collaborative research and development projects involving data collecting and forward analyses. Adopting Phenobook is expected to improve the involved processes by minimizing input errors, resulting in higher quality and reliability of the research outcomes.
Project description:Comprehensive two-dimensional gas chromatography (GC×GC) is a powerful analytical tool for both nontargeted and targeted analyses. However, there is a need for more integrated workflows for processing and managing the resultant high-complexity datasets. End-to-end workflows for processing GC×GC data are challenging and often require multiple tools or software to process a single dataset. We describe a new approach, which uses an existing underutilized interface within commercial software to integrate free and open-source/external scripts and tools, tailoring the workflow to the needs of the individual researcher within a single software environment. To demonstrate the concept, the interface was successfully used to complete a first-pass alignment on a large-scale GC×GC metabolomics dataset. The analysis was performed by interfacing bespoke and published external algorithms within a commercial software environment to automatically correct the variation in retention times captured by a routine reference standard. Variation in 1tR and 2tR was reduced on average from 8 and 16% CV prealignment to less than 1 and 2% post alignment, respectively. The interface enables automation and creation of new functions and increases the interconnectivity between chemometric tools, providing a window for integrating data-processing software with larger informatics-based data management platforms.
Project description:Despite the increased access to scientific publications and data as a result of open science initiatives, access to scientific tools remains limited. Uncrewed aerial vehicles (UAVs, or drones) can be a powerful tool for research in disciplines such as agriculture and environmental sciences, but their use in research is currently dominated by proprietary, closed source tools. The objective of this work was to collect, curate, organize and test a set of open source tools for aerial data capture for research purposes. The Open Science Drone Toolkit was built through a collaborative and iterative process by more than 100 people in five countries, and comprises an open-hardware autonomous drone and off-the-shelf hardware, open-source software, and guides and protocols that enable the user to perform all the necessary tasks to obtain aerial data. Data obtained with this toolkit over a wheat field was compared to data from satellite imagery and a commercial hand-held sensor, finding a high correlation for both instruments. Our results demonstrate the possibility of capturing research-grade aerial data using affordable, accessible, and customizable open source software and hardware, and using open workflows.
Project description:The thermal shift assay (TSA)-also known as differential scanning fluorimetry (DSF), thermofluor, and Tm shift-is one of the most popular biophysical screening techniques used in fragment-based ligand discovery (FBLD) to detect protein-ligand interactions. By comparing the thermal stability of a target protein in the presence and absence of a ligand, potential binders can be identified. The technique is easy to set up, has low protein consumption, and can be run on most real-time polymerase chain reaction (PCR) instruments. While data analysis is straightforward in principle, it becomes cumbersome and time-consuming when the screens involve multiple 96- or 384-well plates. There are several approaches that aim to streamline this process, but most involve proprietary software, programming knowledge, or are designed for specific instrument output files. We therefore developed an analysis workflow implemented in the Konstanz Information Miner (KNIME), a free and open-source data analytics platform, which greatly streamlined our data processing timeline for 384-well plates. The implementation is code-free and freely available to the community for improvement and customization to accommodate a wide range of instrument input files and workflows.