Project description:The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.
Project description:For more than thirty years, it has been claimed that a way to improve software developers' productivity and software quality is to focus on people and to provide incentives to make developers satisfied and happy. This claim has rarely been verified in software engineering research, which faces an additional challenge in comparison to more traditional engineering fields: software development is an intellectual activity and is dominated by often-neglected human factors (called human aspects in software engineering research). Among the many skills required for software development, developers must possess high analytical problem-solving skills and creativity for the software construction process. According to psychology research, affective states-emotions and moods-deeply influence the cognitive processing abilities and performance of workers, including creativity and analytical problem solving. Nonetheless, little research has investigated the correlation between the affective states, creativity, and analytical problem-solving performance of programmers. This article echoes the call to employ psychological measurements in software engineering research. We report a study with 42 participants to investigate the relationship between the affective states, creativity, and analytical problem-solving skills of software developers. The results offer support for the claim that happy developers are indeed better problem solvers in terms of their analytical abilities. The following contributions are made by this study: (1) providing a better understanding of the impact of affective states on the creativity and analytical problem-solving capacities of developers, (2) introducing and validating psychological measurements, theories, and concepts of affective states, creativity, and analytical-problem-solving skills in empirical software engineering, and (3) raising the need for studying the human factors of software engineering by employing a multidisciplinary viewpoint.
Project description:Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.
Project description:In scientific research, objectivity and unbiased data analysis is crucial for the validity and reproducibility of outcomes. This is particularly important for studies involving video or image categorization. A common approach of decreasing the bias is delegating data analysis to researchers unfamiliar with the experimental settings. However, this requires additional personnel and is prone to cognitive biases. Here we describe the Video & Image Cutter & Randomizer (VICR) software (https://github.com/kkihnphd/VICR), designed for unbiased analysis by segmenting and then randomizing videos or still images. VICR allows a single researcher to conduct and analyze studies in a blinded manner, eliminating the bias in analysis and streamlining the research process. We describe the features of the VICR software and demonstrate its capabilities using zebrafish behavior studies. To our knowledge, VICR is the only software for the randomization of video and image segments capable of eliminating bias in data analysis in a variety of research fields.
Project description:In scientific research, objectivity and unbiased data analysis is crucial for the validity and reproducibility of outcomes. This is particularly important for studies involving video or image categorization. A common approach of decreasing the bias is delegating data analysis to researchers unfamiliar with the experimental settings. However, this requires additional personnel and is prone to cognitive biases. Here we describe the Video & Image Cutter & Randomizer (VICR) software (https://github.com/kkihnphd/VICR), designed for unbiased analysis by segmenting and then randomizing the segmented videos or still images. VICR allows a single researcher to conduct and analyze studies in a blinded manner, eliminating the bias in analysis and streamlining the research process. We describe the features of the VICR software and demonstrate its capabilities using zebrafish behavior studies. To our knowledge, VICR is the only software for the randomization of video and image segments capable of eliminating bias in data analysis in a variety of research fields.
Project description:Progress in mass spectrometry lipidomics has led to a rapid proliferation of studies across biology and biomedicine. These generate extremely large raw datasets requiring sophisticated solutions to support automated data processing. To address this, numerous software tools have been developed and tailored for specific tasks. However, for researchers, deciding which approach best suits their application relies on ad hoc testing, which is inefficient and time consuming. Here we first review the data processing pipeline, summarizing the scope of available tools. Next, to support researchers, LIPID MAPS provides an interactive online portal listing open-access tools with a graphical user interface. This guides users towards appropriate solutions within major areas in data processing, including (1) lipid-oriented databases, (2) mass spectrometry data repositories, (3) analysis of targeted lipidomics datasets, (4) lipid identification and (5) quantification from untargeted lipidomics datasets, (6) statistical analysis and visualization, and (7) data integration solutions. Detailed descriptions of functions and requirements are provided to guide customized data analysis workflows.
Project description:After years of using Graphics Processing Units (GPUs) to accelerate scientific applications in fields as varied as tomography, computer vision, climate modeling, digital forensics, geospatial databases, particle physics, radio astronomy, and localization microscopy, we noticed a number of technical, socio-technical, and non-technical challenges that Research Software Engineers (RSEs) may run into. While some of these challenges, such as managing different programming languages within a project, or having to deal with different memory spaces, are common to all software projects involving GPUs, others are more typical of scientific software projects. Among these challenges we include changing resolutions or scales, maintaining an application over time and making it sustainable, and evaluating both the obtained results and the achieved performance.
Project description:BackgroundWe address the goal of curating observations from published experiments in a generalizable form; reasoning over these observations to generate interpretations and then querying this interpreted knowledge to supply the supporting evidence. We present web-application software as part of the 'BioScholar' project (R01-GM083871) that fully instantiates this process for a well-defined domain: using tract-tracing experiments to study the neural connectivity of the rat brain.ResultsThe main contribution of this work is to provide the first instantiation of a knowledge representation for experimental observations called 'Knowledge Engineering from Experimental Design' (KEfED) based on experimental variables and their interdependencies. The software has three parts: (a) the KEfED model editor - a design editor for creating KEfED models by drawing a flow diagram of an experimental protocol; (b) the KEfED data interface - a spreadsheet-like tool that permits users to enter experimental data pertaining to a specific model; (c) a 'neural connection matrix' interface that presents neural connectivity as a table of ordinal connection strengths representing the interpretations of tract-tracing data. This tool also allows the user to view experimental evidence pertaining to a specific connection. BioScholar is built in Flex 3.5. It uses Persevere (a noSQL database) as a flexible data store and PowerLoom® (a mature First Order Logic reasoning system) to execute queries using spatial reasoning over the BAMS neuroanatomical ontology.ConclusionsWe first introduce the KEfED approach as a general approach and describe its possible role as a way of introducing structured reasoning into models of argumentation within new models of scientific publication. We then describe the design and implementation of our example application: the BioScholar software. This is presented as a possible biocuration interface and supplementary reasoning toolkit for a larger, more specialized bioinformatics system: the Brain Architecture Management System (BAMS).
Project description:Science depends heavily on reliable and easy-to-use software packages, such as mathematical libraries or data analysis tools. Developing such packages requires a lot of effort, which is too often avoided due to the lack of funding or recognition. In order to reduce the efforts required to create sustainable software packages, we present a project skeleton that ensures the best software engineering practices from the start of a project, or serves as reference for existing projects.