Project description:SummaryHomology modelling, the technique of generating models of 3D protein structures based on experimental structures from related proteins, has become increasingly popular over the years. An abundance of different tools for model generation and model evaluation is available from various research groups. We present HOMELETTE, an interface which implements a unified programmatic access to these tools. This allows for the assemble of custom pipelines from pre- or self-implemented building blocks.Availability and implementationHOMELETTE is implemented in Python, compatible with version 3.6 and newer. It is distributed under the MIT license. Documentation and tutorials are available at Read the Docs (https://homelette.readthedocs.io/). The latest version of HOMELETTE is available on PyPI (https://pypi.org/project/homelette/) and GitHub (https://github.com/PhilippJunk/homelette). A full installation of the latest version of HOMELETTE with all dependencies is also available as a Docker container (https://hub.docker.com/r/philippjunk/homelette_template).Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:An interactive application, Modelface, was presented for Modeller software based on windows platform. The application is able to run all steps of homology modeling including pdb to fasta generation, running clustal, model building and loop refinement. Other modules of modeler including energy calculation, energy minimization and the ability to make single point mutations in the PDB structures are also implemented inside Modelface. The API is a simple batch based application with no memory occupation and is free of charge for academic use. The application is also able to repair missing atom types in the PDB structures making it suitable for many molecular modeling studies such as docking and molecular dynamic simulation. Some successful instances of modeling studies using Modelface are also reported.
Project description:PurposeMachine Learning Package for Cancer Diagnosis (MLCD) is the result of a National Institutes of Health/National Cancer Institute (NIH/NCI)-sponsored project for developing a unified software package from state-of-the-art breast cancer biopsy diagnosis and machine learning algorithms that can improve the quality of both clinical practice and ongoing research.MethodsWhole-slide images of 240 well-characterized breast biopsy cases, initially assembled under R01 CA140560, were used for developing the algorithms and training the machine learning models. This software package is based on the methodology developed and published under our recent NIH/NCI-sponsored research grant (R01 CA172343) for finding regions of interest (ROIs) in whole-slide breast biopsy images, for segmenting ROIs into histopathologic tissue types and for using this segmentation in classifiers that can suggest final diagnoses.ResultThe package provides an ROI detector for whole-slide images and modules for semantic segmentation into tissue classes and diagnostic classification into 4 classes (benign, atypia, ductal carcinoma in situ, invasive cancer) of the ROIs. It is available through the GitHub repository under the Massachusetts Institute of Technology license and will later be distributed with the Pathology Image Informatics Platform system. A Web page provides instructions for use.ConclusionOur tools have the potential to provide help to other cancer researchers and, ultimately, to practicing physicians and will motivate future research in this field. This article describes the methodology behind the software development and gives sample outputs to guide those interested in using this package.
Project description:Computational methods for protein structure modelling are routinely used to complement experimental structure determination, thus they help to address a broad spectrum of scientific questions in biomedical research. The most accurate methods today are based on homology modelling, i.e. detecting a homologue to the desired target sequence that can be used as a template for modelling. Here we present a versatile open source homology modelling toolbox as foundation for flexible and computationally efficient modelling workflows. ProMod3 is a fully scriptable software platform that can perform all steps required to generate a protein model by homology. Its modular design aims at fast prototyping of novel algorithms and implementing flexible modelling pipelines. Common modelling tasks, such as loop modelling, sidechain modelling or generating a full protein model by homology, are provided as production ready pipelines, forming the starting point for own developments and enhancements. ProMod3 is the central software component of the widely used SWISS-MODEL web-server.
Project description:BACKGROUND: MicroRNAs (miRNAs) are endogenous small RNAs that play a key role in post-transcriptional regulation of gene expression in animals and plants. The number of known miRNAs has increased rapidly over the years. The current release (version 14.0) of miRBase, the central online repository for miRNA annotation, comprises over 10.000 miRNA precursors from 115 different species. Furthermore, a large number of decentralized online resources are now available, each contributing with important miRNA annotation and information. RESULTS: We have developed a software framework, designated here as miRMaid, with the goal of integrating miRNA data resources in a uniform web service interface that can be accessed and queried by researchers and, most importantly, by computers. miRMaid is built around data from miRBase and is designed to follow the official miRBase data releases. It exposes miRBase data as inter-connected web services. Third-party miRNA data resources can be modularly integrated as miRMaid plugins or they can loosely couple with miRMaid as individual entities in the World Wide Web. miRMaid is available as a public web service but is also easily installed as a local application. The software framework is freely available under the LGPL open source license for academic and commercial use. CONCLUSION: miRMaid is an intuitive and modular software platform designed to unify miRBase and independent miRNA data resources. It enables miRNA researchers to computationally address complex questions involving the multitude of miRNA data resources. Furthermore, miRMaid constitutes a basic framework for further programming in which microRNA-interested bioinformaticians can readily develop their own tools and data sources.
Project description:Surface plasmon resonance (SPR) is a powerful method for obtaining detailed molecular interaction parameters. Modern instrumentation with its increased throughput has enabled routine screening by SPR in hit-to-lead and lead optimization programs, and SPR has become a mainstream drug discovery technology. However, the processing and reporting of SPR data in drug discovery are typically performed manually, which is both time-consuming and tedious. Here, we present the workflow concept, design and experiences with a software module relying on a single, browser-based software platform for the processing, analysis, and reporting of SPR data. The efficiency of this concept lies in the immediate availability of end results: data are processed and analyzed upon loading the raw data file, allowing the user to immediately quality control the results. Once completed, the user can automatically report those results to data repositories for corporate access and quickly generate printed reports or documents. The software module has resulted in a very efficient and effective workflow through saved time and improved quality control. We discuss these benefits and show how this process defines a new benchmark in the drug discovery industry for the handling, interpretation, visualization, and sharing of SPR data.
Project description:Mosquito species belonging to the genus Aedes have attracted the interest of scientists and public health officers because of their capacity to transmit viruses that affect humans. Some of these species were brought outside their native range by means of trade and tourism and then colonised new regions thanks to a unique combination of eco-physiological traits. Considering mosquito physiological and behavioural traits to understand and predict their population dynamics is thus a crucial step in developing strategies to mitigate the local densities of invasive Aedes populations. Here, we synthesised the life cycle of four invasive Aedes species (Ae. aegypti, Ae. albopictus, Ae. japonicus and Ae. koreicus) in a single multi-scale stochastic modelling framework which we coded in the R package dynamAedes. We designed a stage-based and time-discrete stochastic model driven by temperature, photo-period and inter-specific larval competition that can be applied to three different spatial scales: punctual, local and regional. These spatial scales consider different degrees of spatial complexity and data availability by accounting for both active and passive dispersal of mosquito species as well as for the heterogeneity of the input temperature data. Our overarching aim was to provide a flexible, open-source and user-friendly tool rooted in the most updated knowledge on the species' biology which could be applied to the management of invasive Aedes populations as well as to more theoretical ecological inquiries.
Project description:We present a framework to assist the diagrammatic modelling of complex biological systems using the unified modelling language (UML). The framework comprises three levels of modelling, ranging in scope from the dynamics of individual model entities to system-level emergent properties. By way of an immunological case study of the mouse disease experimental autoimmune encephalomyelitis, we show how the framework can be used to produce models that capture and communicate the biological system, detailing how biological entities, interactions and behaviours lead to higher-level emergent properties observed in the real world. We demonstrate how the UML can be successfully applied within our framework, and provide a critique of UML's ability to capture concepts fundamental to immunology and biology more generally. We show how specialized, well-explained diagrams with less formal semantics can be used where no suitable UML formalism exists. We highlight UML's lack of expressive ability concerning cyclic feedbacks in cellular networks, and the compounding concurrency arising from huge numbers of stochastic, interacting agents. To compensate for this, we propose several additional relationships for expressing these concepts in UML's activity diagram. We also demonstrate the ambiguous nature of class diagrams when applied to complex biology, and question their utility in modelling such dynamic systems. Models created through our framework are non-executable, and expressly free of simulation implementation concerns. They are a valuable complement and precursor to simulation specifications and implementations, focusing purely on thoroughly exploring the biology, recording hypotheses and assumptions, and serve as a communication medium detailing exactly how a simulation relates to the real biology.
Project description:Human E1 is a key player in protein ubiquitination, however the E1 structure is not available. In this paper, we describe the derivation of a human E1 structure using molecular modelling based on the crystal structure of S. cerevisiae E1 and M. Musculus E1. Key interactions between our E1 model and ubiquitin are also discussed.
Project description:With the advent of computer-aided drug design (CADD), traditional physical testing of thousands of molecules has now been replaced by target-focused drug discovery, where potentially bioactive molecules are predicted by computer software before their physical synthesis. However, despite being a significant breakthrough, CADD still faces various limitations and challenges. The increasing availability of data on small molecules has created a need to streamline the sourcing of data from different databases and automate the processing and cleaning of data into a form that can be used by multiple CADD software applications. Several standalone software packages are available to aid the drug designer, each with its own specific application, requiring specialized knowledge and expertise for optimal use. These applications require their own input and output files, making it a challenge for nonexpert users or multidisciplinary discovery teams. Here, we have developed a new software platform called DataPype, which wraps around these different software packages. It provides a unified automated workflow to search for hit compounds using specialist software. Additionally, multiple virtual screening packages can be used in the one workflow, and if different ways of looking at potential hit compounds all predict the same set of molecules, we have higher confidence that we should make or purchase and test the molecules. Importantly, DataPype can run on computer servers, speeding up the virtual screening for new compounds. Combining access to multiple CADD tools within one interface will enhance the early stage of drug discovery, increase usability, and enable the use of parallel computing.