ABSTRACT: Next Generation Sequencing (NGS) technologies generate a large amount of high quality transcriptome datasets enabling the investigation of molecular processes on a genomic and metagenomic scale. These transcriptomics studies aim to quantify and compare the molecular phenotypes of the biological processes at hand. Despite the vast increase of available transcriptome datasets, little is known about the evolutionary conservation of those characterized transcriptomes.The myTAI package implements exploratory analysis functions to infer transcriptome conservation patterns in any transcriptome dataset. Comprehensive documentation of myTAI functions and tutorial vignettes provide step-by-step instructions on how to use the package in an exploratory and computationally reproducible manner.The open source myTAI package is available at https://github.com/HajkD/myTAI and https://cran.r-project.org/web/packages/myTAIfirstname.lastname@example.org.Supplementary data are available at Bioinformatics online.
Project description:Retrieval and reproducible functional annotation of genomic data are crucial in biology. However, the current poor usability and transparency of retrieval methods hinders reproducibility. Here we present an open source R package, biomartr , which provides a comprehensive easy-to-use framework for automating data retrieval and functional annotation for meta-genomic approaches. The functions of biomartr achieve a high degree of clarity, transparency and reproducibility of analyses.The biomartr package implements straightforward functions for bulk retrieval of all genomic data or data for selected genomes, proteomes, coding sequences and annotation files present in databases hosted by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EMBL-EBI). In addition, biomartr communicates with the BioMart database for functional annotation of retrieved sequences. Comprehensive documentation of biomartr functions and five tutorial vignettes provide step-by-step instructions on how to use the package in a reproducible manner.The open source biomartr package is available at https://github.com/HajkD/biomartr and https://cran.r-project.org/web/packages/biomartr/index.html .email@example.com.Supplementary data are available at Bioinformatics online.
Project description:The rapidly expanding microbiomics field is generating increasingly larger datasets, characterizing the microbiota in diverse environments. Although classical numerical ecology methods provide a robust statistical framework for their analysis, software currently available is inadequate for large datasets and some computationally intensive tasks, like rarefaction and associated analysis.Here we present a software package for rarefaction analysis of large count matrices, as well as estimation and visualization of diversity, richness and evenness. Our software is designed for ease of use, operating at least 7x faster than existing solutions, despite requiring 10x less memory.C?++?and R source code (GPL v.2) as well as binaries are available from https://github.com/hildebra/Rarefaction and from CRAN (https://cran.r-project.org/).firstname.lastname@example.org or email@example.com.Supplementary data are available at Bioinformatics online.
Project description:We present a method of variable selection for the sparse generalized additive model. The method doesn't assume any specific functional form, and can select from a large number of candidates. It takes the form of incremental forward stagewise regression. Given no functional form is assumed, we devised an approach termed "roughening" to adjust the residuals in the iterations. In simulations, we show the new method is competitive against popular machine learning approaches. We also demonstrate its performance using some real datasets. The method is available as a part of the nlnet package on CRAN (https://cran.r-project.org/package=nlnet).
Project description:Detecting periodicity in large scale data remains a challenge. While efforts have been made to identify best of breed algorithms, relatively little research has gone into integrating these methods in a generalizable method. Here, we present MetaCycle, an R package that incorporates ARSER, JTK_CYCLE and Lomb-Scargle to conveniently evaluate periodicity in time-series data. MetaCycle has two functions, meta2d and meta3d, designed to analyze two-dimensional and three-dimensional time-series datasets, respectively. Meta2d implements N-version programming concepts using a suite of algorithms and integrating their results.<h4>Availability and implementation</h4>MetaCycle package is available on the CRAN repository (https://cran.r-project.org/web/packages/MetaCycle/index.html) and GitHub (https://github.com/gangwug/MetaCycle).<h4>Contact</h4>firstname.lastname@example.orgSupplementary information: Supplementary data are available at Bioinformatics online.
Project description:The precision-recall plot is more informative than the ROC plot when evaluating classifiers on imbalanced datasets, but fast and accurate curve calculation tools for precision-recall plots are currently not available. We have developed Precrec, an R library that aims to overcome this limitation of the plot. Our tool provides fast and accurate precision-recall calculations together with multiple functionalities that work efficiently under different conditions.Precrec is licensed under GPL-3 and freely available from CRAN (https://cran.r-project.org/package=precrec). It is implemented in R with Cemail@example.comSupplementary information: Supplementary data are available at Bioinformatics online.
Project description:<h4>Background</h4>We introduce BPG, a framework for generating publication-quality, highly-customizable plots in the R statistical environment.<h4>Results</h4>This open-source package includes multiple methods of displaying high-dimensional datasets and facilitates generation of complex multi-panel figures, making it suitable for complex datasets. A web-based interactive tool allows online figure customization, from which R code can be downloaded for integration with computational pipelines.<h4>Conclusion</h4>BPG provides a new approach for linking interactive and scripted data visualization and is available at http://labs.oicr.on.ca/boutros-lab/software/bpg or via CRAN at https://cran.r-project.org/web/packages/BoutrosLab.plotting.general.
Project description:MOTIVATION:Time courses utilizing genome scale data are a common approach to identifying the biological pathways that are controlled by the circadian clock, an important regulator of organismal fitness. However, the methods used to detect circadian oscillations in these datasets are not able to accommodate changes in the amplitude of the oscillations over time, leading to an underestimation of the impact of the clock on biological systems. RESULTS:We have created a program to efficaciously identify oscillations in large-scale datasets, called the Extended Circadian Harmonic Oscillator application, or ECHO. ECHO utilizes an extended solution of the fixed amplitude oscillator that incorporates the amplitude change coefficient. Employing synthetic datasets, we determined that ECHO outperforms existing methods in detecting rhythms with decreasing oscillation amplitudes and in recovering phase shift. Rhythms with changing amplitudes identified from published biological datasets revealed distinct functions from those oscillations that were harmonic, suggesting purposeful biologic regulation to create this subtype of circadian rhythms. AVAILABILITY AND IMPLEMENTATION:ECHO's full interface is available at https://github.com/delosh653/ECHO. An R package for this functionality, echo.find, can be downloaded at https://CRAN.R-project.org/package=echo.find. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Summary:Ontologies are widely used constructs for encoding and analyzing biomedical data, but the absence of simple and consistent tools has made exploratory and systematic analysis of such data unnecessarily difficult. Here we present three packages which aim to simplify such procedures. The ontologyIndex package enables arbitrary ontologies to be read into R, supports representation of ontological objects by native R types, and provides a parsimonius set of performant functions for querying ontologies. ontologySimilarity and ontologyPlot extend ontologyIndex with functionality for straightforward visualization and semantic similarity calculations, including statistical routines. Availability and Implementation:ontologyIndex , ontologyPlot and ontologySimilarity are all available on the Comprehensive R Archive Network website under https://cran.r-project.org/web/packages/ . Contact:Daniel Greene firstname.lastname@example.org. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:Considerable attention has been given to the quantification of environmental effects on organisms. In natural conditions, environmental factors are continuously changing in a complex manner. To reveal the effects of such environmental variations on organisms, transcriptome data in field environments have been collected and analyzed. Nagano et al. proposed a model that describes the relationship between transcriptomic variation and environmental conditions and demonstrated the capability to predict transcriptome variation in rice plants. However, the computational cost of parameter optimization has prevented its wide application.: We propose a new statistical model and efficient parameter optimization based on the previous study. We developed and released FIT, an R package that offers functions for parameter optimization and transcriptome prediction. The proposed method achieves comparable or better prediction performance within a shorter computational time than the previous method. The package will facilitate the study of the environmental effects on transcriptomic variation in field conditions.Freely available from CRAN ( https://cran.r-project.org/web/packages/FIT/ ).: email@example.com.Supplementary data are available at Bioinformatics online.
Project description:: The integrative analysis of multiple high-throughput data sources that are available for a common sample set is an increasingly common goal in biomedical research. Joint and individual variation explained (JIVE) is a tool for exploratory dimension reduction that decomposes a multi-source dataset into three terms: a low-rank approximation capturing joint variation across sources, low-rank approximations for structured variation individual to each source and residual noise. JIVE has been used to explore multi-source data for a variety of application areas but its accessibility was previously limited. We introduce R.JIVE, an intuitive R package to perform JIVE and visualize the results. We discuss several improvements and extensions of the JIVE methodology that are included. We illustrate the package with an application to multi-source breast tumor data from The Cancer Genome Atlas.R.JIVE is available via the Comprehensive R Archive Network (CRAN) under the GPLv3 license: https://firstname.lastname@example.orgSupplementary data are available at Bioinformatics online.