Project description:Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R.
Project description:Exponential rise of metagenomics sequencing is delivering massive functional environmental genomics data. However, this also generates a procedural bottleneck for on-going re-analysis as reference databases grow and methods improve, and analyses need be updated for consistency, which require acceess to increasingly demanding bioinformatic and computational resources. Here, we present the KAUST Metagenomic Analysis Platform (KMAP), a new integrated open web-based tool for the comprehensive exploration of shotgun metagenomic data. We illustrate the capacities KMAP provides through the re-assembly of ~ 27,000 public metagenomic samples captured in ~ 450 studies sampled across ~ 77 diverse habitats. A small subset of these metagenomic assemblies is used in this pilot study grouped into 36 new habitat-specific gene catalogs, all based on full-length (complete) genes. Extensive taxonomic and gene annotations are stored in Gene Information Tables (GITs), a simple tractable data integration format useful for analysis through command line or for database management. KMAP pilot study provides the exploration and comparison of microbial GITs across different habitats with over 275 million genes. KMAP access to data and analyses is available at https://www.cbrc.kaust.edu.sa/aamg/kmap.start .
Project description:ObjectiveRe-identification risk methods for biomedical data often assume a worst case, in which attackers know all identifiable features (eg, age and race) about a subject. Yet, worst-case adversarial modeling can overestimate risk and induce heavy editing of shared data. The objective of this study is to introduce a framework for assessing the risk considering the attacker's resources and capabilities.Materials and methodsWe integrate 3 established risk measures (ie, prosecutor, journalist, and marketer risks) and compute re-identification probabilities for data subjects. This probability is dependent on an attacker's capabilities (eg, ability to obtain external identified resources) and the subject's decision on whether to reveal their participation in a dataset. We illustrate the framework through case studies using data from over 1 000 000 patients from Vanderbilt University Medical Center and show how re-identification risk changes when attackers are pragmatic and use 2 known resources for attack: (1) voter registration lists and (2) social media posts.ResultsOur framework illustrates that the risk is substantially smaller in the pragmatic scenarios than in the worst case. Our experiments yield a median worst-case risk of 0.987 (where 0 is least risky and 1 is most risky); however, the median reduction in risk was 90.1% in the voter registration scenario and 100% in the social media posts scenario. Notably, these observations hold true for a wide range of adversarial capabilities.ConclusionsThis research illustrates that re-identification risk is situationally dependent and that appropriate adversarial modeling may permit biomedical data sharing on a wider scale than is currently the case.
Project description:Single cell transcriptomics has recently seen a surge in popularity, leading to the need for data analysis pipelines that are reproducible, modular, and interoperable across different systems and institutions.To meet this demand, we introduce scAN1.0, a processing pipeline for analyzing 10X single cell RNA sequencing data. scAN1.0 is built using the Nextflow DSL2 and can be run on most computational systems. The modular design of Nextflow pipelines enables easy integration and evaluation of different blocks for specific analysis steps.We demonstrate the usefulness of scAN1.0 by showing its ability to examine the impact of the mapping step during the analysis of two datasets: (i) a 10X scRNAseq of a human pituitary gonadotroph tumor dataset and (ii) a murine 10X scRNAseq acquired on CD8 T cells during an immune response.
Project description:Microfluidic cultivation devices that facilitate O2 control enable unique studies of the complex interplay between environmental O2 availability and microbial physiology at the single-cell level. Therefore, microbial single-cell analysis based on time-lapse microscopy is typically used to resolve microbial behavior at the single-cell level with spatiotemporal resolution. Time-lapse imaging then provides large image-data stacks that can be efficiently analyzed by deep learning analysis techniques, providing new insights into microbiology. This knowledge gain justifies the additional and often laborious microfluidic experiments. Obviously, the integration of on-chip O2 measurement and control during the already complex microfluidic cultivation, and the development of image analysis tools, can be a challenging endeavor. A comprehensive experimental approach to allow spatiotemporal single-cell analysis of living microorganisms under controlled O2 availability is presented here. To this end, a gas-permeable polydimethylsiloxane microfluidic cultivation chip and a low-cost 3D-printed mini-incubator were successfully used to control O2 availability inside microfluidic growth chambers during time-lapse microscopy. Dissolved O2 was monitored by imaging the fluorescence lifetime of the O2-sensitive dye RTDP using FLIM microscopy. The acquired image-data stacks from biological experiments containing phase contrast and fluorescence intensity data were analyzed using in-house developed and open-source image-analysis tools. The resulting oxygen concentration could be dynamically controlled between 0% and 100%. The system was experimentally tested by culturing and analyzing an E. coli strain expressing green fluorescent protein as an indirect intracellular oxygen indicator. The presented system allows for innovative microbiological research on microorganisms and microbial ecology with single-cell resolution.
Project description:Mitochondrial DNA (mtDNA) encodes the core subunits for OXPHOS, essential in near-all eukaryotes. Packed into distinct foci (nucleoids) inside mitochondria, the number of mtDNA copies differs between cell-types and is affected in several human diseases. Currently, common protocols estimate per-cell mtDNA-molecule numbers by sequencing or qPCR from bulk samples. However, this does not allow insight into cell-to-cell heterogeneity and can mask phenotypical sub-populations. Here, we present mtFociCounter, a single-cell image analysis tool for reproducible quantification of nucleoids and other foci. mtFociCounter is a light-weight, open-source freeware and overcomes current limitations to reproducible single-cell analysis of mitochondrial foci. We demonstrate its use by analysing 2165 single fibroblasts, and observe a large cell-to-cell heterogeneity in nucleoid numbers. In addition, mtFociCounter quantifies mitochondrial content and our results show good correlation (R = 0.90) between nucleoid number and mitochondrial area, and we find nucleoid density is less variable than nucleoid numbers in wild-type cells. Finally, we demonstrate mtFociCounter readily detects differences in foci-numbers upon sample treatment, and applies to Mitochondrial RNA Granules and superresolution microscopy. mtFociCounter provides a versatile solution to reproducibly quantify cellular foci in single cells and our results highlight the importance of accounting for cell-to-cell variance and mitochondrial context in mitochondrial foci analysis.
Project description:Single cell ATAC-seq (scATAC-seq) has become the most widely used method for profiling open chromatin landscape of heterogeneous cell populations at a single-cell resolution. Although numerous software tools and pipelines have been developed, an easy-to-use, scalable, reproducible, and comprehensive pipeline for scATAC-seq data analyses is still lacking. To fill this gap, we developed scATACpipe, a Nextflow pipeline, for performing comprehensive analyses of scATAC-seq data including extensive quality assessment, preprocessing, dimension reduction, clustering, peak calling, differential accessibility inference, integration with scRNA-seq data, transcription factor activity and footprinting analysis, co-accessibility inference, and cell trajectory prediction. scATACpipe enables users to perform the end-to-end analysis of scATAC-seq data with three sub-workflow options for preprocessing that leverage 10x Genomics Cell Ranger ATAC software, the ultra-fast Chromap procedures, and a set of custom scripts implementing current best practices for scATAC-seq data preprocessing. The pipeline extends the R package ArchR for downstream analysis with added support to any eukaryotic species with an annotated reference genome. Importantly, scATACpipe generates an all-in-one HTML report for the entire analysis and outputs cluster-specific BAM, BED, and BigWig files for visualization in a genome browser. scATACpipe eliminates the need for users to chain different tools together and facilitates reproducible and comprehensive analyses of scATAC-seq data from raw reads to various biological insights with minimal changes of configuration settings for different computing environments or species. By applying it to public datasets, we illustrated the utility, flexibility, versatility, and reliability of our pipeline, and demonstrated that our scATACpipe outperforms other workflows.
Project description:Spatial information of tissues is an essential component to reach a holistic overview of gene expression mechanisms. The sequencing-based Spatial transcriptomics approach allows to spatially barcode the whole transcriptome of tissue sections using microarray glass slides. However, manual preparation of high-quality tissue sequencing libraries is time-consuming and subjected to technical variability. Here, we present an automated adaptation of the 10x Genomics Visium library construction on the widely used Agilent Bravo Liquid Handling Platform. Compared to the manual Visium library preparation, our automated approach reduces hands-on time by over 80% and provides higher throughput and robustness. Our automated Visium library preparation protocol provides a new strategy to standardize spatially resolved transcriptomics analysis of tissues at scale.
Project description:Cells have different intrinsic markers such as mechanical and electrical properties, which may be used as specific characteristics. Here, we present a microfluidic chip configured with two opposing optical fibers and four 3D electrodes for multiphysical parameter measurement. The chip leverages optical fibers to capture and stretch a single cell and uses 3D electrodes to achieve rotation of the single cell. According to the stretching deformation and rotation spectrum, the mechanical and dielectric properties can be extracted. We provided proof of concept by testing five types of cells (HeLa, A549, HepaRG, MCF7 and MCF10A) and determined five biophysical parameters, namely, shear modulus, steady-state viscosity, and relaxation time from the stretching deformation and area-specific membrane capacitance and cytoplasm conductivity from the rotation spectra. We showed the potential of the chip in cancer research by observing subtle changes in the cellular properties of transforming growth factor beta 1 (TGF-β1)-induced epithelial-mesenchymal transition (EMT) A549 cells. The new chip provides a microfluidic platform capable of multiparameter characterization of single cells, which can play an important role in the field of single-cell research.
Project description:BACKGROUND:A lack of reproducibility has been repeatedly criticized in computational research. High throughput sequencing (HTS) data analysis is a complex multi-step process. For most of the steps a range of bioinformatic tools is available and for most tools manifold parameters need to be set. Due to this complexity, HTS data analysis is particularly prone to reproducibility and consistency issues. We have defined four criteria that in our opinion ensure a minimal degree of reproducible research for HTS data analysis. A series of workflow management systems is available for assisting complex multi-step data analyses. However, to the best of our knowledge, none of the currently available work flow management systems satisfies all four criteria for reproducible HTS analysis. RESULTS:Here we present uap, a workflow management system dedicated to robust, consistent, and reproducible HTS data analysis. uap is optimized for the application to omics data, but can be easily extended to other complex analyses. It is available under the GNU GPL v3 license at https://github.com/yigbt/uap. CONCLUSIONS:uap is a freely available tool that enables researchers to easily adhere to reproducible research principles for HTS data analyses.