Project description:SummarySeveral methods have been developed in the past years to infer cell-cell communication networks from transcriptomic data based on ligand and receptor expression. Among them, ICELLNET is one of the few approaches to consider the multiple subunits of ligands and receptors complexes to infer and quantify cell communication. In here, we present a major update of ICELLNET. As compared to its original implementation, we (i) drastically expanded the ICELLNET ligand-receptor database from 380 to 1669 biologically curated interactions, (ii) integrated important families of communication molecules involved in immune crosstalk, cell adhesion, and Wnt pathway, (iii) optimized ICELLNET framework for single-cell RNA sequencing data analyses, (iv) provided new visualizations of cell-cell communication results to facilitate prioritization and biological interpretation. This update will broaden the use of ICELLNET by the scientific community in different biological fields.Availability and implementationICELLNET package is implemented in R. Source code, documentation and tutorials are available on GitHub (https://github.com/soumelis-lab/ICELLNET).
Project description:Single-cell omics technologies have revolutionized the study of gene regulation in complex tissues. A major computational challenge in analyzing these datasets is to project the large-scale and high-dimensional data into low-dimensional space while retaining the relative relationships between cells. This low dimension embedding is necessary to decompose cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Traditional dimensionality reduction techniques, however, face challenges in computational efficiency and in comprehensively addressing cellular diversity across varied molecular modalities. Here we introduce a nonlinear dimensionality reduction algorithm, embodied in the Python package SnapATAC2, which not only achieves a more precise capture of single-cell omics data heterogeneities but also ensures efficient runtime and memory usage, scaling linearly with the number of cells. Our algorithm demonstrates exceptional performance, scalability and versatility across diverse single-cell omics datasets, including single-cell assay for transposase-accessible chromatin using sequencing, single-cell RNA sequencing, single-cell Hi-C and single-cell multi-omics datasets, underscoring its utility in advancing single-cell analysis.
Project description:Quantitative mass spectrometry data necessitates an analytical pipeline that captures the accuracy and comprehensiveness of the experiments. Currently, data analysis is often coupled to specific software packages, which restricts the analysis to a given workflow and precludes a more thorough characterization of the data by other complementary tools. To address this, we have developed PyQuant, a cross-platform mass spectrometry data quantification application that is compatible with existing frameworks and can be used as a stand-alone quantification tool. PyQuant supports most types of quantitative mass spectrometry data including SILAC, NeuCode, (15)N, (13)C, or (18)O and chemical methods such as iTRAQ or TMT and provides the option of adding custom labeling strategies. In addition, PyQuant can perform specialized analyses such as quantifying isotopically labeled samples where the label has been metabolized into other amino acids and targeted quantification of selected ions independent of spectral assignment. PyQuant is capable of quantifying search results from popular proteomic frameworks such as MaxQuant, Proteome Discoverer, and the Trans-Proteomic Pipeline in addition to several standalone search engines. We have found that PyQuant routinely quantifies a greater proportion of spectral assignments, with increases ranging from 25-45% in this study. Finally, PyQuant is capable of complementing spectral assignments between replicates to quantify ions missed because of lack of MS/MS fragmentation or that were omitted because of issues such as spectra quality or false discovery rates. This results in an increase of biologically useful data available for interpretation. In summary, PyQuant is a flexible mass spectrometry data quantification platform that is capable of interfacing with a variety of existing formats and is highly customizable, which permits easy configuration for custom analysis.
Project description:The scaling of single-cell data exploratory analysis with the rapidly growing diversity and quantity of single-cell omics datasets demands more interpretable and robust data representation that is generalizable across datasets. Here, we have developed a 'linearly interpretable' framework that combines the interpretability and transferability of linear methods with the representational power of non-linear methods. Within this framework we introduce a data representation and visualization method, GraphDR, and a structure discovery method, StructDR, that unifies cluster, trajectory and surface estimation and enables their confidence set inference.
Project description:Reference-based cell-type annotation can significantly reduce time and effort in single-cell analysis by transferring labels from a previously-annotated dataset to a new dataset. However, label transfer by end-to-end computational methods is challenging due to the entanglement of technical (e.g., from different sequencing batches or techniques) and biological (e.g., from different cellular microenvironments) variations, only the first of which must be removed. To address this issue, we propose Polyphony, an interactive transfer learning (ITL) framework, to complement biologists' knowledge with advanced computational methods. Polyphony is motivated and guided by domain experts' needs for a controllable, interactive, and algorithm-assisted annotation process, identified through interviews with seven biologists. We introduce anchors, i.e., analogous cell populations across datasets, as a paradigm to explain the computational process and collect user feedback for model improvement. We further design a set of visualizations and interactions to empower users to add, delete, or modify anchors, resulting in refined cell type annotations. The effectiveness of this approach is demonstrated through quantitative experiments, two hypothetical use cases, and interviews with two biologists. The results show that our anchor-based ITL method takes advantage of both human and machine intelligence in annotating massive single-cell datasets.
Project description:MotivationGene set enrichment (GSE) analysis allows for an interpretation of gene expression through pre-defined gene set databases and is a critical step in understanding different phenotypes. With the rapid development of single-cell RNA sequencing (scRNA-seq) technology, GSE analysis can be performed on fine-grained gene expression data to gain a nuanced understanding of phenotypes of interest. However, with the cellular heterogeneity in single-cell gene profiles, current statistical GSE analysis methods sometimes fail to identify enriched gene sets. Meanwhile, deep learning has gained traction in applications like clustering and trajectory inference in single-cell studies due to its prowess in capturing complex data patterns. However, its use in GSE analysis remains limited, due to interpretability challenges.ResultsIn this paper, we present DeepGSEA, an explainable deep gene set enrichment analysis approach which leverages the expressiveness of interpretable, prototype-based neural networks to provide an in-depth analysis of GSE. DeepGSEA learns the ability to capture GSE information through our designed classification tasks, and significance tests can be performed on each gene set, enabling the identification of enriched sets. The underlying distribution of a gene set learned by DeepGSEA can be explicitly visualized using the encoded cell and cellular prototype embeddings. We demonstrate the performance of DeepGSEA over commonly used GSE analysis methods by examining their sensitivity and specificity with four simulation studies. In addition, we test our model on three real scRNA-seq datasets and illustrate the interpretability of DeepGSEA by showing how its results can be explained.Availability and implementationhttps://github.com/Teddy-XiongGZ/DeepGSEA.
Project description:Recent advances in CRISPR-Cas9 engineering and single-cell assays have enabled the simultaneous measurement of single-cell transcriptomic and phylogenetic profiles. However, there are few computational tools enabling users to integrate and derive insight from a joint analysis of these two modalities. Here, we describe "PhyloVision": an open-source software for interactively exploring data from both modalities and for identifying and interpreting heritable gene modules whose concerted expression are associated with phylogenetic relationships. PhyloVision provides a feature-rich, interactive, and shareable web-based report for investigating these modules while also supporting several other data and meta-data exploration capabilities. We demonstrate the utility of PhyloVision using a published dataset of metastatic lung adenocarcinoma cells, whose phylogeny was resolved using a CRISPR-Cas9-based lineage-tracing system. Together, we anticipate that PhyloVision and the methods it implements will be a useful resource for scalable and intuitive data exploration for any assay that simultaneously measures cell state and lineage.
Project description:Deciphering cell-cell communication is a key step in understanding the physiology and pathology of multicellular systems. Recent advances in single-cell transcriptomics have contributed to unraveling the cellular composition of tissues and enabled the development of computational algorithms to predict cellular communication mediated by ligand-receptor interactions. Despite the existence of various tools capable of inferring cell-cell interactions from single-cell RNA sequencing data, the analysis and interpretation of the biological signals often require deep computational expertize. Here we present InterCellar, an interactive platform empowering lab-scientists to analyze and explore predicted cell-cell communication without requiring programming skills. InterCellar guides the biological interpretation through customized analysis steps, multiple visualization options, and the possibility to link biological pathways to ligand-receptor interactions. Alongside convenient data exploration features, InterCellar implements data-driven analyses including the possibility to compare cell-cell communication from multiple conditions. By analyzing COVID-19 and melanoma cell-cell interactions, we show that InterCellar resolves data-driven patterns of communication and highlights molecular signals through the integration of biological functions and pathways. We believe our user-friendly, interactive platform will help streamline the analysis of cell-cell communication and facilitate hypothesis generation in diverse biological systems.
Project description:Recently, efforts have been made toward the harmonization of transcriptomic data structures and workflows using the concept of data tidiness, to facilitate modularisation. We present tidybulk, a modular framework for bulk transcriptional analyses that introduces a tidy transcriptomic data structure paradigm and analysis grammar. Tidybulk covers a wide variety of analysis procedures and integrates a large ecosystem of publicly available analysis algorithms under a common framework. Tidybulk decreases coding burden, facilitates reproducibility, increases efficiency for expert users, lowers the learning curve for inexperienced users, and bridges transcriptional data analysis with the tidyverse. Tidybulk is available at R/Bioconductor bioconductor.org/packages/tidybulk .
Project description:BackgroundMany approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data in which ground truth about biological variation is unknown (i.e., usually).ResultsWe approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships.ConclusionsNew insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.