Project description:Single-cell whole-genome sequencing (scWGS) enables the assessment of genome-level molecular differences between individual cells with particular relevance to genetically diverse systems like solid tumors. The application of scWGS was limited due to a dearth of accessible platforms capable of producing high-throughput profiles. We present a technique that leverages nucleosome disruption methodologies with the widely adopted 10× Genomics ATAC-seq workflow to produce scWGS profiles for high-throughput copy-number analysis without new equipment or custom reagents. We further demonstrate the use of commercially available indexed transposase complexes from ScaleBio for sample multiplexing, reducing the per-sample preparation costs. Finally, we demonstrate that sequential indexed tagmentation with an intervening nucleosome disruption step allows for the generation of both ATAC and WGS data from the same cell, producing comparable data to the unimodal assays. By exclusively utilizing accessible commercial reagents, we anticipate that these scWGS and scWGS+ATAC methods can be broadly adopted by the research community.
Project description:Research based on a strategy of single-cell low-coverage whole genome sequencing (SLWGS) has enabled better reproducibility and accuracy for detection of copy number variations (CNVs). The whole genome amplification (WGA) method and sequencing platform are critical factors for successful SLWGS (<0.1 × coverage). In this study, we compared single cell and multiple cells sequencing data produced by the HiSeq2000 and Ion Proton platforms using two WGA kits and then comprehensively evaluated the GC-bias, reproducibility, uniformity and CNV detection among different experimental combinations. Our analysis demonstrated that the PicoPLEX WGA Kit resulted in higher reproducibility, lower sequencing error frequency but more GC-bias than the GenomePlex Single Cell WGA Kit (WGA4 kit) independent of the cell number on the HiSeq2000 platform. While on the Ion Proton platform, the WGA4 kit (both single cell and multiple cells) had higher uniformity and less GC-bias but lower reproducibility than those of the PicoPLEX WGA Kit. Moreover, on these two sequencing platforms, depending on cell number, the performance of the two WGA kits was different for both sensitivity and specificity on CNV detection. The results can help researchers who plan to use SLWGS on single or multiple cells to select appropriate experimental conditions for their applications.
Project description:Somatic mutations are the cause of cancer and have been implicated in other, noncancerous diseases and aging. While clonally expanded mutations can be studied by deep sequencing of bulk DNA, very few somatic mutations expand clonally, and most are unique to each cell. We describe a detailed protocol for single-cell whole-genome sequencing to discover and analyze somatic mutations in tissues and organs. The protocol comprises single-cell multiple displacement amplification (SCMDA), which ensures efficiency and high fidelity in amplification, and the SCcaller software tool to call single-nucleotide variations and small insertions and deletions from the sequencing data by filtering out amplification artifacts. With SCMDA and SCcaller at its core, this protocol describes a complete procedure for the comprehensive analysis of somatic mutations in a single cell, covering (1) single-cell or nucleus isolation, (2) single-cell or nucleus whole-genome amplification, (3) library preparation and sequencing, and (4) computational analyses, including alignment, variant calling, and mutation burden estimation. Methods are also provided for mutation annotation, hotspot discovery and signature analysis. The protocol takes 12-15 h from single-cell isolation to library preparation and 3-7 d of data processing. Compared with other single-cell amplification methods or single-molecular sequencing, it provides high genomic coverage, high accuracy in single-nucleotide variation and small insertions and deletion calling from the same single-cell genome, and fewer processing steps. SCMDA and SCcaller require basic experience in molecular biology and bioinformatics. The protocol can be utilized for studying mutagenesis and genome mosaicism in normal and diseased human and animal tissues under various conditions.
Project description:Single cell whole-genome sequencing (scWGS) is providing novel insights into the nature of genetic heterogeneity in normal and diseased cells. However, the whole-genome amplification process required for scWGS introduces biases into the resulting sequencing that can confound downstream analysis. Here, we present a statistical method, with an accompanying package PaSD-qc (Power Spectral Density-qc), that evaluates the properties and quality of single cell libraries. It uses a modified power spectral density to assess amplification uniformity, amplicon size distribution, autocovariance and inter-sample consistency as well as to identify chromosomes with aberrant read-density profiles due either to copy alterations or poor amplification. These metrics provide a standard way to compare the quality of single cell samples as well as yield information necessary to improve variant calling strategies. We demonstrate the usefulness of this tool in comparing the properties of scWGS protocols, identifying potential chromosomal copy number variation, determining chromosomal and subchromosomal regions of poor amplification, and selecting high-quality libraries from low-coverage data for deep sequencing. The software is available free and open-source at https://github.com/parklab/PaSDqc.
Project description:A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences.
Project description:This dataset was collected from viable bone marrow cells obtained at diagnosis from nine patients with high hyperdiploid ALL and one normal bone marrow sample. All samples were subjected to low pass single cell whole genome sequencing with the median sequencing coverage of 0.02x. Single nuclei in G0/G1 phase were isolated using a fluorescence-activated cell sorting (FACS) cytometer. DNA libraries were constructed and associated next-generation sequencing was carried out by European Research Institute for the Biology of Ageing (ERIBA), University of Groningen, University Medical Center Groningen, Groningen, The Netherlands. Further details regarding the DNA libraries construction are available by Bos et. al., 2019 (https://link.springer.com/protocol/10.1007/978-1-4939-8931-7_15). The dataset has been used for copy number aberrations analysis.
Project description:BackgroundWhole-Genome Bisulfite Sequencing (WGBS) is a Next Generation Sequencing (NGS) technique for measuring DNA methylation at base resolution. Continuing drops in sequencing costs are beginning to enable high-throughput surveys of DNA methylation in large samples of individuals and/or single cells. These surveys can easily generate hundreds or even thousands of WGBS datasets in a single study. The efficient pre-processing of these large amounts of data poses major computational challenges and creates unnecessary bottlenecks for downstream analysis and biological interpretation.ResultsTo offer an efficient analysis solution, we present MethylStar, a fast, stable and flexible pre-processing pipeline for WGBS data. MethylStar integrates well-established tools for read trimming, alignment and methylation state calling in a highly parallelized environment, manages computational resources and performs automatic error detection. MethylStar offers easy installation through a dockerized container with all preloaded dependencies and also features a user-friendly interface designed for experts/non-experts. Application of MethylStar to WGBS from Human, Maize and A. thaliana shows favorable performance in terms of speed and memory requirements compared with existing pipelines.ConclusionsMethylStar is a fast, stable and flexible pipeline for high-throughput pre-processing of bulk or single-cell WGBS data. Its easy installation and user-friendly interface should make it a useful resource for the wider epigenomics community. MethylStar is distributed under GPL-3.0 license and source code is publicly available for download from github https://github.com/jlab-code/MethylStar . Installation through a docker image is available from http://jlabdata.org/methylstar.tar.gz.
Project description:ObjectiveComprehensive and reliable genome-wide variant analysis of a small number of cells has been challenging due to genome coverage bias, PCR over-cycling, and the requirement of expensive technologies. To comprehensively identify genome alterations in single colon crypts that reflect genome heterogeneity of stem cells, we developed a method to construct whole-genome sequencing libraries from single colon crypts without DNA extraction, whole-genome amplification, or increased PCR enrichment cycles.ResultsWe present post-alignment statistics of 81 single-crypts (each contains four- to eight-fold less DNA than the requirement of conventional methods) and 16 bulk-tissue libraries to demonstrate the consistent success in obtaining reliable coverage, both in depth (≥ 30X) and breadth (≥ 92% of the genome covered at ≥ 10X depth), of the human genome. These single-crypt libraries are of comparable quality as libraries generated with the conventional method using high quality and quantities of purified DNA. Conceivably, our method can be applied to small biopsy samples from many tissues and can be combined with single cell targeted sequencing to comprehensively profile cancer genomes and their evolution. The broad potential application of this method offers expanded possibilities in cost-effectively examining genome heterogeneity in small numbers of cells at high resolution.