ECNano: A Cost-Effective Workflow for Target Enrichment Sequencing and Accurate Variant Calling on 4,800 Clinically Significant Genes Using a Single MinION Flowcell
Ontology highlight
ABSTRACT: Target enrichment sequencing and variant calling on medical exome using ONT MinION
Project description:BackgroundThe application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants. By applying MES with MinION sequencing, the technology can achieve a more uniform capture of the target regions, shorter turnaround time, and lower sequencing cost per sample.MethodWe introduced a cost-effective optimized workflow, ECNano, comprising a wet-lab protocol and bioinformatics analysis, for accurate variant detection at 4800 clinically important genes and regions using a single MinION flowcell. The ECNano wet-lab protocol was optimized to perform long-read target enrichment and ONT library preparation to stably generate high-quality MES data with adequate coverage. The subsequent variant-calling workflow, Clair-ensemble, adopted a fast RNN-based variant caller, Clair, and was optimized for target enrichment data. To evaluate its performance and practicality, ECNano was tested on both reference DNA samples and patient samples.ResultsECNano achieved deep on-target depth of coverage (DoC) at average > 100× and > 98% uniformity using one MinION flowcell. For accurate ONT variant calling, the generated reads sufficiently covered 98.9% of pathogenic positions listed in ClinVar, with 98.96% having at least 30× DoC. ECNano obtained an average read length of 1000 bp. The long reads of ECNano also covered the adjacent splice sites well, with 98.5% of positions having ≥ 30× DoC. Clair-ensemble achieved > 99% recall and accuracy for SNV calling. The whole workflow from wet-lab protocol to variant detection was completed within three days.ConclusionWe presented ECNano, an out-of-the-box workflow comprising (1) a wet-lab protocol for ONT target enrichment sequencing and (2) a downstream variant detection workflow, Clair-ensemble. The workflow is cost-effective, with a short turnaround time for high accuracy variant calling in 4800 clinically significant genes and regions using a single MinION flowcell. The long-read exon captured data has potential for further development, promoting the application of long-read sequencing in personalized disease treatment and risk prediction.
Project description:BackgroundWhole genome sequencing (WGS) is becoming increasingly prevalent for molecular diagnosis, staging and prognosis because of its declining costs and the ability to detect nearly all genes associated with a patient's disease. The currently widely accepted variant calling pipeline, GATK, is limited in terms of its computational speed and efficiency, which cannot meet the growing analysis needs.ResultsHere, we propose a fast and accurate DNASeq variant calling workflow that is purely composed of tools from LUSH toolkit. The precision and recall measurements indicate that both the LUSH and GATK pipelines exhibit high levels of consistency, with precision and recall rates exceeding 99% on the 30x NA12878 dataset. In terms of processing speed, the LUSH pipeline outperforms the GATK pipeline, completing 30x WGS data analysis in just 1.6 h, which is approximately 17 times faster than GATK. Notably, the LUSH_HC tool completes the processing from BAM to VCF in just 12 min, which is around 76 times faster than GATK.ConclusionThese findings suggest that the LUSH pipeline is a highly promising alternative to the GATK pipeline for WGS data analysis, with the potential to significantly improve bedside analysis of acutely ill patients, large-scale cohort data analysis, and high-throughput variant calling in crop breeding programs. Furthermore, the LUSH pipeline is highly scalable and easily deployable, allowing it to be readily applied to various scenarios such as clinical diagnosis and genomic research.
Project description:Genetic markers (DNA barcodes) are often used to support and confirm species identification. Barcode sequences can be generated in the field using portable systems based on the Oxford Nanopore Technologies (ONT) MinION sequencer. However, to achieve a broader application, current proof-of-principle workflows for on-site barcoding analysis must be standardized to ensure a reliable and robust performance under suboptimal field conditions without increasing costs. Here, we demonstrate the implementation of a new on-site workflow for DNA extraction, PCR-based barcoding, and the generation of consensus sequences. The portable laboratory features inexpensive instruments that can be carried as hand luggage and uses standard molecular biology protocols and reagents that tolerate adverse environmental conditions. Barcodes are sequenced using MinION technology and analyzed with ONTrack, an original de novo assembly pipeline that requires as few as 1000 reads per sample. ONTrack-derived consensus barcodes have a high accuracy, ranging from 99.8 to 100%, despite the presence of homopolymer runs. The ONTrack pipeline has a user-friendly interface and returns consensus sequences in minutes. The remarkable accuracy and low computational demand of the ONTrack pipeline, together with the inexpensive equipment and simple protocols, make the proposed workflow particularly suitable for tracking species under field conditions.
Project description:The spatial heterogeneity of gene expression has driven the development of diverse spatial transcriptomics technologies. Here, we introduce photocleavage and ligation sequencing (PCL-seq), a spatial indexing method based on a light-controlled DNA labeling strategy applied to tissue sections. PCL-seq employs photocleavable oligonucleotides and sequence adapters to construct transcriptional profiles of specific regions of interest (ROIs), designated via microscopically controlled photo-illumination. In frozen mouse embryos, PCL-seq generates spatially aligned gene expression matrices and achieves high-quality data outputs, detecting approximately 170,000 unique molecular identifiers (UMIs) and 8,600 genes (irradiation diameter=100 µm). Moreover, PCL-seq is compatible with formalin-fixed and paraffin-embedded (FFPE) tissues, successfully identifying thousands of differentially enriched transcripts in the digits and vertebrae of FFPE mouse embryo sections. Additionally, PCL-seq achieves subcellular resolution, as demonstrated by differential expression profiling between nuclear and cytoplasmic compartments. These features establish PCL-seq as an accessible and versatile workflow for spatial transcriptomic analyses in both frozen and FFPE tissues at subcellular resolution.
Project description:ImportanceBy employing a cost-effective approach for complete genome sequencing, the study has enabled the identification of novel enterovirus strains and shed light on the genetic exchange events during outbreaks. The success rate of genome sequencing and the scalability of the protocol demonstrate its practical utility for routine enterovirus surveillance. Moreover, the study's findings of recombinant strains of EVA71 and CVA2 contributing to epidemics in Malaysia and Taiwan emphasize the need for accurate detection and characterization of enteroviruses. The investigation of the whole genome and upstream ORF sequences has provided insights into the evolution and spread of enterovirus subgenogroups. These findings have important implications for the prevention, control, and surveillance of enteroviruses, ultimately contributing to the understanding and management of enterovirus-related illnesses.
Project description:BackgroundAccurate clinical structural variant (SV) calling is essential for cancer target identification and diagnosis but has been historically challenging due to the lack of ground truth for clinical specimens. Meanwhile, reduced clinical-testing cost is the key to the widespread clinical utility.MethodsWe analyzed massive data from tumor samples of 476 patients and developed a computational framework for accurate and cost-effective detection of clinically-relevant SVs. In addition, standard materials and classical experiments including immunohistochemistry and/or fluorescence in situ hybridization were used to validate the developed computational framework.ResultsWe systematically evaluated the common algorithms for SV detection and established an expert-reviewed SV call set of 1,303 tumor-specific SVs with high-evidence levels. Moreover, we developed a random-forest-based decision model to improve the true positive of SVs. To independently validate the tailored 'two-step' strategy, we utilized standard materials and classical experiments. The accuracy of the model was over 90% (92-99.78%) for all types of data.ConclusionOur study provides a valuable resource and an actionable guide to improve cancer-specific SV detection accuracy and clinical applicability.
Project description:The prevalence of Plasmodium falciparum hrp2 (pfhrp2)-deleted parasites threatens the efficacy of the most used and sensitive malaria rapid diagnostic tests and highlights the need for continued surveillance for this gene deletion. While PCR methods are adequate for determining pfhrp2 presence or absence, they offer a limited view of its genetic diversity. Here, we present a portable sequencing method using the MinION. Pfhrp2 amplicons were generated from individual samples, barcoded, and pooled for sequencing. To overcome potential crosstalk between barcodes, we implemented a coverage-based threshold for pfhrp2 deletion confirmation. Amino acid repeat types were then counted and visualized with custom Python scripts following de novo assembly. We evaluated this assay using well-characterized reference strains and 152 field isolates with and without pfhrp2 deletions, of which 38 were also sequenced on the PacBio platform to provide a standard for comparison. Of 152 field samples, 93 surpassed the positivity threshold, and of those samples, 62/93 had a dominant pfhrp2 repeat type. PacBio-sequenced samples with a dominant repeat-type profile from the MinION sequencing data matched the PacBio profile. This field-deployable assay can be used alone for surveilling pfhrp2 diversity or as a sequencing-based addition to the World Health Organization's existing deletion surveillance protocol.
Project description:Customizable endonucleases such as transcription activator-like effector nucleases (TALENs) and clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) enable rapid generation of mutant strains at genomic loci of interest in animal models and cell lines. With the accelerated pace of generating mutant alleles, genotyping has become a rate-limiting step to understanding the effects of genetic perturbation. Unless mutated alleles result in distinct morphological phenotypes, mutant strains need to be genotyped using standard methods in molecular biology. Classic restriction fragment length polymorphism (RFLP) or sequencing is labor-intensive and expensive. Although simpler than RFLP, current versions of allele-specific PCR may still require post-polymerase chain reaction (PCR) handling such as sequencing, or they are more expensive if allele-specific fluorescent probes are used. Commercial genotyping solutions can take weeks from assay design to result, and are often more expensive than assembling reactions in-house. Key components of commercial assay systems are often proprietary, which limits further customization. Therefore, we developed a one-step open-source genotyping method based on quantitative PCR. The allele-specific qPCR (ASQ) does not require post-PCR processing and can genotype germline mutants through either threshold cycle (Ct) or end-point fluorescence reading. ASQ utilizes allele-specific primers, a locus-specific reverse primer, universal fluorescent probes and quenchers, and hot start DNA polymerase. Individual laboratories can further optimize this open-source system as we completely disclose the sequences, reagents, and thermal cycling protocol. We have tested the ASQ protocol to genotype alleles in five different genes. ASQ showed a 98-100% concordance in genotype scoring with RFLP or Sanger sequencing outcomes. ASQ is time-saving because a single qPCR without post-PCR handling suffices to score genotypes. ASQ is cost-effective because universal fluorescent probes negate the necessity of designing expensive probes for each locus.
Project description:BackgroundDue to the frequent reassortment and zoonotic potential of influenza A viruses, rapid gain of sequence information is crucial. Alongside established next-generation sequencing protocols, the MinION sequencing device (Oxford Nanopore Technologies) has become a serious competitor for routine whole-genome sequencing. Here, we established a novel, rapid and high-throughput MinION multiplexing workflow based on a universal RT-PCR.MethodsTwelve representative influenza A virus samples of multiple subtypes were universally amplified in a one-step RT-PCR and subsequently sequenced on the MinION instrument in conjunction with a barcoding library preparation kit from the rapid family and the MinIT performing live base-calling. The identical PCR products were sequenced on an IonTorrent platform and, after final consensus assembly, all data was compared for validation. To prove the practicability of the MinION-MinIT method in human and veterinary diagnostics, we sequenced recent and historical influenza strains for further benchmarking.ResultsThe MinION-MinIT combination generated over two million reads for twelve samples in a six-hour sequencing run, from which a total of 72% classified as quality screened, trimmed and mapped influenza reads to produce full genome sequences. Identities between the datasets of > 99.9% were achieved, with 100% coverage of all segments alongside a sufficient confidence and 4492fold mean depth. From RNA extraction to finished sequences, only 14 h were required.ConclusionsOverall, we developed and validated a novel and rapid multiplex workflow for influenza A virus sequencing. This protocol suits both clinical and academic settings, aiding in real time diagnostics and passive surveillance.