Project description:The biomarker CA125, a peptide epitope located in several tandem repeats of the mucin MUC16, is the gold-standard for monitoring regression and recurrence of high-grade serous ovarian cancer in response to therapy. However, the CA125 epitope along with several structural features of the MUC16 molecule are ill-defined. One central aspect still unresolved is the number of tandem repeats in MUC16 and how many of these contain the CA125 epitope. Studies from the early 2000s assembled short DNA reads to estimate that MUC16 contained 63 repeats. Here, we conduct Nanopore long-read sequencing of MUC16 transcripts from three primary ovarian tumors and established cell lines (OVCAR3, OVCAR5, and Kuramochi) for a more exhaustive and accurate estimation and sequencing of the MUC16 tandem repeats. The consensus sequence derived from these six sources was confirmed by proteomics validation and agrees with recent additions to the NCBI database. We propose a model of MUC16 containing 19—not 63—tandem repeats. Additionally, we predict the structure of the tandem repeat domain using the deep-learning algorithm, AlphaFold. The predicted structure displays an SEA domain and unstructured linker region rich in proline, serine, and threonine residues in all 19 tandem repeats. Our studies now pave the way for a detailed characterization of the CA125 epitope. Sequencing and modeling of the MUC16 tandem repeats along with their glycoproteomic characterization, currently underway in our laboratories, will help identify novel epitopes in the MUC16 molecule that improve on the sensitivity and clinical utility of the current CA125 assay.
Project description:Using long-read nanopore sequencing, we obtained chromosome-wide phased methylomes of the active and inactive X in mouse placenta and neural stem cells (NSCs), overcoming the limitations if short-read bisulfite sequencing in allelic resolution. We also conducted quantitative analysis of methylation properties like symmetry and entropy, providing a more comprehensive view of epigenetic silencing in X chromosome inactivation. We also resolved the allele-specific genetics and epigenetics of structural macrosatellite Dxz4 and other repeats.
Project description:Clinically translatable large animal models have become indispensable for cardiovascular research, clinically relevant proof of concept studies and for novel diagnostic and therapeutic interventions. In particular, the pig as emerged as an essential cardiovascular disease model, because its heart, circulatory system, and blood supply are anatomically and functionally similar to that of humans. Unfortunately, molecular and omics-based studies in the pig are hampered by the incompleteness of the genome and the lack of diversity of the corresponding transcriptome annotation. Here, we employed Nanopore long-read sequencing and in-depth proteomics on top of Illumina RNA-seq to enhance the pig cardiac transcriptome annotation. We assembled 15,926 transcripts, stratified into coding and non-coding, and validated our results by complementary mass spectrometry. A manual review of several gene loci, which are associated with cardiac function, corroborated the utility of our enhanced annotation. All our data are available for download and is also provided as tracks for integration in genome browsers. We deem this resource as highly valuable for molecular research in an increasingly relevant large animal model.
2020-10-29 | PXD018985 | Pride
Project description:Capture Seuencing of Tandem Repeats
Project description:We sequenced DNA from a bulk of Col x Ler F2 hybrid plants (WT and recq4) using Nanopore long-read sequencing and identified crossover sites with COmapper. For nanopore sequencing of gDNA from 1,000 pooled seedlings, 10-day-old seedlings were ground in liquid nitrogen using a mortar and pestle. The ground tissue was resuspended in four volumes of CTAB buffer (1% [w/v] CTAB, 50 mM Tris-HCl pH 8.0, 0.7 M NaCl, 10 mM EDTA) and incubated at 65°C for 30 min. Following chloroform extraction, isopropanol precipitation and removal of RNAs as above, the gDNA pellet was resuspended in 150 μl TE (10 mM Tris-HCl pH 8.0, 0.1 mM EDTA) buffer and gDNA was quantified using a Qubit dsDNA Broad Range assay kit (Thermo Fisher, Q32853). Nine micrograms of gDNA from pollen or seedlings was used to construct a nanopore long-read sequencing library using a Ligation Sequencing Kit V14 (Nanopore, SQK-LSK114). The libraries were sequenced using a PromethION platform (BGI, Hong Kong).
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.