Copy number variant analysis using genome-wide mate-pair sequencing.
ABSTRACT: Copy number variation (CNV) is a common form of structural variation detected in human genomes, occurring as both constitutional and somatic events. Cytogenetic techniques like chromosomal microarray (CMA) are widely used in analyzing CNVs. However, CMA techniques cannot resolve the full nature of these structural variations (i.e. the orientation and location of associated breakpoint junctions) and must be combined with other cytogenetic techniques, such as karyotyping or FISH, to do so. This makes the development of a next-generation sequencing (NGS) approach capable of resolving both CNVs and breakpoint junctions desirable. Mate-pair sequencing (MPseq) is a NGS technology designed to find large structural rearrangements across the entire genome. Here we present an algorithm capable of performing copy number analysis from mate-pair sequencing data. The algorithm uses a step-wise procedure involving normalization, segmentation, and classification of the sequencing data. The segmentation technique combines both read depth and discordant mate-pair reads to increase the sensitivity and resolution of CNV calls. The method is particularly suited to MPseq, which is designed to detect breakpoint junctions at high resolution. This allows for the classification step to accurately calculate copy number levels at the relatively low read depth of MPseq. Here we compare results for a series of hematological cancer samples that were tested with CMA and MPseq. We demonstrate comparable sensitivity to the state-of-the-art CMA technology, with the benefit of improved breakpoint resolution. The algorithm provides a powerful analytical tool for the analysis of MPseq results in cancer.
Project description:Post-pubertal testicular germ-cell tumours (TGCTs) can present with a variety of distinct histologies which are nevertheless lineage related and often co-occurring. The exact lineage relationships and developmental pathways leading to the different histologies is debated. In order to investigate the relationship of histologic populations, mate-pair sequencing (MPseq) and exome sequencing (ExomeSeq) were conducted on different histological populations within the same tumour. Ten TGCTs with 1-3 histologic types/tumour were sequenced. Junctions of somatic chromosomal rearrangements were identified on a per genome basis, with germ cell neoplasia in situ possessing the least (median 1, range 0-4) and embryonal carcinoma the most (median 8.5, range 6-12). Copy number variation revealed gains and losses, including isoform 12p (i12p) (10/10 samples), and chromosomes 7, 8, and 21 gains (7/10 samples). Mapping of shared junctions within a tumour revealed lineage relationships, but only i12p was shared between patients. ExomeSeq from two cases demonstrated a high level of copy-neutral loss of heterozygosity. Parallel assessment of separate histologies within a single TGCT demonstrated cumulative and divergent changes, suggesting the importance of parallel sequencing for detection of relevant biomarkers.
Project description:Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
Project description:Fluorescence in situ hybridization (FISH) is currently the gold-standard assay to detect recurrent genomic abnormalities of prognostic significance in multiple myeloma (MM). Since most translocations in MM involve a position effect with heterogeneous breakpoints, we hypothesize that FISH has the potential to miss translocations involving these regions. We evaluated 70 bone marrow samples from patients with plasma cell dyscrasia by FISH and whole-genome mate-pair sequencing (MPseq). Thirty cases (42.9%) displayed at least one instance of discordance between FISH and MPseq for each primary and secondary abnormality evaluated. Nine cases had abnormalities detected by FISH that went undetected by MPseq including 6 tetraploid clones and three cases with missed copy number abnormalities. In contrast, 19 cases had abnormalities detected by MPseq that went undetected by FISH. Seventeen were MYC rearrangements and two were 17p deletions. MPseq identified 36 MYC abnormalities and 17 (50.0% of MYC abnormal group with FISH results) displayed a false negative FISH result. MPseq identified 10 cases (14.3%) with IgL rearrangements, a recent marker of poor outcome, and 10% with abnormalities in genes associated with lenalidomide response or resistance. In summary, MPseq was superior in the characterization of rearrangement complexity and identification of secondary abnormalities demonstrating increased clinical value compared to FISH.
Project description:Clustered copy number variants (CNVs) as detected by chromosomal microarray analysis (CMA) are often reported as germline chromothripsis. However, such cases might need further investigations by massive parallel whole genome sequencing (WGS) in order to accurately define the underlying complex rearrangement, predict the occurrence mechanisms and identify additional complexities. Here, we utilized WGS to delineate the rearrangement structure of 21 clustered CNV carriers first investigated by CMA and identified a total of 83 breakpoint junctions (BPJs). The rearrangements were further sub-classified depending on the patterns observed: I) Cases with only deletions (n = 8) often had additional structural rearrangements, such as insertions and inversions typical to chromothripsis; II) cases with only duplications (n = 7) or III) combinations of deletions and duplications (n = 6) demonstrated mostly interspersed duplications and BPJs enriched with microhomology. In two cases the rearrangement mutational signatures indicated both a breakage-fusion-bridge cycle process and haltered formation of a ring chromosome. Finally, we observed two cases with Alu- and LINE-mediated rearrangements as well as two unrelated individuals with seemingly identical clustered CNVs on 2p25.3, possibly a rare European founder rearrangement. In conclusion, through detailed characterization of the derivative chromosomes we show that multiple mechanisms are likely involved in the formation of clustered CNVs and add further evidence for chromoanagenesis mechanisms in both "simple" and highly complex chromosomal rearrangements. Finally, WGS characterization adds positional information, important for a correct clinical interpretation and deciphering mechanisms involved in the formation of these rearrangements.
Project description:Recently, microarrays have replaced karyotyping as a first tier test in patients with idiopathic intellectual disability and/or multiple congenital abnormalities (ID/MCA) in many laboratories. Although in about 14-18% of such patients, DNA copy-number variants (CNVs) with clinical significance can be detected, microarrays have the disadvantage of missing balanced rearrangements, as well as providing no information about the genomic architecture of structural variants (SVs) like duplications and complex rearrangements. Such information could possibly lead to a better interpretation of the clinical significance of the SV. In this study, the clinical use of mate pair next-generation sequencing was evaluated for the detection and further characterization of structural variants within the genomes of 50 ID/MCA patients. Thirty of these patients carried a chromosomal aberration that was previously detected by array CGH or karyotyping and suspected to be pathogenic. In the remaining 20 patients no causal SVs were found and only benign aberrations were detected by conventional techniques. Combined cluster and coverage analysis of the mate pair data allowed precise breakpoint detection and further refinement of previously identified balanced and (complex) unbalanced aberrations, pinpointing the causal gene for some patients. We conclude that mate pair sequencing is a powerful technology that can provide rapid and unequivocal characterization of unbalanced and balanced SVs in patient genomes and can be essential for the clinical interpretation of some SVs.
Project description:Copy-number variants (CNVs) are a major source of genetic variation in human health and disease. Previous studies have implicated replication stress as a causative factor in CNV formation. However, existing data are technically limited in the quality of comparisons that can be made between human CNVs and experimentally induced variants. Here, we used two high-resolution strategies-single nucleotide polymorphism (SNP) arrays and mate-pair sequencing-to compare CNVs that occur constitutionally to those that arise following aphidicolin-induced DNA replication stress in the same human cells. Although the optimized methods provided complementary information, sequencing was more sensitive to small variants and provided superior structural descriptions. The majority of constitutional and all aphidicolin-induced CNVs appear to be formed via homology-independent mechanisms, while aphidicolin-induced CNVs were of a larger median size than constitutional events even when mate-pair data were considered. Aphidicolin thus appears to stimulate formation of CNVs that closely resemble human pathogenic CNVs and the subset of larger nonhomologous constitutional CNVs.
Project description:T-cell acute lymphoblastic leukemia (T-ALL) is an aggressive hematopoietic neoplasm involving the bone marrow and blood that accounts for ?15% of childhood and 25% of adult ALL. Whereas multiple, recurrent genetic abnormalities have been described in T-ALL, their clinical significance is unclear or controversial. Importantly, ABL1 rearrangements, most commonly described in BCR/ABL1-positive B-ALL and BCR-ABL1-like B-ALL, have been observed in T-ALL and may respond to tyrosine kinase inhibitor (TKI) therapy. We describe a newly diagnosed case of pediatric T-ALL with a fluorescence in situ hybridization abnormality suggesting a partial ABL1 deletion by a BCR/ABL1 dual-color dual-fusion probe but that demonstrated a normal result using an ABL1 break-apart probe. Mate-pair sequencing (MPseq), a next-generation sequencing (NGS)-based technology utilized to detect copy number and structural abnormalities with high resolution and precision throughout the genome, was performed and revealed a NUP214/ABL1 gene fusion that has been demonstrated to be sensitive to TKI therapy. This case demonstrates the power of MPseq to resolve chromosomal abnormalities unappreciable by traditional cytogenetic methodologies and highlights the clinical value of this novel NGS-based technology.
Project description:Genomic disorders are the clinical conditions manifested by submicroscopic genomic rearrangements including copy number variants (CNVs). The CNVs can be identified by array-based comparative genomic hybridization (aCGH), the most commonly used technology for molecular diagnostics of genomic disorders. However, clinical aCGH only informs CNVs in the probe-interrogated regions. Neither orientational information nor the resulting genomic rearrangement structure is provided, which is a key to uncovering mutational and pathogenic mechanisms underlying genomic disorders. Long-range polymerase chain reaction (PCR) is a traditional approach to obtain CNV breakpoint junction, but this method is inefficient when challenged by structural complexity such as often found at the PLP1 locus in association with Pelizaeus-Merzbacher disease (PMD). Here we introduced 'capture and single-molecule real-time sequencing' (cap-SMRT-seq) and newly developed 'asymmetry linker-mediated nested PCR walking' (ALN-walking) for CNV breakpoint sequencing in 49 subjects with PMD-associated CNVs. Remarkably, 29 (94%) of the 31 CNV breakpoint junctions unobtainable by conventional long-range PCR were resolved by cap-SMRT-seq and ALN-walking. Notably, unexpected CNV complexities, including inter-chromosomal rearrangements that cannot be resolved by aCGH, were revealed by efficient breakpoint sequencing. These sequence-based structures of PMD-associated CNVs further support the role of DNA replicative mechanisms in CNV mutagenesis, and facilitate genotype-phenotype correlation studies. Intriguingly, the lengths of gained segments by CNVs are strongly correlated with clinical severity in PMD, potentially reflecting the functional contribution of other dosage-sensitive genes besides PLP1. Our study provides new efficient experimental approaches (especially ALN-walking) for CNV breakpoint sequencing and highlights their importance in uncovering CNV mutagenesis and pathogenesis in genomic disorders.
Project description:Alu repetitive elements are known to be major contributors to genome instability by generating Alu-mediated copy-number variants (CNVs). Most of the reported Alu-mediated CNVs are simple deletions and duplications, and the mechanism underlying Alu-Alu-mediated rearrangement has been attributed to non-allelic homologous recombination (NAHR). Chromosome 17 at the p13.3 genomic region lacks extensive low-copy repeat architecture; however, it is highly enriched for Alu repetitive elements, with a fraction of 30% of total sequence annotated in the human reference genome, compared with the 10% genome-wide and 18% on chromosome 17. We conducted mechanistic studies of the 17p13.3 CNVs by performing high-density oligonucleotide array comparative genomic hybridization, specifically interrogating the 17p13.3 region with ?150 bp per probe density; CNV breakpoint junctions were mapped to nucleotide resolution by polymerase chain reaction and Sanger sequencing. Studied rearrangements include 5 interstitial deletions, 14 tandem duplications, 7 terminal deletions and 13 complex genomic rearrangements (CGRs). Within the 17p13.3 region, Alu-Alu-mediated rearrangements were identified in 80% of the interstitial deletions, 46% of the tandem duplications and 50% of the CGRs, indicating that this mechanism was a major contributor for formation of breakpoint junctions. Our studies suggest that Alu repetitive elements facilitate formation of non-recurrent CNVs, CGRs and other structural aberrations of chromosome 17 at p13.3. The common observation of Alu-mediated rearrangement in CGRs and breakpoint junction sequences analysis further demonstrates that this type of mechanism is unlikely attributed to NAHR, but rather may be due to a recombination-coupled DNA replicative repair process.
Project description:Chromosomal insertions are genomic rearrangements with a chromosome segment inserted into a non-homologous chromosome or a non-adjacent locus on the same chromosome or the other homologue, constituting ~2% of nonrecurrent copy-number gains. Little is known about the molecular mechanisms of their formation. We identified 16 individuals with complex insertions among 56,000 individuals tested at Baylor Genetics using clinical array comparative genomic hybridization (aCGH) and fluorescence in situ hybridization (FISH). Custom high-density aCGH was performed on 10 individuals with available DNA, and breakpoint junctions were fine-mapped at nucleotide resolution by long-range PCR and DNA sequencing in 6 individuals to glean insights into potential mechanisms of formation. We observed microhomologies and templated insertions at the breakpoint junctions, resembling the breakpoint junction signatures found in complex genomic rearrangements generated by replication-based mechanism(s) with iterative template switches. In addition, we analyzed 5 families with apparently balanced insertion in one parent detected by FISH analysis and found that 3 parents had additional small copy-number variants (CNVs) at one or both sides of the inserting fragments as well as at the inserted sites. We propose that replicative repair can result in interchromosomal complex insertions generated through chromothripsis-like chromoanasynthesis involving two or three chromosomes, and cause a significant fraction of apparently balanced insertions harboring small flanking CNVs.