FusionSeq: a Modular Framework for Finding Gene Fusions by Analyzing Paired-End RNA Sequencing Data
ABSTRACT: We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts such as misalignments or random pairing of transcript fragments and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including 8 cancers with and without known rearrangements.
Project description:<p>We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts such as misalignments or random pairing of transcript fragments and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including 8 cancers with and without known rearrangements.</p>
Project description:We report the design and implementation of a "breakpoint analysis" pipeline to discover novel gene fusions by tell-tale transcript level or genomic DNA copy number transitions occurring within genes. We use this method to prioritize candidate rearrangements from high density array CGH datasets as well as exon-resolution expression microarrays. We mine both publicly available data as well as datasets generated in our laboratory. Several gene fusion candidates were chosen for further characterization, and corresponding samples were profiled using paired end RNA sequencing to discover the identity of the gene fusion. Using this approach, we report the discovery and characterization of novel gene fusions spanning multiple cancer subtypes including angiosarcoma, pancreatic cancer, anaplastic astrocytoma, melanoma, breast cancer, and T-cell acute lymphoblastic leukemia. Taken together, this study provides a robust approach for gene fusion discovery, and our results highlight a more widespread role of fusion genes in cancer pathogenesis. Breakpoint analysis for the discovery of novel gene fusions across human cancers
Project description:Studies of fusion genes have mainly focused on the formation of fusions that result in the production of hybrid proteins or, alternatively, on promoter-switching events that put a gene under the control of aberrant signals. However, gene fusions may also disrupt the transcriptional control of genes that are encoded in introns downstream of the breakpoint. By ignoring structural constraints of the transcribed fusions, we highlight the importance of a largely unexplored function of fusion genes. Using breast cancer as an example, we show that miRNA host genes are specifically enriched in fusion genes and that many different, low-frequency, 5' partners may deregulate the same miRNA irrespective of the coding potential of the fusion transcript. These results indicate that the concept of recurrence, defined by the rate of functionally important aberrations, needs to be revised to encompass convergent fusions that affect a miRNA independently of transcript structure and protein-coding potential. Overall design: Illumina paired-end RNA-sequencing was performed on 1600 sequencing libraries (49 technical replicates, 1552 tumour samples) for fusion gene detection analysis. miRNA sequencing was performed on a subset of the fusion detection samples, 191 sequence libraries (5 technical replicates, 186 tumour samples), for miRNA transcript expression estimation. ------------------------------------ This represents the miRNA sequencing component of 191 libraries only. -------------------------------------- The authors state "due to Swedish law, the patient consent, and the risk that the sequencing data contains personally-identifiable information andhereditary mutations, we cannot deposit the short sequencing read data in a repository". Thus, this submission is incomplete.
Project description:Unbalanced translocations are a relatively common type of copy number variation and are a major contributor to neurodevelopmental disorders. We analyzed the breakpoints of 57 unique unbalanced translocations to investigate the mechanisms of how they form. 51 are simple unbalanced translocations between two different chromosome ends, and six rearrangements have more than three breakpoints involving two to five chromosomes. Sequencing 37 breakpoint junctions revealed that simple translocations have between zero and four basepairs (bp) of microhomology (n=26), short inserted sequences (n=8), or paralogous repeats (n=3) at the junctions, indicating that translocations do not arise primarily from non-allelic homologous recombination, but instead form most often via non-homologous end joining or microhomology-mediated break-induced replication. Three simple translocations fuse genes that are predicted to produce in-frame transcripts of SIRPG-WWOX, SMOC2-PROX1, and PIEZO2-MTA1, which may lead to gain of function. Three complex translocations have inversions, insertions, and multiple breakpoint junctions between only two chromosomes. Whole- genome sequencing and fluorescence in situ hybridization analysis of two de novo translocations revealed at least 18 and 33 breakpoints involving five different chromosomes. Breakpoint sequencing of one inherited translocation involving four chromosomes uncovered multiple breakpoints with inversions and insertions. All of these breakpoint junctions had zero to four bp of microhomology consistent with germline chromothripsis, and both de novo events occurred on paternal alleles. Breakpoint sequencing of our large collection of chromosome rearrangements offers a comprehensive analysis of the molecular mechanisms behind germline translocation formation. High resolution array CGH; two-color experiment, clinical patient vs. normal control gDNA; sex mis-matched
Project description:Somatic variants spontaneously appear during the vegetative multiplication of woody crops. The new white-berried grapevine cultivar Tempranillo Blanco (TB) originally appeared as a bud sport of the black skin-berried Tempranillo Tinto (TT) cultivar. To understand the origin of this variation, TT and TB genomes were sequenced. Structural variation and genetic segregation analyses uncovered that complex chromosome rearrangements consistent with chromothripsis, a catastrophic phenomenon recently described in human cancer, generated the variant genome of TB and the deletion of the color locus functional allele. Loss of heterozygosity and decreased copy number delimited alternating monosomic and disomic fragments in the distal arms of TB’s linkage groups 2 and 5. Hemizygous fragments collectively extended over 8.1 Mb and comprised 313 annotated genes. Clustered breakpoints for complex chromosome rearrangements disrupting linkage groups 2 and 5 were identified and junctions involved unbalanced inter- and intra-chromosome translocations and one unbalanced inversion. Signatures of blunt fusions or microhomology-mediated end joining mechanisms were detected at breakpoint junction flanks. Segregation distortion in TB-derived selfed progeny indicated linkage of rearrangements in a single copy of the affected chromosomes that was barely transmitted. Additionally to berry color loss, these dramatic changes have further viticultural consequences in TB associated to a decreased sexual fitness. Our findings show that chromothripsis spontaneously arise during mitotic multiplication of grapevine, evidencing that this phenomenon could contribute to clonal variation in woody crops and to the evolution of plant genomes. Grapevine GrapeGen GeneChips(R) were used for partial comparative genome hybridization between black-berried cultivars and their respective white-berried somatic variants. Differences in copy number were used to estimate chromosome deletions associated to the loss of berry color. Overall design: Young grapevine leaves from each variant line were used for gDNA extraction. Fragmented DNA was amplified, labelled and hybridized to GrapeGen Affymetrix microarrays. Three pairs of white- and black-berried variants were compared with two biological replicates per variant.
Project description:Next Generation Sequencing technologies have enabled de novo gene fusion discovery that could reveal candidates with therapeutic significance in cancer. Here we present an open-source software package, ChimeraScan, for the discovery of chimeric transcription between two independent transcripts. Three cancer cell lines with known gene fusions
Project description:We identified 16 individuals with complex insertions among 56,000 individuals tested at Baylor Genetics Laboratories using clinical array comparative genomic hybridization (aCGH) and fluorescence in situ hybridization (FISH). Custom high-density aCGH was performed on individuals with available DNA, and breakpoint junctions were fine-mapped at nucleotide resolution by long-range PCR and DNA sequencing to glean insights into potential mechanisms of formation. Overall design: Seven individuals with complex chromosomal insertions were subjected to custom high-density arrays in the targeted regions.
Project description:Mutations of TCF4, which encodes a basic helix-loop-helix transcription factor, cause Pitt-Hopkins syndrome (PTHS) via multiple genetic mechanisms. TCF4 is a complex locus expressing multiple transcripts by alternative splicing and use of multiple promoters. We report a three-generation family segregating mild intellectual disability with an apparently balanced chromosomal translocation t(14;18)(q23.3;q21.2) that we characterized as a complex unbalanced karyotype 46,XY,der(14)del(14)(q23.3q23.3)t(14;18)(q23.3;q21.2)del(18)(q21.2q21.2) del(18)(q21.2q21.2)inv(18)(q21.2q21.2),der(18)t(14 ;18)(q23.3;q21.2) disrupting TCF4. Using whole genome sequencing, transcriptome sequencing, qRT-PCR and nCounter analysis, we characterized the breakpoint junctions from derivative chromosomes and gene expression at the TCF4 locus. Our analyses revealed that family members segregating mild intellectual disability with the complex chromosome aberration had normal expression of genes along chromosomes 14 or 18 and no marked changes in expression of genes other than TCF4. Affected individuals had 12-33 fold higher mRNA levels of TCF4 than did unaffected controls or individuals with PTHS. Increased levels of TCF4 transcript variants originating distal to the translocation breakpoint, not the fusion transcript generated by the derivative chromosome, contributed to this increased. Although validation in additional patients is required, our findings suggest that the dysmorphic features and severe intellectual disability characteristic of PTHS is partially rescued by overexpression of short TCF4 transcripts encoding a nuclear localization signal, a transcription activation domain, and the basic helix-loop-helix domain. Examination of TCF4 Isoform expression comparison between mutant and control skin fibroblast tissues
Project description:A novel oligonucleotide microarray design is described whereby one can screen for all known oncogenic fusion transcripts by one microarray hybridization. Measurements of chimeric transcript junctions are combined with exon-wise measurements of individual fusion partners. Keywords: Nimblegen custom-design Overall design: The pilot data included a design with 68,861 oligonucleotide probes covering all combinations of chimeric exon-exon junctions from 275 pairs of fusion genes, as well as sets of oligos internal to all the exons of the fusion partners. Proof of principle was demonstrated by detection of known fusion genes (such as TCF3:PBX1, ETV6:RUNX1, and TMPRSS2:ERG) from six positive controls consisting of leukemia cell lines and prostate cancer biopsies.
Project description:A novel oligonucleotide microarray design is described whereby one can screen for all known oncogenic fusion transcripts by one microarray hybridization. Measurements of chimeric transcript junctions are combined with exon-wise measurements of individual fusion partners. Keywords: Nimblegen custom-design The pilot data included a design with 68,861 oligonucleotide probes covering all combinations of chimeric exon-exon junctions from 275 pairs of fusion genes, as well as sets of oligos internal to all the exons of the fusion partners. Proof of principle was demonstrated by detection of known fusion genes (such as TCF3:PBX1, ETV6:RUNX1, and TMPRSS2:ERG) from six positive controls consisting of leukemia cell lines and prostate cancer biopsies.