Expression data from leukemic patients with complex structural variants
ABSTRACT: Structural variants can lead to an alteration of gene expression which may be associated with disease worsening. In our study we attempted to describe expression changes associated with the presence of extensive genomic rearrangements in chronic lymphocytic leukemia. We used microarrays for establishing an algorithm for identification of unique expression profiles associated with extensive genomic rearrangements. Overall design: Peripheral blood samples from leukemic patients were used for RNA extraction. Gene expression profiles were correlated to chromosomal abnormalities identified in the cohort.
INSTRUMENT(S): [HTA-2_0] Affymetrix Human Transcriptome Array 2.0 [transcript (gene) version]
Project description:Structural variants can lead to an alteration of gene expression which may be associated with disease worsening. In our study we attempted to describe expression changes associated with the presence of extensive genomic rearrangements in chronic lymphocytic leukemia. Overall design: We used Affymetrix microarrays to identify genomic rearrangements in chronic lymphocytic leukemia. Peripheral blood samples from 10 leukemic patients were used for genomic DNA extraction. Copy number analysis was performed and obtained data were correlated with expression profiles of these cases.
Project description:Structural variants can lead to an alteration of gene expression which may be associated with disease worsening. In our study we attempted to describe expression changes associated with the presence of extensive genomic rearrangements in chronic lymphocytic leukemia. Overall design: Peripheral blood samples from leukemic patients were used for RNA extraction. Gene expression profiles were correlated to chromosomal abnormalities identified in the cohort.
Project description:Background:Extensive genome rearrangements, known as chromothripsis, have been recently identified in several cancer types. Chromothripsis leads to complex structural variants (cSVs) causing aberrant gene expression and the formation of de novo fusion genes, which can trigger cancer development, or worsen its clinical course. The functional impact of cSVs can be studied at the RNA level using whole transcriptome sequencing (total RNA-Seq). It represents a powerful tool for discovering, profiling, and quantifying changes of gene expression in the overall genomic context. However, bioinformatic analysis of transcriptomic data, especially in cases with cSVs, is a complex and challenging task, and the development of proper bioinformatic tools for transcriptome studies is necessary. Methods:We designed a bioinformatic workflow for the analysis of total RNA-Seq data consisting of two separate parts (pipelines): The first pipeline incorporates a statistical solution for differential gene expression analysis in a biologically heterogeneous sample set. We utilized results from transcriptomic arrays which were carried out in parallel to increase the precision of the analysis. The second pipeline is used for the identification of de novo fusion genes. Special attention was given to the filtering of false positives (FPs), which was achieved through consensus fusion calling with several fusion gene callers. We applied the workflow to the data obtained from ten patients with chronic lymphocytic leukemia (CLL) to describe the consequences of their cSVs in detail. The fusion genes identified by our pipeline were correlated with genomic break-points detected by genomic arrays. Results:We set up a novel solution for differential gene expression analysis of individual samples and de novo fusion gene detection from total RNA-Seq data. The results of the differential gene expression analysis were concordant with results obtained by transcriptomic arrays, which demonstrates the analytical capabilities of our method. We also showed that the consensus fusion gene detection approach was able to identify true positives (TPs) efficiently. Detected coordinates of fusion gene junctions were in concordance with genomic breakpoints assessed using genomic arrays. Discussion:Byapplying our methods to real clinical samples, we proved that our approach for total RNA-Seq data analysis generates results consistent with other genomic analytical techniques. The data obtained by our analyses provided clues for the study of the biological consequences of cSVs with far-reaching implications for clinical outcome and management of cancer patients. The bioinformatic workflow is also widely applicable for addressing other research questions in different contexts, for which transcriptomic data are generated.
Project description:The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using end sequencing profiling, which relies on paired-end sequencing of cloned tumor genomes.In the present study brain, breast, ovary, and prostate tumors, along with three breast cancer cell lines, were surveyed using end sequencing profiling, yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization confirmed translocations and complex tumor genome structures that include co-amplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms revealed candidate somatic mutations and an elevated rate of novel single nucleotide polymorphisms in an ovarian tumor.These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than was previously appreciated and that genomic fusions, including fusion transcripts and proteins, may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.
Project description:Genomic disorders are defined as diseases caused by rearrangements of the genome incited by a genomic architecture that conveys instability. Y-chromosome related dysfunctions such as male infertility are frequently associated with gross DNA rearrangements resulting from its peculiar genomic architecture. The Y-chromosome has evolved into a highly specialized chromosome to perform male functions, mainly spermatogenesis. Direct and inverted repeats, some of them palindromes with highly identical nucleotide sequences that can form DNA cruciform structures, characterize the genomic structure of the Y-chromosome long arm. Some particular Y chromosome genomic deletions can cause spermatogenic failure likely because of removal of one or more transcriptional units with a potential role in spermatogenesis. We describe mechanisms underlying the formation of human genomic rearrangements on autosomes and review Y-chromosome deletions associated with male infertility.
Project description:POT1 and TPP1 are part of the shelterin complex and are essential for telomere length regulation and maintenance. Naturally occurring mutations of the telomeric POT1-TPP1 complex are implicated in familial glioma, melanoma and chronic lymphocytic leukaemia. Here we report the atomic structure of the interacting portion of the human telomeric POT1-TPP1 complex and suggest how several of these mutations contribute to malignant cancer. The POT1 C-terminus (POT1C) forms a bilobal structure consisting of an OB-fold and a holiday junction resolvase domain. TPP1 consists of several loops and helices involved in extensive interactions with POT1C. Biochemical data shows that several of the cancer-associated mutations, partially disrupt the POT1-TPP1 complex, which affects its ability to bind telomeric DNA efficiently. A defective POT1-TPP1 complex leads to longer and fragile telomeres, which in turn promotes genomic instability and cancer.
Project description:The gram-negative anaerobic bacterium Porphyromonas gingivalis is a major causative agent of chronic periodontitis. Porphyromonas gingivalis strains have been classified into virulent and less-virulent strains by mouse subcutaneous soft tissue abscess model analysis. Here, we present the whole genome sequence of P. gingivalis ATCC 33277, which is classified as a less-virulent strain. We identified 2090 protein-coding sequences (CDSs), 4 RNA operons, and 53 tRNA genes in the ATCC 33277 genome. By genomic comparison with the virulent strain W83, we identified 461 ATCC 33277-specific and 415 W83-specific CDSs. Extensive genomic rearrangements were observed between the two strains: 175 regions in which genomic rearrangements have occurred were identified. Thirty-five of those genomic rearrangements were inversion or translocation and 140 were simple insertion, deletion, or replacement. Both strains contained large numbers of mobile elements, such as insertion sequences, miniature inverted-repeat transposable elements (MITEs), and conjugative transposons, which are frequently associated with genomic rearrangements. These findings indicate that the mobile genetic elements have been deeply involved in the extensive genome rearrangement of P. gingivalis and the occurrence of many of the strain-specific CDSs. We also describe here a very unique feature of MITE400, which we renamed MITEPgRS (MITE of P. gingivalis with Repeating Sequences).
Project description:BACKGROUND:We investigated the features of the genomic rearrangements in a cohort of 50 male individuals with proteolipid protein 1 (PLP1) copy number gain events who were ascertained with Pelizaeus-Merzbacher disease (PMD; MIM: 312080). We then compared our new data to previous structural variant mutagenesis studies involving the Xq22 region of the human genome. The aggregate data from 159 sequenced join-points (discontinuous sequences in the reference genome that are joined during the rearrangement process) were studied. Analysis of these data from 150 individuals enabled the spectrum and relative distribution of the underlying genomic mutational signatures to be delineated. METHODS:Genomic rearrangements in PMD individuals with PLP1 copy number gain events were investigated by high-density customized array or clinical chromosomal microarray analysis and breakpoint junction sequence analysis. RESULTS:High-density customized array showed that the majority of cases (33/50; ~?66%) present with single duplications, although complex genomic rearrangements (CGRs) are also frequent (17/50; ~?34%). Breakpoint mapping to nucleotide resolution revealed further previously unknown structural and sequence complexities, even in single duplications. Meta-analysis of all studied rearrangements that occur at the PLP1 locus showed that single duplications were found in ~?54% of individuals and that, among all CGR cases, triplication flanked by duplications is the most frequent CGR array CGH pattern observed. Importantly, in ~?32% of join-points, there is evidence for a mutational signature of microhomeology (highly similar yet imperfect sequence matches). CONCLUSIONS:These data reveal a high frequency of CGRs at the PLP1 locus and support the assertion that replication-based mechanisms are prominent contributors to the formation of CGRs at Xq22. We propose that microhomeology can facilitate template switching, by stabilizing strand annealing of the primer using W-C base complementarity, and is a mutational signature for replicative repair.
Project description:Alu repetitive elements are known to be major contributors to genome instability by generating Alu-mediated copy-number variants (CNVs). Most of the reported Alu-mediated CNVs are simple deletions and duplications, and the mechanism underlying Alu-Alu-mediated rearrangement has been attributed to non-allelic homologous recombination (NAHR). Chromosome 17 at the p13.3 genomic region lacks extensive low-copy repeat architecture; however, it is highly enriched for Alu repetitive elements, with a fraction of 30% of total sequence annotated in the human reference genome, compared with the 10% genome-wide and 18% on chromosome 17. We conducted mechanistic studies of the 17p13.3 CNVs by performing high-density oligonucleotide array comparative genomic hybridization, specifically interrogating the 17p13.3 region with ?150 bp per probe density; CNV breakpoint junctions were mapped to nucleotide resolution by polymerase chain reaction and Sanger sequencing. Studied rearrangements include 5 interstitial deletions, 14 tandem duplications, 7 terminal deletions and 13 complex genomic rearrangements (CGRs). Within the 17p13.3 region, Alu-Alu-mediated rearrangements were identified in 80% of the interstitial deletions, 46% of the tandem duplications and 50% of the CGRs, indicating that this mechanism was a major contributor for formation of breakpoint junctions. Our studies suggest that Alu repetitive elements facilitate formation of non-recurrent CNVs, CGRs and other structural aberrations of chromosome 17 at p13.3. The common observation of Alu-mediated rearrangement in CGRs and breakpoint junction sequences analysis further demonstrates that this type of mechanism is unlikely attributed to NAHR, but rather may be due to a recombination-coupled DNA replicative repair process.
Project description:Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ?4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50-300 bp) SVs.