Project description:Mutations in protein-coding genes are well established as the basis for human cancer, yet it remains elusive how alterations within non-coding genome, a substantial fraction of which contain cis-regulatory elements (CREs), contribute to cancer pathophysiology. Here we developed an integrative approach to systematically identify and characterize non-coding regulatory variants with functional consequences in human hematopoietic malignancies. Combining targeted resequencing of hematopoietic lineage-specific CREs and mutation discovery, and uncovered 1,837 recurrently mutated CREs containing leukemia-associated non-coding variants. By enhanced CRISPR/dCas9-based CRE perturbation screening and functional analyses, we identified 218 variant-associated oncogenic or tumor suppressive CREs in human leukemia. Non-coding variants at KRAS and PER2 enhancers reside in nuclear receptor (NR) binding regions and modulate transcriptional activities in response to NR signaling in situ in leukemia cells. NR binding sites frequently co-localize with non-coding variants across cancer types. Hence, recurrent non-coding variants connect enhancer dysregulation with nuclear receptor signaling in hematopoietic malignancies.
Project description:Mutations in protein-coding genes are well established as the basis for human cancer, yet it remains elusive how alterations within non-coding genome, a substantial fraction of which contain cis-regulatory elements (CREs), contribute to cancer pathophysiology. Here we developed an integrative approach to systematically identify and characterize non-coding regulatory variants with functional consequences in human hematopoietic malignancies. Combining targeted resequencing of hematopoietic lineage-specific CREs and mutation discovery, and uncovered 1,837 recurrently mutated CREs containing leukemia-associated non-coding variants. By enhanced CRISPR/dCas9-based CRE perturbation screening and functional analyses, we identified 218 variant-associated oncogenic or tumor suppressive CREs in human leukemia. Non-coding variants at KRAS and PER2 enhancers reside in nuclear receptor (NR) binding regions and modulate transcriptional activities in response to NR signaling in situ in leukemia cells. NR binding sites frequently co-localize with non-coding variants across cancer types. Hence, recurrent non-coding variants connect enhancer dysregulation with nuclear receptor signaling in hematopoietic malignancies.
Project description:We have combined high-quality genome sequencing and RNA-sequencing data within a 17-individual, three generation family. Using these data, we have contrasted cis-acting expression, allele-specific expression and splicing quantitative trait loci (collectively termed eQTLs) within the family to eQTLs discovered within a cell-type and ethnicity-matched population sample. We identified that eQTL that exhibit larger effects in the family compared to the population are enriched for rare regulatory and splicing variants and were more likely to influence essential genes. In addition, we identify several large effect-size eQTLs within the family for genes involved in complex disease. Through analysis of eQTLs in a large family we also report the utility of non-coding genome annotation to predicting the effect of rare non-coding variants. We find that a combination of distance to the transcription start site, evolutionary constraint and epigenetic annotation is considerably more informative for predicting the consequence of rare non-coding variants than for common variants. In summary, through transcriptome analyses within a large family we are able to identify the contribution of rare non-coding variants to expression phenotypes and further demonstrate the predictive potential of diverse non-coding genome annotation for interpretation of the impact of rare non-coding variants. RNA-Sequencing of CEPH/UTAH family 1463
Project description:We have combined high-quality genome sequencing and RNA-sequencing data within a 17-individual, three generation family. Using these data, we have contrasted cis-acting expression, allele-specific expression and splicing quantitative trait loci (collectively termed eQTLs) within the family to eQTLs discovered within a cell-type and ethnicity-matched population sample. We identified that eQTL that exhibit larger effects in the family compared to the population are enriched for rare regulatory and splicing variants and were more likely to influence essential genes. In addition, we identify several large effect-size eQTLs within the family for genes involved in complex disease. Through analysis of eQTLs in a large family we also report the utility of non-coding genome annotation to predicting the effect of rare non-coding variants. We find that a combination of distance to the transcription start site, evolutionary constraint and epigenetic annotation is considerably more informative for predicting the consequence of rare non-coding variants than for common variants. In summary, through transcriptome analyses within a large family we are able to identify the contribution of rare non-coding variants to expression phenotypes and further demonstrate the predictive potential of diverse non-coding genome annotation for interpretation of the impact of rare non-coding variants.
Project description:A number of genetic studies have identified rare protein-coding DNA variations associated with autism spectrum disorder (ASD), a neurodevelopmental disorder with significant genetic etiology and heterogeneity. In contrast, the contributions of functional, regulatory genetic variations that occur in the extensive non-protein-coding regions of the genome remain poorly understood. Here we developed a genome-wide analysis to identify rare single nucleotide variants (SNVs) that occur in non-coding regions and determined regulatory function and evolutionary conservation of these variants. Using publicly available datasets and computational predictions, we identified SNVs within putative regulatory regions in promoters, transcription factor binding sites, microRNA genes and their target sites. Overall, we found regulatory variants in the ASD cases were enriched in autism-risk genes and genes involved in fetal neurodevelopment. As with previously reported coding mutations, we found an enrichment of regulatory variants associated with dysregulation of neurodevelopmental and synaptic signaling pathways. Among these were rare inherited non-coding SNVs found in the mature sequence of a number of microRNAs predicted to affect the regulation of autism-risk genes. We show a paternally inherited miR-873-5p variant, with reduced NRXN2 binding affinity, overlays a maternally inherited NRXN1 putative loss-of-function coding variation to likely increase genetic liability in an idiopathic ASD case. Our analysis pipeline provides a new resource for identifying loss-of-function regulatory DNA variations that may contribute to the genetic etiology of complex disorders.
Project description:Mutations in protein-coding genes are well established as the basis for human cancer, yet it remains elusive how alterations within non-coding genome, a substantial fraction of which contain cis-regulatory elements, contribute to cancer pathophysiology largely due to lack of high throughput assays to assess their functional effects. Here we developed an integrative approach to systematically identify and characterize non-coding regulatory variants in human hematopoietic malignancies by combining targeted resequencing, mutation discovery, CRISPR-based enhancer-selective epigenome editing, and enhancer reporter assays. We identify 4,629 recurrent non-coding alterations and 939 mutation-associated pathogenic enhancers controlling proto-oncogenes or tumor suppressors. Enhancer variants at KRAS and PER2 co-localize with nuclear receptor (NR) binding sites and modulate transcriptional activities in response to NR signaling in leukemia cells. NR binding sites frequently associate with non-coding variants across cancer types. Hence, recurrent non-coding somatic variants connect enhancer dysregulation with nuclear receptor signaling in hematopoietic malignancies.
Project description:GenomePaint (https://proteinpaint.stjude.org/genomepaint) is a dynamic visualization platform for whole-genome, whole-exome, transcriptome, and epigenomic data, featuring a novel design that captures the inter-relatedness between DNA variations and RNA expression. Regulatory non-coding variants can be inspected and discovered along with coding variants, and their functional impact further explored by examining 3D genome and/or ChIP-seq data generated from cancer cell lines. Further, GenomePaint correlates mutation and expression patterns with patient outcomes, and can display external data such as adult cancer datasets and user-provided custom tracks. We used GenomePaint to analyze multi-omics data from 3,652 pediatric cancers representing 16 histotypes, and demonstrate the visualization features through examples, including two that led to new insights into oncogenic mechanisms in pediatric cancer. The first is the discovery of a new class of pathogenic recurrent variants that cause aberrant splicing, disrupting the RING domain of CREBBP, a driver gene frequently mutated in relapsed pediatric leukemia. The second is the cis-activation of the MYC oncogene in a subset of B-lineage acute lymphoblastic leukemia (B-ALL) via duplication of the NOTCH1-MYC enhancer (N-ME), previously discovered only in T-lineage ALL. The regulatory impact of N-ME enhancer amplification was initially confirmed by allelic imbalance in published gene expression and ChIP-seq data and verified by additional Capture-C and fluorescence in situ hybridization data generated by follow-up experiments. These examples demonstrate the power of GenomePaint in enabling not only data visualization but also integrative genomic analysis that can lead to novel biological insight for follow-up experimental validation.
Project description:About 45% of congenital heart disease (CHD) is caused by rare gene mutations. Non-coding mutations that perturb cis-regulatory elements (CREs) likely contribute to CHD among the remaining cases without clear etiology. However, identifying CHD-causing non-coding variants has been problematic. We combined human induced pluripotent stem cell-derived cardiomyocyte (iPSC-CM) differentiation and a lentivirus-mediated massively parallel reporter assay (lentiMPRA) to create a high-throughput platform to measure human cardiac enhancer activity. We tested 2451 candidate human cardiac enhancers, identified 1185 with measurable activity, and functionally dissected 123 of these by systematic tiling mutagenesis. We functionally evaluated 6761 non-coding de novo variants (ncDNVs) prioritized from the whole genome sequencing (WGS) of 749 CHD trios. 397 ncDNVs significantly affected cardiac CRE activity. Remarkably, 53% of these ncDNVs increased enhancer activity, often at regions with undetectable enhancer activity in the reference sequence. We introduced 10 of these DNVs associated with CHD genes into iPSCs and found that 4 altered expression of neighboring genes. Moreover, these 4 DNVs also altered cardiomyocyte differentiation, as assessed by single nucleus RNA sequencing. Using the MPRA data, we developed a regression model to prioritize future DNVs for functional testing and demonstrate that this model finds enrichment of DNVs in a second, independent WGS cohort. Taken together, we developed a scalable system to measure the impact of non-coding DNVs on CRE activity and deployed this platform to systematically assess the contribution of non-coding DNVs to CHD.