Causes and consequences of chromatin variation between inbred mice.
ABSTRACT: Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits.
Project description:The observation that the genetic variants identified in genome-wide association studies (GWAS) frequently lie in non-coding regions of the genome that contain cis-regulatory elements suggests that altered gene expression underlies the development of many complex traits. In order to efficiently make a comprehensive assessment of the impact of non-coding genetic variation in immune related diseases we emulated the whole-exome sequencing paradigm and developed a custom capture panel for the known DNase I hypersensitive site (DHS) in immune cells - "Immunoseq".We performed Immunoseq in 30 healthy individuals where we had existing transcriptome data from T cells. We identified a large number of novel non-coding variants in these samples. Relying on allele specific expression measurements, we also showed that our selected capture regions are enriched for functional variants that have an impact on differential allelic gene expression. The results from a replication set with 180 samples confirmed our observations.We show that Immunoseq is a powerful approach to detect novel rare variants in regulatory regions. We also demonstrate that these novel variants have a potential functional role in immune cells.
Project description:DNase I hypersensitive sites (DHSs) are a hallmark of chromatin regions containing regulatory DNA such as enhancers and promoters; however, the factors affecting the establishment and maintenance of these sites are not fully understood. We now show that HMGN1 and HMGN2, nucleosome-binding proteins that are ubiquitously expressed in vertebrate cells, maintain the DHS landscape of mouse embryonic fibroblasts (MEFs) synergistically. Loss of one of these HMGN variants led to a compensatory increase of binding of the remaining variant. Genome-wide mapping of the DHSs in Hmgn1(-/-), Hmgn2(-/-), and Hmgn1(-/-)n2(-/-) MEFs reveals that loss of both, but not a single HMGN variant, leads to significant remodeling of the DHS landscape, especially at enhancer regions marked by H3K4me1 and H3K27ac. Loss of HMGN variants affects the induced expression of stress-responsive genes in MEFs, the transcription profiles of several mouse tissues, and leads to altered phenotypes that are not seen in mice lacking only one variant. We conclude that the compensatory binding of HMGN variants to chromatin maintains the DHS landscape, and the transcription fidelity and is necessary to retain wild-type phenotypes. Our study provides insight into mechanisms that maintain regulatory sites in chromatin and into functional compensation among nucleosome binding architectural proteins.
Project description:The spatial configuration of the chicken alpha-globin gene domain in erythroid and lymphoid cells was studied by using the Chromosome Conformation Capture (3C) approach. Real-time PCR with TaqMan probes was employed to estimate the frequencies of cross-linking of different restriction fragments within the domain. In differentiated cultured erythroblasts and in 10-day chick embryo erythrocytes expressing 'adult' alpha(A) and alpha(D) globin genes the following elements of the domain were found to form an 'active' chromatin hub: upstream Major Regulatory Element (MRE), -9 kb upstream DNase I hypersensitive site (DHS), -4 kb upstream CpG island, alpha(D) gene promoter and the downstream enhancer. The alpha(A) gene promoter was not present in the 'active' chromatin hub although the level of alpha(A) gene transcription exceeded that of the alpha(D) gene. Formation of the 'active' chromatin hub was preceded by the assembly of multiple incomplete hubs containing MRE in combination with either -9 kb DHS or other regulatory elements of the domain. These incomplete chromatin hubs were present in proliferating cultured erythroblasts which did not express globin genes. In lymphoid cells only the interaction between the alpha(D) promoter and the CpG island was detected.
Project description:DNase I hypersensitive sites (DHS) are abundant in regulatory elements, such as promoter, enhancer and transcription factor binding sites. Many studies have revealed that disease-associated variants were concentrated in DHS-related regions. However, limited studies are available on the roles of DHS-related variants in lung cancer. In this study, we performed a large-scale case-control study with 20 871 lung cancer cases and 15 971 controls to evaluate the associations between regulatory genetic variants in DHS and lung cancer susceptibility. The expression quantitative trait loci (eQTL) analysis and pathway-enrichment analysis were performed to identify the possible target genes and pathways. In addition, we performed motif-based analysis to explore the lung-cancer-related motifs using sequence kernel association test. Two novel variants, rs186332 in 20q13.3 (C>T, odds ratio [OR] = 1.17, 95% confidence interval [95% CI]: 1.10-1.24, P = 8.45 × 10-7) and rs4839323 in 1p13.2 (T>C, OR = 0.92, 95% CI: 0.89-0.95, P = 1.02 × 10-6) showed significant association with lung cancer risk. The eQTL analysis suggested that these two SNPs might regulate the expression of MRGBP and SLC16A1, respectively. What's more, the expression of both MRGBP and SLC16A1 was aberrantly elevated in lung tumor tissues. The motif-based analysis identified 10 motifs related to the risk of lung cancer (P < 1.71 × 10-4). Our findings suggested that variants in DHS might modify lung cancer susceptibility through regulating the expression of surrounding genes. This study provided us a deeper insight into the roles of DHS-related genetic variants for lung cancer.
Project description:It has been challenging to determine the disease-causing variant(s) for most major histocompatibility complex (MHC)-associated diseases. However, it is becoming increasingly clear that regulatory variation is pervasive and a fundamentally important mechanism governing phenotypic diversity and disease susceptibility. We gathered DNase I data from 136 human cells to characterize the regulatory landscape of the MHC region, including 4867 DNase I hypersensitive sites (DHSs). We identified thousands of regulatory elements that have been gained or lost in the human or chimpanzee genomes since their evolutionary divergence. We compared alignments of the DHS across six primates and found 149 DHSs with convincing evidence of positive and/or purifying selection. Of these DHSs, compared to neutral sequences, 24 evolved rapidly in the human lineage. We identified 15 instances of transcription-factor-binding motif gains, such as USF, MYC, MAX, MAFK, STAT1, PBX3, etc, and observed 16 GWAS (genome-wide association study) SNPs associated with diseases within these 24 DHSs using FIMO (Find Individual Motif Occurrences) and UCSC (University of California, Santa Cruz) ChIP-seq data. Combining eQTL and Hi-C data, our results indicated that there were five SNPs located in human gains motifs affecting the corresponding gene's expression, two of which closely matched DHS target genes. In addition, a significant SNP, rs7756521, at genome-wide significant level likely affects DDR expression and represents a causal genetic variant for HIV-1 control. These results indicated that species-specific motif gains or losses of rapidly evolving DHSs in the primate genomes might play a role during adaptation evolution and provided some new evidence for a potentially causal role for these GWAS SNPs.
Project description:Finnish samples have been extensively utilized in studying single-gene disorders, where the founder effect has clearly aided in discovery, and more recently in genome-wide association studies of complex traits, where the founder effect has had less obvious impacts. As the field starts to explore rare variants' contribution to polygenic traits, it is of great importance to characterize and confirm the Finnish founder effect in sequencing data and to assess its implications for rare-variant association studies. Here, we employ forward simulation, guided by empirical deep resequencing data, to model the genetic architecture of quantitative polygenic traits in both the general European and the Finnish populations simultaneously. We demonstrate that power of rare-variant association tests is higher in the Finnish population, especially when variants' phenotypic effects are tightly coupled with fitness effects and therefore reflect a greater contribution of rarer variants. SKAT-O, variable-threshold tests, and single-variant tests are more powerful than other rare-variant methods in the Finnish population across a range of genetic models. We also compare the relative power and efficiency of exome array genotyping to those of high-coverage exome sequencing. At a fixed cost, less expensive genotyping strategies have far greater power than sequencing; in a fixed number of samples, however, genotyping arrays miss a substantial portion of genetic signals detected in sequencing, even in the Finnish founder population. As genetic studies probe sequence variation at greater depth in more diverse populations, our simulation approach provides a framework for evaluating various study designs for gene discovery.
Project description:The relapsing fever agent Borrelia hermsii undergoes multiphasic antigenic variation through gene conversion of a unique expression site on a linear plasmid by an archived variable antigen gene. To further characterize this mechanism we assessed the repertoire and organization of archived variable antigen genes by sequencing approximately 85% of plasmids bearing these genes. Most archived genes shared with the expressed gene a <or= 62 nucleotide (nt) region, the upstream homology sequence (UHS), that surrounded the start codon. The 59 archived variable antigen genes were arrayed in clusters with 13 repetitive, 214 nt long downstream homology sequence (DHS) elements distributed among them. A fourteenth DHS element was downstream of the expression locus. Informative nucleotide polymorphisms in UHS regions and DHS elements were applied to the analysis of the expression site of relapse serotypes from 60 infected mice in a prospective study. For most recombinations, the upstream crossover occurred in the UHS's second half, and the downstream crossover was in the DHS's second half. Usually the closest archival DHS element was used, but occasionally a more distant DHS was employed. The downstream extragenic crossover site in B. hermsii contrasts with the upstream [corrected] extragenic crossover site for antigenic variation in African trypanosomes.
Project description:DNase I hypersensitive sites (DHSs) provide important information on the presence of transcriptional regulatory elements and the state of chromatin in mammalian cells. Conventional DNase sequencing (DNase-seq) for genome-wide DHSs profiling is limited by the requirement of millions of cells. Here we report an ultrasensitive strategy, called single-cell DNase sequencing (scDNase-seq) for detection of genome-wide DHSs in single cells. We show that DHS patterns at the single-cell level are highly reproducible among individual cells. Among different single cells, highly expressed gene promoters and enhancers associated with multiple active histone modifications display constitutive DHS whereas chromatin regions with fewer histone modifications exhibit high variation of DHS. Furthermore, the single-cell DHSs predict enhancers that regulate cell-specific gene expression programs and the cell-to-cell variations of DHS are predictive of gene expression. Finally, we apply scDNase-seq to pools of tumour cells and pools of normal cells, dissected from formalin-fixed paraffin-embedded tissue slides from patients with thyroid cancer, and detect thousands of tumour-specific DHSs. Many of these DHSs are associated with promoters and enhancers critically involved in cancer development. Analysis of the DHS sequences uncovers one mutation (chr18: 52417839G>C) in the tumour cells of a patient with follicular thyroid carcinoma, which affects the binding of the tumour suppressor protein p53 and correlates with decreased expression of its target gene TXNL1. In conclusion, scDNase-seq can reliably detect DHSs in single cells, greatly extending the range of applications of DHS analysis both for basic and for translational research, and may provide critical information for personalized medicine.
Project description:Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating new genes in models of anxiety, heart disease and multiple sclerosis. The relationship between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci, a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show that the extent and spatial pattern of variation in inbred rats differ substantially from those of inbred mice and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species.
Project description:BACKGROUND:There are an exceedingly large number of sequence variants discovered through whole genome sequencing in most populations, including cattle. Deciphering which of these affect complex traits is a major challenge. In this study we hypothesize that variants in some functional classes, such as splice site regions, coding regions, DNA methylated regions and long noncoding RNA will explain more variance in complex traits than others. Two variance component approaches were used to test this hypothesis - the first determines if variants in a functional class capture a greater proportion of the variance, than expected by chance, the second uses the proportion of variance explained when variants in all annotations are fitted simultaneously. RESULTS:Our data set consisted of 28.3 million imputed whole genome sequence variants in 16,581 dairy cattle with records for 6 complex trait phenotypes, including production and fertility. We found that sequence variants in splice site regions and synonymous classes captured the greatest proportion of the variance, explaining up to 50% of the variance across all traits. We also found sequence variants in target sites for DNA methylation (genomic regions that are found be highly methylated in bovine placentas), captured a significant proportion of the variance. Per sequence variant, splice site variants explain the highest proportion of variance in this study. The proportion of variance captured by the missense predicted deleterious (from SIFT) and missense tolerated classes was relatively small. CONCLUSION:The results demonstrate using functional annotations to filter whole genome sequence variants into more informative subsets could be useful for prioritization of the variants that are more likely to be associated with complex traits. In addition to variants found in splice sites and protein coding genes regulatory variants and those found in DNA methylated regions, explained considerable variation in milk production and fertility traits. In our analysis synonymous variants captured a significant proportion of the variance, which raises the possible explanation that synonymous mutations might have some effects, or more likely that these variants are miss-annotated, or alternatively the results reflect imperfect imputation of the actual causative variants.