Project description:Long-range activation is an essential property of enhancers, yet the features determining long-range enhancer activities have not been systematically investigated due to a lack of high-throughput methods to measure long-range enhancer activities efficiently. To address this gap, we present a long-range massively parallel reporter assay (long-range MPRA), a genome-integrated assay allowing the measurement of hundreds of enhancers at multiple distances away from a promoter in parallel. The long-range MPRA assay features two independent landing pads, which allow for modular control over the genome-integrated promoter and enhancer libraries. We showcased the capability of long-range MPRA by testing over 300 K562 enhancers, as well as a set of enhancer combinations, at distances up to 100kb. We found that enhancers’ long-range activities are primarily determined by their intrinsic strength, with strong enhancers retaining more activity over long distances, while weak enhancers rapidly lose activity. Additionally, we found that GATA1-bound enhancers are more resistant to distance-dependent loss of activity, suggesting that TF binding also modulates long-range function. Finally, testing long-range enhancer activities with three different promoters (HBE, HBG and GAPDH) revealed that long-range E-P interactions rely on not only enhancer properties but also promoter responsiveness.
Project description:Enhancers are critical cis-regulatory elements controlling gene expression during cell development and differentiation. However, genome-wide enhancer characterization has been challenging due to the lack of a well-defined relationship between enhancers and genes. Here, we applied a massively parallel reporter assay on Arabidopsis to measure enhancer activities across the genome. We identified 4327 enhancers with various combinations of epigenetic modifications distinctively different from animal enhancers. Furthermore, we showed that enhancers differ from promoters in their preference for transcription factors. Although some enhancers are not conserved and overlap with transposable elements forming clusters, enhancers are generally conserved across thousand Arabidopsis species, suggesting they are selected under evolution pressure and could play critical roles in the regulation of important genes. Moreover, comparison analysis reveals that enhancers identified by different strategies do not overlap, suggesting these methods are complementary in nature. In sum, our work provides an additional catalog of enhancers and lays the foundation for further investigation into enhancers’ functional mechanisms in plants.
Project description:We employ a massively parallel reporter assay (MPRA) to measure the ex vivo activities of hundreds of K562 and HepG2 enhancers with known transcription factor motif instances. For seven selected motifs that correspond to known or predicted activators and repressors in the two cell types, we make directed modifications of the bases corresponding to these motifs and observe the changes in enhancer activity. Reporter mRNA-seq from HepG2 and K562 cells transfected with a ~55,000-plex MPRA plasmid pool containing 5,418 mutated human enhancer sequences, each linked to 10 distinct 10-nt tags. The reporter mRNA tags facilitate quantitation of their abundances. The same tags were also sequenced from the transfected MPRA plasmid pool to facilitate normalization to plasmid copy numbers.
Project description:We apply a massively parallel reporter assay (MPRA) that relies on mRNA and plasmid tag sequencing (Tag-Seq) to compare the regulatory activities of more than 27,000 distinct variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon beta enhancer. The resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution and can be used the to train quantitative sequence-activity models (QSAMs). Reporter Tag-Seq from HEK293 cells transfected with each of six MPRA plasmid pools, with and without stimulation (forskolin or Sendai virus). The reporter mRNAs contain unique 10 nucleotide tags that facilitates quantitation of their abundances. The same tags were also sequenced from each ransfected plasmid pool to facilitate normalization to plasmid copy numbers. The reporter constructs were designed according to two different mutagenesis strategies: 'single-hit scanning' and 'multi-hit sampling'. The specific variants are included in the processed data files.
Project description:Recent genome-wide association studies have established that most complex disease-associated loci are found in noncoding regions where defining their function is nontrivial. In this study, we leverage a modular massively parallel reporter assay (MPRA) to uncover sequence features linked to context-specific regulatory activity. We screened enhancer activity across a panel of 198-bp fragments spanning over 10k type 2 diabetes- and metabolic trait-associated variants in the 832/13 rat insulinoma cell line, a relevant model of pancreatic beta cells. We explored these fragments’ context sensitivity by comparing their activities when placed up- or downstream of a reporter gene, and in combination with either a synthetic housekeeping promoter (SCP1) or a more biologically relevant promoter corresponding to the human insulin gene (INS). We identified clear effects of MPRA construct design on measured fragment enhancer activity. Specifically, a subset of fragments (n = 702/11,656) displayed positional bias, evenly distributed across up- and downstream preference. A separate set of fragments exhibited promoter bias (n = 698/11,656), mostly towards the cell-specific INS promoter (73.4%). To identify sequence features associated with promoter preference, we used Lasso regression with 562 genomic annotations and discovered that fragments with INS promoter-biased activity are enriched for HNF1 motifs. HNF1 family transcription factors are key regulators of glucose metabolism disrupted in maturity onset diabetes of the young (MODY), suggesting genetic convergence between rare coding variants that cause MODY and common T2D-associated regulatory variants. We designed a follow-up MPRA containing HNF1 motif-enriched fragments and observed several instances where deletion or mutation of HNF1 motifs disrupted the INS promoter-biased enhancer activity, specifically in the beta cell model but not in a skeletal muscle cell line, another diabetes-relevant cell type. Together, our study suggests that cell-specific regulatory activity is partially influenced by enhancer-promoter compatibility and indicates that careful attention should be paid when designing MPRA libraries to capture context-specific regulatory processes at disease-associated genetic signals.
Project description:Recent genome-wide association studies have established that most complex disease-associated loci are found in noncoding regions where defining their function is nontrivial. In this study, we leverage a modular massively parallel reporter assay (MPRA) to uncover sequence features linked to context-specific regulatory activity. We screened enhancer activity across a panel of 198-bp fragments spanning over 10k type 2 diabetes- and metabolic trait-associated variants in the 832/13 rat insulinoma cell line, a relevant model of pancreatic beta cells. We explored these fragments’ context sensitivity by comparing their activities when placed up- or downstream of a reporter gene, and in combination with either a synthetic housekeeping promoter (SCP1) or a more biologically relevant promoter corresponding to the human insulin gene (INS). We identified clear effects of MPRA construct design on measured fragment enhancer activity. Specifically, a subset of fragments (n = 702/11,656) displayed positional bias, evenly distributed across up- and downstream preference. A separate set of fragments exhibited promoter bias (n = 698/11,656), mostly towards the cell-specific INS promoter (73.4%). To identify sequence features associated with promoter preference, we used Lasso regression with 562 genomic annotations and discovered that fragments with INS promoter-biased activity are enriched for HNF1 motifs. HNF1 family transcription factors are key regulators of glucose metabolism disrupted in maturity onset diabetes of the young (MODY), suggesting genetic convergence between rare coding variants that cause MODY and common T2D-associated regulatory variants. We designed a follow-up MPRA containing HNF1 motif-enriched fragments and observed several instances where deletion or mutation of HNF1 motifs disrupted the INS promoter-biased enhancer activity, specifically in the beta cell model but not in a skeletal muscle cell line, another diabetes-relevant cell type. Together, our study suggests that cell-specific regulatory activity is partially influenced by enhancer-promoter compatibility and indicates that careful attention should be paid when designing MPRA libraries to capture context-specific regulatory processes at disease-associated genetic signals.
Project description:We apply a massively parallel reporter assay (MPRA) that relies on mRNA and plasmid tag sequencing (Tag-Seq) to compare the regulatory activities of more than 27,000 distinct variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon beta enhancer. The resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution and can be used the to train quantitative sequence-activity models (QSAMs).
Project description:We employ a massively parallel reporter assay (MPRA) to measure the ex vivo activities of hundreds of K562 and HepG2 enhancers with known transcription factor motif instances. For seven selected motifs that correspond to known or predicted activators and repressors in the two cell types, we make directed modifications of the bases corresponding to these motifs and observe the changes in enhancer activity.
Project description:BackgroundMassively parallel reporter assays (MPRAs) are an experimental technology for measuring the activity of thousands of candidate regulatory sequences or their variants in parallel, where the activity of individual sequences is measured from pools of sequence-tagged reporter genes. Activity is derived from the ratio of transcribed RNA to input DNA counts of associated tag sequences in each reporter construct, so-called barcodes. Recently, tools specifically designed to analyze MPRA data were developed that attempt to model the count data, accounting for its inherent variation. Of these tools, MPRAnalyze and mpralm are most widely used. MPRAnalyze models barcode counts to estimate the transcription rate of each sequence. While it has increased statistical power and robustness against outliers compared to mpralm, it is slow and has a high false discovery rate. Mpralm, a tool built on the R package Limma, estimates log fold-changes between different sequences. As opposed to MPRAnalyze, it is fast and has a low false discovery rate but is susceptible to outliers and has less statistical power.ResultsWe propose BCalm, an MPRA analysis framework aimed at addressing the limitations of the existing tools. BCalm is an adaptation of mpralm, but models individual barcode counts instead of aggregating counts per sequence. Leaving out the aggregation step increases statistical power and improves robustness to outliers, while being fast and precise. We show the improved performance over existing methods on both simulated MPRA data and a lentiviral MPRA library of 166,508 target sequences, including 82,258 allelic variants. Further, BCalm adds functionality beyond the existing mpralm package, such as preparing count input files from MPRAsnakeflow, as well as an option to test for sequences with enhancing or repressing activity. Its built-in plotting functionalities allow for easy interpretation of the results.ConclusionsWith BCalm, we provide a new tool for analyzing MPRA data which is robust and accurate on real MPRA datasets. The package is available at https://github.com/kircherlab/BCalm .
Project description:We performed a massively parallel reporter assay (MPRA) of 2,034 genomic regions containing single nucleotide polymorphisms (4,587 regions tested with 96,328 barcodes) that are in high linkage disequilibrium with lead variants from an asthma GWAS (PMID: 31036433). Test sequences were transfected into 16HBE14o- Human Bronchial Epithelial Cells, and assessed for enhancer activity by comparing RNA counts to DNA input counts. The processed files contain the MPRA barcodes, read counts and activities.