Project description:Sea stars and sea urchins are model systems for interrogating the types of deep evolutionary changes that have restructured developmental gene regulatory networks (GRNs). Although cis-regulatory DNA evolution is likely the predominant mechanism of change, it was recently shown that Tbrain, a Tbox transcription factor protein, has evolved a changed preference for a low-affinity, secondary binding motif. The primary, high-affinity motif is conserved. To date, however, no genome-wide comparisons have been performed to provide an unbiased assessment of the evolution of GRNs between these taxa, and no study has attempted to determine the interplay between transcription factor binding motif evolution and GRN topology. The study here measures genome-wide binding of Tbrain orthologs by using ChIP-sequencing and associates these orthologs with putative target genes to assess global function. Targets of both factors are enriched for other regulatory genes, although nonoverlapping sets of functional enrichments in the two datasets suggest a much diverged function. The number of low-affinity binding motifs is significantly depressed in sea urchins compared with sea star, but both motif types are associated with genes from a range of functional categories. Only a small fraction (∼10%) of genes are predicted to be orthologous targets. Collectively, these data indicate that Tbr has evolved significantly different developmental roles in these echinoderms and that the targets and the binding motifs in associated cis-regulatory sequences are dispersed throughout the hierarchy of the GRN, rather than being biased toward terminal process or discrete functional blocks, which suggests extensive evolutionary tinkering.
Project description:Eukaryotic transcription factors (TFs) from the same structural family tend to bind similar DNA sequences, despite the ability of these TFs to execute distinct functions in vivo. The cell partly resolves this specificity paradox through combinatorial strategies and the use of low-affinity binding sites, which are better able to distinguish between similar TFs. However, because these sites have low affinity, it is challenging to understand how TFs recognize them in vivo. Here, we summarize recent findings and technological advancements that allow for the quantification and mechanistic interpretation of TF recognition across a wide range of affinities. We propose a model that integrates insights from the fields of genetics and cell biology to provide further conceptual understanding of TF binding specificity. We argue that in eukaryotes, target specificity is driven by an inhomogeneous 3D nuclear distribution of TFs and by variation in DNA binding affinity such that locally elevated TF concentration allows low-affinity binding sites to be functional.
Project description:Position weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a substantial fraction of ChIP-seq peaks. To study various modes of the direct binding of plant TFs, we compiled the benchmark collection of 111 ChIP-seq datasets for Arabidopsis thaliana, and applied the traditional PWM, and two alternative motif models BaMM and SiteGA, proposing the dependencies of the positions. The variation in the stringency of the recognition thresholds for the models proposed that the hits of PWM, BaMM, and SiteGA models are associated with the sites of high/medium, any, and low affinity, respectively. At the medium recognition threshold, about 60% of ChIP-seq peaks contain PWM hits consisting of conserved core consensuses, while BaMM and SiteGA provide hits for an additional 15% of peaks in which a weaker core consensus is compensated through intra-motif dependencies. The presence/absence of these dependencies in the motifs of alternative/traditional models was confirmed by the dependency logo DepLogo visualizing the position-wise partitioning of the alignments of predicted sites. We exemplify the detailed analysis of ChIP-seq profiles for plant TFs CCA1, MYC2, and SEP3. Gene ontology (GO) enrichment analysis revealed that among the three motif models, the SiteGA had the highest portions of genes with the significantly enriched GO terms among all predicted genes. We showed that both alternative motif models provide for traditional PWM greater extensions in predicted sites for TFs MYC2/SEP3 with condition/tissue specific functions, compared to those for TF CCA1 with housekeeping functions. Overall, the combined application of standard and alternative motif models is beneficial to detect various modes of the direct TF-DNA interactions in the maximal portion of ChIP-seq loci.
Project description:Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression.
Project description:Sequence-specific binding by transcription factors (TFs) interprets regulatory information encoded in the genome. Using recently published universal protein binding microarray (PBM) data on the in vitro DNA binding preferences of these proteins for all possible 8-base-pair sequences, we examined the evolutionary conservation and enrichment within putative regulatory regions of the binding sequences of a diverse library of 104 nonredundant mouse TFs spanning 22 different DNA-binding domain structural classes. We found that not only high affinity binding sites, but also numerous moderate and low affinity binding sites, are under negative selection in the mouse genome. These 8-mers occur preferentially in putative regulatory regions of the mouse genome, including CpG islands and non-exonic ultraconserved elements (UCEs). Of TFs whose PBM "bound" 8-mers are enriched within sets of tissue-specific UCEs, many are expressed in the same tissue(s) as the UCE-driven gene expression. Phylogenetically conserved motif occurrences of various TFs were also enriched in the noncoding sequence surrounding numerous gene sets corresponding to Gene Ontology categories and tissue-specific gene expression clusters, suggesting involvement in transcriptional regulation of those genes. Altogether, our results indicate that many of the sequences bound by these proteins in vitro, including lower affinity DNA sequences, are likely to be functionally important in vivo. This study not only provides an initial analysis of the potential regulatory associations of 104 mouse TFs, but also presents an approach for the functional analysis of TFs from any other metazoan genome as their DNA binding preferences are determined by PBMs or other technologies.
Project description:Gene expression is regulated in part by protein transcription factors that bind target regulatory DNA sequences. Predicting DNA binding sites and affinities from transcription factor sequence or structure is difficult; therefore, experimental data are required to link transcription factors to target sequences. We present a microfluidics-based approach for de novo discovery and quantitative biophysical characterization of DNA target sequences. We validated our technique by measuring sequence preferences for 28 Saccharomyces cerevisiae transcription factors with a variety of DNA-binding domains, including several that have proven difficult to study by other techniques. For each transcription factor, we measured relative binding affinities to oligonucleotides covering all possible 8-bp DNA sequences to create a comprehensive map of sequence preferences; for four transcription factors, we also determined absolute affinities. We expect that these data and future use of this technique will provide information essential for understanding transcription factor specificity, improving identification of regulatory sites and reconstructing regulatory interactions.
Project description:Sequence-specific DNA-binding proteins including transcription factors (TFs) are key determinants of gene regulation and chromatin architecture. TF profiling is commonly carried out by formaldehyde cross-linking and sonication followed by chromatin immunoprecipitation (X-ChIP). We describe a method to profile TF binding at high resolution without cross-linking. We begin with micrococcal nuclease-digested non-cross-linked chromatin and then perform affinity purification of TFs and paired-end sequencing. The resulting occupied regions of genomes from affinity-purified naturally isolated chromatin (ORGANIC) profiles of Saccharomyces cerevisiae Abf1 and Reb1 provide high-resolution maps that are accurate, as defined by the presence of known TF consensus motifs in identified binding sites, that are not biased toward accessible chromatin and that do not require input normalization. We profiled Drosophila melanogaster GAGA factor and Pipsqueak to test ORGANIC performance on larger genomes. Our results suggest that ORGANIC profiling is a widely applicable high-resolution method for sensitive and specific profiling of direct protein-DNA interactions.
Project description:To represent the sequence specificity of transcription factors, the position weight matrix (PWM) is widely used. In most cases, each element is defined as a log likelihood ratio of a base appearing at a certain position, which is estimated from a finite number of known binding sites. To avoid bias due to this small sample size, a certain numeric value, called a pseudocount, is usually allocated for each position, and its fraction according to the background base composition is added to each element. So far, there has been no consensus on the optimal pseudocount value. In this study, we simulated the sampling process by artificially generating binding sites based on observed nucleotide frequencies in a public PWM database, and then the generated matrix with an added pseudocount value was compared to the original frequency matrix using various measures. Although the results were somewhat different between measures, in many cases, we could find an optimal pseudocount value for each matrix. These optimal values are independent of the sample size and are clearly correlated with the entropy of the original matrices, meaning that larger pseudocount vales are preferable for less conserved binding sites. As a simple representative, we suggest the value of 0.8 for practical uses.
Project description:Growth hormone regulates its biological properties via a sequential hormone-induced receptor homodimerization mechanism. Using a mutagenesis-scanning analysis of 81 single and 32 pairwise double mutations, we show that the hormone's two spatially distal receptor binding sites (Site1 and Site2) are allosterically coupled. These allosteric effects are focused among a relatively few residues centered around the interaction between Asp-116 of the hormone and Trp-169 of the receptor in Site2. A rearrangement of this interaction triggered by mutations in Site1 produces both a major conformation and energetic reorganization of Site2, surprisingly without a reduction in overall binding affinity. Additionally, the data suggest a change in the conformational dynamics of several groups in Site2 that appear to be important in defining the Site2 interaction. Changes in binding energy of the affected Site2 residues usually range in magnitude from 3- to 60-fold, but in one case are as large as 10(4).