biostudies-arrayexpress001430Raluca GordanHomo sapienshttps://www.ebi.ac.uk/biostudies/studies/E-GEOD-59845Accurate predictions of the DNA binding specificities of transcription factors (TFs) are necessary for understanding gene regulatory mechanisms. Traditionally, predictive models are built based on nucleotide sequence features. Here, we employed three- dimensional DNA shape information obtained on a high-throughput basis to integrate intuitive DNA structural features into the modeling of TF binding specificities using support vector regression. We performed quantitative predictions of DNA binding specificities, using the DREAM5 dataset for 65 mouse TFs and genomic-context protein binding microarray data for three human basic helix-loop-helix TFs. DNA shape-augmented models compared favorably with sequence-based models for these predictions. Although both k-mer and DNA shape features encoded the interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space compared to k-mer use. Finally, analyzing the weights of DNA shape-augmented models uncovered TF family- specific structural readout mechanisms that were not obvious from the nucleotide sequence. Three genomic-context protein binding microarray (gcPBM) experiments of human transcription factors were performed. Briefly, the gcPBMs involved binding his-tagged transcription factors c-Myc, Max, and Mad1(Mxd1) to double-stranded 180K Agilent microarrays in order to determine their binding specificity for putative DNA binding sites in native genomic context. Briefly, we represent three categories of 36-bp sequences: 1) bound probes, 2) unbound probes (or negative controls), and 3) test probes. Bound probes corresponded to genomic regions bound in vivo by c-Myc, Max, or Mad2 (ChIP-seq P < 10^(-10) in HeLaS3 or K562 celld (ENCODE)) that contain at least two consecutive 8-mers with universal PBM E-score > 0.4 (Munteanu and Gordan, LNCS 2013). All putative binding sites occur at the same position within the probes on the array. M-bM-^@M-^\UnboundM-bM-^@M-^] probes corresponded to genomic regions with ChIP-seq P < 10^(-10) and a maximum 8-mer E-score < 0.2. We also designed test probes that contain, within constant flanking regions, all nnCACGTGnn 10-mers and 18 nnnCACGTGnnn 12-mers (where n = A, C, G, or T). Each DNA sequence represented on the array is present in 6 replicate spots. We report the gcPBM signal intensity for each spot. The PBM protocol is described in Berger et al., Nature Biotechnology 2006 (PMID 16998473).biostudies-arrayexpressLabeling - Proteins were tagged with N-terminal His by cloning. Protein-bound arrays were incubated with Alexa-488-conjugated rabbit polyclonal antibody to His (Invitrogen).Growth Protocol - Plasmids were transformed into BL21 (DE3) expression strain of E. coli (New England BioLabs). Overnight bacterial culture (5 mL) was diluted into 1 L LB (10 g BACTO Tryptone (BD); 5 g BACTO Yeast Extract (BD); 10 g NaCl; pH 7.5) and grown to OD 0.6 (600 nm visible light). Protein expression was then induced with 1 mM isopropyl M-NM-2-D-1-thiogalactopyranoside (IPTG) for 3 hours at 37M-:C. Cells were pelleted by centrifugation @ 12,000 rpm for 10 min and stored at -20M-:C. Each pellet (from 1 L culture) was thawed in 15 ml lysis buffer (150 mM Tris-HCl, pH 8.0; 150 mM NaCl; 2 mM dithiothreitol (DTT); 1 tablet complete Mini, EDTA-free protease inhibitor cocktail (Roche, #11 836 170 001); 50 uM C4H10O6Zn (zinc acetate); 1 mg/ml chicken egg white lysozyme (Sigma, # L6876)) and subjected to two freeze-thaw cycles in dry ice/ethanol. To digest DNA, to each 15 ml lysis solution was added 10 uL recombinant, RNase-free DNAse I (Roche, #04 716 728 001, 10 U/ul), 30 uL 1 M MgCl2, and 2 mL 10% Triton X-100, and digested at room temperature until solution was runny (~1 hr). Solutions were centrifuged at 12,000 rpm for 25 min at 4M-:C, and supernatant was used for protein purification.Scaning - Protein-bound microarrays were scanned to detect Alexa-488-conjugated antibody (488 nm ex, 522 nm em) using at least three different laser power settings to best capture a broad range of signal intensities and ensure signal intensities below saturation for all spots. Microarray TIF images were analyzed using GenePix Pro version 6.0 software (Molecular Devices), bad spots were manually flagged and removed, and data from multiple Alexa 488 scans of the same slide were combined using masliner (MicroArray LINEar Regression) software.Hybridization - Double-stranded microarrays were first pre-moistened in PBS / 0.01% Triton X-100 for 5 min and blocked with PBS / 2% (wt/vol) nonfat dried milk (Sigma) for 1 h. Microarrays were then washed once with PBS / 0.1% (vol/vol) Tween-20 for 5 min and once with PBS / 0.01% Triton X-100 for 2 min. Proteins were diluted to 200 or 100 nM in a 175-M-NM-<l protein binding reaction containing PBS / 2% (wt/vol) milk / 51.3 ng/M-NM-<l salmon testes DNA (Sigma) / 0.2 M-NM-<g/M-NM-<l bovine serum albumin (New England Biolabs). Preincubated protein binding mixtures were applied to individual chambers of a four-chamber gasket cover slip in a steel hybridization chamber (Agilent), and the assembled microarrays were incubated for 1 h at room temperature. Microarrays were again washed once with PBS / 0.5% (vol/vol) Tween-20 for 3 min, and then once with PBS / 0.01% Triton X-100 for 2 min. Alexa-488-conjugated rabbit polyclonal antibody to His (Invitrogen) was diluted to 50 M-NM-<g/ml in PBS / 2% milk and applied to a single-chamber gasket cover slip (Agilent), and the assembled microarrays were again incubated for 1 h at 20M-0C. Finally, microarrays were washed twice with PBS / 0.05% (vol/vol) Tween-20 for 3 min each, and once in PBS for 2 min. After each hour-long incubation step, microarrays and cover slips were disassembled in a staining dish filled with 500 ml of the first wash solution. All washes were performed in Coplin jars on an orbital shaker at 125 r.p.m. Immediately following each series of washes, microarrays were rinsed in PBS (slowly removed over approximately 10 seconds) to ensure removal of detergent and uniform drying.Nucleic Acid Extraction - Full-length open reading frames were cloned into the Gateway pDEST17 (N-terminal His-tag) expression vector by recombinational cloning from previously created pENTR clones. All proteins were produced by over-expression in E. coli BL21 (DE3) cells (New England BioLabs), and purified by FPLC (AKTAprime plus) using HisTrap FF affinity columns (GE Healthcare). Anti-His Western blots were performed to assess protein quality and concentration.MIAME ScoreRaw DataOrganizationAssays and DataProcessed DataMAGE-TAB FilesArray DesignsData Transformation - To correct for any possible non-uniformities in protein binding, we adjusted the Alexa 488 signals according to their positions on the microarray. We calculated the median normalized intensity of the 15 x 15 block centered on each spot and divided the spot's signal by the ratio of the median within the block to the median over the entire chamber. ID_REF = VALUE = Normalized signal intensityUnknownTranscriptomicsGenomicsProteomicsUntil now, it has been reasonably assumed that specific base-pair recognition is the only mechanism controlling the specificity of transcription factor (TF)-DNA binding. Contrary to this assumption, here we show that nonspecific DNA sequences possessing certain repeat symmetries, when present outside of specific TF binding sites (TFBSs), statistically control TF-DNA binding preferences. We used high-throughput protein-DNA binding assays to measure the binding levels and free energies of binding for several human TFs to tens of thousands of short DNA sequences with varying repeat symmetries. Based on statistical mechanics modeling, we identify a new protein-DNA binding mechanism induced by DNA sequence symmetry in the absence of specific base-pair recognition, and experimentally demonstrate that this mechanism indeed governs protein-DNA binding preferences.ChIP-chip by arrayHomo sapiensProtein-DNA binding in the absence of specific base-pair recognition.Afek A, Schipper JL, Horton J, GordM-oM-?M-=n R, Lukatsky DBRaluca Gordan143falseQuantitative modeling of transcription factor binding specificities using DNA shapeAccurate predictions of the DNA binding specificities of transcription factors (TFs) are necessary for understanding gene regulatory mechanisms. Traditionally, predictive models are built based on nucleotide sequence features. Here, we employed three- dimensional DNA shape information obtained on a high-throughput basis to integrate intuitive DNA structural features into the modeling of TF binding specificities using support vector regression. We performed quantitative predictions of DNA binding specificities, using the DREAM5 dataset for 65 mouse TFs and genomic-context protein binding microarray data for three human basic helix-loop-helix TFs. DNA shape-augmented models compared favorably with sequence-based models for these predictions. Although both k-mer and DNA shape features encoded the interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space compared to k-mer use. Finally, analyzing the weights of DNA shape-augmented models uncovered TF family- specific structural readout mechanisms that were not obvious from the nucleotide sequence. Three genomic-context protein binding microarray (gcPBM) experiments of human transcription factors were performed. Briefly, the gcPBMs involved binding his-tagged transcription factors c-Myc, Max, and Mad1(Mxd1) to double-stranded 180K Agilent microarrays in order to determine their binding specificity for putative DNA binding sites in native genomic context. Briefly, we represent three categories of 36-bp sequences: 1) bound probes, 2) unbound probes (or negative controls), and 3) test probes. Bound probes corresponded to genomic regions bound in vivo by c-Myc, Max, or Mad2 (ChIP-seq P < 10^(-10) in HeLaS3 or K562 celld (ENCODE)) that contain at least two consecutive 8-mers with universal PBM E-score > 0.4 (Munteanu and Gordan, LNCS 2013). All putative binding sites occur at the same position within the probes on the array. M-bM-^@M-^\UnboundM-bM-^@M-^] probes corresponded to genomic regions with ChIP-seq P < 10^(-10) and a maximum 8-mer E-score < 0.2. We also designed test probes that contain, within constant flanking regions, all nnCACGTGnn 10-mers and 18 nnnCACGTGnnn 12-mers (where n = A, C, G, or T). Each DNA sequence represented on the array is present in 6 replicate spots. We report the gcPBM signal intensity for each spot. The PBM protocol is described in Berger et al., Nature Biotechnology 2006 (PMID 16998473).2014-11-04T00:00:00Z2023-09-10T04:06:32.727Z2022-03-14T13:03:25.291ZE-GEOD-59845GSE5984525313048EFO_000276010.1073/pnas.1410569111