Project description:L1 transposons occupy 17% of the human genome and are widely exapted for the regulation of human genes, particularly in breast cancer, where we have previously shown abundant cancer-specific transcription factor (TF) binding sites within the L1PA2 subfamily. In the current study, we performed a comprehensive analysis of TF binding activities in primate-specific L1 subfamilies and identified pervasive exaptation events amongst these evolutionarily related L1 transposons. By motif scanning, we predicted diverse and abundant TF binding potentials within the L1 transposons. We confirmed substantial TF binding activities in the L1 subfamilies using TF binding sites consolidated from an extensive collection of publicly available ChIP-seq datasets. Young L1 subfamilies (L1HS, L1PA2 and L1PA3) contributed abundant TF binding sites in MCF7 cells, primarily via their 5' UTR. This is expected as the L1 5' UTR hosts cis-regulatory elements that are crucial for L1 replication and mobilisation. Interestingly, the ancient L1 subfamilies, where 5' truncation was common, displayed comparable TF binding capacity through their 3' ends, suggesting an alternative exaptation mechanism in L1 transposons that was previously unnoticed. Overall, primate-specific L1 transposons were extensively exapted for TF binding in MCF7 breast cancer cells and are likely prominent genetic players modulating breast cancer transcriptional regulation.
Project description:De novo genes emerge from noncoding regions of genomes via succession of mutations. Among others, such mutations activate transcription and create a new open reading frame (ORF). Although the mechanisms underlying ORF emergence are well documented, relatively little is known about the mechanisms enabling new transcription events. Yet, in many species a continuum between absent and very prominent transcription has been reported for essentially all regions of the genome. In this study, we searched for de novo transcripts by using newly assembled genomes and transcriptomes of seven inbred lines of Drosophila melanogaster, originating from six European and one African population. This setup allowed us to detect sample specific de novo transcripts, and compare them to their homologous nontranscribed regions in other samples, as well as genic and intergenic control sequences. We studied the association with transposable elements (TEs) and the enrichment of transcription factor motifs upstream of de novo emerged transcripts and compared them with regulatory elements. We found that de novo transcripts overlap with TEs more often than expected by chance. The emergence of new transcripts correlates with regions of high guanine-cytosine content and TE expression. Moreover, upstream regions of de novo transcripts are highly enriched with regulatory motifs. Such motifs are more enriched in new transcripts overlapping with TEs, particularly DNA TEs, and are more conserved upstream de novo transcripts than upstream their 'nontranscribed homologs'. Overall, our study demonstrates that TE insertion is important for transcript emergence, partly by introducing new regulatory motifs from DNA TE families.
Project description:Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled samples in 46% of the datasets studied, yielding a 99% confidence lower-bound estimate for all studies of 33%. In a separate analysis of a set of datasets concerning a single cohort of subjects, 2/4 had mislabeled samples, indicating laboratory mix-ups rather than data recording errors. While the number of mixed-up samples per study was generally small, because our method can only identify a subset of potential mix-ups, our estimate is conservative for the breadth of the problem. Our findings emphasize the need for more stringent sample tracking, and that re-users of published data must be alert to the possibility of annotation and labelling errors.
Project description:Professional screeners frequently verify photograph IDs in such industries as professional security, bar tending, and sales of age-restricted materials. Moreover, security screening is a vital tool for law enforcement in the search for missing or wanted persons. Nevertheless, previous research demonstrates that novice participants fail to spot fake IDs when they are rare (i.e., the low prevalence effect; LPE). To address whether this phenomenon also occurs with professional screeners, we conducted three experiments. Experiment 1 compared security professional and non-professionals. Experiment 2 compared bar-security professionals, access-security professionals, and non-professionals. Finally, Experiment 3 added a newly created Professional Identity Training Questionnaire to determine whether and how aspects of professionals' employment predict ID-matching accuracy. Across all three experiments, all participants were susceptible to the LPE regardless of professional status. Neither length/type of professional experience nor length/type of training experience affected ID verification performance. We discuss task performance and survey responses with aims to acknowledge and address this potential problem in real-world screening scenarios.
Project description:In many real-world settings, individuals rarely present another person's ID, which increases the likelihood that a screener will fail to detect it. Three experiments examined how within-person variability (i.e., differences between two images of the same person) and feedback may have influenced criterion shifting, thought to be one of the sources of the low-prevalence effect (LPE). Participants made identity judgments of a target face and an ID under either high, medium, or low mismatch prevalence. Feedback appeared after every trial, only error trials, or no trials. Experiment 1 used two controlled images taken on the same day. Experiment 2 used two controlled images taken at least 6 months apart. Experiment 3 used one controlled and one ambient image taken at least 1 year apart. Importantly, receiver operating characteristic curves revealed that feedback and greater within-person variability exacerbated the LPE by affecting both criterion and discriminability. These results carry implications for many real-world settings, such as border crossings and airports, where identity screening plays a major role in securing public safety.
Project description:TFIIH is essential for both RNA polymerase II transcription and DNA repair, and mutations in TFIIH can result in human disease. Here, we determine the molecular architecture of human and yeast TFIIH by an integrative approach using chemical crosslinking/mass spectrometry (CXMS) data, biochemical analyses, and previously published electron microscopy maps. We identified four new conserved "topological regions" that function as hubs for TFIIH assembly and more than 35 conserved topological features within TFIIH, illuminating a network of interactions involved in TFIIH assembly and regulation of its activities. We show that one of these conserved regions, the p62/Tfb1 Anchor region, directly interacts with the DNA helicase subunit XPD/Rad3 in native TFIIH and is required for the integrity and function of TFIIH. We also reveal the structural basis for defects in patients with xeroderma pigmentosum and trichothiodystrophy, with mutations found at the interface between the p62 Anchor region and the XPD subunit.
Project description:Several studies suggested that transcription factor (TF) binding to DNA may be impaired or enhanced by DNA methylation. We present MeDeMo, a toolbox for TF motif analysis that combines information about DNA methylation with models capturing intra-motif dependencies. In a large-scale study using ChIP-seq data for 335 TFs, we identify novel TFs that show a binding behaviour associated with DNA methylation. Overall, we find that the presence of CpG methylation decreases the likelihood of binding for the majority of methylation-associated TFs. For a considerable subset of TFs, we show that intra-motif dependencies are pivotal for accurately modelling the impact of DNA methylation on TF binding. We illustrate that the novel methylation-aware TF binding models allow to predict differential ChIP-seq peaks and improve the genome-wide analysis of TF binding. Our work indicates that simplistic models that neglect the effect of DNA methylation on DNA binding may lead to systematic underperformance for methylation-associated TFs.
Project description:Transcription factors are proteins that recognize specific DNA sequences and affect local transcriptional processes. They are the primary means by which all organisms control specific gene expression. Understanding which DNA sequences a particular transcription factor recognizes provides important clues into the set of genes that they regulate and, through this, their potential biological functions. Insights may be gained through homology searches and genetic means. However, these approaches can be misleading, especially when comparing distantly related organisms or in cases of complicated transcriptional regulation. In this work, we used a biochemistry-based approach to determine the spectrum of DNA sequences specifically bound by the Thermus thermophilus HB8 TetR-family transcription factor TTHB023. The consensus sequence 5'-(a/c)Y(g/t)A(A/C)YGryCR(g/t)T(c/a)R(g/t)-3' was found to have a nanomolar binding affinity with TTHB023. Analyzing the T. thermophilus HB8 genome, several TTHB023 consensus binding sites were mapped to the promoters of genes involved in fatty acid biosynthesis. Notably, some of these were not identified previously through genetic approaches, ostensibly given their potential co-regulation by the Thermus thermophilus HB8 TetR-family transcriptional repressor TTHA0167. Our investigation provides additional evidence supporting the usefulness of a biochemistry-based approach for characterizing putative transcription factors, especially in the case of cooperative regulation.
Project description:Structures of complete 10-subunit yeast TFIIH and of a nested set of subcomplexes, containing 5, 6, and 7 subunits, have been determined by electron microscopy (EM) and 3D reconstruction. Consistency among all the structures establishes the location of the "minimal core" subunits (Ssl1, Tfb1, Tfb2, Tfb4, and Tfb5), and additional densities can be specifically attributed to Rad3, Ssl2, and the TFIIK trimer. These results can be further interpreted by placement of previous X-ray structures into the additional densities to give a preliminary picture of the RNA polymerase II preinitiation complex. In this picture, the key catalytic components of TFIIH, the Ssl2 ATPase/helicase and the Kin28 protein kinase are in proximity to their targets, downstream promoter DNA and the RNA polymerase C-terminal domain.
Project description:In plants, post-transcriptional gene silencing (PTGS) is mediated by DICER-LIKE 1 (DCL1)-dependent microRNAs (miRNAs), which also trigger 21-nucleotide secondary short interfering RNAs (siRNAs) via RNA-DEPENDENT RNA POLYMERASE 6 (RDR6), DCL4 and ARGONAUTE 1 (AGO1), whereas transcriptional gene silencing (TGS) of transposons is mediated by 24-nucleotide heterochromatic (het)siRNAs, RDR2, DCL3 and AGO4 (ref. 4). Transposons can also give rise to abundant 21-nucleotide 'epigenetically activated' small interfering RNAs (easiRNAs) in DECREASED DNA METHYLATION 1 (ddm1) and DNA METHYLTRANSFERASE 1 (met1) mutants, as well as in the vegetative nucleus of pollen grains and in dedifferentiated plant cell cultures. Here we show that easiRNAs in Arabidopsis thaliana resemble secondary siRNAs, in that thousands of transposon transcripts are specifically targeted by more than 50 miRNAs for cleavage and processing by RDR6. Loss of RDR6, DCL4 or DCL1 in a ddm1 background results in loss of 21-nucleotide easiRNAs and severe infertility, but 24-nucleotide hetsiRNAs are partially restored, supporting an antagonistic relationship between PTGS and TGS. Thus miRNA-directed easiRNA biogenesis is a latent mechanism that specifically targets transposon transcripts, but only when they are epigenetically reactivated during reprogramming of the germ line. This ancient recognition mechanism may have been retained both by transposons to evade long-term heterochromatic silencing and by their hosts for genome defence.