Dataset Information

Accelerated knowledge discovery from omics data by optimal experimental design

ABSTRACT: Accelerated knowledge discovery from omics data by optimal experimental design

PROVIDER: PRJNA604190 | ENA |

REPOSITORIES: ENA

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	SRR10994900.fastq.gz	Fastqsanger.gz
	SRR10994901.fastq.gz	Fastqsanger.gz
	SRR10994902.fastq.gz	Fastqsanger.gz
	SRR10994903.fastq.gz	Fastqsanger.gz
	SRR10994904.fastq.gz	Fastqsanger.gz

Items per page:

1 - 5 of 135

Similar Datasets

Project description:BACKGROUND: Microarray comparative genomic hybridization (CGH) is currently one of the most powerful techniques to measure DNA copy number in large genomes. In humans, microarray CGH is widely used to assess copy number variants in healthy individuals and copy number aberrations associated with various diseases, syndromes and disease susceptibility. In model organisms such as Caenorhabditis elegans (C. elegans) the technique has been applied to detect mutations, primarily deletions, in strains of interest. Although various constraints on oligonucleotide properties have been suggested to minimize non-specific hybridization and improve the data quality, there have been few experimental validations for CGH experiments. For genomic regions where strict design filters would limit the coverage it would also be useful to quantify the expected loss in data quality associated with relaxed design criteria. RESULTS: We have quantified the effects of filtering various oligonucleotide properties by measuring the resolving power for detecting deletions in the human and C. elegans genomes using NimbleGen microarrays. Approximately twice as many oligonucleotides are typically required to be affected by a deletion in human DNA samples in order to achieve the same statistical confidence as one would observe for a deletion in C. elegans. Surprisingly, the ability to detect deletions strongly depends on the oligonucleotide 15-mer count, which is defined as the sum of the genomic frequency of all the constituent 15-mers within the oligonucleotide. A similarity level above 80% to non-target sequences over the length of the probe produces significant cross-hybridization. We recommend the use of a fairly large melting temperature window of up to 10 C, the elimination of repeat sequences, the elimination of homopolymers longer than 5 nucleotides, and a threshold of -1 kcal/mol on the oligonucleotide self-folding energy. We observed very little difference in data quality when varying the oligonucleotide length between 50 and 70, and even when using an isothermal design strategy. CONCLUSIONS: We have determined experimentally the effects of varying several key oligonucleotide microarray design criteria for detection of deletions in C. elegans and humans with NimbleGen's CGH technology. Our oligonucleotide design recommendations should be applicable for CGH analysis in most species.

Project description:Transcriptional enhancers act as docking stations for combinations of transcription factors and thereby regulate spatiotemporal activation of their target genes. A single enhancer, of a few hundred base pairs in length, can autonomously and independently of its location and orientation drive cell-type specific expression of a gene or transgene. It has been a long-standing goal in the field to decode the regulatory logic of an enhancer and to understand the details of how spatiotemporal gene expression is encoded in an enhancer sequence. Recently, deep learning models have yielded unprecedented insight into the enhancer code, and well-trained models are reaching a level of understanding that may be close to complete. As a consequence, we hypothesized that deep learning models can be used to guide the directed design of synthetic, cell type specific enhancers, and that this process would allow for a detailed tracing of all enhancer features at nucleotide-level resolution. Here we implemented and compared three different design strategies, each built on a deep learning model: (1) directed sequence evolution; (2) directed iterative motif implanting; and (3) generative design. We evaluated the function of fully synthetic enhancers to specifically target Kenyon cells or glial cells in the fruit fly brain using transgenic animals. We then exploited this concept further by creating “dual-code” enhancers that target two cell types, and minimal enhancers smaller than 50 base pairs that are fully functional. By examining the trajectories followed during state space searches towards functional enhancers, we could accurately define the enhancer code as the optimal strength, combination, and relative distance of TF activator motifs, and the absence of TF repressor motifs. Finally, we applied the same three strategies to successfully design human enhancers, finding highly similar design principles as in Drosophila. In conclusion, enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.

Dataset Information

Accelerated knowledge discovery from omics data by optimal experimental design

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets