Project description:We report genes induced under transient hypoxia in an ELT-2 dependent manner Adult C. elegans were exposed to transient hypoxia and fed ELT-2 RNAi. Normoxia and Empty Vector RNAi were performed as controls
Project description:Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.
Project description:ELT-2 is the major transcription factor required for Caenorhabditis elegans intestinal development. It initiates in embryos to promote development then persists after hatching through larval and adult stages. Though the sites of ELT-2 binding are characterized and the transcriptional changes that result from ELT-2 depletion are described, a major missing piece has been the lack of an intestine-specific transcriptome profile over developmental time. We generated this dataset by Fluorescence Activated Cell Sorting (FACS) intestine cells at distinct developmental stages. We analyzed this dataset in conjunction with previously conducted ELT-2 studies to evaluate ELT-2’s role in directing the intestinal regulatory network through development. We found that only 33% of intestine-enriched genes in the embryo were direct targets of ELT-2 but that number increased to 75% by the L3 stage. This suggests additional transcription factors promote intestinal transcription especially in the embryo. Furthermore, only half of ELT-2’s direct target genes were dependent on ELT-2 for their proper expression levels, and an equal proportion of those responded to elt-2 depletion with over-expression as with under-expression. That is, ELT-2 can either activate or repress direct target genes. Indeed, we observed that ELT-2 repressed its own promoter, implicating new models for its autoregulation. Together, our results illustrate that ELT-2 impacts roughly 20 – 50% of intestine-specific genes, that ELT-2 both positively and negatively controls its direct targets, and that our current model of the intestinal regulatory network is incomplete as the factors responsible for directing the expression of many intestinal genes remain unknown.
Project description:Identification of the molecular lesion in Caenorhabditis elegans mutants isolated through forward genetic screens usually involves time-consuming genetic mapping. We used Illumina deep sequencing technology to sequence a complete, mutant C. elegans genome and thus pinpointed a single-nucleotide mutation in the genome that affects a neuronal cell fate decision. This constitutes a proof-of-principle for using whole-genome sequencing to analyze C. elegans mutants.
Project description:Genomic rearrangements cause congenital disorders, cancer, and complex diseases in human. Yet, they are still understudied in rare diseases because their detection is challenging, despite the advent of whole genome sequencing (WGS) technologies. Short-read (srWGS) and long-read WGS approaches are regularly compared, and the latter is commonly recommended in studies focusing on genomic rearrangements. However, srWGS is currently the most economical, accurate, and widely supported technology. In Caenorhabditis elegans (C. elegans), such variants, induced by various mutagenesis processes, have been used for decades to balance large genomic regions by preventing chromosomal crossover events and allowing the maintenance of lethal mutations. Interestingly, those chromosomal rearrangements have rarely been characterized on a molecular level. To evaluate the ability of srWGS to detect various types of complex genomic rearrangements, we sequenced three balancer strains using short-read Illumina technology. As we experimentally validated the breakpoints uncovered by srWGS, we showed that, by combining several types of analyses, srWGS enables the detection of a reciprocal translocation (eT1), a free duplication (sDp3), a large deletion (sC4), and chromoanagenesis events. Thus, applying srWGS to decipher real complex genomic rearrangements in model organisms may help designing efficient bioinformatics pipelines with systematic detection of complex rearrangements in human genomes.
Project description:RNA viruses are the etiological agents of many infectious diseases. Since RNA viruses are error-prone during genome replication, rapid, accurate and economical whole RNA viral genome sequence determination is highly demanded. Next-generation sequencing (NGS) techniques perform whole viral genome sequencing due to their high-throughput sequencing capacity. However, the NGS techniques involve a significant burden for sample preparation. Since to generate complete viral genome coverage, genomic nucleic acid enrichment is required by reverse transcription PCR using virus-specific primers or by viral particle concentration. Furthermore, conventional NGS techniques cannot determine the 5' and 3' terminal sequences of the RNA viral genome. Therefore, the terminal sequences are determined one by one using rapid amplification of cDNA ends (RACE). However, since some RNA viruses have segmented genomes, the burden of the determination using RACE is proportional to the number of segments. To date, there is only one study attempting whole genome sequencing of multiple RNA viruses without using above mentioned methods, but the generated sequences' accuracy compared to the reference sequences was up to 97% and did not reach 100% due to the low read depth. Hence, we established novel methods, named PCR-NGS and RCA-NGS, that were optimized for an NGS machine, MinION. These methods do not require nucleic acid amplification with virus-specific PCR primers, physical viral particle enrichment, and RACE. These methods enable whole RNA viral genome sequencing by combining the following techniques: (1) removal of unwanted DNA and RNA other than the RNA viral genome by nuclease treatment; (2) the terminal of viral genome sequence determination by barcoded linkers ligation; (3) amplification of the viral genomic cDNA using ligated linker sequences-specific PCR or an isothermal DNA amplification technique, such as rolling circle amplification (RCA). The established method was evaluated using isolated RNA viruses with single-stranded, double-stranded, positive-stranded, negative-stranded, non-segmented or multi-segmented genomes. As a result, all the viral genome sequences could be determined with 100% accuracy, and these mean read depths were greater than 2,500×, at least using either of the methods. This method should allow for easy and economical determination of accurate RNA viral genomes.
Project description:To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p <5×10 -8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p <1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR <0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
Project description:To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis-eQTL variant-gene transcript (eGene) pairs at p < 5 × 10-8 (2,855,111 unique cis-eQTL variants and 15,982 unique eGenes) and 1,469,754 trans-eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans-eQTL variants and 7233 unique eGenes). In addition, 442,379 cis-eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis-eGenes are enriched for immune functions (FDR < 0.05). The cis-eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
Project description:To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p < 5x10 - 8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR < 0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.
Project description:BackgroundIntragenic modifiers (in-phase, second-site variants) are known to have dramatic effects on clinical outcomes, affecting disease attributes such as severity or age of onset. However, despite their clinical importance, the focus of many genetic screens in model systems is on the discovery of extragenic variants, with many labs still relying upon more traditional methods to identify modifiers. However, traditional methods such as PCR and Sanger sequencing can be time-intensive and do not permit a thorough understanding of the intragenic modifier effects in the context of non-isogenic genomic backgrounds.ResultsHere, we apply high throughput approaches to identify and understand intragenic modifiers using Caenorhabditis elegans. Specifically, we applied whole genome sequencing (WGS) to a mutagen-induced forward genetic screen to identify intragenic suppressors of a temperature-sensitive zyg-1(it25) allele in C. elegans. ZYG-1 is a polo kinase that is important for centriole function and cell divisions, and mutations that truncate its human orthologue, PLK4, have been associated with microcephaly. Combining WGS and CRISPR/Cas9, we rapidly identify intragenic modifiers, show that these variants are distributed non-randomly throughout zyg-1 and that genomic context plays an important role on phenotypic outcomes.ConclusionsUltimately, our work shows that WGS facilitates high-throughput identification of intragenic modifiers in clinically relevant genes by reducing hands-on research time and overall costs and by allowing thorough understanding of the intragenic phenotypic effects in the context of different genetic backgrounds.