Project description:<p>Next generation sequencing has aided characterization of genomic variation. While whole genome sequencing may capture all possible mutations, whole exome sequencing is more cost-effective and captures most phenotype-altering mutations. Initial strategies for exome enrichment utilized a hybridization-based capture approach. Recently, amplicon-based methods were designed to simplify preparation and utilize smaller DNA inputs. We appraised two hybridization capture-based and two amplicon-based whole exome sequencing methods, utilizing both Illumina and Ion Torrent sequencers, comparing on-target alignment, uniformity, and variant calling. While the amplicon methods had higher on-target rates, the hybridization capture-based approaches showed better uniformity. All methods identified many of the same single nucleotide variants, but each amplicon-based method missed variants detected by the other three methods and reported additional variants discordant with all three other technologies. Many of these potential false positives or negatives appear to result from limited coverage, low variant frequency, vicinity to read starts/ends, or the need for platform-specific variant calling algorithms. All methods demonstrated effective copy number variant calling when compared against a single nucleotide polymorphism array. This study illustrates some differences between various whole exome sequencing approaches, highlights the need for selecting appropriate variant calling based on capture method, and will aid laboratories in selecting their preferred approach.</p>
| phs000938 | dbGaP
Project description:Variant calling for cpn60 barcode sequence-based microbiome profiling
Project description:The haplotype map constructed by the International HapMap Project is a valuable source for the studies of disease genes, population structure, and evolution. In the Project, haplotypes have been inferred from experimentally determined genotypes, and are fairly accurate for Caucasians and Africans since the inference was based on the genotypes of trios. However, the inference for the Asians populations was less accurate, because of the lack of familial information. Here we assessed how the error in the inference can affect downstream studies, especially the analysis of recent positive selections, by comparing the results of the analyses using the data of HapMap JPT and of definitive haplotypes (DHaplo-DB) determined by us from a collection of Japanese complete hydatidiform moles (CHM), each of which carries a genome derived from a single sperm. We found that the error in JPT was not uniform throughout the genome, and the statistics for recent positive selection was significantly affected. Keywords: Definitive haplotype determination using CHMs, which carry haploid genomes.
2009-05-12 | GSE12713 | GEO
Project description:variant calling of evolved yeast
Project description:Motivation: Detection of changes in DNA-protein interactions from ChIP-seq data is a crucial step in unraveling the regulatory networks behind biological processes. The simplest variation of this problem is the differential peak calling problem. Here one has to find genomic regions with ChIP-seq signal changes between two cellular conditions in the interaction of a protein with DNA. The great majority of peak calling methods can only analyse one ChIP-seq signal at a time and are unable to perform differential peak calling. Recently, a few approaches based on the combination of these peak callers with statistical tests for detecting differential digital expression have been proposed. However, these methods fail to detect detailed changes of protein-DNA interactions. Results: We propose ODIN; an HMM-based approach to detect and analyse differential peaks in pairs of ChIP-seq data. ODIN performs genomic signal processing, peak calling and p-value calculation in an integrated framework. We also propose an evaluation methodology to compare ODIN with competing methods. The evaluation method is based on the association of differential peaks with expression changes in the same cellular conditions. Our empirical study based on several ChIP-seq experiments from transcription factors, histone modifications and simulated data shows that ODIN outperforms considered competing methods in most scenarios. H3K4me1 and PU.1 occupancy in MPP, CDP, cDC and pDC
Project description:Although most disease associations detected by GWAS are nongenic, very few have been mapped to causal regulatory variants. Here, we present a method for detecting regulatory QTLs that does not require genotyping or whole-genome sequencing. The method combines deep, long-read ChIP-seq with a new statistical test that simultaneously scores peak height correlation and allelic imbalance: the Genotype-independent Signal Correlation and Imbalance (G-SCI) test. We performed histone acetylation ChIP-seq on 57 human lymphoblastoid cell lines and used the resulting reads to call 500,066 SNPs de novo within regulatory elements. The G-SCI test annotated 8,764 of these as histone acetylation QTLs (haQTLs) - an order of magnitude larger than the set of candidates detected by expression QTL analysis. Lymphoblastoid haQTLs were highly predictive of autoimmune disease mechanisms. Thus, our method facilitates large-scale regulatory variant detection in any moderately-sized cohort for which functional profiling data can be generated, thus simplifying identification of causal variants within GWAS loci. We applied our method, named Regulatory Variant Ascertainment and chromatin Regression by sequencing (RegVAR-seq), to 57 cell lines from a single population group. We used the resulting sequence data for variant calling, and validated calls using an independent platform. We then identified histone acetylation QTLs (haQTLs) using a novel statistical test that requires no prior genotype information and combines peak height and allelic imbalance data across the 57 individuals. Transcription factor binding site analysis was used to independently support the functionality of haQTLs. Finally, we examined the association between haQTLs and SNPs associated with human phenotypes.
Project description:<p>Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalogue of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of seventeen individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "platinum" variant catalogue of 4.7 million single nucleotide variants (SNVs) plus 0.7 million small (1-50bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and eleven children of this pedigree. Platinum genotypes are highly concordant with the current catalogue of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%), and add a validated truth catalogue that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("non-platinum") revealed that the majority of these variants are <i>de novo</i> and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.</p>
Project description:The haplotype map constructed by the International HapMap Project is a valuable source for the studies of disease genes, population structure, and evolution. In the Project, haplotypes have been inferred from experimentally determined genotypes, and are fairly accurate for Caucasians and Africans since the inference was based on the genotypes of trios. However, the inference for the Asians populations was less accurate, because of the lack of familial information. Here we assessed how the error in the inference can affect downstream studies, especially the analysis of recent positive selections, by comparing the results of the analyses using the data of HapMap JPT and of definitive haplotypes (DHaplo-DB) determined by us from a collection of Japanese complete hydatidiform moles (CHM), each of which carries a genome derived from a single sperm. We found that the error in JPT was not uniform throughout the genome, and the statistics for recent positive selection was significantly affected. Keywords: Definitive haplotype determination using CHMs, which carry haploid genomes. 100 CHM samples collected throughout Japan were analyzed by Affymetrix Genechip Mapping 500K Set array.