Project description:BACKGROUND:Understanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation. The most widely used methods for collecting variant information at the DNA-level include whole genome sequencing, which remains costly, and the more economical solution of array-based techniques, as these are capable of simultaneously genotyping a pre-selected set of variable DNA sites in the human genome. The largest publicly accessible set of human genomic sequence data available today originates from exome sequencing that comprises around 1.2% of the whole genome (approximately 30 million base pairs). RESULTS:To unbiasedly compare the effect of SNP selection strategies in population genetic analysis we subsampled the variants of the same highly curated 1 K Genome dataset to mimic genome, exome sequencing and array data in order to eliminate the effect of different chemistry and error profiles of these different approaches. Next we compared the application of the exome dataset to the array-based dataset and to the gold standard whole genome dataset using the same population genetic analysis methods. CONCLUSIONS:Our results draw attention to some of the inherent problems that arise from using pre-selected SNP sets for population genetic analysis. Additionally, we demonstrate that exome sequencing provides a better alternative to the array-based methods for population genetic analysis. In this study, we propose a strategy for unbiased variant collection from exome data and offer a bioinformatics protocol for proper data processing.
Project description:We present CANOES, an algorithm for the detection of rare copy number variants from exome sequencing data. CANOES models read counts using a negative binomial distribution and estimates variance of the read counts using a regression-based approach based on selected reference samples in a given dataset. We test CANOES on a family-based exome sequencing dataset, and show that its sensitivity and specificity is comparable to that of XHMM. Moreover, the method is complementary to Gaussian approximation-based methods (e.g. XHMM or CoNIFER). When CANOES is used in combination with these methods, it will be possible to produce high accuracy calls, as demonstrated by a much reduced and more realistic de novo rate in results from trio data.
Project description:Suicide is one of the leading causes of mortality worldwide; it causes the death of more than one million patients each year. Suicide is a complex, multifactorial phenotype with environmental and genetic factors contributing to the risk of the forthcoming suicide. These factors first generally lead to mental disorders, such as depression, schizophrenia and bipolar disorder, which then become the direct cause of suicide. Here we present a high quality dataset (including processed BAM and VCF files) gained from the high-throughput whole-exome Illumina sequencing of 23 suicide victims - all of whom had suffered from major depressive disorder - and 21 control patients to a depth of at least 40-fold coverage in both cohorts. We identified ~130,000 variants per sample and altogether 442,270 unique variants in the cohort of 44 samples. To our best knowledge, this is the first whole-exome sequencing dataset from suicide victims. We expect that this dataset provides useful information for genomic studies of suicide and depression, and also for the analysis of the Hungarian population.
Project description:The genetics of late-onset Alzheimer's disease (AD) is complex due to the heterogeneous nature of the disorder. APOE*4 is the strongest genetic risk factor for AD. Genome-wide association studies have identified more than 30 additional loci, each having relatively small effect size. Known AD loci explain only about 30% of the genetic variance, and thus much of the genetic variance remains unexplained. To identify some of the missing heritability of AD, we analyzed whole-exome sequencing (WES) data focusing on non-APOE*4 carriers from two WES datasets: 720 cases and controls from the University of Pittsburgh and 7,252 cases and controls from the Alzheimer's Disease Sequencing Project. Following separate WES analyses in each dataset, we performed meta-analysis for overlapping markers present in both datasets. Among the four variants reaching the exome-wide significance threshold, three were from known AD loci: APOE/rs7412 (odds ratio (OR)?=?0.40; p?=?5.46E-24), TOMM40/rs157581 (OR?=?1.49; p?=?4.04E-07), and TREM2/rs75932628 (OR?=?4.00; p?=?1.15E-07). The fourth significant variant, rs199533, was from a novel locus on chromosome 17 in the NSF gene (OR?=?0.78; p?=?2.88E-07). NSF was also significant in the gene-based analysis (p?=?1.20E-05). In the GTEx data, NSF/rs199533 is a cis-eQTL for multiple genes in the brain and blood, including NSF that is highly expressed across all brain tissues, including regions that typically show amyloid-? accumulation. Further characterization of genes that are affected by NSF/rs199533 may help to shed light on the roles of these genes in AD etiology.
Project description:Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data.Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%).We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
Project description:We present an easy-to-use, open-source Optimised Exome analysis tool, OpEx (http://icr.ac.uk/opex) that accurately detects small-scale variation, including indels, to clinical standards. We evaluated OpEx performance with an experimentally validated dataset (the ICR142 NGS validation series), a large 1000 exome dataset (the ICR1000 UK exome series), and a clinical proband-parent trio dataset. The performance of OpEx for high-quality base substitutions and short indels in both small and large datasets is excellent, with overall sensitivity of 95%, specificity of 97% and low false detection rate (FDR) of 3%. Depending on the individual performance requirements the OpEx output allows one to optimise the inevitable trade-offs between sensitivity and specificity. For example, in the clinical setting one could permit a higher FDR and lower specificity to maximise sensitivity. In contexts where experimental validation is not possible, minimising the FDR and improving specificity may be a preferable trade-off for slightly lower sensitivity. OpEx is simple to install and use; the whole pipeline is run from a single command. OpEx is therefore well suited to the increasing research and clinical laboratories undertaking exome sequencing, particularly those without in-house dedicated bioinformatics expertise.
Project description:OBJECTIVE:Currently, there is a disconnect between finding a patient's relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics. METHODS AND MATERIALS:The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment. RESULTS:IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies. CONCLUSION:IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine.IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT.
Project description:Variants associated with Parkinson's disease (PD) have generally a small effect size and, therefore, large sample sizes or targeted analyses are required to detect significant associations in a whole exome sequencing (WES) study. Here, we used protein-protein interaction (PPI) information on 36 genes with established or suggested associations with PD to target the analysis of the WES data. We performed an association analysis on WES data from 439 Finnish PD subjects and 855 controls, and included a Finnish population cohort as the replication dataset with 60 PD subjects and 8214 controls. Single variant association (SVA) test in the discovery dataset yielded 11 candidate variants in seven genes, but the associations were not significant in the replication cohort after correction for multiple testing. Polygenic risk score using variants rs2230288 and rs2291312, however, was associated to PD with odds ratio of 2.7 (95% confidence interval 1.4-5.2; p?<?2.56e-03). Furthermore, an analysis of the PPI network revealed enriched clusters of biological processes among established and candidate genes, and these functional networks were visualized in the study. We identified novel candidate variants for PD using a gene prioritization based on PPI information, and described why these variants may be involved in the pathogenesis of PD.
Project description:Recently, vacuolar protein sorting 35 (VPS35) and eukaryotic translation initiation factor 4 gamma 1 (EIF4G1) have been identified as 2 causal Parkinson disease (PD) genes. We used whole exome sequencing for rapid, parallel analysis of variations in these 2 genes.We performed whole exome sequencing in 213 patients with PD and 272 control individuals. Those rare variants (RVs) with <5% frequency in the exome variant server database and our own control data were considered for analysis. We performed joint gene-based tests for association using RVASSOC and SKAT (Sequence Kernel Association Test) as well as single-variant test statistics.We identified 3 novel VPS35 variations that changed the coded amino acid (nonsynonymous) in 3 cases. Two variations were in multiplex families and neither segregated with PD. In EIF4G1, we identified 11 (9 nonsynonymous and 2 small indels) RVs including the reported pathogenic mutation p.R1205H, which segregated in all affected members of a large family, but also in 1 unaffected 86-year-old family member. Two additional RVs were found in isolated patients only. Whereas initial association studies suggested an association (p = 0.04) with all RVs in EIF4G1, subsequent testing in a second dataset for the driving variant (p.F1461) suggested no association between RVs in the gene and PD.We confirm that the specific EIF4G1 variation p.R1205H seems to be a strong PD risk factor, but is nonpenetrant in at least one 86-year-old. A few other select RVs in both genes could not be ruled out as causal. However, there was no evidence for an overall contribution of genetic variability in VPS35 or EIF4G1 to PD development in our dataset.
Project description:Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive.The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results.We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism.On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively), thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes.Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.