Dataset Information

Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses.

ABSTRACT:

Background

Whole-exome sequencing (WES) is a popular next-generation sequencing technology used by numerous laboratories with various levels of statistical and analytical expertise. Centralized databases, such as the Sequence Read Archive and the European Nucleotide Archive, allow data to be reanalyzed by independent labs to confirm results and derive additional insights. Access to new and shared data highlights the necessity for software that both lowers the statistical and analytical expertise required to generate results and promotes reproducible methodology among laboratories.

Findings

We have developed fastq2vcf, a pipeline that automates the genomic variant calling process using multiple callers. Fastq2vcf offers improved flexibility, efficiency, and reproducibility by seamlessly integrating several leading sequencing analysis tools. It outputs not only the annotated variant call set for each caller, but also the consensus variant call set shared by different callers. Furthermore, it can be customized and extended easily.

Conclusions

Our software tool automatically generates executable command lines for a variety of tools required for analyzing WES data. It is also highly configurable and provides users with complete control of the processing procedure, making it easy to submit and track jobs in both single workstation and parallelized computing environments. By using this pipeline, WES analysis can be easily reproduced.

SUBMITTER: Gao X

PROVIDER: S-EPMC4376134 | biostudies-literature | 2015 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses.

Gao Xiaoyi X Xu Jianpeng J Starmer Joshua J

BMC research notes 20150308

<h4>Background</h4>Whole-exome sequencing (WES) is a popular next-generation sequencing technology used by numerous laboratories with various levels of statistical and analytical expertise. Centralized databases, such as the Sequence Read Archive and the European Nucleotide Archive, allow data to be reanalyzed by independent labs to confirm results and derive additional insights. Access to new and shared data highlights the necessity for software that both lowers the statistical and analytical e ...[more]

PMID: 25889517

Similar Datasets

Project description:Exome sequencing provides unprecedented insights into cancer biology and pharmacological response. Here we assess these two parameters for the NCI-60, which is among the richest genomic and pharmacological publicly available cancer cell line databases. Homozygous genetic variants that putatively affect protein function were identified in 1,199 genes (approximately 6% of all genes). Variants that are either enriched or depleted compared to non-cancerous genomes, and thus may be influential in cancer progression and differential drug response were identified for 2,546 genes. Potential gene knockouts are made available. Assessment of cell line response to 19,940 compounds, including 110 FDA-approved drugs, reveals ≈80-fold range in resistance versus sensitivity response across cell lines. 103,422 gene variants were significantly correlated with at least one compound (at p<0.0002). These include genes of known pharmacological importance such as IGF1R, BRAF, RAD52, MTOR, STAT2 and TSC2 as well as a large number of candidate genes such as NOM1, TLL2, and XDH. We introduce two new web-based CellMiner applications that enable exploration of variant-to-compound relationships for a broad range of researchers, especially those without bioinformatics support. The first tool, "Genetic variant versus drug visualization", provides a visualization of significant correlations between drug activity-gene variant combinations. Examples are given for the known vemurafenib-BRAF, and novel ifosfamide-RAD52 pairings. The second, "Genetic variant summation" allows an assessment of cumulative genetic variations for up to 150 combined genes together; and is designed to identify the variant burden for molecular pathways or functional grouping of genes. An example of its use is provided for the EGFR-ERBB2 pathway gene variant data and the identification of correlated EGFR, ERBB2, MTOR, BRAF, MEK and ERK inhibitors. The new tools are implemented as an updated web-based CellMiner version, for which the present publication serves as a compendium.

Project description:Background & aimsA long duration of inflammatory bowel disease (IBD) increases the risk for colorectal cancer. Mutation analysis of limited numbers of genes has indicated that colorectal tumors that develop in patients with IBD differ from those of patients without IBD. We performed whole-exome sequencing analyses to characterize the genetic landscape of these tumors.MethodsWe collected colorectal tumor and non-neoplastic tissues from 31 patients with IBD and colorectal cancer (15 with ulcerative colitis, 14 with Crohn's disease, and 2 with indeterminate colitis) and performed whole-exome sequencing analyses of the microdissected tumor and matched nontumor tissues. We identified somatic alterations by comparing matched specimens. The prevalence of mutations in sporadic colorectal tumors was obtained from previously published exome-sequencing studies.ResultsTwo specimens had somatic mutations in the DNA proofreading or mismatch repair genes POLE, MLH1, and MSH6 and the tumor cells had a hypermutable phenotype. The remaining tumors had, on average, 71 alterations per sample. TP53 was the most commonly mutated gene, with prevalence similar to that of sporadic colorectal tumors (63% of cases). However, tumors from the patients with IBD had a different mutation spectrum. APC and KRAS were mutated at significantly lower rates in tumors from patients with IBD than in sporadic colorectal tumors (13% and 20% of cases, respectively). Several genes were mutated more frequently or uniquely in tumors from patients with IBD, including SOX9 and EP300 (which encode proteins in the WNT pathway), NRG1 (which encodes an ERBB ligand), and IL16 (which encodes a cytokine). Our study also revealed recurrent mutations in components of the Rho and Rac GTPase network, indicating a role for noncanonical WNT signaling in development of colorectal tumors in patients with IBD.ConclusionsColorectal tumors that develop in patients with IBD have distinct genetic features from sporadic colorectal tumors. These findings could be used to develop disease-specific markers for diagnosis and treatment of patients with IBD and colorectal cancer.

Dataset Information

Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses.

Background

Findings

Conclusions

Publications

Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets