Transcriptome sequencing of human hepatocellular carcinoma
ABSTRACT: Deep high-throughput transcriptome sequencing (RNA-seq) performed on 3 pairs of matched tumor and adjacent non-tumorours (NT) tissues from HCC patients of Chinese origin generated 183.6-million reads that could be aligned. We discovered a number of differentially expressed genes and multiple types of somatic single nucleotide variations (SNVs) in expressed genes. After the removal of the error alignments, high-quality reads were mapped to the human reference sequence (GRCh37/hg19) using three different softwares TopHat, Burrows-Wheeler Aligner (BWA) and CLC Genomics Workbench (CLC). The high-quality variants were identified using VarScan with the following parameters: minimum coverage depth of 10, variation frequency of more than 30% and base quality of more than 15. A total of 568, 545 and 494 potential somatic single nucleotide variants (SNVs), including 94, 89 and 101 coding somatic SNVs (cSNVs), were identified in 3 tumor samples HCC448T, HCC473T and HCC510T, respectively. Validation analysis was carried out for 10 of the intersected cSNVs (all are non-synonymous substitutions) within selected genes of interests with the majority confirmed. Examination of 3 paired human hepatocellular carcinoma and matched non-tumor tissues
Project description:Deep high-throughput transcriptome sequencing (RNA-seq) performed on 3 pairs of matched tumor and adjacent non-tumorours (NT) tissues from HCC patients of Chinese origin generated 183.6-million reads that could be aligned. We discovered a number of differentially expressed genes and multiple types of somatic single nucleotide variations (SNVs) in expressed genes. After the removal of the error alignments, high-quality reads were mapped to the human reference sequence (GRCh37/hg19) using three different softwares TopHat, Burrows-Wheeler Aligner (BWA) and CLC Genomics Workbench (CLC). The high-quality variants were identified using VarScan with the following parameters: minimum coverage depth of 10, variation frequency of more than 30% and base quality of more than 15. A total of 568, 545 and 494 potential somatic single nucleotide variants (SNVs), including 94, 89 and 101 coding somatic SNVs (cSNVs), were identified in 3 tumor samples HCC448T, HCC473T and HCC510T, respectively. Validation analysis was carried out for 10 of the intersected cSNVs (all are non-synonymous substitutions) within selected genes of interests with the majority confirmed. Examination of 3 paired human hepatocellular carcinoma and matched non-tumor tissues
Project description:Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants. We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs - Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP. MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.
Project description:Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%-50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license.
Project description:MOTIVATION: With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors. Four recently published algorithms for the detection of somatic SNV sites in matched cancer-normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka. In this analysis, we apply these four SNV calling algorithms to cancer-normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient. The candidate SNV sites returned by each algorithm are filtered to remove likely false positives, then characterized and compared to investigate the strengths and weaknesses of each SNV calling algorithm. RESULTS: Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates. AVAILABILITY: Data accession number SRA081939, code at http://code.google.com/p/snv-caller-review/ CONTACT: firstname.lastname@example.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Project description:Somatic mosaicism refers to the existence of somatic mutations in a fraction of somatic cells in a single biological sample. Its importance has mainly been discussed in theory although experimental work has started to emerge linking somatic mosaicism to disease diagnosis. Through novel statistical modeling of paired-end DNA-sequencing data using blood-derived DNA from healthy donors as well as DNA from tumor samples, we present an ultra-fast computational pipeline, LocHap that searches for multiple single nucleotide variants (SNVs) that are scaffolded by the same reads. We refer to scaffolded SNVs as local haplotypes (LH). When an LH exhibits more than two genotypes, we call it a local haplotype variant (LHV). The presence of LHVs is considered evidence of somatic mosaicism because a genetically homogeneous cell population will not harbor LHVs. Applying LocHap to whole-genome and whole-exome sequence data in DNA from normal blood and tumor samples, we find wide-spread LHVs across the genome. Importantly, we find more LHVs in tumor samples than in normal samples, and more in older adults than in younger ones. We confirm the existence of LHVs and somatic mosaicism by validation studies in normal blood samples. LocHap is publicly available at http://www.compgenome.org/lochap.
Project description:BACKGROUND:Previous publications indicated that genetic predisposition might play important roles in the onset of osteonecrosis of the femoral head (ONFH) in systemic lupus erythematosus (SLE). Some gene loci such as complement C3d receptor 2 (CR2), nitric oxide synthase 3 (NOS3), collagen type II alpha 1 chain (COL2A1), protein tyrosine phosphatase non-receptor type 22 (PTPN22), and transient receptor potential cation channel subfamily V member 4 (TRPV4) were reported to be involved in this process. AIM:To investigate whether the risk of ONFH in SLE is associated with single nucleotide variations (SNVs) in these five genes. METHODS:SNVs in the CR2, NOS3, COL2A1, PTPN22, and TRPV4 genes were examined by using FastTarget and Illumina Miseq sequencing technologies in 49 cases of SLE with ONFH. Burrows-wheeler aligner was used to align the sequencing reads to hg19, and GATK and Varscan programs were used to perform SNV calling. PolyPhen-2, SIFT, and MutationTaster were used to assess the functional effects of non-synonymous SNVs. RESULTS:Six of the 49 patients were confirmed to have low frequency SNVs, including one patient with SNVs in NOS3 (exon 6: c.814G>A: p.E272K and exon 7: c.814G>A: p.E272K.), four in COL2A1 (rs41263847: exon 29: c.1913C>T: p.T638I, exon 28: c.1706C>T: p.T569I, and rs371445823: exon 8: c.580G>A: p.A194T, exon 7: c.373G>A: p.A125T), and one in CR2 (rs45573035: exon 2: c.200C>G: p.T67S). CONCLUSION:The onset of ONFH in SLE might be associated with the identified SNVs in NOS3, COL2A1, and CR2.
Project description:The aim of the present study was to identify potential key genes and single nucleotide variations (SNVs) in prostate cancer. RNA sequencing (RNA-seq) data, GSE22260, were downloaded from the Gene Expression Omnibus database, including 4 prostate cancer samples and 4 normal tissues samples. RNA-Seq reads were processed using Tophat and differentially-expressed genes (DEGs) were identified using the Cufflinks package. Gene Ontology enrichment analysis of DEGs was performed. Subsequently, Seqpos was used to identify the potential upstream regulatory elements of DEGs. SNV was analyzed using Genome Analysis Toolkit. In addition, the frequency and risk-level of mutant genes were calculated using VarioWatch. A total of 150 upregulated and 211 downregulated DEGs were selected and 25 upregulated and 17 downregulated potential upstream regulatory elements were identified, respectively. The SNV annotations of somatic mutations revealed that 65% were base transition and 35% were base transversion. At frequencies ?2, a total of 17 mutation sites were identified. The mutation site with the highest frequency was located in the folate hydrolase 1B (FOLH1B) gene. Furthermore, 20 high-risk mutant genes with high frequency were identified using VarioWatch, including ribosomal protein S4 Y-linked 2 (RPS4Y2), polycystin 1 transient receptor potential channel interacting (PKD1) and FOLH1B. In addition, kallikrein 1 (KLK1) and PKD1 are known tumor suppressor genes. The potential regulatory elements and high-frequency mutant genes (RPS4Y2, KLK1, PKD1 and FOLH1B) may have key functions in prostate cancer. The results of the present study may provide novel information for the understanding of prostate cancer development.
Project description:Multiple endocrine neoplasia type 1 (MEN1) is a hereditary cancer syndrome caused by germline mutations of the MEN1 gene located in chromosome 11q13. In patients with MEN1, multicentric tumors develop in the involved organs; however, precise evaluation of genetic changes in these multicentric tumors has not been performed. In the present study, using whole-exome sequencing, we analyzed germline and somatic genetic changes in blood cells, two pancreatic endocrine tumors and one duodenal tumor obtained from a patient with MEN1 gastrinoma. We found that this patient possessed a novel germline mutation of the MEN1 gene [NM_137099.2:c.1505dupA (p.Lys502Lysfs); the localization was Chr11:64572134 on Assembly GRCh37], in which an adenine insertion in codon 502 of the MEN1 gene resulted in a frame shift and a premature stop codon. In terms of heterozygosity, the mutated allele was heterozygous in blood cells, hemizygous in the two pancreatic tumors and homozygous in the duodenal tumor. Immunohistochemical staining confirmed that only truncated menin protein accumulated in the nucleus of the tumor tissues. Further evaluation of tumor-specific somatic mutations in two pancreatic tumors did not detect single-nucleotide variations (SNVs) in 609 cancer-associated genes designated by the COSMIC cancer gene census, suggesting that the germline MEN1 mutation and resultant loss of heterozygosity played a major role in tumorigenesis. In the duodenal tumor, in addition to the germline MEN1 mutation, single-nucleotide variations in two cancer-associated genes were found. Further studies are required to clarify the role of these somatic single-nucleotide variations in the progression of MEN1 tumors.
Project description:The rapid development of next generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Single nucleotide variants (SNVs) inferred from next generation sequencing are expected to reveal gene mutations in cancer. However, NGS has lower sequence coverage and poor SNVs detection capability in the regulatory regions of the genome. Post probabilistic based methods are efficient for detection of SNVs in high coverage regions or sequencing data with high depth. However, for data with low sequencing depth, the efficiency of such algorithms remains poor and needs to be improved.A new tool SNVHMM basing on a discrete hidden Markov model (HMM) was developed to infer the genotype for each position on the genome. We incorporated the mapping quality of each read and the corresponding base quality on the reads into the emission probability of HMM. The context information of the whole observation as well as its confidence were completely utilized to infer the genotype for each position on the genome in study. Therefore, more probability power can be gained over the Bayes based methods, which is very useful for SNVs detection for data with low sequencing depth. Moreover, our model was verified by testing against two sets of lobular breast tumor and Myelodysplastic Syndromes (MDS) data each. Comparing against a recently published SNVs calling algorithm SNVMix2, our model improved the performance of SNVMix2 largely when the sequencing depth is low and also outperformed SNVMix2 when SNVMix2 is well trained by large datasets.SNVHMM can detect SNVs from NGS cancer data efficiently even if the sequence depth is very low. The training data size can be very small for SNVHMM to work. SNVHMM incorporated the base quality and mapping quality of all observed bases and reads, and also provides the option for users to choose the confidence of the observation for SNVs prediction.
Project description:Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive.