ChimericSeq: An open-source, user-friendly interface for analyzing NGS data to identify and characterize viral-host chimeric sequences.
ABSTRACT: Identification of viral integration sites has been important in understanding the pathogenesis and progression of diseases associated with particular viral infections. The advent of next-generation sequencing (NGS) has enabled researchers to understand the impact that viral integration has on the host, such as tumorigenesis. Current computational methods to analyze NGS data of virus-host junction sites have been limited in terms of their accessibility to a broad user base. In this study, we developed a software application (named ChimericSeq), that is the first program of its kind to offer a graphical user interface, compatibility with both Windows and Mac operating systems, and optimized for effectively identifying and annotating virus-host chimeric reads within NGS data. In addition, ChimericSeq's pipeline implements custom filtering to remove artifacts and detect reads with quantitative analytical reporting to provide functional significance to discovered integration sites. The improved accessibility of ChimericSeq through a GUI interface in both Windows and Mac has potential to expand NGS analytical support to a broader spectrum of the scientific community.
Project description:Although next-generation sequencing (NGS) technology provides a comprehensive means with which to identify potential pathogens from clinical specimens, simple and user-friendly bioinformatics pipelines are expected to obtain the entire viral genome sequence, subsequently providing traceability, based on extensive molecular phylogenetic analyses. We have developed a web-based integrated NGS analysis tool for the viral genome (virus genome-targeted assembly pipeline: VirusTAP), which includes extensive sequence subtraction of host- or bacteria-related NGS reads prior to de novo assembly, leading to the prompt and accurate assembly of viral genome sequences from metagenomic NGS reads. The VirusTAP web site is at https://gph.niid.go.jp/cgi-bin/virustap/index.cgi/.
Project description:Recent advances in Next-Generation Sequencing (NGS) make comparative analyses of the composition and diversity of whole microbial communities possible at a far greater depth than ever before. This brings new challenges, such as an increased dependence on computation to process these huge datasets. The demand on system resources usually requires migrating from Windows to Linux-based operating systems and prior familiarity with command-line interfaces. To overcome this barrier, we developed a fully automated and easy-to-install package as well as a complete, easy-to-follow pipeline for microbial metataxonomic analysis operating in the Windows Subsystem for Linux (WSL)-Bioinformatics Through Windows (BTW). BTW combines several open-access tools for processing marker gene data, including 16S rRNA, bringing the user from raw sequencing reads to diversity-related conclusions. It includes data quality filtering, clustering, taxonomic assignment and further statistical analyses, directly in WSL, avoiding the prior need of migrating from Windows to Linux. BTW is expected to boost the use of NGS amplicon data by facilitating rapid access to a set of bioinformatics tools for Windows users. Moreover, several Linux command line tools became more reachable, which will enhance bioinformatics accessibility to a wider range of researchers and practitioners in the life sciences and medicine. BTW is available in GitHub (https://github.com/vpylro/BTW). The package is freely available for noncommercial users.
Project description:Viral integration into a host genome is defined by two chimeric junctions that join viral and host DNA. Recently, computational tools have been developed that utilize NGS data to detect chimeric junctions. These methods identify individual viral-host junctions but do not associate chimeric pairs as an integration event. Without knowing the chimeric boundaries of an integration, its genetic content cannot be determined.Summonchimera is a Perl program that associates chimera pairs to infer the complete viral genomic integration event to the nucleotide level within single or paired-end NGS data. SummonChimera integration prediction was verified on a set of single-end IonTorrent reads from a purified Salmonella bacterium with an integrated bacteriophage. Furthermore, SummonChimera predicted integrations from experimentally verified Hepatitis B Virus chimeras within a paired-end Whole Genome Sequencing hepatocellular carcinoma tumor database.SummonChimera identified all experimentally verified chimeras detected by current computational methods. Further, SummonChimera integration inference precisely predicted bacteriophage integration. The application of SummonChimera to cancer NGS accurately identifies deletion of host and viral sequence during integration. The precise nucleotide determination of an integration allows prediction of viral and cellular gene transcription patterns.
Project description:SUMMARY: Characterizing genetic diversity through genotyping short amplicons is central to evolutionary biology. Next-generation sequencing (NGS) technologies changed the scale at which these type of data are acquired. SESAME is a web application package that assists genotyping of multiplexed individuals for several markers based on NGS amplicon sequencing. It automatically assigns reads to loci and individuals, corrects reads if standard samples are available and provides an intuitive graphical user interface (GUI) for allele validation based on the sequences and associated decision-making tools. The aim of SESAME is to help allele identification among a large number of sequences. AVAILABILITY: SESAME and its documentation are freely available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Licence for Windows and Linux from http://www1.montpellier.inra.fr/CBGP/NGS/ or http://tinyurl.com/ngs-sesame.
Project description:Tn-seq is a high throughput technique for analysis of transposon mutant libraries. Tn-seq Explorer was developed as a convenient and easy-to-use package of tools for exploration of the Tn-seq data. In a typical application, the user will have obtained a collection of sequence reads adjacent to transposon insertions in a reference genome. The reads are first aligned to the reference genome using one of the tools available for this task. Tn-seq Explorer reads the alignment and the gene annotation, and provides the user with a set of tools to investigate the data and identify possibly essential or advantageous genes as those that contain significantly low counts of transposon insertions. Emphasis is placed on providing flexibility in selecting parameters and methodology most appropriate for each particular dataset. Tn-seq Explorer is written in Java as a menu-driven, stand-alone application. It was tested on Windows, Mac OS, and Linux operating systems. The source code is distributed under the terms of GNU General Public License. The program and the source code are available for download at http://www.cmbl.uga.edu/downloads/programs/Tn_seq_Explorer/ and https://github.com/sina-cb/Tn-seqExplorer.
Project description:BACKGROUND:Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process "big data" in a reasonable computational time. RESULTS:Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). CONCLUSIONS:We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a?>?6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 ( http://openserver.itb.cnr.it/vispa/ ) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository ( https://bitbucket.org/andreacalabria/vispa2 ).
Project description:Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system facilitates targeted genome editing in organisms. Despite high demand of this system, finding a reliable tool for the determination of specific target sites in large genomic data remained challenging. Here, we report SSFinder, a python script to perform high throughput detection of specific target sites in large nucleotide datasets. The SSFinder is a user-friendly tool, compatible with Windows, Mac OS, and Linux operating systems, and freely available online.
Project description:Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Project description:Next generation sequencing (NGS) technologies have revolutionized the genomics field and are becoming more commonplace for identification of human infectious diseases. However, due to the low abundance of viral nucleic acids (NAs) in relation to host, viral identification using direct NGS technologies often lacks sufficient sensitivity. Here, we describe an approach based on two complementary enrichment strategies that significantly improves the sensitivity of NGS-based virus identification. To start, we developed two sets of DNA probes to enrich virus NAs associated with respiratory diseases. The first set of probes spans the genomes, allowing for identification of known viruses and full genome sequencing, while the second set targets regions conserved among viral families or genera, providing the ability to detect both known and potentially novel members of those virus groups. Efficiency of enrichment was assessed by NGS testing reference virus and clinical samples with known infection. We show significant improvement in viral identification using enriched NGS compared to unenriched NGS. Without enrichment, we observed an average of 0.3% targeted viral reads per sample. However, after enrichment, 50%-99% of the reads per sample were the targeted viral reads for both the reference isolates and clinical specimens using both probe sets. Importantly, dramatic improvements on genome coverage were also observed following virus-specific probe enrichment. The methods described here provide improved sensitivity for virus identification by NGS, allowing for a more comprehensive analysis of disease etiology.
Project description:We report the development of a novel method called EpiVIA (https://github.com/VahediLab/epiVIA) for the joint profiling of the chromatin accessibility and lentiviral integration site analysis at the population and single-cell levels. We validate our technique in clonal cells with previously defined integration sites and further demonstrate the ability to measure lentiviral integration sites and chromatin accessibility of host and viral genomes at the single-cell resolution in CAR-T cells. We anticipate that EpiVIA will enable the single-cell deconstruction of gene regulation during CAR-T therapy, leading to the discovery of cellular factors associated with durable treatment. Overall design: Idenfification of integration sites in two clonal cells with ATAC-seq.