Unknown

Dataset Information

0

GAAP: A Genome Assembly + Annotation Pipeline.


ABSTRACT: Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.

SUBMITTER: Kong J 

PROVIDER: S-EPMC6617929 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

GAAP: A Genome Assembly + Annotation Pipeline.

Kong Jinhwa J   Huh Sun S   Won Jung-Im JI   Yoon Jeehee J   Kim Baeksop B   Kim Kiyong K  

BioMed research international 20190626


Genomic analysis begins with <i>de novo</i> assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform w  ...[more]

Similar Datasets

| S-EPMC8297458 | biostudies-literature
| S-EPMC4828917 | biostudies-literature
| S-EPMC5001611 | biostudies-literature
| S-EPMC4735447 | biostudies-literature
| S-EPMC5860143 | biostudies-literature
| S-EPMC7520038 | biostudies-literature
| S-EPMC4410580 | biostudies-literature
| S-EPMC3280279 | biostudies-literature
| S-EPMC6710283 | biostudies-literature
| S-EPMC8329486 | biostudies-literature