Unknown

Dataset Information

0

Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets.


ABSTRACT: The advent of high-throughput sequencing (HTS) has made genomic-level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study, I conducted an experiment using empirical RADseq datasets generated for two species of salmonids (Actinopterygii; Teleostei; Salmonidae) to address these questions. There are currently reference genomes for six salmonids of varying phylogenetic distance. I aligned the RADseq data to all six genomes and identified variants with several different genotypers, which were then fed into population genetic analyses. Increasing phylogenetic distance between target species and reference genome reduced the proportion of reads that successfully aligned and mapping quality. Reference genome also influenced the number of SNPs that were generated and depth at those SNPs, although the affect varied by genotyper. Inferences of population structure were mixed: increasing reference genome divergence reduced estimates of differentiation but similar patterns of population relationships were found across scenarios. These findings reveal how the choice of reference genome can influence the output of bioinformatic pipelines. It also emphasizes the need to identify best practices and guidelines for the burgeoning field of biodiversity genomics.

SUBMITTER: Bohling J 

PROVIDER: S-EPMC7391306 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets.

Bohling Justin J  

Ecology and evolution 20200628 14


The advent of high-throughput sequencing (HTS) has made genomic-level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study  ...[more]

Similar Datasets

| S-EPMC7469296 | biostudies-literature
| S-EPMC8578599 | biostudies-literature
| S-EPMC4158716 | biostudies-other
| S-EPMC5052017 | biostudies-literature