Dataset Information


The Challenges of Analysing Highly Diverse Picobirnavirus Sequence Data.

ABSTRACT: The reliable identification and classification of infectious diseases is critical for understanding their biology and controlling their impact. Recent advances in sequencing technology have allowed insight into the remarkable diversity of the virosphere, of which a large component remains undiscovered. For these emerging or undescribed viruses, the process of classifying unknown sequences is heavily reliant on existing nucleotide sequence information in public databases. However, due to the enormous diversity of viruses, and past focus on the most prevalent and impactful virus types, databases are often incomplete. Picobirnaviridae is a dsRNA virus family with broad host and geographic range, but with relatively little sequence information in public databases. The family contains one genus, Picobirnavirus, which may be associated with gastric illness in humans and animals. Little further information is available due in part to difficulties in identification. Here, we investigate diversity both within the genus Picobirnavirus and among other dsRNA virus types using a combined phylogenetic and functional (protein structure homology-modelling) approach. Our results show that diversity within picobirnavirus exceeds that seen between many other dsRNA genera. Furthermore, we find that commonly used practices employed to classify picobirnavirus, such as analysis of short fragments and trimming of sequences, can influence phylogenetic conclusions. The degree of phylogenetic and functional divergence among picobirnavirus sequences in our study suggests an enormous undiscovered diversity, which contributes to the undescribed "viral dark matter" component of metagenomic studies.


PROVIDER: S-EPMC6316005 | BioStudies | 2018-01-01T00:00:00Z


REPOSITORIES: biostudies

Similar Datasets

2009-01-01 | S-EPMC2693148 | BioStudies
2020-01-01 | S-EPMC7102571 | BioStudies
2010-01-01 | S-EPMC3664192 | BioStudies
2011-01-01 | S-EPMC3077240 | BioStudies
2017-01-01 | S-EPMC5223137 | BioStudies
2012-01-01 | S-EPMC3372223 | BioStudies
| PRJNA788895 | ENA
2020-01-01 | S-EPMC7533066 | BioStudies
2012-01-01 | S-EPMC3407116 | BioStudies
2000-01-01 | S-EPMC92031 | BioStudies