Unknown

Dataset Information

0

Alignment-free genome tree inference by learning group-specific distance metrics.


ABSTRACT: Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.

SUBMITTER: Patil KR 

PROVIDER: S-EPMC3762195 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Alignment-free genome tree inference by learning group-specific distance metrics.

Patil Kaustubh R KR   McHardy Alice C AC  

Genome biology and evolution 20130101 8


Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alig  ...[more]

Similar Datasets

| S-EPMC3478627 | biostudies-literature
| S-EPMC6220562 | biostudies-literature
| S-EPMC4265526 | biostudies-literature
| S-EPMC3202569 | biostudies-literature
| S-EPMC8204903 | biostudies-literature
| S-EPMC3444837 | biostudies-literature
| S-EPMC6501302 | biostudies-literature
| S-EPMC7708075 | biostudies-literature
| S-EPMC6374904 | biostudies-literature