Dataset Information

AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees.

ABSTRACT:

Motivation

Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.

Results

We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods.

Availability and implementation

AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Pipes L

PROVIDER: S-EPMC8756197 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees.

Pipes Lenore L Nielsen Rasmus R

Bioinformatics (Oxford, England) 20220101 3

<h4>Motivation</h4>Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster se ...[more]

PMID: 34668516

Dataset Information

AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees.

Motivation

Results

Availability and implementation

Supplementary information

Publications

AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

TreeCluster: Clustering biological sequences using phylogenetic trees.
| S-EPMC6705769 | biostudies-literature

Robustness of ancestral sequence reconstruction to phylogenetic uncertainty.
| S-EPMC2922618 | biostudies-literature

Dianthus ancestral sequence reconstruction
| PRJEB54098 | ENA

Divergent Evolution of a Protein-Protein Interaction Revealed through Ancestral Sequence Reconstruction and Resurrection.
| S-EPMC7782867 | biostudies-literature

Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.
| S-EPMC4512549 | biostudies-literature

Reconstruction of human phylogenetic trees using single-cell genome sequencing
| EGAS00001004824 | EGA

Bayesian inference of ancestral dates on bacterial phylogenetic trees.
| S-EPMC6294524 | biostudies-literature

Phylogenetic trees of closely related bacterial species and subspecies based on frequencies of short nucleotide sequences.
| S-EPMC10118083 | biostudies-literature

Alignment Modulates Ancestral Sequence Reconstruction Accuracy.
| S-EPMC5995191 | biostudies-literature

A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance.
| S-EPMC6465635 | biostudies-literature