Unknown

Dataset Information

0

K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.


ABSTRACT: Motivation:Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. Results:We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. Availability and implementation:The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). Contact:yueljiang@163.com. Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Lin J 

PROVIDER: S-EPMC6355110 | biostudies-literature | 2018 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

Lin Jie J   Adjeroh Donald A DA   Jiang Bing-Hua BH   Jiang Yue Y  

Bioinformatics (Oxford, England) 20180501 10


<h4>Motivation</h4>Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods.<h4>Results</h4>We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally rel  ...[more]

Similar Datasets

| S-EPMC2818754 | biostudies-literature
| S-EPMC3123933 | biostudies-literature
| S-EPMC6403383 | biostudies-literature
| S-EPMC3146591 | biostudies-literature
| S-EPMC4017329 | biostudies-literature
| S-EPMC1131888 | biostudies-literature
| S-EPMC6391537 | biostudies-literature
| S-EPMC4528633 | biostudies-literature