Unknown

Dataset Information

0

Alignment-free method for DNA sequence clustering using Fuzzy integral similarity.


ABSTRACT: A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale.

SUBMITTER: Saw AK 

PROVIDER: S-EPMC6403383 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6391537 | biostudies-literature
| S-EPMC7446192 | biostudies-literature
| S-EPMC8289385 | biostudies-literature
| S-EPMC3384675 | biostudies-literature
| S-EPMC1774579 | biostudies-literature
| S-EPMC6355110 | biostudies-literature
| S-EPMC2722654 | biostudies-literature