Unknown

Dataset Information

0

Assessing the efficiency of multiple sequence alignment programs.


ABSTRACT:

Background

Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale data analysis, it is imperative to optimize the application of software. Therefore, a balance between alignment accuracy and computational cost has become a critical indicator of the most suitable MSA program. We compared both accuracy and cost of nine popular MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX, MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the benchmark alignment dataset BAliBASE and discuss the relevance of some implementations embedded in each program's algorithm. Accuracy of alignment was calculated with the two standard scoring functions provided by BAliBASE, the sum-of-pairs and total-column scores, and computational costs were determined by collecting peak memory usage and time of execution.

Results

Our results indicate that mostly the consistency-based programs Probcons, T-Coffee, Probalign and MAFFT outperformed the other programs in accuracy. Whenever sequences with large N/C terminal extensions were present in the BAliBASE suite, Probalign, MAFFT and also CLUSTAL OMEGA outperformed Probcons and T-Coffee. The drawback of these programs is that they are more memory-greedy and slower than POA, CLUSTALW, DIALIGN-TX, and MUSCLE. CLUSTALW and MUSCLE were the fastest programs, being CLUSTALW the least RAM memory demanding program.

Conclusions

Based on the results presented herein, all four programs Probcons, T-Coffee, Probalign and MAFFT are well recommended for better accuracy of multiple sequence alignments. T-Coffee and recent versions of MAFFT can deliver faster and reliable alignments, which are specially suited for larger datasets than those encountered in the BAliBASE suite, if multi-core computers are available. In fact, parallelization of alignments for multi-core computers should probably be addressed by more programs in a near future, which will certainly improve performance significantly.

SUBMITTER: Pais FS 

PROVIDER: S-EPMC4015676 | biostudies-literature | 2014 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Assessing the efficiency of multiple sequence alignment programs.

Pais Fabiano Sviatopolk-Mirsky FS   Ruy Patrícia de Cássia PC   Oliveira Guilherme G   Coimbra Roney Santos RS  

Algorithms for molecular biology : AMB 20140306 1


<h4>Background</h4>Multiple sequence alignment (MSA) is an extremely useful tool for molecular and evolutionary biology and there are several programs and algorithms available for this purpose. Although previous studies have compared the alignment accuracy of different MSA programs, their computational time and memory usage have not been systematically evaluated. Given the unprecedented amount of data produced by next generation deep sequencing platforms, and increasing demand for large-scale da  ...[more]

Similar Datasets

| S-EPMC168907 | biostudies-literature
| S-EPMC1087786 | biostudies-literature
| S-EPMC1635699 | biostudies-literature
| S-EPMC3799466 | biostudies-literature
| S-EPMC10773980 | biostudies-literature
| S-EPMC8289385 | biostudies-literature
| S-EPMC4599319 | biostudies-literature
| S-EPMC6151001 | biostudies-literature
| S-EPMC6657586 | biostudies-literature
| S-EPMC3229529 | biostudies-literature