Dataset Information

Blast sampling for structural and functional analyses.

ABSTRACT:

Background

The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with a single query sequence frequently lead to the identification of hundreds, or even thousands of potential homologues. The huge volume of data renders the subsequent structural, functional and evolutionary analyses very difficult. It is therefore essential to develop new strategies for efficient sampling of this large sequence space, in order to reduce the number of sequences to be processed. At the same time, it is important to retain the most pertinent sequences for structural and functional studies.

Results

An exhaustive analysis on a large scale test set (284 protein families) was performed to compare the efficiency of four different sampling methods aimed at selecting the most pertinent sequences. These four methods sample the proteins detected by BlastP searches and can be divided into two categories: two customisable methods where the user defines either the maximal number or the percentage of sequences to be selected; two automatic methods in which the number of sequences selected is determined by the program. We focused our analysis on the potential information content of the sampled sets of sequences using multiple alignment of complete sequences as the main validation tool. The study considered two criteria: the total number of sequences in BlastP and their associated E-values. The subsequent analyses investigated the influence of the sampling methods on the E-value distributions, the sequence coverage, the final multiple alignment quality and the active site characterisation at various residue conservation thresholds as a function of these criteria.

Conclusion

The comparative analysis of the four sampling methods allows us to propose a suitable sampling strategy that significantly reduces the number of homologous sequences required for alignment, while at the same time maintaining the relevant information concerning the active site residues.

SUBMITTER: Friedrich A

PROVIDER: S-EPMC1819393 | biostudies-literature | 2007 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Blast sampling for structural and functional analyses.

Friedrich Anne A Ripp Raymond R Garnier Nicolas N Bettler Emmanuel E Deléage Gilbert G Poch Olivier O Moulinier Luc L

BMC bioinformatics 20070223

<h4>Background</h4>The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with a single query sequence frequently lead to the identification of hundreds, or even thousands of potential homologues. The huge volume of data renders the subsequent structural, functional and evolutionary analyses very difficult. It is therefore essential to develop new strategies for efficient sampling of this l ...[more]

PMID: 17319945

Dataset Information

Blast sampling for structural and functional analyses.

Background

Results

Conclusion

Publications

Blast sampling for structural and functional analyses.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Structural and Functional Analyses of Cone Snail Toxins.
| S-EPMC6628382 | biostudies-literature

Structural and Functional Enrichment Analyses for Antimicrobial Peptides.
| S-EPMC7699717 | biostudies-literature

The functional and structural changes in the basilar artery due to overpressure blast injury.
| S-EPMC4671114 | biostudies-other

Functional and structural analyses of threonine dehydratase from Corynebacterium glutamicum.
| S-EPMC207545 | biostudies-other

Developmental evaluation of atypical auditory sampling in dyslexia: Functional and structural evidence.
| S-EPMC6869042 | biostudies-literature

Structural and functional analyses of human DDX41 DEAD domain.
| S-EPMC5233616 | biostudies-literature

Structural and Functional Analyses of the FAM46C/Plk4 Complex.
| S-EPMC7415566 | biostudies-literature

Enhanced sampling and overfitting analyses in structural refinement of nucleic acids into electron microscopy maps.
| S-EPMC3690198 | biostudies-literature

Structural and Functional Analyses of Human ChaC2 in Glutathione Metabolism.
| S-EPMC7022552 | biostudies-literature

Visualization of comparative genomic analyses by BLAST score ratio.
| S-EPMC545078 | biostudies-literature