Dataset Information


HMM-ModE: implementation, benchmarking and validation with HMMER3.

ABSTRACT: BACKGROUND: HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation. RESULTS: The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy. CONCLUSIONS: The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families.


PROVIDER: S-EPMC4236727 | BioStudies | 2014-01-01

REPOSITORIES: biostudies

Similar Datasets

1000-01-01 | S-EPMC5404901 | BioStudies
2015-01-01 | S-EPMC4521371 | BioStudies
2011-01-01 | S-EPMC3228556 | BioStudies
1000-01-01 | S-EPMC1950549 | BioStudies
2016-01-01 | S-EPMC5126834 | BioStudies
2011-01-01 | S-EPMC3125773 | BioStudies
2014-01-01 | S-EPMC4229909 | BioStudies
2012-01-01 | S-EPMC3338012 | BioStudies
2011-01-01 | S-EPMC3085309 | BioStudies
2016-01-01 | S-EPMC5119741 | BioStudies