Ontology highlight
ABSTRACT:
SUBMITTER: Petti S
PROVIDER: S-EPMC8929697 | biostudies-literature | 2022 Mar
REPOSITORIES: biostudies-literature

PLoS computational biology 20220307 3
Biological sequence families contain many sequences that are very similar to each other because they are related by evolution, so the strategy for splitting data into separate training and test sets is a nontrivial choice in benchmarking sequence analysis methods. A random split is insufficient because it will yield test sequences that are closely related or even identical to training sequences. Adapting ideas from independent set graph algorithms, we describe two new methods for splitting seque ...[more]