Ontology highlight
ABSTRACT:
SUBMITTER: Shimron E
PROVIDER: S-EPMC9060447 | biostudies-literature | 2022 Mar
REPOSITORIES: biostudies-literature
Shimron Efrat E Tamir Jonathan I JI Wang Ke K Lustig Michael M
Proceedings of the National Academy of Sciences of the United States of America 20220321 13
SignificancePublic databases are an important resource for machine learning research, but their growing availability sometimes leads to "off-label" usage, where data published for one task are used for another. This work reveals that such off-label usage could lead to biased, overly optimistic results of machine-learning algorithms. The underlying cause is that public data are processed with hidden processing pipelines that alter the data features. Here we study three well-known algorithms devel ...[more]