Dataset Information

Are open set classification methods effective on large-scale datasets?

ABSTRACT: Supervised classification methods often assume the train and test data distributions are the same and that all classes in the test set are present in the training set. However, deployed classifiers often require the ability to recognize inputs from outside the training set as unknowns. This problem has been studied under multiple paradigms including out-of-distribution detection and open set recognition. For convolutional neural networks, there have been two major approaches: 1) inference methods to separate knowns from unknowns and 2) feature space regularization strategies to improve model robustness to novel inputs. Up to this point, there has been little attention to exploring the relationship between the two approaches and directly comparing performance on large-scale datasets that have more than a few dozen categories. Using the ImageNet ILSVRC-2012 large-scale classification dataset, we identify novel combinations of regularization and specialized inference methods that perform best across multiple open set classification problems of increasing difficulty level. We find that input perturbation and temperature scaling yield significantly better performance on large-scale datasets than other inference methods tested, regardless of the feature space regularization strategy. Conversely, we find that improving performance with advanced regularization schemes during training yields better performance when baseline inference techniques are used; however, when advanced inference methods are used to detect open set classes, the utility of these combersome training paradigms is less evident.

SUBMITTER: Roady R

PROVIDER: S-EPMC7473573 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Are open set classification methods effective on large-scale datasets?

Roady Ryne R Hayes Tyler L TL Kemker Ronald R Gonzales Ayesha A Kanan Christopher C

PloS one 20200904 9

Supervised classification methods often assume the train and test data distributions are the same and that all classes in the test set are present in the training set. However, deployed classifiers often require the ability to recognize inputs from outside the training set as unknowns. This problem has been studied under multiple paradigms including out-of-distribution detection and open set recognition. For convolutional neural networks, there have been two major approaches: 1) inference method ...[more]

PMID: 32886692

Dataset Information

Are open set classification methods effective on large-scale datasets?

Publications

Are open set classification methods effective on large-scale datasets?

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets.
| S-EPMC7198352 | biostudies-literature

Open set classification of sound event.
| S-EPMC10787752 | biostudies-literature

Classification of hyper-scale multimodal imaging datasets
| S-EPMC10718410 | biostudies-literature

Normalization of Large Scale Transcriptome Data Using Heuristic Methods
2023-04-12 | GSE189788 | GEO

Comprehensive comparison of large-scale tissue expression datasets.
| S-EPMC4493645 | biostudies-literature

Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning.
| S-EPMC8794728 | biostudies-literature

SuperCellCyto: enabling efficient analysis of large scale cytometry datasets.
| S-EPMC11003185 | biostudies-literature

A large-scale crop protection bioassay data set.
| S-EPMC4493826 | biostudies-literature

Large-scale test data set for location problems.
| S-EPMC5988488 | biostudies-literature

Gaussian Embedding for Large-scale Gene Set Analysis.
| S-EPMC7505077 | biostudies-literature