Dataset Information

A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals.

ABSTRACT: While prior research has shown that facial images signal personal information, publications in this field tend to assess the predictability of a single variable or a small set of variables at a time, which is problematic. Reported prediction quality is hard to compare and generalize across studies due to different study conditions. Another issue is selection bias: researchers may choose to study variables intuitively expected to be predictable and underreport unpredictable variables (the 'file drawer' problem). Policy makers thus have an incomplete picture for a risk-benefit analysis of facial analysis technology. To address these limitations, we perform a megastudy-a survey-based study that reports the predictability of numerous personal attributes (349 binary variables) from 2646 distinct facial images of 969 individuals. Using deep learning, we find 82/349 personal attributes (23%) are predictable better than random from facial image pixels. Adding facial images substantially boosts prediction quality versus demographics-only benchmark model. Our unexpected finding of strong predictability of iPhone versus Galaxy preference variable shows how testing many hypotheses simultaneously can facilitate knowledge discovery. Our proposed L1-regularized image decomposition method and other techniques point to smartphone camera artifacts, BMI, skin properties, and facial hair as top candidate non-demographic signals in facial images.

SUBMITTER: Tkachenko Y

PROVIDER: S-EPMC10687237 | biostudies-literature | 2023 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals.

Tkachenko Yegor Y Jedidi Kamel K

Scientific reports 20231129 1

While prior research has shown that facial images signal personal information, publications in this field tend to assess the predictability of a single variable or a small set of variables at a time, which is problematic. Reported prediction quality is hard to compare and generalize across studies due to different study conditions. Another issue is selection bias: researchers may choose to study variables intuitively expected to be predictable and underreport unpredictable variables (the 'file d ...[more]

PMID: 38030632

Dataset Information

A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals.

Publications

A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Regularity and predictability of human mobility in personal space.
| S-EPMC3937357 | biostudies-literature

Phylogenetic signals and predictability in plant-soil feedbacks.
| S-EPMC7689780 | biostudies-literature

Personality, Attitudinal, and Demographic Predictors of Non-consensual Dissemination of Intimate Images.
| S-EPMC9554400 | biostudies-literature

Facial recognition technology can expose political orientation from naturalistic facial images.
| S-EPMC7801376 | biostudies-literature

Openness weighted association studies: leveraging personal genome information to prioritize non-coding variants.
| S-EPMC8665759 | biostudies-literature

Disentangling the flow of signals between populations of neurons.
| S-EPMC11442031 | biostudies-literature

Disentangling demographic effects of red deer on chamois population dynamics.
| S-EPMC8216891 | biostudies-literature

Disentangling topographic contributions to near-field scanning microwave microscopy images.
| S-EPMC6482031 | biostudies-literature

Signals of demographic expansion in Drosophila virilis.
| S-EPMC2276204 | biostudies-literature

Hiding personal information reveals the worst.
| S-EPMC4743808 | biostudies-literature