Unknown

Dataset Information

0

A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals.


ABSTRACT: While prior research has shown that facial images signal personal information, publications in this field tend to assess the predictability of a single variable or a small set of variables at a time, which is problematic. Reported prediction quality is hard to compare and generalize across studies due to different study conditions. Another issue is selection bias: researchers may choose to study variables intuitively expected to be predictable and underreport unpredictable variables (the 'file drawer' problem). Policy makers thus have an incomplete picture for a risk-benefit analysis of facial analysis technology. To address these limitations, we perform a megastudy-a survey-based study that reports the predictability of numerous personal attributes (349 binary variables) from 2646 distinct facial images of 969 individuals. Using deep learning, we find 82/349 personal attributes (23%) are predictable better than random from facial image pixels. Adding facial images substantially boosts prediction quality versus demographics-only benchmark model. Our unexpected finding of strong predictability of iPhone versus Galaxy preference variable shows how testing many hypotheses simultaneously can facilitate knowledge discovery. Our proposed L1-regularized image decomposition method and other techniques point to smartphone camera artifacts, BMI, skin properties, and facial hair as top candidate non-demographic signals in facial images.

SUBMITTER: Tkachenko Y 

PROVIDER: S-EPMC10687237 | biostudies-literature | 2023 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals.

Tkachenko Yegor Y   Jedidi Kamel K  

Scientific reports 20231129 1


While prior research has shown that facial images signal personal information, publications in this field tend to assess the predictability of a single variable or a small set of variables at a time, which is problematic. Reported prediction quality is hard to compare and generalize across studies due to different study conditions. Another issue is selection bias: researchers may choose to study variables intuitively expected to be predictable and underreport unpredictable variables (the 'file d  ...[more]

Similar Datasets

| S-EPMC3937357 | biostudies-literature
| S-EPMC7689780 | biostudies-literature
| S-EPMC9554400 | biostudies-literature
| S-EPMC7801376 | biostudies-literature
| S-EPMC8665759 | biostudies-literature
| S-EPMC11442031 | biostudies-literature
| S-EPMC8216891 | biostudies-literature
| S-EPMC6482031 | biostudies-literature
| S-EPMC2276204 | biostudies-literature
| S-EPMC4743808 | biostudies-literature