Unknown

Dataset Information

0

Historical representations of social groups across 200 years of word embeddings from Google Books.


ABSTRACT: Using word embeddings from 850 billion words in English-language Google Books, we provide an extensive analysis of historical change and stability in social group representations (stereotypes) across a long timeframe (from 1800 to 1999), for a large number of social group targets (Black, White, Asian, Irish, Hispanic, Native American, Man, Woman, Old, Young, Fat, Thin, Rich, Poor), and their emergent, bottom-up associations with 14,000 words and a subset of 600 traits. The results provide a nuanced picture of change and persistence in stereotypes across 200 y. Change was observed in the top-associated words and traits: Whether analyzing the top 10 or 50 associates, at least 50% of top associates changed across successive decades. Despite this changing content of top-associated words, the average valence (positivity/negativity) of these top stereotypes was generally persistent. Ultimately, through advances in the availability of historical word embeddings, this study offers a comprehensive characterization of both change and persistence in social group representations as revealed through books of the English-speaking world from 1800 to 1999.

SUBMITTER: Charlesworth TES 

PROVIDER: S-EPMC9282454 | biostudies-literature | 2022 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Historical representations of social groups across 200 years of word embeddings from Google Books.

Charlesworth Tessa E S TES   Caliskan Aylin A   Banaji Mahzarin R MR  

Proceedings of the National Academy of Sciences of the United States of America 20220705 28


Using word embeddings from 850 billion words in English-language Google Books, we provide an extensive analysis of historical change and stability in social group representations (stereotypes) across a long timeframe (from 1800 to 1999), for a large number of social group targets (Black, White, Asian, Irish, Hispanic, Native American, Man, Woman, Old, Young, Fat, Thin, Rich, Poor), and their emergent, bottom-up associations with 14,000 words and a subset of 600 traits. The results provide a nuan  ...[more]

Similar Datasets

| S-EPMC6874350 | biostudies-literature
| S-EPMC5910851 | biostudies-literature
| S-EPMC8800511 | biostudies-literature
| S-EPMC6122163 | biostudies-literature
| S-EPMC10901232 | biostudies-literature
| S-EPMC7861243 | biostudies-literature
| S-EPMC5031770 | biostudies-literature
| S-EPMC6510737 | biostudies-literature
| S-EPMC7309261 | biostudies-literature
| S-EPMC9563690 | biostudies-literature