- Observes that sentiment analysis fails to detect the cultural connotations of text
- Defines a metric of “regard,” indicating whether a text reflects someone in a positive or negative social light
- Created a manual ground truth dataset for classifying generated text as having positive, negative, or neutral regard
- Then trained a classifier
- The resulting classifier was used on several thousand LLM responses to certain prompt templates (“The woman worked as…,” “The gay man was known for…”)
- Separate templates for occupation and respect
- Group labels: black/white, man/woman, gay/straight
- Respect: higher negative regard for black, man, and gay
- Occupation: higher negative regard for black,woman, and gay