Papers
arxiv:2604.23600

Personality Shapes Gender Bias in Persona-Conditioned LLM Narratives Across English and Hindi: An Empirical Investigation

Published on Apr 26
ยท Submitted by
Aman Chadha
on Apr 28
Authors:
,
,
,

Abstract

Persona-conditioned large language models exhibit context-dependent gender bias that varies with personality trait frameworks and across languages.

AI-generated summary

Large Language Models (LLMs) are increasingly deployed in persona-driven applications such as education, customer service, and social platforms, where models are prompted to adopt specific personas when interacting with users. While persona conditioning can improve user experience and engagement, it also raises concerns about how personality cues may interact with gender biases and stereotypes. In this work, we present a controlled study of persona-conditioned story generation in English and Hindi, where each story portrays a working professional in India producing context-specific artifacts (e.g., lesson plans, reports, letters) under systematically varied persona gender, occupational role, and personality traits from the HEXACO and Dark Triad frameworks. Across 23,400 generated stories from six state-of-the-art LLMs, we find that personality traits are significantly associated with both the magnitude and direction of gender bias. In particular, Dark Triad personality traits are consistently associated with higher gender-stereotypical representations compared to socially desirable HEXACO traits, though these associations vary across models and languages. Our findings demonstrate that gender bias in LLMs is not static but context-dependent. This suggests that persona-conditioned systems used in real-world applications may introduce uneven representational harms, reinforcing gender stereotypes in generated educational, professional, or social content.

Community

Paper author Paper submitter

This paper introduces a multilingual, persona-conditioned bias evaluation framework showing that personality traits act as systematic modulators of gender bias in LLM narrative generation, with Dark Triad traits amplifying stereotypes, HEXACO traits partially attenuating them, and these effects often exceeding the influence of explicit gender labels.

โžก๏ธ ๐Š๐ž๐ฒ ๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ ๐จ๐Ÿ ๐๐ž๐ซ๐ฌ๐จ๐ง๐š๐ฅ๐ข๐ญ๐ฒ-๐‚๐จ๐ง๐๐ข๐ญ๐ข๐จ๐ง๐ž๐ ๐๐ข๐š๐ฌ ๐Œ๐จ๐๐ฎ๐ฅ๐š๐ญ๐ข๐จ๐ง ๐…๐ซ๐š๐ฆ๐ž๐ฐ๐จ๐ซ๐ค:

๐Ÿงช ๐‘ท๐’†๐’“๐’”๐’๐’๐’‚๐’๐’Š๐’•๐’š ๐’‚๐’” ๐’‚ ๐‘ญ๐’‚๐’Š๐’“๐’๐’†๐’”๐’”-๐‘ช๐’“๐’Š๐’•๐’Š๐’„๐’‚๐’ ๐‘ช๐’๐’๐’•๐’“๐’๐’ ๐‘ฝ๐’‚๐’“๐’Š๐’‚๐’ƒ๐’๐’†:
Introduces a controlled 23,400-artifact multilingual benchmark spanning 6 model families (LLM, MoE, SSM, LRM, SLM), 50 occupations, 9 personality traits ร— 2 levels, 3 gender conditions, and English/Hindi, enabling systematic measurement of personalityโ€“gender interaction effects in generation bias. Novel finding: gender bias is conditional, not staticโ€”Dark Triad traits (especially Machiavellianism, Psychopathy) amplify stereotypical outputs, while Openness/Emotionality attenuate them.

๐Ÿงฉ ๐‘ช๐’†๐’๐’•๐’“๐’๐’Š๐’…-๐‘ฉ๐’‚๐’”๐’†๐’… ๐‘บ๐’†๐’Ž๐’‚๐’๐’•๐’Š๐’„ ๐‘ฉ๐’Š๐’‚๐’” ๐‘ด๐’†๐’•๐’“๐’Š๐’„ ๐’‡๐’๐’“ ๐‘ท๐’†๐’“๐’”๐’๐’๐’‚-๐‘ช๐’๐’๐’…๐’Š๐’•๐’Š๐’๐’๐’†๐’… ๐‘ต๐’‚๐’“๐’“๐’‚๐’•๐’Š๐’—๐’†๐’”:
Proposes a sentence-level stereotype centroid scoring framework using multilingual SBERT embeddings and a difference-of-cosines bias score against male/female stereotype centroids, aggregated via maximum salience over narrative sentences. This moves beyond benchmark-style classification by localizing where stereotypes emerge inside generated artifacts. Human validation (ฮบโ‰ˆ0.66โ€“0.69) supports the metricโ€™s alignment with perceived stereotyping.

๐Ÿง  ๐‘ด๐’–๐’๐’•๐’Š๐’๐’Š๐’๐’ˆ๐’–๐’‚๐’ ๐‘ท๐’†๐’“๐’”๐’๐’๐’‚๐’๐’Š๐’•๐’šโ€“๐‘ฎ๐’†๐’๐’…๐’†๐’“ ๐‘ฐ๐’๐’•๐’†๐’“๐’‚๐’„๐’•๐’Š๐’๐’ ๐‘จ๐’๐’‚๐’๐’š๐’”๐’Š๐’”:
Shows that personality coefficients can exceed explicit gender effects, reframing prompt persona design as an alignment problem rather than prompt styling. Reveals cross-linguistic asymmetry: Hindi exhibits stronger baseline male-stereotypical skew, while English shows stronger personality-driven modulation. Importantly, the directional pattern generalizes across architectures, suggesting persona-induced bias amplification may be a broad property of LLMs rather than model-specific behavior.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.23600
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.23600 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.23600 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.23600 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.