Exploring the potential and limitations of large language models as virtual respondents for social science research

Authors

  • Zsófia Rakovics Doctoral School of Sociology, ELTE Eötvös Loránd University, Budapest, Hungary; Research Center for Computational Social Science, Faculty of Social Sciences, ELTE Eötvös Loránd University, Budapest, Hungary; MTA–TK Lendület “Momentum” Digital Social Science Research Group for Social Stratification, HUN-REN Centre for Social Sciences, Budapest, Hungary
  • Márton Rakovics Research Center for Computational Social Science, Faculty of Social Sciences, ELTE Eötvös Loránd University, Budapest, Hungary; Centre for Translational Medicine, Semmelweis University, Budapest, Hungary https://orcid.org/0000-0002-5830-4870

DOI:

https://doi.org/10.17356/ieejsp.v10i4.1326
Abstract Views: 69 PDF Downloads: 43

Keywords:

computational social science, large language models, GPT, Llama, Mixtral

Abstract

Social and linguistic differences encoded in various textual content available on the internet represent certain features of modern societies. For any scientific research which is interested in social differences mediated by language, the advent of large language models (LLMs) has brought new opportunities. LLMs could be used to extract information about different groups of society and utilized as data providers by acting as virtual respondents generating answers as such.

Using LLMs (GPT-variants, Llama2, and Mixtral), we generated virtual answers for politics and democracy related attitude questions of the European Social Survey (10th wave) and statistically compared the results of the simulated responses to the real ones. We explored different prompting techniques and the effect of different types and richness of contextual information provided to the models. Our results suggest that the tested LLMs generate highly realistic answers and are good at invoking the needed patterns from limited contextual information given to them if a couple of relevant examples are provided, but struggle in a zero-shot setting.

A critical methodological analysis is inevitable when considering the potential use of data generated by LLMs for scientific research, the exploration of known biases and reflection on social reality not represented on the internet are essential.

Downloads

Published

2025-02-17

How to Cite

[1]
Rakovics, Z. and Rakovics, M. 2025. Exploring the potential and limitations of large language models as virtual respondents for social science research. Intersections. East European Journal of Society and Politics. 10, 4 (Feb. 2025), 126–147. DOI:https://doi.org/10.17356/ieejsp.v10i4.1326.