Exploring the potential and limitations of large language models as virtual respondents for social science research
DOI:
https://doi.org/10.17356/ieejsp.v10i4.1326Keywords:
computational social science, large language models, GPT, Llama, MixtralAbstract
Social and linguistic differences encoded in various textual content available on the internet represent certain features of modern societies. For any scientific research which is interested in social differences mediated by language, the advent of large language models (LLMs) has brought new opportunities. LLMs could be used to extract information about different groups of society and utilized as data providers by acting as virtual respondents generating answers as such.
Using LLMs (GPT-variants, Llama2, and Mixtral), we generated virtual answers for politics and democracy related attitude questions of the European Social Survey (10th wave) and statistically compared the results of the simulated responses to the real ones. We explored different prompting techniques and the effect of different types and richness of contextual information provided to the models. Our results suggest that the tested LLMs generate highly realistic answers and are good at invoking the needed patterns from limited contextual information given to them if a couple of relevant examples are provided, but struggle in a zero-shot setting.
A critical methodological analysis is inevitable when considering the potential use of data generated by LLMs for scientific research, the exploration of known biases and reflection on social reality not represented on the internet are essential.
Downloads
Published
How to Cite
Issue
Section
License
Copyright Notice
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication, with the work three months after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. This acknowledgement is not automatic, it should be asked from the editors and can usually be obtained one year after its first publication in the journal.