Edited by Mary Waters, Harvard University, Cambridge, MA; received March 27, 2023; accepted June 2, 2023
July 18, 2023
120 (30) e2305016120
Abstract
Many NLP applications require manual text annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using four samples of tweets and news articles (n = 6,183), we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, stance, topics, and frame detection. Across the four datasets, the zero-shot accuracy of ChatGPT exceeds that of crowd workers by about 25 percentage points on average, while ChatGPT’s intercoder agreement exceeds that of both crowd workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003—about thirty times cheaper than MTurk. These results demonstrate the potential of large language models to drastically increase the efficiency of text classification.
Continue Reading
Data, Materials, and Software Availability
Replication materials are available at the Harvard Dataverse, https://doi.org/10.7910/DVN/PQYF6M (15). Some study data are available (only tweet IDs can be shared, not tweets themselves).
Acknowledgments
This project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 883121). We thank Fabio Melliger, Paula Moser, and Sophie van IJzendoorn for excellent research assistance.
Author contributions
F.G., M.A., and M.K. designed research; performed research; analyzed data; and wrote the paper.
Competing interests
The authors declare no competing interest.
Supporting Information
References
1
G. Emerson et al., Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) (Association for Computational Linguistics, Seattle, 2022).
2
K. Benoit, D. Conway, B. E. Lauderdale, M. Laver, S. Mikhaylov, Crowd-sourced text analysis: Reproducible and agile production of political data. Am. Polit. Sci. Rev. 116, 278–295 (2016).
3
M. Chmielewski, S. C. Kucker, An MTurk crisis? Shifts in data quality and the impact on study results Soc. Psychol. Personality Sci. 11, 464–473 (2020).
4
P. Y. Wu, J. A. Tucker, J. Nagler, S. Messing, Large Language Models Can Be Used to Estimate the Ideologies of Politicians in a Zero-Shot Learning Setting (2023).
5
J. J. Nay, Large Language Models as Corporate Lobbyists (2023).
6
M. Binz, E. Schulz, Using cognitive psychology to understand GPT-3. Proc. Natl. Acad. Sci. U.S.A. 120, e2218523120 (2023).
7
L. P. Argyle et al., Out of one, many: Using language models to simulate human samples. Polit. Anal. 1–15 (2023).
8
T. Kuzman, I. Mozetič, N. Ljubešić, ChatGPT: Beginning of an end of manual linguistic data annotation? Use case of automatic genre identification. arXiv eprints (2023). http://arxiv.org/abs/2303.03953 (Accessed 13 March 2023).
9
F. Huang, H. Kwak, J. An, Is chatGPT better than human annotators? Potential and limitations of chatGPT in explaining implicit hate speech. arXiv [Preprint] (2023). http://arxiv.org/abs/2302.07736 (Accessed 13 March 2023).
10
M. Alizadeh et al., Content moderation as a political issue: The Twitter discourse around trump’s ban. J. Quant. Des.: Digital Media 2, 1–44 (2022).
11
P. S. Bayerl, K. I. Paul, What determines inter-coder agreement in manual annotations? A meta-analytic investigation Comput. Linguist. 37, 699–725 (2011).
12
M. Desmond, E. Duesterwald, K. Brimijoin, M. Brachman, Q. Pan, Semi-automateddatalabeling, in NeurIPS 2020 Competition and Demonstration Track, (PMLR, 2021), pp. 156–169.
13
T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa, Large language models are zero-shot reasoners. arXiv [Preprint] (2022). http://arxiv.org/abs/2205.11916 (Accessed 13 March 2023).
14
D. Card, A. Boydstun, J. H. Gross, P. Resnik, N. A. Smith, “The media frames corpus: Annotations of frames across issues” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (2015), pp. 438–444.
15
F. Gilardi, M. Alizadeh, M. Kubli, Replication Data for: ChatGPT outperforms crowd-workers for text-annotation tasks. Harvard Dataverse. https://doi.org/10.7910/DVN/PQYF6M. Deposited 16 June 2023.
Information & Authors
Information
Published in
Proceedings of the National Academy of Sciences
Vol. 120 | No. 30
July 25, 2023
Classifications
Copyright
Data, Materials, and Software Availability
Replication materials are available at the Harvard Dataverse, https://doi.org/10.7910/DVN/PQYF6M (15). Some study data are available (only tweet IDs can be shared, not tweets themselves).
Submission history
Received: March 27, 2023
Accepted: June 2, 2023
Published online: July 18, 2023
Published in issue: July 25, 2023
Keywords
- ChatGPT
- text classification
- large language models
- human annotations
- text as data
Acknowledgments
This project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 883121). We thank Fabio Melliger, Paula Moser, and Sophie van IJzendoorn for excellent research assistance.
Author Contributions
F.G., M.A., and M.K. designed research; performed research; analyzed data; and wrote the paper.
Competing Interests
The authors declare no competing interest.
Authors
Affiliations
Department of Political Science, University of Zurich, Zurich 8050, Switzerland
Department of Political Science, University of Zurich, Zurich 8050, Switzerland
Department of Political Science, University of Zurich, Zurich 8050, Switzerland
Notes
Metrics & Citations
Metrics
Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.
Citation statements
Altmetrics
Citations
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.
Cited by
View Options
View options
PDF format
Download this article as a PDF file
DOWNLOAD PDF
Get Access
Media
Figures
Tables
Other