- Title
- Cost effective annotation framework using zero-shot text classification
- Creator
- Kasthuriarachchy, Buddhika; Chetty, Madhu; Shatte, Adrian; Walls, Darren
- Date
- 2021
- Type
- Text; Conference paper
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/180317
- Identifier
- vital:15733
- Identifier
-
https://doi.org/10.1109/IJCNN52387.2021.9534335
- Identifier
- ISBN:9780738133669 (ISBN)
- Abstract
- Manual and high-quality annotation of social media data has enabled companies and researchers to develop improved implementations using natural language processing. However, human text-annotation is expensive and time-consuming. Crowd-sourcing platforms such as Amazon's Mechanical Turk (MTurk) can be leveraged for the creation of large training corpora for text classification tasks using social media data. Nevertheless, the quality of annotations can vary significantly, based on the interpretations and motivations of annotators completing the tasks. Further, the labelling cost of data through MTurk will increase if target messages are small and having a significant amount of noise (e.g. promotional messages on Twitter). In this work, we propose a new annotation framework to create high-quality human-annotated datasets for text classification from social media data. We present a zero-shot text classification based pre-annotation technique reducing the adverse effects arising due to the highly skewed distribution of data across target classes. The proposed framework significantly reduces the cost and time while maintaining the quality of the annotations. Being generic, it can be applied to annotating text data from any discipline. Our experiment with a Twitter data annotation using the proposed annotation framework shows a cost reduction of 80% with no compromise to quality. © 2021 IEEE.
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Relation
- 2021 International Joint Conference on Neural Networks, IJCNN 2021 Vol. 2021-July
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- Copyright ©2021 IEEE
- Subject
- Annotation; Crowdsourcing; Framework; Zero-shot text classification
- Reviewed
- Funder
- This research is supported by Global Hosts Pty Ltd trading as SportsHosts, a Melbourne based company.
- Hits: 1021
- Visitors: 959
- Downloads: 0