- Title
- Meaning-sensitive text data augmentation with intelligent masking
- Creator
- Kasthuriarachchy, Buddhika; Chetty, Madhu; Shatte, Adrian; Walls, Darren
- Date
- 2023
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/199042
- Identifier
- vital:19148
- Identifier
-
https://doi.org/10.1145/3623403
- Identifier
- ISSN:2157-6904 (ISSN)
- Abstract
- With the recent popularity of applying large-scale deep neural network-based models for natural language processing (NLP), attention to develop methods for text data augmentation is at its peak, since the limited size of training data tends to significantly affect the accuracy of these models. To this end, we propose a novel text data augmentation technique called Intelligent Masking with Optimal Substitutions Text Data Augmentation (IMOSA). IMOSA, developed for labelled sentences, can identify the most favourable sentences and locate the appropriate word combinations in a particular sentence to replace and generate synthetic sentences with a meaning closer to the original sentence, while also significantly increasing the diversity of the dataset. We demonstrate that the proposed technique notably improves the performance of classifiers based on attention-based transformer models through the extensive experiments for five different text classification tasks which are performed under the low data regime in a context-Aware NLP setting. The analysis clearly shows that IMOSA effectively generates more sentences using favourable original examples and completely ignores undesirable examples. Furthermore, the experiments carried out confirm IMOSA's ability to add diversity to the augmented dataset using multiple distinct masking patterns against the same original sentence, which remarkably adds variety to the training dataset. IMOSA consistently outperforms the two key masked language model-based text data augmentation techniques, and demonstrates a robust performance against the critical challenging NLP tasks. © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
- Publisher
- Association for Computing Machinery
- Relation
- ACM Transactions on Intelligent Systems and Technology Vol. 14, no. 6 (2023), p.
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- © 2023 Copyright held by the owner/author(s)
- Subject
- 4602 Artificial intelligence; 4611 Machine learning; Additional key words and phrasestext data augmentation; IMOSA; Masked language model
- Reviewed
- Funder
- This research is supported by Global Hosts Pty Ltd trading as SportsHosts, a Melbourne based company.
- Hits: 471
- Visitors: 466
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|