- Title
- Statistical compression-based models for text classification
- Creator
- Saikrishna, Vidya; Dowe, David; Ray, Sid
- Date
- 2017
- Type
- Text; Conference paper
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/181536
- Identifier
- vital:15956
- Identifier
-
https://doi.org/10.1109/Eco-friendly.2016.7893212
- Identifier
- ISBN:9781509043590 (ISBN)
- Abstract
- Text classification is the task of assigning predefined categories to text documents. It is a common machine learning problem. Statistical text classification that makes use of machine learning methods to learn classification rules are particularly known to be successful in this regard. In this research project we are trying to re-invent the text classification problem with a sound methodology based on statistical data compression technique-the Minimum Message Length (MML) principle. To model the data sequence we have used the Probabilistic Finite State Automata (PFSAs). We propose two approaches for text classification using the MML-PFSAs. We have tested both the approaches with the Enron spam dataset and the results of our empirical evaluation has been recorded in terms of the well known classification measures i.e. recall, precision, accuracy and error. The results indicate good classification accuracy that can be compared with the state of art classifiers. © 2016 IEEE.
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Relation
- 5th International Conference on Eco-Friendly Computing and Communication Systems, ICECCS 2016 p. 1-6
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- Copyright © 2016 IEEE
- Subject
- Minimum Message Length (MML); Probabilistic Finite State Automaton (PFSA); Spam Filtering
- Reviewed
- Hits: 212
- Visitors: 207
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|