- Title
- Multi-label classification on shorter featured dataset using optimization techniques
- Creator
- Banerjee, Arunava
- Date
- 2012
- Type
- Text; Thesis; Masters
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/157500
- Identifier
- vital:11622
- Abstract
- Classification of objects based on inherent properties is a general problem area encountered in diverse fields of knowledge. In terms of text classification, the problem presented in this work is based on two particular criteria for documents as given below: • Informativeness of feature sets - A feature set would comprise of words in a document. Presence of words that can be used to characterize a document in a corpus (database) is the informativeness of a feature set. • Multilabelness - documents can have content dealing with diverse topics These criteria are not localized to documents only, but can be generalized to other areas as well with little adaptation. In this thesis, the classification problem that is being investigated involve datasets containing the prescence of smaller number of features associated with a larger number of classes. The acronym SFML (Shorter Featured & Multi-Labeled) has been used to denote these types of datasets. Further, SFML type datasets can be encountered in various walks of life, like Medicine, SMS Services, Text Classification to name a few. In this thesis, the performance of various existing classification algorithms were tested on SFML datasets and their results compared. Further, a new classification algorithm based on optimization is also proposed for these types of datasets. Applications to the Adverse Drug Reaction problem and phishing profiling problem have been considered here. Classification results show that the proposed algorithm performs better than existing classification algorithms as the number of features tend to decrease.; Master of Computing (By Research)
- Publisher
- Federation University Australia
- Rights
- Copyright Arunava Banerjee
- Rights
- Open Access
- Rights
- This metadata is freely available under a CCO license
- Subject
- Multi-label classification; Shorter featrued datasets; Optimization techniques
- Thesis Supervisor
- Mammadov, Musa
- Hits: 2100
- Visitors: 1792
- Downloads: 1
Thumbnail | File | Description | Size | Format |
---|