- Title
- Using corpus analysis to inform research into opinion detection in blogs
- Creator
- Osman, Deanna; Yearwood, John; Vamplew, Peter
- Date
- 2007
- Type
- Text; Conference paper
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/33586
- Identifier
- vital:3617
- Identifier
- ISBN:1445-1336
- Abstract
- Opinion detection research relies on labeled documents for training data, either by assumptions based on the document's origin or by using human assessors to categorise the documents. In recent years, blogs have become a source for opinion identification research (TREC Blog06). This study analyses the part-of-speech proportion and the words used within various corpora, determining key differences and similarities useful when preparing for opinion identification research. The resulting comparisons between the characteristics of the various corpora is detailed and discussed. In particular, opinion bearing and non opinion Blog06 documents were found to display a high level of similarity, indicating that blog documents assessed at the document level cannot be used as training data in opinion identification research.
- Publisher
- Gold Coast, Queensland, Victoria : Australian Computer Society
- Relation
- Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 65-75
- Rights
- Open Access
- Rights
- Copyright Australian Computer Society (uploading privileges were granted by permission of the Australian Computer Society Inc)
- Rights
- This metadata is freely available under a CCO license
- Subject
- 0804 Data Format; Blogs; Web logs; Blog06; TREC; Opinion detection; Opinion identification
- Full Text
- Hits: 4454
- Visitors: 4592
- Downloads: 160
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Conference paper | 405 KB | Adobe Acrobat PDF | View Details Download |