Opinion search in web logs
- Authors: Osman, Deanna , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Eighteenth Australasian Database Conference, ADC 2007, Ballarat, Victoria : 29th January-2nd February 2007 p. 133-139
- Full Text:
- Description: Web logs(blogs) are a fast growing forum for people of all ages to express their feelings and opinions on topics of interest. The entries are often written in informal language without the structure found in newswire or published articles. One blog entry may contain many topics, these topics may express an opinion or a fact on a particular topic. This research is in contrast to work on opinion detection which has been carried out on more formally authored texts and on segments that are either whole documents or sentences. Whole web logs are divided into topics using a simple text segmentation approach. Similarity scores are used to distinguish where topic changers occur. The results are compared to human-evaluated topic changes and the most accurate algorithm is used in the remainder of the research. Words within each topic-block are allocated weightings depending on their opinion-bearing strength. Two approaches of using these weights, the sum and the maximum, are used to determine whether the topic-block is opinion-bearing or non-opinion-bearing. The opinion-bearing topic-blocks are rated by human evaluators as either opinion-bearing or non-opinion-bearing with precision of 67% for approach A and 70% for approach B. These results are compared with two approaches on published text to identify the difference between web logs and published articles.
- Description: 2003004895
Using corpus analysis to inform research into opinion detection in blogs
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 65-75
- Full Text:
- Description: Opinion detection research relies on labeled documents for training data, either by assumptions based on the document's origin or by using human assessors to categorise the documents. In recent years, blogs have become a source for opinion identification research (TREC Blog06). This study analyses the part-of-speech proportion and the words used within various corpora, determining key differences and similarities useful when preparing for opinion identification research. The resulting comparisons between the characteristics of the various corpora is detailed and discussed. In particular, opinion bearing and non opinion Blog06 documents were found to display a high level of similarity, indicating that blog documents assessed at the document level cannot be used as training data in opinion identification research.
- Description: 2003004892
A web-based Narrative construction environment
- Authors: Yearwood, John , Stranieri, Andrew , Osman, Deanna
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at NILE 2008: 5th International Conference on Narrative and Interactive Learning Environments, Edinburgh, Scotland : 6th-8th August 2008 p. 78-81
- Full Text:
- Description: This paper describes a web-based environment for constructing narrative from story snippets contributed by a community of interest. The underlying model uses an argument based structure to infer the next event in the narrative sequence. The approach makes use of both events and higher level story elements derived from Polti’s dramatic situations. Dramatic situations used are consistent with a theme, and events are generally constrained by the dramatic situation. The narrative generated is a function of the event history, the dramatic situations chosen and the plausible inferences about next events that are contributed by a community of interest in the theme. At this stage, a player’s actions are simulated using a random selection from a set and the implementation of a nonsense filter. Example outputs from the system are provided and discussed.
- Description: 2003006499
Weblogs for market research : Improving opinion detection using system fusion
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Service Systems and Service Management, 2008, Melbourne, Victoria : 30th June - 2nd July 2008 p. 1-6
- Full Text:
- Description: Searching for opinions on a specific product or service within blogs is a new frontier for market researchers. This research investigates the use of system fusion methods to improve mean average precision (MAP) results achieved by the Text REtrieval Conference (TREC) Blog06 participants and reports the improved MAP results. It is hypothesized that diversity of the inputs is vital to maximising the MAP improvements. This is shown in the improvement in MAP values achieved by some of the participantpsilas ranked lists. The growth in the number of blog authors who write valuable opinions about their life experiences has led to an unsolicited resource of opinions on products, politics and services. In 2006, TREC collected blogs and set a task of detecting opinions on given topics to their participants, reporting the results using MAP.
- Description: 2003007757
Weblogs for market research : Finding more relevant opinion documents using system fusion
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2009
- Type: Text , Journal article
- Relation: Online Information Review Vol. 33, no. 5 (2009), p. 873-888
- Full Text: false
- Reviewed:
- Description: Purpose - The purpose of this paper is to examine the usefulness of fusion as a means of improving the precision of automated opinion detection. Design/methodology/approach - Five system fusion methods are proposed and tested using runs submitted by the Text REtrieval Conference (TREC) Blog06 participants as input. The methods include a voting method, an inverse rank method (IRM), a linear-normalised score method and two weighted methods that use a weighted IRM score to rank the document. Findings - Mean average precision (MAP) is used as an indicator of the performance of the runs in this study. The best system fusion method achieves a 55.5 percent higher MAP result compared with the highest MAP result of any individual run submitted by the Blog06 participants. This equates to an increase in detection of 2,398 relevant opinion documents (21 percent). Practical implications - System fusion can be used to improve upon the results achieved by existing individual opinion detection systems. On the other hand, multiple opinion detection approaches can be combined into one system and fusion used to combine the results to build in diversity. Diversity within fusion inputs can increase the improvements achieved by fusion methods. The improved output from a diverse opinion detection system will then contain a higher number of relevant documents and reduce the incidence of high-ranking non-relevant documents and low-ranking relevant documents. Originality/value - The fusion methods proposed in this study demonstrate that simple fusion of opinion detection systems can improve performance.
Automated opinion detection : Implications of the level of agreement between human raters
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2010
- Type: Text , Journal article
- Relation: Information Processing and Management Vol. 46, no. 3 (2010), p. 331-342
- Full Text: false
- Reviewed:
- Description: The ability to agree with the TREC Blog06 opinion assessments was measured for seven human assessors and compared with the submitted results of the Blog06 participants. The assessors achieved a fair level of agreement between their assessments, although the range between the assessors was large. It is recommended that multiple assessors are used to assess opinion data, or a pre-test of assessors is completed to remove the most dissenting assessors from a pool of assessors prior to the assessment process. The possibility of inconsistent assessments in a corpus also raises concerns about training data for an automated opinion detection system (AODS), so a further recommendation is that AODS training data be assembled from a variety of sources. This paper establishes an aspirational value for an AODS by determining the level of agreement achievable by human assessors when assessing the existence of an opinion on a given topic. Knowing the level of agreement amongst humans is important because it sets an upper bound on the expected performance of AODS. While the AODSs surveyed achieved satisfactory results, none achieved a result close to the upper bound. © 2009 Elsevier Ltd. All rights reserved.