Two-step comprehensive open domain text annotation with frame semantics
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Ma, Liping
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Australasian Language Technology Workshop 2007, Melbourne Zoo, Melbourne, Victoria : 10th-11th December 2007 p. 83-91
- Full Text:
- Description: With shallow semantic parsing tasks receiving more attention in many natural language applications, there is a need for labelled corpora for learning the specific tags under consideration. In this paper, we discuss a two-step semantic class and semantic role assignment based on the FrameNet elements over a subset of the AQUAINT collection with a reasonable coverage over the semantic frames in FrameNet. The quality of the annotation task is examined through inter-annotator agreement. The methodology described in this work for measuring inter-annotator agreement can be adapted for similar tasks. Some central aspects of the task are also detailed in this paper.
- Description: 2003005522
Understanding victims of identity theft: A grounded theory approach
- Authors: Turville, Kylie , Firmin, Sally , Yearwood, John , Miller, Charlynn
- Date: 2010
- Type: Text , Conference paper
- Relation: 5th International Conference on Qualitative Research in IT and IT in Qualitative Research, QualIT 2010
- Full Text:
- Reviewed:
- Description: Being a victim of identity theft can be a devastating and life-changing event. Once the victim discovers the misuse they need to begin the process of recovery. For the "lucky" victims this may take only a couple of phone calls and a small amount of time; however, some victims may experience difficulties for many year. In order to recover, victims of crime require support and assistance; however, within Australia this support is sadly lacking. In order to identify the issues currently faced by victims of identity theft as they work through the recovery process, a Grounded Theory methodology was identified as most appropriate. This paper provides a brief overview of the history of the research project; a brief introduction of grounded theory with a focus on preconceived ideas and their implications; and a description of the research project currently being undertaken. A discussion of some issues experienced when using grounded theory within an IT department with very little experience of qualitative research will be provided, along with some preliminary results.
- Description: E1
Understanding victims of identity theft: Preliminary insights
- Authors: Turville, Kylie , Yearwood, John , Miller, Charlynn
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Identity theft is not a new crime, however changes in society and the way that business is conducted have made it an easier, attractive and more lucrative crime. When a victim discovers the misuse of their identity they must then begin the process of recovery, including fixing any issues that may have been created by the misuse. For some victims this may only take a small amount of time and effort, however for others they may continue to experience issues for many years after the initial moment of discovery. To date, little research has been conducted within Australia or internationally regarding what a victim experiences as they work through the recovery process. This paper presents a summary of the identity theft domain with an emphasis on research conducted within Australia, and identifies a number of issues regarding research in this area. The paper also provides an overview of the research project currently being undertaken by the authors in obtaining an understanding of what victims of identity theft experience during the recovery process; particularly their experiences when dealing with organizations. Finally, it reports on some of the preliminary work that has already been conducted for the research project. © 2010 IEEE.
Unsupervised and supervised data classification via nonsmooth and global optimisation
- Authors: Bagirov, Adil , Rubinov, Alex , Sukhorukova, Nadezda , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: Top Vol. 11, no. 1 (2003), p. 1-92
- Full Text:
- Reviewed:
- Description: We examine various methods for data clustering and data classification that are based on the minimization of the so-called cluster function and its modications. These functions are nonsmooth and nonconvex. We use Discrete Gradient methods for their local minimization. We consider also a combination of this method with the cutting angle method for global minimization. We present and discuss results of numerical experiments.
- Description: C1
- Description: 2003000421
Unsupervised color textured image segmentation using cluster ensembles and MRF mdel
- Authors: Islam, Mofakharul , Yearwood, John , Vamplew, Peter
- Date: 2008
- Type: Text , Book chapter
- Relation: Advances in computer and information sciences and engineering p. 323-328
- Full Text: false
- Reviewed:
- Description: We propose a novel approach to implement robust unsupervised color image content understanding approach that segments a color image into its constituent parts automatically. The aim of this work is to produce precise segmentation of color images using color and texture information along with neighborhood relationships among image pixels which will provide more accuracy in segmentation. Here, unsupervised means automatic discovery of classes or clusters in images rather than generating the class or cluster descriptions from training image sets. As a whole, in this particular work, the problem we want to investigate is to implement a robust unsupervised SVFM model based color medical image segmentation tool using Cluster Ensembles and MRF model along with wavelet transforms for increasing the content sensitivity of the segmentation model. In addition, Cluster Ensemble has been utilized for introducing a robust technique for finding the number of components in an image automatically. The experimental results reveal that the proposed tool is able to find the accurate number of objects or components in a color image and eventually capable of producing more accurate and faithful segmentation and can. A statistical model based approach has been developed to estimate the Maximum a posteriori (MAP) to identify the different objects/components in a color image. The approach utilizes a Markov Random Field model to capture the relationships among the neighboring pixels and integrate that information into the Expectation Maximization (EM) model fitting MAP algorithm. The algorithm simultaneously calculates the model parameters and segments the pixels iteratively in an interleaved manner. Finally, it converges to a solution where the model parameters and pixel labels are stabilized within a specified criterion. Finally, we have compared our results with another well-known segmentation approach.
Unsupervised segmentation of Industrial Images using Markov Random Field Model
- Authors: Islam, Mofakharul , Yearwood, John , Vamplew, Peter
- Date: 2009
- Type: Text , Book chapter
- Relation: Technogical Developments in Education and Automation p. 369-374
- Full Text: false
- Reviewed:
- Description: We propose a novel approach to investigate and implement unsupervised image content understanding and segmentation of color industrial images like medical imaging, forensic imaging, security and surveillance imaging, biotechnical imaging, biometrics, mineral and mining imaging, material science imaging, and many more. In this particular work, our focus will be on medical images only. The aim is to develop a computer aided diagnosis (CAD) system based on a newly developed Multidimensional Spatially Variant Finite Mixture Model (MSVFMM) using Markov Random Fields (MRF) Model. Unsupervised means automatic discovery of classes or clusters in images rather than generating the class or cluster descriptions from training image sets. The aim of this work is to produce precise segmentation of color medical images on the basis of subtle color and texture variation. Finer segmentation of images has tremendous potential in medical imaging where subtle information related to color and texture is required to analyze the image accurately. In this particular work, we have used CIE-Luv and Daubechies wavelet transforms as color and texture descriptors respectively. Using the combined effect of a CIE-Luv color model and Daubechies transforms, we can segment color medical images precisely in a meaningful manner. The evaluation of the results is done through comparison of the segmentation quality with another similar alternative approach and it is found that the proposed approach is capable of producing more faithful segmentation.
Using association and overlapping time window approach to detect drug reaction signals
- Authors: Ivkovic, Sasha , Saunders, Gary , Ghosh, Ranadhir , Yearwood, John
- Date: 2006
- Type: Text , Conference paper
- Relation: Paper presented at CIMCA 2005 International Conference on Computational Intelligence for Modelling Control & Automation jointly with IAWTIC 2005 International Conference on Intelligent Agents, Web Technologies & Internet Commerce, Vienna, Austria : 28th November, 2005 p. 1045-1053
- Full Text:
- Reviewed:
- Description: The problem with detecting adverse drug reactions (ADRs) from drugs is that they may not be obvious until long after they are widely prescribed. Part of the problem is these events are rare. This work describes an approach to signal detection of ADRs based on association rules (AR) in Australian drug safety data. This work was carried out using the Australian Adverse Drug Reactions Advisory Committee (ADRAC) database, which contains a hundred and thirty seven thousand records collected in 1972-2001 period. Many signal detection methods have been developed for drug safety data, most of which use a classical statistical approach. Some of these stratify the data using an ontology for reactions, but the application of drug ontologies to ADR signal detection methods has not been reported. We propose a novel approach for detecting various signal levels by using an overlapped windowing approach. The overlapping windows help to detect smooth transition of signal. We use association rules for measuring significant change over time for different hierarchical levels of drugs (using the Anatomical-Therapeutic-Chemical (ATC) system of drug classification ontology) and their reactions based on the System Organ Classes (SOC) ontology. Using association rules and their strength for different levels in the drug and reaction hierarchy, helps in the detection of signals at particular levels in higher order using a bottom up approach. The results of a preliminary investigation of ADRAC data using our method demonstrate that this approach could produce a powerful and robust ADR signal detection method.
- Description: E1
- Description: 2003001838
Using corpus analysis to inform research into opinion detection in blogs
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at Sixth Australasian Data Mining Conference, AusDM 2007, Gold Coast, Queensland, Victoria : 3rd-4th December 2007 p. 65-75
- Full Text:
- Description: Opinion detection research relies on labeled documents for training data, either by assumptions based on the document's origin or by using human assessors to categorise the documents. In recent years, blogs have become a source for opinion identification research (TREC Blog06). This study analyses the part-of-speech proportion and the words used within various corpora, determining key differences and similarities useful when preparing for opinion identification research. The resulting comparisons between the characteristics of the various corpora is detailed and discussed. In particular, opinion bearing and non opinion Blog06 documents were found to display a high level of similarity, indicating that blog documents assessed at the document level cannot be used as training data in opinion identification research.
- Description: 2003004892
Using global optimization to improve classification for medical diagnosis and prognosis
- Authors: Bagirov, Adil , Rubinov, Alex , Yearwood, John
- Date: 2001
- Type: Text , Journal article
- Relation: Topics in health information management Vol. 22, no. 1 (2001), p. 65-74
- Full Text: false
- Description: Global optimization-based techniques are studied in order to increase the accuracy of medical diagnosis and prognosis with data from various databases. First, we discuss feature selection, the problem of determining the most informative features for classification in the databases under consideration. Then, we apply a technique based on convex and global optimization for classification in these databases. The third application of this technique is a method that calculates centers of clusters to predict when breast cancer is likely to recur in patients for which cancer has been removed. The technique achieves high accuracy with these databases. Better classifiers will lead to improved assistance in making medical diagnostic and prognostic decisions.
- Description: 2003003662
Using links to aid web classification
- Authors: Xie, Wei , Mammadov, Musa , Yearwood, John
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 981-986
- Full Text:
- Description: In this paper, we will present a new approach of using link information to improve the accuracy and efficiency of web classification. However, different from others, we only use the mappings between linked documents and their own class or classes. In this case, we only need to add a few features called linked-class features into the datasets. We apply SVM and BoosTexter for classification. We show that the classification accuracy can be improved based on mixtures of ordinary word features and out-linked-class features. We analyze and discuss the reason of this improvement.
- Description: 2003005438
Using psycholinguistic features for profiling first language of authors
- Authors: Torney, Rosemary , Vamplew, Peter , Yearwood, John
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of the American Society for Information Science and Technology Vol. 63, no. 6 (2012), p. 1256-1269
- Full Text: false
- Reviewed:
- Description: This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the Linguistic Inquiry and Word Count (LIWC) tool, that have not previously been applied to the task of author profiling. As LIWC is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both as a single type feature set because of its psycholinguistic basis, and in combination with other feature sets, because it should be sufficiently different to add insight rather than redundancy. It was found that LIWC features were competitive with previously used feature types in identifying the first language of an author, and that combined feature sets including LIWC features consistently showed better accuracy rates and average F measures than were achieved by the same feature sets without the LIWC features. As a secondary issue, this study also examined how effectively first language classification scaled up to a larger number of possible languages. It was found that the classification scheme scaled up effectively to the entire 16 language collection from the International Corpus of Learner English, when compared with results achieved on just 5 languages in previous research. 2012 ASIS&T.
Visual grouping of association rules by clustering conditional probabilities for categorical data
- Authors: Ivkovic, Sasha , Ghosh, Ranadhir , Yearwood, John
- Date: 2005
- Type: Text , Book chapter
- Relation: Business Applications and Computational Intelligence p. 248-266
- Full Text: false
- Reviewed:
- Description: We demonstrate the use of a visual data-mining tool for non-technical domain experts within organizations to facilitate the extraction of meaningful information and knowledge from in-house databases. The tool is mainly based on the basic notion of grouping association rules. Association rules are useful in discovering items that are frequently found together. However in many applications, rules with lower frequencies are often interesting for the user. Grouping of association rules is one way to overcome the rare item problem. However some groups of association rules are too large for ease of understanding. In this chapter we propose a method for clustering categorical data based on the conditional probabilities of association rules for data sets with large numbers of attributes. We argue that the proposed method provides non-technical users with a better understanding of discovered patterns in the data set.
Visual tools for analysing evolution, emergence, and error in data streams
- Authors: Hart, Sol , Yearwood, John , Bagirov, Adil
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 987-992
- Full Text:
- Description: The relatively new field of stream mining has necessitated the development of robust drift-aware algorithms that provide accurate, real time, data handling capabilities. Tools are needed to assess and diagnose important trends and investigate drift evolution parameters. In this paper, we present two new and novel visualisation techniques, Pixie and Luna graphs, which incorporate salient group statistics coupled with intuitive visual representations of multidimensional groupings over time. Through the novel representations presented here, spatial interactions between temporal divisions can be diagnosed and overall distribution patterns identified. It provides a means of evaluating in non-constrained capacity, commonly constrained evolutionary problems.
- Description: 2003005432
Visualizing association rules for feedback within the legal system
- Authors: Ivkovic, Sasha , Yearwood, John , Stranieri, Andrew
- Date: 2003
- Type: Text , Conference paper
- Relation: Paper presented at the 9th International Conference on Artificial Intelligence and Law, Edinburgh, Scotland : 24th - 28th June, 2003
- Full Text: false
- Reviewed:
- Description: Knowledge discovery from databases (KDD) exercises in law have typically attempted to derive knowledge about decision making processes in the legal domain automatically from datasets. This is made difficult in that real data that represents aspects of a decision process in law is commonly stored as text and rarely stored in structured databases. The central claim advanced here is that KDD processes can be usefully applied to existing datasets of client and demographic data in order to provide feedback for the effective operation of organizations within the legal system. However, the cost of data mining suites and the scarcity of specialized personnel for these tools mitigates against their use. In this study data mining with Association Rules (AR) has been performed on a data-set of over 380,000 records from a legal aid agency. Methods to visualise patterns in order to suggest and test plausible hypotheses from the data have been developed. The tool, called WebAssociate is entirely web based. Domain experts using the tool report favorable responses.
- Description: E1
- Description: 2003000495
Water allocation argument tree (WAAT): A tool for facilitating public participation in water allocation decisions
- Authors: Graymore, Michelle , Stranieri, Andrew , McRae-Williams, Pamela , Mays, Heather , Lehmann, La Vergne , Thoms, Gavin , Yearwood, John
- Date: 2012
- Type: Text , Book
- Full Text: false
- Reviewed:
Web-based decision support for structured reasoning in health
- Authors: Stranieri, Andrew , Yearwood, John , Gervasoni, Susan , Garner, Susan , Deans, Cecil , Johnstone, Alistair
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at Health Informatics Conference 2004 - Let's make a difference with health ICT, Brisbane, Queensland : 25th July, 2004
- Full Text: false
- Reviewed:
- Description: Decision making processes in the health care setting are often complex, demanding of professionals and rapidly changing. Furthermore, there is increasing pressure for professionals to make reasoning transparent and consistent. Decision support technologies have not made a substantial contribution to these issues to date largely because knowledge is difficult to elicit and maintain and existing development tools are very sophisticated yet complex. In this study a method for representing complex and discretionary reasoning used successfully in law was applied to the task of modelling decision making processes that critical care nurses deploy in responding to a low oxygen alarm. The approach, based on decision and argument tree diagrams enables the rapid development of small scale, yet useful web based decision support systems.
- Description: E1
- Description: 2003000831
Weblogs for market research : Finding more relevant opinion documents using system fusion
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2009
- Type: Text , Journal article
- Relation: Online Information Review Vol. 33, no. 5 (2009), p. 873-888
- Full Text: false
- Reviewed:
- Description: Purpose - The purpose of this paper is to examine the usefulness of fusion as a means of improving the precision of automated opinion detection. Design/methodology/approach - Five system fusion methods are proposed and tested using runs submitted by the Text REtrieval Conference (TREC) Blog06 participants as input. The methods include a voting method, an inverse rank method (IRM), a linear-normalised score method and two weighted methods that use a weighted IRM score to rank the document. Findings - Mean average precision (MAP) is used as an indicator of the performance of the runs in this study. The best system fusion method achieves a 55.5 percent higher MAP result compared with the highest MAP result of any individual run submitted by the Blog06 participants. This equates to an increase in detection of 2,398 relevant opinion documents (21 percent). Practical implications - System fusion can be used to improve upon the results achieved by existing individual opinion detection systems. On the other hand, multiple opinion detection approaches can be combined into one system and fusion used to combine the results to build in diversity. Diversity within fusion inputs can increase the improvements achieved by fusion methods. The improved output from a diverse opinion detection system will then contain a higher number of relevant documents and reduce the incidence of high-ranking non-relevant documents and low-ranking relevant documents. Originality/value - The fusion methods proposed in this study demonstrate that simple fusion of opinion detection systems can improve performance.
Weblogs for market research : Improving opinion detection using system fusion
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Service Systems and Service Management, 2008, Melbourne, Victoria : 30th June - 2nd July 2008 p. 1-6
- Full Text:
- Description: Searching for opinions on a specific product or service within blogs is a new frontier for market researchers. This research investigates the use of system fusion methods to improve mean average precision (MAP) results achieved by the Text REtrieval Conference (TREC) Blog06 participants and reports the improved MAP results. It is hypothesized that diversity of the inputs is vital to maximising the MAP improvements. This is shown in the improvement in MAP values achieved by some of the participantpsilas ranked lists. The growth in the number of blog authors who write valuable opinions about their life experiences has led to an unsolicited resource of opinions on products, politics and services. In 2006, TREC collected blogs and set a task of detecting opinions on given topics to their participants, reporting the results using MAP.
- Description: 2003007757
Workload coverage through nonsmooth optimization
- Authors: Sukhorukova, Nadezda , Ugon, Julien , Yearwood, John
- Date: 2009
- Type: Text , Journal article
- Relation: Optimization Methods and Software Vol. 24, no. 2 (2009), p. 285-298
- Full Text: false
- Reviewed:
- Description: In this paper, workload coverage is the problem of identifying a pattern of days worked and days off, along with the number of hours worked on each work day. This pattern must satisfy certain work-related constraints and fit best to a predefined workload. In our study, we formulate the problem of workload coverage as an optimization problem. We propose a number of models which take into consideration various staffing constraints. For each of these models, our study aims to find a compromise between an accurate workload coverage and the ability to solve the corresponding optimization problems in a reasonable time. Numerical experiments on each model are carried out and the results are presented. Interestingly, the nonlinear programming approaches are found to be competitive with linear programming ones. © 2009 Taylor & Francis.