A technique for ranking friendship closeness in social networking services
- Sun, Zhaohao, Yearwood, John, Firmin, Sally
- Authors: Sun, Zhaohao , Yearwood, John , Firmin, Sally
- Date: 2013
- Type: Text , Conference paper
- Relation: 24th Australasian Conference on Information Systems (ACIS) p. 1-9
- Full Text:
- Reviewed:
- Description: The concept of friend and friendship are critical to both theoretical and empirical studies of social relations, social media and social networks. Measuring the closeness among friends is a big issue for developing online social networking services (SNS) such as Facebook. This paper will address this issue by proposing a technique for ranking friendship closeness in SNS. The technique consists of an algorithm for ranking need-driven friendship closeness and an algorithm for behaviour-based friendship closeness in online social networking sites. The former is based on Maslow’s hierarchy of needs, while the latter is based on behaviours of users on Facebook and TOPSIS. Examples provided illustrate the viability of the proposed algorithms. The research in this paper shows that ranking friendship closeness will facilitate understanding of needs and behaviours of friends and of friendships in SNS. The proposed approach will facilitate research and development of social media, social commerce, social networks, and SN
- Authors: Sun, Zhaohao , Yearwood, John , Firmin, Sally
- Date: 2013
- Type: Text , Conference paper
- Relation: 24th Australasian Conference on Information Systems (ACIS) p. 1-9
- Full Text:
- Reviewed:
- Description: The concept of friend and friendship are critical to both theoretical and empirical studies of social relations, social media and social networks. Measuring the closeness among friends is a big issue for developing online social networking services (SNS) such as Facebook. This paper will address this issue by proposing a technique for ranking friendship closeness in SNS. The technique consists of an algorithm for ranking need-driven friendship closeness and an algorithm for behaviour-based friendship closeness in online social networking sites. The former is based on Maslow’s hierarchy of needs, while the latter is based on behaviours of users on Facebook and TOPSIS. Examples provided illustrate the viability of the proposed algorithms. The research in this paper shows that ranking friendship closeness will facilitate understanding of needs and behaviours of friends and of friendships in SNS. The proposed approach will facilitate research and development of social media, social commerce, social networks, and SN
A new supervised term ranking method for text categorization
- Mammadov, Musa, Yearwood, John, Zhao, Lei
- Authors: Mammadov, Musa , Yearwood, John , Zhao, Lei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 102-111
- Full Text:
- Reviewed:
- Description: In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ2 statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ2 statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better. © 2010 Springer-Verlag.
- Authors: Mammadov, Musa , Yearwood, John , Zhao, Lei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 102-111
- Full Text:
- Reviewed:
- Description: In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ2 statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ2 statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better. © 2010 Springer-Verlag.
Consensus clustering and supervised classification for profiling phishing emails in internet commerce security
- Dazeley, Richard, Yearwood, John, Kang, Byeongho, Kelarev, Andrei
- Authors: Dazeley, Richard , Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 235-246
- Full Text:
- Reviewed:
- Description: This article investigates internet commerce security applications of a novel combined method, which uses unsupervised consensus clustering algorithms in combination with supervised classification methods. First, a variety of independent clustering algorithms are applied to a randomized sample of data. Second, several consensus functions and sophisticated algorithms are used to combine these independent clusterings into one final consensus clustering. Third, the consensus clustering of the randomized sample is used as a training set to train several fast supervised classification algorithms. Finally, these fast classification algorithms are used to classify the whole large data set. One of the advantages of this approach is in its ability to facilitate the inclusion of contributions from domain experts in order to adjust the training set created by consensus clustering. We apply this approach to profiling phishing emails selected from a very large data set supplied by the industry partners of the Centre for Informatics and Applied Optimization. Our experiments compare the performance of several classification algorithms incorporated in this scheme. © 2010 Springer-Verlag Berlin Heidelberg.
- Authors: Dazeley, Richard , Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 235-246
- Full Text:
- Reviewed:
- Description: This article investigates internet commerce security applications of a novel combined method, which uses unsupervised consensus clustering algorithms in combination with supervised classification methods. First, a variety of independent clustering algorithms are applied to a randomized sample of data. Second, several consensus functions and sophisticated algorithms are used to combine these independent clusterings into one final consensus clustering. Third, the consensus clustering of the randomized sample is used as a training set to train several fast supervised classification algorithms. Finally, these fast classification algorithms are used to classify the whole large data set. One of the advantages of this approach is in its ability to facilitate the inclusion of contributions from domain experts in order to adjust the training set created by consensus clustering. We apply this approach to profiling phishing emails selected from a very large data set supplied by the industry partners of the Centre for Informatics and Applied Optimization. Our experiments compare the performance of several classification algorithms incorporated in this scheme. © 2010 Springer-Verlag Berlin Heidelberg.
Exploring novel features and decision rules to identify cardiovascular autonomic neuropathy using a hybrid of wrapper-filter based feature selection
- Huda, Shamsul, Jelinek, Herbert, Ray, Biplob, Stranieri, Andrew, Yearwood, John
- Authors: Huda, Shamsul , Jelinek, Herbert , Ray, Biplob , Stranieri, Andrew , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at the 2010 6th International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2010 p. 297-302
- Full Text:
- Reviewed:
- Description: Cardiovascular autonomic neuropathy (CAN) is one of the important causes of mortality among diabetes patients. Statistics shows that more than 22% of people with type 2 diabetes mellitus suffer from CAN and which in turn leads to cardiovascular disease (heart attack, stroke). Therefore early detection of CAN could reduce the mortality. Traditional method for detection of CAN uses Ewing's algorithm where five noninvasive cardiovascular tests are used. Often for clinician, it is difficult to collect data from for the Ewing Battery patients due to onerous test conditions. In this paper, we propose a hybrid of wrapper-filter approach to find novel features from patients' ECG records and then generate decision rules for the new features for easier detection of CAN. In the proposed feature selection, a hybrid of filter (Maximum Relevance, MR) and wrapper (Artificial Neural Net Input Gain Measurement Approximation ANNIGMA) approaches (MR-ANNIGMA) would be used. The combined heuristics in the hybrid MRANNIGMA takes the advantages of the complementary properties of the both filter and wrapper heuristics and can find significant features. The selected features set are used to generate a new set of rules for detection of CAN. Experiments on real patient records shows that proposed method finds a smaller set of features for detection of CAN than traditional method which are clinically significant and could lead to an easier way to diagnose CAN. © 2010 IEEE.
- Authors: Huda, Shamsul , Jelinek, Herbert , Ray, Biplob , Stranieri, Andrew , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at the 2010 6th International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2010 p. 297-302
- Full Text:
- Reviewed:
- Description: Cardiovascular autonomic neuropathy (CAN) is one of the important causes of mortality among diabetes patients. Statistics shows that more than 22% of people with type 2 diabetes mellitus suffer from CAN and which in turn leads to cardiovascular disease (heart attack, stroke). Therefore early detection of CAN could reduce the mortality. Traditional method for detection of CAN uses Ewing's algorithm where five noninvasive cardiovascular tests are used. Often for clinician, it is difficult to collect data from for the Ewing Battery patients due to onerous test conditions. In this paper, we propose a hybrid of wrapper-filter approach to find novel features from patients' ECG records and then generate decision rules for the new features for easier detection of CAN. In the proposed feature selection, a hybrid of filter (Maximum Relevance, MR) and wrapper (Artificial Neural Net Input Gain Measurement Approximation ANNIGMA) approaches (MR-ANNIGMA) would be used. The combined heuristics in the hybrid MRANNIGMA takes the advantages of the complementary properties of the both filter and wrapper heuristics and can find significant features. The selected features set are used to generate a new set of rules for detection of CAN. Experiments on real patient records shows that proposed method finds a smaller set of features for detection of CAN than traditional method which are clinically significant and could lead to an easier way to diagnose CAN. © 2010 IEEE.
From convex to nonconvex: A loss function analysis for binary classification
- Zhao, Lei, Mammadov, Musa, Yearwood, John
- Authors: Zhao, Lei , Mammadov, Musa , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288
- Full Text:
- Reviewed:
- Description: Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, ø-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. © 2010 IEEE.
- Authors: Zhao, Lei , Mammadov, Musa , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288
- Full Text:
- Reviewed:
- Description: Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, ø-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. © 2010 IEEE.
Profiling phishing emails based on hyperlink information
- Yearwood, John, Mammadov, Musa, Banerjee, Arunava
- Authors: Yearwood, John , Mammadov, Musa , Banerjee, Arunava
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, Odense : 9th-11th August 2010 p. 120-127
- Full Text:
- Description: In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e.DNS) information on hyperlinks as profile classes. Further, we generate profiles based on classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information. © 2010 Crown Copyright.
- Authors: Yearwood, John , Mammadov, Musa , Banerjee, Arunava
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, Odense : 9th-11th August 2010 p. 120-127
- Full Text:
- Description: In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e.DNS) information on hyperlinks as profile classes. Further, we generate profiles based on classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information. © 2010 Crown Copyright.
Smokers' characteristics and cluster based quitting rule discovery model for enhancement of government's tobacco control systems
- Huda, Shamsul, Yearwood, John, Borland, Ron
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 14th Pacific Asia Conference on Information Systems (PACIS 2010)
- Full Text:
- Reviewed:
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quitting intentions is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quiiting intentios is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 14th Pacific Asia Conference on Information Systems (PACIS 2010)
- Full Text:
- Reviewed:
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quitting intentions is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quiiting intentios is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
Understanding victims of identity theft: A grounded theory approach
- Turville, Kylie, Firmin, Sally, Yearwood, John, Miller, Charlynn
- Authors: Turville, Kylie , Firmin, Sally , Yearwood, John , Miller, Charlynn
- Date: 2010
- Type: Text , Conference paper
- Relation: 5th International Conference on Qualitative Research in IT and IT in Qualitative Research, QualIT 2010
- Full Text:
- Reviewed:
- Description: Being a victim of identity theft can be a devastating and life-changing event. Once the victim discovers the misuse they need to begin the process of recovery. For the "lucky" victims this may take only a couple of phone calls and a small amount of time; however, some victims may experience difficulties for many year. In order to recover, victims of crime require support and assistance; however, within Australia this support is sadly lacking. In order to identify the issues currently faced by victims of identity theft as they work through the recovery process, a Grounded Theory methodology was identified as most appropriate. This paper provides a brief overview of the history of the research project; a brief introduction of grounded theory with a focus on preconceived ideas and their implications; and a description of the research project currently being undertaken. A discussion of some issues experienced when using grounded theory within an IT department with very little experience of qualitative research will be provided, along with some preliminary results.
- Description: E1
- Authors: Turville, Kylie , Firmin, Sally , Yearwood, John , Miller, Charlynn
- Date: 2010
- Type: Text , Conference paper
- Relation: 5th International Conference on Qualitative Research in IT and IT in Qualitative Research, QualIT 2010
- Full Text:
- Reviewed:
- Description: Being a victim of identity theft can be a devastating and life-changing event. Once the victim discovers the misuse they need to begin the process of recovery. For the "lucky" victims this may take only a couple of phone calls and a small amount of time; however, some victims may experience difficulties for many year. In order to recover, victims of crime require support and assistance; however, within Australia this support is sadly lacking. In order to identify the issues currently faced by victims of identity theft as they work through the recovery process, a Grounded Theory methodology was identified as most appropriate. This paper provides a brief overview of the history of the research project; a brief introduction of grounded theory with a focus on preconceived ideas and their implications; and a description of the research project currently being undertaken. A discussion of some issues experienced when using grounded theory within an IT department with very little experience of qualitative research will be provided, along with some preliminary results.
- Description: E1
A classification algorithm that derives weighted sum scores for insight into disease
- Quinn, Anthony, Stranieri, Andrew, Yearwood, John, Hafen, Gaudenz
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
Applying clustering and ensemble clustering approaches to phishing profiling
- Webb, Dean, Yearwood, John, Vamplew, Peter, Ma, Liping, Ofoghi, Bahadorreza, Kelarev, Andrei
- Authors: Webb, Dean , Yearwood, John , Vamplew, Peter , Ma, Liping , Ofoghi, Bahadorreza , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Eighth Australasian Data Mining Conference, AusDM 2009, University of Melbourne, Melbourne, Victoria : 1st–4th December 2009
- Full Text:
- Description: 2003007911
- Authors: Webb, Dean , Yearwood, John , Vamplew, Peter , Ma, Liping , Ofoghi, Bahadorreza , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Eighth Australasian Data Mining Conference, AusDM 2009, University of Melbourne, Melbourne, Victoria : 1st–4th December 2009
- Full Text:
- Description: 2003007911
Establishing phishing provenance using orthographic features
- Liping, Ma, Yearwood, John, Watters, Paul
- Authors: Liping, Ma , Yearwood, John , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009
- Full Text:
- Description: After phishing message detection, determining the provenance of phishing messages and Websites is the second step to tracing cybercriminals. In this paper, we present a novel method to cluster phishing emails automatically using orthographic features. In particular, we develop an algorithm to cluster documents and remove redundant features at the same time. After collecting all the possible features based on observation, we adapt the modified global k-mean method repeatedly, and generate the objective function values over a range of tolerance values across different subsets of features. Finally, we identify the appropriate clusters based on studying the distribution of the objective function values. Experimental evaluation of a large number of computations demonstrates that our clustering and feature selection techniques are highly effective and achieve reliable results.
- Description: 2003007842
- Authors: Liping, Ma , Yearwood, John , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009
- Full Text:
- Description: After phishing message detection, determining the provenance of phishing messages and Websites is the second step to tracing cybercriminals. In this paper, we present a novel method to cluster phishing emails automatically using orthographic features. In particular, we develop an algorithm to cluster documents and remove redundant features at the same time. After collecting all the possible features based on observation, we adapt the modified global k-mean method repeatedly, and generate the objective function values over a range of tolerance values across different subsets of features. Finally, we identify the appropriate clusters based on studying the distribution of the objective function values. Experimental evaluation of a large number of computations demonstrates that our clustering and feature selection techniques are highly effective and achieve reliable results.
- Description: 2003007842
Experimental investigation of three machine learning algorithms for ITS dataset
- Yearwood, John, Kang, Byeongho, Kelarev, Andrei
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
- Full Text:
- Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
- Description: 2003007844
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
- Full Text:
- Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
- Description: 2003007844
From lexical entailment to recognizing textual entailment using linguistic resources
- Ofoghi, Bahadorreza, Yearwood, John
- Authors: Ofoghi, Bahadorreza , Yearwood, John
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Australasian Language Technology Association Workshop 2009, Sydney, New South Wales : 3rd-4th December 2009 p. 119–123
- Full Text:
- Description: In this paper, we introduce our Recognizing Textual Entailment (RTE) system developed on the basis of Lexical Entailment between two text excerpts, namely the hypothesis and the text. To extract atomic parts of hypotheses and texts, we carry out syntactic parsing on the sentences. We then utilize WordNet and FrameNet lexical resources for estimating lexical coverage of the text on the hypothesis. We report the results of our RTE runs on the Text Analysis Conference RTE datasets. Using a failure analysis process, we also show that the main difficulty of our RTE system relates to the underlying difficulty of syntactic analysis of sentences.
- Description: 2003007910
- Authors: Ofoghi, Bahadorreza , Yearwood, John
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Australasian Language Technology Association Workshop 2009, Sydney, New South Wales : 3rd-4th December 2009 p. 119–123
- Full Text:
- Description: In this paper, we introduce our Recognizing Textual Entailment (RTE) system developed on the basis of Lexical Entailment between two text excerpts, namely the hypothesis and the text. To extract atomic parts of hypotheses and texts, we carry out syntactic parsing on the sentences. We then utilize WordNet and FrameNet lexical resources for estimating lexical coverage of the text on the hypothesis. We report the results of our RTE runs on the Text Analysis Conference RTE datasets. Using a failure analysis process, we also show that the main difficulty of our RTE system relates to the underlying difficulty of syntactic analysis of sentences.
- Description: 2003007910
A web-based Narrative construction environment
- Yearwood, John, Stranieri, Andrew, Osman, Deanna
- Authors: Yearwood, John , Stranieri, Andrew , Osman, Deanna
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at NILE 2008: 5th International Conference on Narrative and Interactive Learning Environments, Edinburgh, Scotland : 6th-8th August 2008 p. 78-81
- Full Text:
- Description: This paper describes a web-based environment for constructing narrative from story snippets contributed by a community of interest. The underlying model uses an argument based structure to infer the next event in the narrative sequence. The approach makes use of both events and higher level story elements derived from Polti’s dramatic situations. Dramatic situations used are consistent with a theme, and events are generally constrained by the dramatic situation. The narrative generated is a function of the event history, the dramatic situations chosen and the plausible inferences about next events that are contributed by a community of interest in the theme. At this stage, a player’s actions are simulated using a random selection from a set and the implementation of a nonsense filter. Example outputs from the system are provided and discussed.
- Description: 2003006499
- Authors: Yearwood, John , Stranieri, Andrew , Osman, Deanna
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at NILE 2008: 5th International Conference on Narrative and Interactive Learning Environments, Edinburgh, Scotland : 6th-8th August 2008 p. 78-81
- Full Text:
- Description: This paper describes a web-based environment for constructing narrative from story snippets contributed by a community of interest. The underlying model uses an argument based structure to infer the next event in the narrative sequence. The approach makes use of both events and higher level story elements derived from Polti’s dramatic situations. Dramatic situations used are consistent with a theme, and events are generally constrained by the dramatic situation. The narrative generated is a function of the event history, the dramatic situations chosen and the plausible inferences about next events that are contributed by a community of interest in the theme. At this stage, a player’s actions are simulated using a random selection from a set and the implementation of a nonsense filter. Example outputs from the system are provided and discussed.
- Description: 2003006499
AWSum - applying data mining in a health care scenario
- Quinn, Anthony, Jelinek, Herbert, Stranieri, Andrew, Yearwood, John
- Authors: Quinn, Anthony , Jelinek, Herbert , Stranieri, Andrew , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2008, Sydney, New South Wales : 15th-18th December 2008 p. 291-296
- Full Text:
- Description: This paper investigates the application of a new data mining algorithm called Automated Weighted Sum, (AWSum), to diabetes screening data to explore its use in providing researchers with new insight into the disease and secondarily to explore the potential the algorithm has for the generation of prognostic models for clinical use. There are many data mining classifiers that produce high levels of predictive accuracy but their application to health research and clinical applications is limited because they are complex, produce results that are difficult to interpret and are difficult to integrate with current knowledge and practises. This is because most focus on accuracy at the expense of informing the user as to the influences that lead to their classification results. By providing this information on influences a researcher can be pointed to new potentially interesting avenues for investigation. AWSum measures influence by calculating a weight for each feature value that represents its influence on a class value relative to other class values. The results produced, although on limited data, indicated the approach has potential uses for research and has some characteristics that may be useful in the future development of prognostic models.
- Description: 2003006660
- Authors: Quinn, Anthony , Jelinek, Herbert , Stranieri, Andrew , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2008, Sydney, New South Wales : 15th-18th December 2008 p. 291-296
- Full Text:
- Description: This paper investigates the application of a new data mining algorithm called Automated Weighted Sum, (AWSum), to diabetes screening data to explore its use in providing researchers with new insight into the disease and secondarily to explore the potential the algorithm has for the generation of prognostic models for clinical use. There are many data mining classifiers that produce high levels of predictive accuracy but their application to health research and clinical applications is limited because they are complex, produce results that are difficult to interpret and are difficult to integrate with current knowledge and practises. This is because most focus on accuracy at the expense of informing the user as to the influences that lead to their classification results. By providing this information on influences a researcher can be pointed to new potentially interesting avenues for investigation. AWSum measures influence by calculating a weight for each feature value that represents its influence on a class value relative to other class values. The results produced, although on limited data, indicated the approach has potential uses for research and has some characteristics that may be useful in the future development of prognostic models.
- Description: 2003006660
Dramatic level analysis for interactive narrative
- Macfadyen, Alyx, Stranieri, Andrew, Yearwood, John
- Authors: Macfadyen, Alyx , Stranieri, Andrew , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at NILE 2008: 5th International Conference on Narrative and Interactive Learning Environments, Edinburgh, Scotland : 6th-8th August 2008 p. 17-22
- Full Text:
- Description: In interactive 3D narratives, a user’s narrative emerges through interactions with the system and embodied agencies (characters) mediated through the 3D environment. We present a methodology that identifies and measures four factors in interactive narrative where agency is present. We describe a technique for measuring drama, agency and engagement and compare the centrality of a designed interactive narrative with the emergent participatory narrative. This methodology has application as an analytic device for any interactive narrative where agency is fundamental. The adoption of the FrameNet semantic resource and the interpretation of interaction in narrative, situate this work in the domain of 3D interactive narratives, mixed and augmented realities and polymorphic narratives that cross forms of media.
- Description: 2003006540
- Authors: Macfadyen, Alyx , Stranieri, Andrew , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at NILE 2008: 5th International Conference on Narrative and Interactive Learning Environments, Edinburgh, Scotland : 6th-8th August 2008 p. 17-22
- Full Text:
- Description: In interactive 3D narratives, a user’s narrative emerges through interactions with the system and embodied agencies (characters) mediated through the 3D environment. We present a methodology that identifies and measures four factors in interactive narrative where agency is present. We describe a technique for measuring drama, agency and engagement and compare the centrality of a designed interactive narrative with the emergent participatory narrative. This methodology has application as an analytic device for any interactive narrative where agency is fundamental. The adoption of the FrameNet semantic resource and the interpretation of interaction in narrative, situate this work in the domain of 3D interactive narratives, mixed and augmented realities and polymorphic narratives that cross forms of media.
- Description: 2003006540
New traceability codes and identification algorithm for tracing pirates
- Wu, Xinwen, Watters, Paul, Yearwood, John
- Authors: Wu, Xinwen , Watters, Paul , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008, Sydney, New South Wales : 10th-12th December 2008 p. 719-724
- Full Text:
- Description: With the increasing popularity of digital products, there is a strong desire to protect the rights of owners against illegal redistribution. Traditional encryption schemes alone do not provide a comprehensive solution to digital rights management, since they do not prevent users who are authorized to use a digital product for their own use from transferring the cleartext content to unauthorized users. However, traceability schemes can be used to trace the illegitimate redistributors effectively. Two types of traceability schemes have been proposed in the literature - traceability codes (TA codes), and codes with the identifiable parent properties (IPP codes). TA codes are special IPP codes, and many TA codes implement an efficient identification algorithm which can determine at least one redistributor. However, many IPP codes are not TA codes, in which case, no efficient identification algorithms are available. In this paper, we generalize the definition of TA codes to derive a new family of traceability codes that is much larger than the family of traditional TA codes. By using existing decoding algorithms with respect to the Lee distance, an efficient identification algorithm is proposed for generalized TA codes. Furthermore, we show that the identification algorithm of generalized TA codes can find more redistributors than those of traditional TA codes.
- Description: 2003006288
- Authors: Wu, Xinwen , Watters, Paul , Yearwood, John
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008, Sydney, New South Wales : 10th-12th December 2008 p. 719-724
- Full Text:
- Description: With the increasing popularity of digital products, there is a strong desire to protect the rights of owners against illegal redistribution. Traditional encryption schemes alone do not provide a comprehensive solution to digital rights management, since they do not prevent users who are authorized to use a digital product for their own use from transferring the cleartext content to unauthorized users. However, traceability schemes can be used to trace the illegitimate redistributors effectively. Two types of traceability schemes have been proposed in the literature - traceability codes (TA codes), and codes with the identifiable parent properties (IPP codes). TA codes are special IPP codes, and many TA codes implement an efficient identification algorithm which can determine at least one redistributor. However, many IPP codes are not TA codes, in which case, no efficient identification algorithms are available. In this paper, we generalize the definition of TA codes to derive a new family of traceability codes that is much larger than the family of traditional TA codes. By using existing decoding algorithms with respect to the Lee distance, an efficient identification algorithm is proposed for generalized TA codes. Furthermore, we show that the identification algorithm of generalized TA codes can find more redistributors than those of traditional TA codes.
- Description: 2003006288
Toward computer mediated elicitation of a community's core values for sustainable decision making
- Stranieri, Andrew, Yearwood, John, Afshar, Faye
- Authors: Stranieri, Andrew , Yearwood, John , Afshar, Faye
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 11th Annual Australian Conference on Knowledge Management and Intelligent Decision Support ACKMIDS 2008 p. 1-14
- Full Text:
- Reviewed:
- Authors: Stranieri, Andrew , Yearwood, John , Afshar, Faye
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 11th Annual Australian Conference on Knowledge Management and Intelligent Decision Support ACKMIDS 2008 p. 1-14
- Full Text:
- Reviewed:
Weblogs for market research : Improving opinion detection using system fusion
- Osman, Deanna, Yearwood, John, Vamplew, Peter
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Service Systems and Service Management, 2008, Melbourne, Victoria : 30th June - 2nd July 2008 p. 1-6
- Full Text:
- Description: Searching for opinions on a specific product or service within blogs is a new frontier for market researchers. This research investigates the use of system fusion methods to improve mean average precision (MAP) results achieved by the Text REtrieval Conference (TREC) Blog06 participants and reports the improved MAP results. It is hypothesized that diversity of the inputs is vital to maximising the MAP improvements. This is shown in the improvement in MAP values achieved by some of the participantpsilas ranked lists. The growth in the number of blog authors who write valuable opinions about their life experiences has led to an unsolicited resource of opinions on products, politics and services. In 2006, TREC collected blogs and set a task of detecting opinions on given topics to their participants, reporting the results using MAP.
- Description: 2003007757
- Authors: Osman, Deanna , Yearwood, John , Vamplew, Peter
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Service Systems and Service Management, 2008, Melbourne, Victoria : 30th June - 2nd July 2008 p. 1-6
- Full Text:
- Description: Searching for opinions on a specific product or service within blogs is a new frontier for market researchers. This research investigates the use of system fusion methods to improve mean average precision (MAP) results achieved by the Text REtrieval Conference (TREC) Blog06 participants and reports the improved MAP results. It is hypothesized that diversity of the inputs is vital to maximising the MAP improvements. This is shown in the improvement in MAP values achieved by some of the participantpsilas ranked lists. The growth in the number of blog authors who write valuable opinions about their life experiences has led to an unsolicited resource of opinions on products, politics and services. In 2006, TREC collected blogs and set a task of detecting opinions on given topics to their participants, reporting the results using MAP.
- Description: 2003007757
A fully automated CAD system using multi-category feature selection with restricted recombination
- Ghosh, Ranadhir, Ghosh, Moumita, Yearwood, John, Mukherjee, Subhasis
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John , Mukherjee, Subhasis
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 106-111
- Full Text:
- Description: In pattern recognition problems features plays an important role for classification results. It is very important which features are used and how many features are used for the classification process. Most of the real life classification problem uses different category of features. It is desirable to find the optimal combination of features that improves the performance of the classifier. There exists different selection framework that selects the features. Mostly do not incorporate the impact of one category of features on another. Even if they incorporate, they produce conflict between the categories. In this paper we proposed a restricted crossover selection framework which incorporate the impact of different categories on each other, as well as it restricts the search within the category which searching in the global region of the search space. The results obtained by the proposed framework are promising.
- Description: 2003005429
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John , Mukherjee, Subhasis
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 106-111
- Full Text:
- Description: In pattern recognition problems features plays an important role for classification results. It is very important which features are used and how many features are used for the classification process. Most of the real life classification problem uses different category of features. It is desirable to find the optimal combination of features that improves the performance of the classifier. There exists different selection framework that selects the features. Mostly do not incorporate the impact of one category of features on another. Even if they incorporate, they produce conflict between the categories. In this paper we proposed a restricted crossover selection framework which incorporate the impact of different categories on each other, as well as it restricts the search within the category which searching in the global region of the search space. The results obtained by the proposed framework are promising.
- Description: 2003005429