Analytics service oriented architecture for enterprise information systems
- Authors: Sun, Zhaohao , Strang, Kenneth , Yearwood, John
- Date: 2014
- Type: Text , Conference paper
- Relation: 16th International Conference on Information Integration and Web-based Applications & Services
- Full Text: false
- Reviewed:
- Description: Big data analytics and business analytics are disruptive technology and innovative solution for enterprise development. However, what is the relationship between big data analytics and business analytics? What is the relationship between business analytics and enterprise information systems (EIS)? How can business analytics enhance the development of EIS? These are still big issues for EIS development. This paper addresses these three issues by proposing an ontology of business analytics, presenting an analytics service-oriented architecture (ASOA) and applying ASOA to EIS, where our surveyed data analysis showed that the proposed ASOA can enhance to develop EIS. This paper also discusses the interrelationship between data analysis and business analytics, and between data analytics and big data analytics. The proposed approaches in this paper will facilitate research and development of EIS, business analytics, big data analytics, and business intelligence.
A technique for ranking friendship closeness in social networking services
- Authors: Sun, Zhaohao , Yearwood, John , Firmin, Sally
- Date: 2013
- Type: Text , Conference paper
- Relation: 24th Australasian Conference on Information Systems (ACIS) p. 1-9
- Full Text:
- Reviewed:
- Description: The concept of friend and friendship are critical to both theoretical and empirical studies of social relations, social media and social networks. Measuring the closeness among friends is a big issue for developing online social networking services (SNS) such as Facebook. This paper will address this issue by proposing a technique for ranking friendship closeness in SNS. The technique consists of an algorithm for ranking need-driven friendship closeness and an algorithm for behaviour-based friendship closeness in online social networking sites. The former is based on Maslow’s hierarchy of needs, while the latter is based on behaviours of users on Facebook and TOPSIS. Examples provided illustrate the viability of the proposed algorithms. The research in this paper shows that ranking friendship closeness will facilitate understanding of needs and behaviours of friends and of friendships in SNS. The proposed approach will facilitate research and development of social media, social commerce, social networks, and SN
A novel approach to optimal pump scheduling in water distribution systems
- Authors: Bagirov, Adil , Barton, Andrew , Mala-Jetmarova, Helena , Al Nuaimat, Alia , Ahmed, S. T. , Sultanova, Nargiz , Yearwood, John
- Date: 2012
- Type: Text , Conference paper
- Relation: 14th Water Distribution Systems Analysis Conference 2012, WDSA 2012 Vol. 1; Adelaide, Australia; 24th-27th September; p. 618-631
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: The operation of a water distribution system is a complex task which involves scheduling of pumps, regulating water levels of storages, and providing satisfactory water quality to customers at required flow and pressure. Pump scheduling is one of the most important tasks of the operation of a water distribution system as it represents the major part of its operating costs. In this paper, a novel approach for modeling of pump scheduling to minimize energy consumption by pumps is introduced which uses pump's start/end run times as continuous variables. This is different from other approaches where binary integer variables for each hour are typically used which is considered very impractical from an operational perspective. The problem is formulated as a nonlinear programming problem and a new algorithm is developed for its solution. This algorithm is based on the combination of the grid search with the Hooke-Jeeves pattern search method. The performance of the algorithm is evaluated using literature test problems applying the hydraulic simulation model EPANet.
- Description: E1
Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest
- Authors: Kelarev, Andrei , Stranieri, Andrew , Abawajy, Jemal , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Conference paper
- Relation: Tenth Australasian Data Mining Conference Vol. 134, p. 93-101
- Full Text: false
- Reviewed:
- Description: This paper is devoted to empirical investigation of novel multi-level ensemble meta classifiers for the detection and monitoring of progression of cardiac autonomic neuropathy, CAN, in diabetes patients. Our experiments relied on an extensive database and concentrated on ensembles of ensembles, or multi-level meta classifiers, for the classification of cardiac autonomic neuropathy progression. First, we carried out a thorough investigation comparing the performance of various base classifiers for several known sets of the most essential features in this database and determined that Random Forest significantly and consistently outperforms all other base classifiers in this new application. Second, we used feature selection and ranking implemented in Random Forest. It was able to identify a new set of features, which has turned out better than all other sets considered for this large and well-known database previously. Random Forest remained the very best classifier for the new set of features too. Third, we investigated meta classifiers and new multi-level meta classifiers based on Random Forest, which have improved its performance. The results obtained show that novel multi-level meta classifiers achieved further improvement and obtained new outcomes that are significantly better compared with the outcomes published in the literature previously for cardiac autonomic neuropathy.
A reasoning framework for decision making in water allocation: a tree for water
- Authors: Graymore, Michelle , Mays, Heather , Stranieri, Andrew , Lehmann, La Vergne , McRae-Williams, Pamela , Thoms, Gavin , Yearwood, John
- Date: 2011
- Type: Text , Conference paper
- Relation: Paper presented at International Conference on Integrated Water Management 2011
- Full Text: false
- Reviewed:
An application of novel clustering technique for information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2011
- Type: Text , Conference paper
- Relation: Applications and Techniques in Information Security Workshop p. 5-11
- Full Text: false
- Reviewed:
- Description: This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.
- Description: 2003009195
Detection of child exploiting chatsfrom a mixed chat dataset as a text classification task
- Authors: Yearwood, John , Miah, Md Waliur Rahman , Kulkarni, Siddhivinayak
- Date: 2011
- Type: Text , Conference paper
- Relation: Proceedings of Australasian Language Technology Association Workshop
- Full Text: false
- Reviewed:
- Description: There is a rapidly growing body of work in the use of Embodied Conversational Agents (ECA) to convey complex contextual relationships through verbal and non-verbal communication, in domains ranging from military C2 to e-learning. In these applications the subject matter expert in often naive to the technical requirements of ECAs. ENGAGE (the Extensible Natural Gesture Animation Generation Engine) is desgined to automatically generate appropriate and 'realistic' animation for ECAs based on the content provided to them. It employs syntactic analysis of the surface text and uses predefined behaviours for the ECA. We discuss the design of this system, its current applications and plans for its future development.
A new supervised term ranking method for text categorization
- Authors: Mammadov, Musa , Yearwood, John , Zhao, Lei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 102-111
- Full Text:
- Reviewed:
- Description: In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ2 statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ2 statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better. © 2010 Springer-Verlag.
Consensus clustering and supervised classification for profiling phishing emails in internet commerce security
- Authors: Dazeley, Richard , Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 11th International Workshop on Knowledge Management and Acquisition for Smart Systems and Services, PKAW 2010 Vol. 6232 LNAI, p. 235-246
- Full Text:
- Reviewed:
- Description: This article investigates internet commerce security applications of a novel combined method, which uses unsupervised consensus clustering algorithms in combination with supervised classification methods. First, a variety of independent clustering algorithms are applied to a randomized sample of data. Second, several consensus functions and sophisticated algorithms are used to combine these independent clusterings into one final consensus clustering. Third, the consensus clustering of the randomized sample is used as a training set to train several fast supervised classification algorithms. Finally, these fast classification algorithms are used to classify the whole large data set. One of the advantages of this approach is in its ability to facilitate the inclusion of contributions from domain experts in order to adjust the training set created by consensus clustering. We apply this approach to profiling phishing emails selected from a very large data set supplied by the industry partners of the Centre for Informatics and Applied Optimization. Our experiments compare the performance of several classification algorithms incorporated in this scheme. © 2010 Springer-Verlag Berlin Heidelberg.
Exploring novel features and decision rules to identify cardiovascular autonomic neuropathy using a hybrid of wrapper-filter based feature selection
- Authors: Huda, Shamsul , Jelinek, Herbert , Ray, Biplob , Stranieri, Andrew , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at the 2010 6th International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2010 p. 297-302
- Full Text:
- Reviewed:
- Description: Cardiovascular autonomic neuropathy (CAN) is one of the important causes of mortality among diabetes patients. Statistics shows that more than 22% of people with type 2 diabetes mellitus suffer from CAN and which in turn leads to cardiovascular disease (heart attack, stroke). Therefore early detection of CAN could reduce the mortality. Traditional method for detection of CAN uses Ewing's algorithm where five noninvasive cardiovascular tests are used. Often for clinician, it is difficult to collect data from for the Ewing Battery patients due to onerous test conditions. In this paper, we propose a hybrid of wrapper-filter approach to find novel features from patients' ECG records and then generate decision rules for the new features for easier detection of CAN. In the proposed feature selection, a hybrid of filter (Maximum Relevance, MR) and wrapper (Artificial Neural Net Input Gain Measurement Approximation ANNIGMA) approaches (MR-ANNIGMA) would be used. The combined heuristics in the hybrid MRANNIGMA takes the advantages of the complementary properties of the both filter and wrapper heuristics and can find significant features. The selected features set are used to generate a new set of rules for detection of CAN. Experiments on real patient records shows that proposed method finds a smaller set of features for detection of CAN than traditional method which are clinically significant and could lead to an easier way to diagnose CAN. © 2010 IEEE.
From convex to nonconvex: A loss function analysis for binary classification
- Authors: Zhao, Lei , Mammadov, Musa , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288
- Full Text:
- Reviewed:
- Description: Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, ø-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. © 2010 IEEE.
Learning parse-free event-based features for textual entailment recognition
- Authors: Ofoghi, Bahadorreza , Yearwood, John
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 184-193
- Full Text: false
- Reviewed:
- Description: We propose new parse-free event-based features to be used in conjunction with lexical, syntactic, and semantic features of texts and hypotheses for Machine Learning-based Recognizing Textual Entailment. Our new similarity features are extracted without using shallow semantic parsers, but still lexical and compositional semantics are not left out. Our experimental results demonstrate that these features can improve the effectiveness of the identification of entailment and no-entailment relationships. © 2010 Springer-Verlag.
Profiling phishing emails based on hyperlink information
- Authors: Yearwood, John , Mammadov, Musa , Banerjee, Arunava
- Date: 2010
- Type: Text , Conference paper
- Relation: Paper presented at 2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, Odense : 9th-11th August 2010 p. 120-127
- Full Text:
- Description: In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e.DNS) information on hyperlinks as profile classes. Further, we generate profiles based on classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information. © 2010 Crown Copyright.
Smokers' characteristics and cluster based quitting rule discovery model for enhancement of government's tobacco control systems
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 14th Pacific Asia Conference on Information Systems (PACIS 2010)
- Full Text:
- Reviewed:
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quitting intentions is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
- Description: Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smoker's quiiting intentios is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics smokers' cluster and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. "From abstract"
Understanding victims of identity theft: A grounded theory approach
- Authors: Turville, Kylie , Firmin, Sally , Yearwood, John , Miller, Charlynn
- Date: 2010
- Type: Text , Conference paper
- Relation: 5th International Conference on Qualitative Research in IT and IT in Qualitative Research, QualIT 2010
- Full Text:
- Reviewed:
- Description: Being a victim of identity theft can be a devastating and life-changing event. Once the victim discovers the misuse they need to begin the process of recovery. For the "lucky" victims this may take only a couple of phone calls and a small amount of time; however, some victims may experience difficulties for many year. In order to recover, victims of crime require support and assistance; however, within Australia this support is sadly lacking. In order to identify the issues currently faced by victims of identity theft as they work through the recovery process, a Grounded Theory methodology was identified as most appropriate. This paper provides a brief overview of the history of the research project; a brief introduction of grounded theory with a focus on preconceived ideas and their implications; and a description of the research project currently being undertaken. A discussion of some issues experienced when using grounded theory within an IT department with very little experience of qualitative research will be provided, along with some preliminary results.
- Description: E1
A classification algorithm that derives weighted sum scores for insight into disease
- Authors: Quinn, Anthony , Stranieri, Andrew , Yearwood, John , Hafen, Gaudenz
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Third Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2009), Wellington, New Zealand : Vol. 97, p. 13-17
- Full Text:
- Description: Data mining is often performed with datasets associated with diseases in order to increase insights that can ultimately lead to improved prevention or treatment. Classification algorithms can achieve high levels of predictive accuracy but have limited application for facilitating the insight that leads to deeper understanding of aspects of the disease. This is because the representation of knowledge that arises from classification algorithms is too opaque, too complex or too sparse to facilitate insight. Clustering, association and visualisation approaches enable greater scope for clinicians to be engaged in a way that leads to insight, however predictive accuracy is compromised or non-existent. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classification algorithm that provides accuracy comparable to other techniques whilst providing some insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. Clinicians are very familiar with weighted sum scoring scales so the internal representation is intuitive and easily understood. This paper presents results from the use of the AWSum approach with data from patients suffering from Cystic Fibrosis.
Applying clustering and ensemble clustering approaches to phishing profiling
- Authors: Webb, Dean , Yearwood, John , Vamplew, Peter , Ma, Liping , Ofoghi, Bahadorreza , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at Eighth Australasian Data Mining Conference, AusDM 2009, University of Melbourne, Melbourne, Victoria : 1st–4th December 2009
- Full Text:
- Description: 2003007911
Can shallow semantic class information help answer passage retrieval?
- Authors: Ofoghi, Bahadorreza , Yearwood, John
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 22nd Australasian Joint Conference, AI 2009: Advances in Artificial Intelligence, Melbourne, Victoria : 1st-4th December 2009 p. 587–596
- Full Text: false
- Description: In this paper, the effect of using semantic class overlap evidence in enhancing the passage retrieval effectiveness of question answering (QA) systems is tested. The semantic class overlap between questions and passages is measured by evoking FrameNet semantic frames using a shallow term-lookup procedure. We use the semantic class overlap evidence in two ways: i) fusing passage scores obtained from a baseline retrieval system with those obtained from the analysis of semantic class overlap (fusion-based approach), and ii) revising the passage scoring function of the baseline system by incorporating semantic class overlap evidence (revision-based approach). Our experiments with the TREC 2004 and 2006 datasets show that the revision-based approach significantly improves the passage retrieval effectiveness of the baseline system.
- Description: 2003007254
Establishing phishing provenance using orthographic features
- Authors: Liping, Ma , Yearwood, John , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009
- Full Text:
- Description: After phishing message detection, determining the provenance of phishing messages and Websites is the second step to tracing cybercriminals. In this paper, we present a novel method to cluster phishing emails automatically using orthographic features. In particular, we develop an algorithm to cluster documents and remove redundant features at the same time. After collecting all the possible features based on observation, we adapt the modified global k-mean method repeatedly, and generate the objective function values over a range of tolerance values across different subsets of features. Finally, we identify the appropriate clusters based on studying the distribution of the objective function values. Experimental evaluation of a large number of computations demonstrates that our clustering and feature selection techniques are highly effective and achieve reliable results.
- Description: 2003007842
Experimental investigation of three machine learning algorithms for ITS dataset
- Authors: Yearwood, John , Kang, Byeongho , Kelarev, Andrei
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at First International Conference, FGIT 2009, Future Generation Information Technology, Jeju Island, Korea : 10th-12th December 2009 Vol. 5899, p. 308-316
- Full Text:
- Description: The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
- Description: 2003007844