Performance evaluation of multi-tier ensemble classifiers for phishing websites
- Authors: Abawajy, Jemal , Beliakov, Gleb , Kelarev, Andrei , Yearwood, John
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the toptier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi-tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi-level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer.
A data mining application of the incidence semirings
- Authors: Abawajy, Jemal , Kelarev, Andrei , Yearwood, John , Turville, Christopher
- Date: 2013
- Type: Text , Journal article
- Relation: Houston Journal of Mathematics Vol. 39, no. 4 (2013), p. 1083-1093
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: This paper is devoted to a combinatorial problem for incidence semirings, which can be viewed as sets of polynomials over graphs, where the edges are the unknowns and the coefficients are taken from a semiring. The construction of incidence rings is very well known and has many useful applications. The present article is devoted to a novel application of the more general incidence semirings. Recent research on data mining has motivated the investigation of the sets of centroids that have largest weights in semiring constructions. These sets are valuable for the design of centroid-based classification systems, or classifiers, as well as for the design of multiple classifiers combining several individual classifiers. Our article gives a complete description of all sets of centroids with the largest weight in incidence semirings.
A Tool for Assisting Group Decision-Making for Consensus Outcomes in Organizations
- Authors: Afshar, Faye , Yearwood, John , Stranieri, Andrew
- Date: 2006
- Type: Text , Book chapter
- Relation: E-Supply Chain Technologies and Management p. 316-343
- Full Text: false
- Reviewed:
Performance evaluation of multivariate non-normal process using metaheuristic approaches
- Authors: Ahmad, S. , Abdollahian, Mali , Bhatti, M.I. , Huda, Shamsul , Yearwood, John
- Date: 2014
- Type: Text , Journal article
- Relation: Journal of Applied Statistical Science Vol. 20, no. 3 (2014), p. 299-315
- Full Text: false
- Reviewed:
- Description: Multivariate process performance indices generally rely on the assumption that the process follow normal distribution but in practice its non-normal with correlated characteristics patterns. This paper proposes two metaheuristic-based approaches to fit Burr distribution to such data; a single candidate model based approach using a Simulated Annealing (SA) technique and a population based approach using a constraint-based Evolutionary Alogorithn (EA). The fitted Burr distribution is then used to estimate the proportion of Non-conforming (PNC) which is then used to fit an appropiate Burr distribution to individual Geometric distance variables. Empirical performance of the proposed methods have been evaluated on real industrial data set using PNC criterion. Experimental results demonstrate that the new approach perform well than the existing.
An argumentation-based multi-agent system for e-tourism dialogue
- Authors: Avery, John , Yearwood, John , Stranieri, Andrew
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at Hybrid Information Systems, First International Workshop on Hybrid Intelligent Systems, Adelaide : 11th - 12th December, 2003 p. 497-512
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003000112
Managing ontology evolution : Capturing the semantics of change
- Authors: Avery, John , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at the Tenth Australian World Wide Web Conference, Gold Coast, Queensland : 4th July, 2004
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003000844
DOWL : A dynamic ontology language
- Authors: Avery, John , Yearwood, John
- Date: 2003
- Type: Text , Conference paper
- Relation: Paper presented at IADIS International Conference WWW/Internet 2003, Algarve, Portugal : 5th August, 2003
- Full Text:
- Reviewed:
- Description: Abstract: Ontologies in a web setting, particularly those used in a group context (such as a virtual community), need to be flexible and open to changes that reflect the evolution of knowledge. OWL the ontology language of the semantic web provides very little for facilitating the description of evolutionary changes in an ontology. We propose a dynamic web ontology language (dOWL), an extension to OWL, which consists of a set of elements that can be used to model these evolutionary changes in an ontology.
- Description: E1
- Description: 2003000552
A formal description of ontology change in OWL
- Authors: Avery, John , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at the Third International Conference on Information Technology and Applications, ICITA 2005, Sydney : 4th - 7th July, 2005
- Full Text:
- Reviewed:
- Description: There are three main activities involved in managing ontology change. Firstly we need to identify changes, secondly describe these identified changes, and finally describe and handle the ramifications of the changes. In previous work we have presented a language (DOWL) for describing ontology change and in this paper we demonstrate how changes described in this language can be represented in the RDF abstract syntax which enables us to describe the ramifications of a change in a formal manner. This formalism can provide the basis for an automated ontology change management system.
- Description: E1
- Description: 2003001448
New algorithms for multi-class cancer diagnosis using tumor gene expression signatures
- Authors: Bagirov, Adil , Ferguson, Brent , Ivkovic, Sasha , Saunders, Gary , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: Bioinformatics Vol. 19, no. 14 (2003), p. 1800-1807
- Full Text:
- Reviewed:
- Description: Motivation: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. Results: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set.
- Description: C1
- Description: 2003000439
A global optimization approach to classification
- Authors: Bagirov, Adil , Rubinov, Alex , Yearwood, John
- Date: 2002
- Type: Text , Journal article
- Relation: Optimization and Engineering Vol. 9, no. 7 (2002), p. 129-155
- Full Text: false
- Reviewed:
- Description: In this paper is presented an hybrid algorithm for finding the absolute extreme point of a multimodal scalar function of many variables. The algorithm is suitable when the objective function is expensive to compute, the computation can be affected by noise and/or partial derivatives cannot be calculated. The method used is a genetic modification of a previous algorithm based on the Prices method. All information about behavior of objective function collected on previous iterates are used to chose new evaluation points. The genetic part of the algorithm is very effective to escape from local attractors of the algorithm and assures convergence in probability to the global optimum. The proposed algorithm has been tested on a large set of multimodal test problems outperforming both the modified Prices algorithm and classical genetic approach.
- Description: C1
- Description: 2003000061
A global optimisation approach to classification in medical diagnosis and prognosis
- Authors: Bagirov, Adil , Rubinov, Alex , Yearwood, John , Stranieri, Andrew
- Date: 2001
- Type: Text , Conference paper
- Relation: Paper presented at 34th Hawaii International Conference on System Sciences, HICSS-34, Maui, Hawaii, USA : 3rd-6th January 2001
- Full Text:
- Description: In this paper global optimisation-based techniques are studied in order to increase the accuracy of medical diagnosis and prognosis with FNA image data from the Wisconsin Diagnostic and Prognostic Breast Cancer databases. First we discuss the problem of determining the most informative features for the classification of cancerous cases in the databases under consideration. Then we apply a technique based on convex and global optimisation to breast cancer diagnosis. It allows the classification of benign cases and malignant ones and the subsequent diagnosis of patients with very high accuracy. The third application of this technique is a method that calculates centres of clusters to predict when breast cancer is likely to recur in patients for which cancer has been removed. The technique achieves higher accuracy with these databases than reported elsewhere in the literature.
- Description: 2003003950
Optimization of feed forward MLPs using the discrete gradient method
- Authors: Bagirov, Adil , Yearwood, John , Ghosh, Ranadhir
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at CIMCA 2004: International Conference on Computational Intelligence for Modelling, Control & Automation, Gold Coast, Queensland : 12th July, 2004
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003000845
An algorithm for clustering based on non-smooth optimization techniques
- Authors: Bagirov, Adil , Rubinov, Alex , Sukhorukova, Nadezda , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: International Transactions in Operational Research Vol. 10, no. 6 (2003), p. 611-617
- Full Text: false
- Reviewed:
- Description: The problem of cluster analysis is formulated as a problem of non-smooth, non-convex optimization, and an algorithm for solving the cluster analysis problem based on non-smooth optimization techniques is developed. We discuss applications of this algorithm in large databases. Results of numerical experiments are presented to demonstrate the effectiveness of this algorithm.
- Description: C1
- Description: 2003000422
A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems
- Authors: Bagirov, Adil , Yearwood, John
- Date: 2006
- Type: Text , Journal article
- Relation: European Journal of Operational Research Vol. 170, no. 2 (2006), p. 578-596
- Full Text: false
- Reviewed:
- Description: The minimum sum-of-squares clustering problem is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the former problem based on nonsmooth optimization techniques is developed. The issue of applying this algorithm to large data sets is discussed. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithm. © 2004 Elsevier B.V. All rights reserved.
- Description: C1
- Description: 2003001520
Using global optimization to improve classification for medical diagnosis and prognosis
- Authors: Bagirov, Adil , Rubinov, Alex , Yearwood, John
- Date: 2001
- Type: Text , Journal article
- Relation: Topics in health information management Vol. 22, no. 1 (2001), p. 65-74
- Full Text: false
- Description: Global optimization-based techniques are studied in order to increase the accuracy of medical diagnosis and prognosis with data from various databases. First, we discuss feature selection, the problem of determining the most informative features for classification in the databases under consideration. Then, we apply a technique based on convex and global optimization for classification in these databases. The third application of this technique is a method that calculates centers of clusters to predict when breast cancer is likely to recur in patients for which cancer has been removed. The technique achieves high accuracy with these databases. Better classifiers will lead to improved assistance in making medical diagnostic and prognostic decisions.
- Description: 2003003662
Unsupervised and supervised data classification via nonsmooth and global optimisation
- Authors: Bagirov, Adil , Rubinov, Alex , Sukhorukova, Nadezda , Yearwood, John
- Date: 2003
- Type: Text , Journal article
- Relation: Top Vol. 11, no. 1 (2003), p. 1-92
- Full Text:
- Reviewed:
- Description: We examine various methods for data clustering and data classification that are based on the minimization of the so-called cluster function and its modications. These functions are nonsmooth and nonconvex. We use Discrete Gradient methods for their local minimization. We consider also a combination of this method with the cutting angle method for global minimization. We present and discuss results of numerical experiments.
- Description: C1
- Description: 2003000421
A novel approach to optimal pump scheduling in water distribution systems
- Authors: Bagirov, Adil , Barton, Andrew , Mala-Jetmarova, Helena , Al Nuaimat, Alia , Ahmed, S. T. , Sultanova, Nargiz , Yearwood, John
- Date: 2012
- Type: Text , Conference paper
- Relation: 14th Water Distribution Systems Analysis Conference 2012, WDSA 2012 Vol. 1; Adelaide, Australia; 24th-27th September; p. 618-631
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: The operation of a water distribution system is a complex task which involves scheduling of pumps, regulating water levels of storages, and providing satisfactory water quality to customers at required flow and pressure. Pump scheduling is one of the most important tasks of the operation of a water distribution system as it represents the major part of its operating costs. In this paper, a novel approach for modeling of pump scheduling to minimize energy consumption by pumps is introduced which uses pump's start/end run times as continuous variables. This is different from other approaches where binary integer variables for each hour are typically used which is considered very impractical from an operational perspective. The problem is formulated as a nonlinear programming problem and a new algorithm is developed for its solution. This algorithm is based on the combination of the grid search with the Hooke-Jeeves pattern search method. The performance of the algorithm is evaluated using literature test problems applying the hydraulic simulation model EPANet.
- Description: E1
An algorithm for minimization of pumping costs in water distribution systems using a novel approach to pump scheduling
- Authors: Bagirov, Adil , Barton, Andrew , Mala-Jetmarova, Helena , Al Nuaimat, Alia , Ahmed, S. T. , Sultanova, Nargiz , Yearwood, John
- Date: 2013
- Type: Text , Journal article
- Relation: Mathematical and Computer Modelling Vol. 57, no. 3-4 (2013), p. 873-886
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: The operation of a water distribution system is a complex task which involves scheduling of pumps, regulating water levels of storages, and providing satisfactory water quality to customers at required flow and pressure. Pump scheduling is one of the most important tasks of the operation of a water distribution system as it represents the major part of its operating costs. In this paper, a novel approach for modeling of explicit pump scheduling to minimize energy consumption by pumps is introduced which uses the pump start/end run times as continuous variables, and binary integer variables to describe the pump status at the beginning of the scheduling period. This is different from other approaches where binary integer variables for each hour are typically used, which is considered very impractical from an operational perspective. The problem is formulated as a mixed integer nonlinear programming problem, and a new algorithm is developed for its solution. This algorithm is based on the combination of the grid search with the Hooke-Jeeves pattern search method. The performance of the algorithm is evaluated using literature test problems applying the hydraulic simulation model EPANet. © 2012 Elsevier Ltd.
- Description: 2003010583
Derivative-free optimization and neural networks for robust regression
- Authors: Beliakov, Gleb , Kelarev, Andrei , Yearwood, John
- Date: 2012
- Type: Text , Journal article
- Relation: Optimization Vol. 61, no. 12 (2012), p. 1467-1490
- Full Text:
- Reviewed:
- Description: Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares (LTS) criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks (ANNs) to contaminated data using LTS criterion. We introduce a penalized LTS criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression. © 2012 Copyright Taylor and Francis Group, LLC.
An application of novel clustering technique for information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2011
- Type: Text , Conference paper
- Relation: Applications and Techniques in Information Security Workshop p. 5-11
- Full Text: false
- Reviewed:
- Description: This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.
- Description: 2003009195