Optimization based clustering algorithms for authorship analysis of phishing emails
- Authors: Seifollahi, Sattar , Bagirov, Adil , Layton, Robert , Gondal, Iqbal
- Date: 2017
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 46, no. 2 (2017), p. 411-425
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Phishing has given attackers power to masquerade as legitimate users of organizations, such as banks, to scam money and private information from victims. Phishing is so widespread that combating the phishing attacks could overwhelm the victim organization. It is important to group the phishing attacks to formulate effective defence mechanism. In this paper, we use clustering methods to analyze and characterize phishing emails and perform their relative attribution. Emails are first tokenized to a bag-of-word space and, then, transformed to a numeric vector space using frequencies of words in documents. Wordnet vocabulary is used to take effects of similar words into account and to reduce sparsity. The word similarity measure is combined with the term frequencies to introduce a novel text transformation into numeric features. To improve the accuracy, we apply inverse document frequency weighting, which gives higher weights to features used by fewer authors. The k-means and recently introduced three optimization based algorithms: MS-MGKM, INCA and DCClust are applied for clustering purposes. The optimization based algorithms indicate the existence of well separated clusters in the phishing emails dataset. © 2017, Springer Science+Business Media New York.
A heuristic algorithm for solving the minimum sum-of-squares clustering problems
- Authors: Ordin, Burak , Bagirov, Adil
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Global Optimization Vol. 61, no. 2 (2015), p. 341-361
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Clustering is an important task in data mining. It can be formulated as a global optimization problem which is challenging for existing global optimization techniques even in medium size data sets. Various heuristics were developed to solve the clustering problem. The global k-means and modified global k-means are among most efficient heuristics for solving the minimum sum-of-squares clustering problem. However, these algorithms are not always accurate in finding global or near global solutions to the clustering problem. In this paper, we introduce a new algorithm to improve the accuracy of the modified global k-means algorithm in finding global solutions. We use an auxiliary cluster problem to generate a set of initial points and apply the k-means algorithm starting from these points to find the global solution to the clustering problems. Numerical results on 16 real-world data sets clearly demonstrate the superiority of the proposed algorithm over the global and modified global k-means algorithms in finding global solutions to clustering problems.
Nonsmooth nonconvex optimization approach to clusterwise linear regression problems
- Authors: Bagirov, Adil , Ugon, Julien , Mirzayeva, Hijran
- Date: 2013
- Type: Text , Journal article
- Relation: European Journal of Operational Research Vol. 229, no. 1 (2013), p. 132-142
- Full Text: false
- Reviewed:
- Description: Clusterwise regression consists of finding a number of regression functions each approximating a subset of the data. In this paper, a new approach for solving the clusterwise linear regression problems is proposed based on a nonsmooth nonconvex formulation. We present an algorithm for minimizing this nonsmooth nonconvex function. This algorithm incrementally divides the whole data set into groups which can be easily approximated by one linear regression function. A special procedure is introduced to generate a good starting point for solving global optimization problems at each iteration of the incremental algorithm. Such an approach allows one to find global or near global solution to the problem when the data sets are sufficiently dense. The algorithm is compared with the multistart Späth algorithm on several publicly available data sets for regression analysis. © 2013 Elsevier B.V. All rights reserved.
- Description: 2003011018
Application of optimisation-based data mining techniques to tobacco control dataset
- Authors: Dzalilov, Zari , Zhang, J , Bagirov, Adil , Mammadov, Musa
- Date: 2010
- Type: Text , Journal article
- Relation: International Journal of Lean Thinking Vol. 1, no. 1 (2010), p. 27-41
- Full Text: false
- Reviewed:
- Description: Tobacco smoking is one of the leading causes of death around the world. Consequently, control of tobacco use is an important global public health issue. Tobacco control may be aided by development of theoretical and methodological frameworks for describing and understanding complex tobacco control systems. Linear regression and logistic regression are currently very popular statistical techniques for modeling and analyzing complex data in tobacco control systems. However, in tobacco markets, numerous interrelated factors nontrivially interact with tobacco control policies, such that policies and control outcomes are nonlinearly related.
Cluster analysis of a tobacco control data set
- Authors: Dzalilov, Zari , Bagirov, Adil
- Date: 2010
- Type: Text , Journal article
- Relation: International Journal of Lean Thinking Vol. 1, no. 2 (2010), p.
- Full Text: false
- Reviewed:
- Description: Development of theoretical and methodological frameworks in data analysis is fundamental for modeling complex tobacco control systems. Following this idea, a new optimization based approach was introduced in the paper through two distinct methods: the modified linear least square fit and a heuristic algorithm for feature slection based on optimization-based methods have the potential to detect nonlinearity, and therefore to be more effective analysis tools of complex data set. In this study we evaluate the modified global k-means clustering algorithm by applying it to a massive set of real-time tobacco control survey data. Cluster analysis identified fixed and stable clusters in the studied data. These clusters correspond to groups of smokers with similar behaviour and the identification of these clusters may allow us to give recommendations on modification of existing tobacco control systems and on the design of future data acquistion surveys.
A multidimensional descent method for global optimization
- Authors: Bagirov, Adil , Rubinov, Alex , Zhang, Jiapu
- Date: 2009
- Type: Text , Journal article
- Relation: Optimization Vol. 58, no. 5 (2009), p. 611-625
- Full Text: false
- Reviewed:
- Description: This article presents a new multidimensional descent method for solving global optimization problems with box-constraints. This is a hybrid method where local search method is used for a local descent and global search is used for further multidimensional search on the subsets of intersection of cones generated by the local search method and the feasible region. The discrete gradient method is used for local search and the cutting angle method is used for global search. Two-and three-dimensional cones are used for the global search. Such an approach allows one, as a rule, to escape local minimizers which are not global ones. The proposed method is local optimization method with strong global search properties. We present results of numerical experiments using both smooth and non-smooth global optimization test problems. These results demonstrate that the proposed algorithm allows one to find a global or a near global minimizer.
Local optimization method with global multidimensional search
- Authors: Bagirov, Adil , Rubinov, Alex , Zhang, Jiapu
- Date: 2005
- Type: Text , Journal article
- Relation: Journal of Global Optimization Vol. 32, no. 2 (2005), p. 161-179
- Full Text:
- Reviewed:
- Description: This paper presents a new method for solving global optimization problems. We use a local technique based on the notion of discrete gradients for finding a cone of descent directions and then we use a global cutting angle algorithm for finding global minimum within the intersection of the cone and the feasible region. We present results of numerical experiments with well-known test problems and with the so-called cluster function. These results confirm that the proposed algorithms allows one to find a global minimizer or at least a deep local minimizer of a function with a huge amount of shallow local minima. © Springer 2005.
- Description: C1
- Description: 2003001351
A global optimization approach to classification
- Authors: Bagirov, Adil , Rubinov, Alex , Yearwood, John
- Date: 2002
- Type: Text , Journal article
- Relation: Optimization and Engineering Vol. 9, no. 7 (2002), p. 129-155
- Full Text: false
- Reviewed:
- Description: In this paper is presented an hybrid algorithm for finding the absolute extreme point of a multimodal scalar function of many variables. The algorithm is suitable when the objective function is expensive to compute, the computation can be affected by noise and/or partial derivatives cannot be calculated. The method used is a genetic modification of a previous algorithm based on the Prices method. All information about behavior of objective function collected on previous iterates are used to chose new evaluation points. The genetic part of the algorithm is very effective to escape from local attractors of the algorithm and assures convergence in probability to the global optimum. The proposed algorithm has been tested on a large set of multimodal test problems outperforming both the modified Prices algorithm and classical genetic approach.
- Description: C1
- Description: 2003000061