An L-2-Boosting Algorithm for Estimation of a Regression Function
- Authors: Bagirov, Adil , Clausen, Conny , Kohler, Michael
- Date: 2010
- Type: Text , Journal article
- Relation: IEEE Transactions on Information Theory Vol. 56, no. 3 (2010), p. 1417-1429
- Full Text:
- Reviewed:
- Description: An L-2-boosting algorithm for estimation of a regression function from random design is presented, which consists of fitting repeatedly a function from a fixed nonlinear function space to the residuals of the data by least squares and by defining the estimate as a linear combination of the resulting least squares estimates. Splitting of the sample is used to decide after how many iterations of smoothing of the residuals the algorithm terminates. The rate of convergence of the algorithm is analyzed in case of an unbounded response variable. The method is used to fit a sum of maxima of minima of linear functions to a given data set, and is compared with other nonparametric regression estimates using simulated data.
Constrained self organizing maps for data clusters visualization
- Authors: Mohebi, Ehsan , Bagirov, Adil
- Date: 2016
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 43, no. 3 (2016), p. 849-869
- Full Text: false
- Reviewed:
- Description: High dimensional data visualization is one of the main tasks in the field of data mining and pattern recognition. The self organizing maps (SOM) is one of the topology visualizing tool that contains a set of neurons that gradually adapt to input data space by competitive learning and form clusters. The topology preservation of the SOM strongly depends on the learning process. Due to this limitation one cannot guarantee the convergence of the SOM in data sets with clusters of arbitrary shape. In this paper, we introduce Constrained SOM (CSOM), the new version of the SOM by modifying the learning algorithm. The idea is to introduce an adaptive constraint parameter to the learning process to improve the topology preservation and mapping quality of the basic SOM. The computational complexity of the CSOM is less than those with the SOM. The proposed algorithm is compared with similar topology preservation algorithms and the numerical results on eight small to large real-world data sets demonstrate the efficiency of the proposed algorithm. © 2015, Springer Science+Business Media New York.
An incremental piecewise linear classifier based on polyhedral conic separation
- Authors: Ozturk, Gurkan , Bagirov, Adil , Kasimbeyli, Refail
- Date: 2015
- Type: Text , Journal article
- Relation: Machine Learning Vol. 101, no. 1-3 (2015), p. 397-413
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: In this paper, a piecewise linear classifier based on polyhedral conic separation is developed. This classifier builds nonlinear boundaries between classes using polyhedral conic functions. Since the number of polyhedral conic functions separating classes is not known a priori, an incremental approach is proposed to build separating functions. These functions are found by minimizing an error function which is nonsmooth and nonconvex. A special procedure is proposed to generate starting points to minimize the error function and this procedure is based on the incremental approach. The discrete gradient method, which is a derivative-free method for nonsmooth optimization, is applied to minimize the error function starting from those points. The proposed classifier is applied to solve classification problems on 12 publicly available data sets and compared with some mainstream and piecewise linear classifiers. © 2014, The Author(s).
Modified self-organising maps with a new topology and initialisation algorithm
- Authors: Mohebi, Ehsan , Bagirov, Adil
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Experimental and Theoretical Artificial Intelligence Vol. 27, no. 3 (2015), p. 351-372
- Full Text: false
- Reviewed:
- Description: Mapping quality of the self-organising maps (SOMs) is sensitive to the map topology and initialisation of neurons. In this article, in order to improve the convergence of the SOM, an algorithm based on split and merge of clusters to initialise neurons is introduced. The initialisation algorithm speeds up the learning process in large high-dimensional data sets. We also develop a topology based on this initialisation to optimise the vector quantisation error and topology preservation of the SOMs. Such an approach allows to find more accurate data visualisation and consequently clustering problem. The numerical results on eight small-to-large real-world data sets are reported to demonstrate the performance of the proposed algorithm in the sense of vector quantisation, topology preservation and CPU time requirement. © 2014 Taylor & Francis.
Optimization based clustering algorithms for authorship analysis of phishing emails
- Authors: Seifollahi, Sattar , Bagirov, Adil , Layton, Robert , Gondal, Iqbal
- Date: 2017
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 46, no. 2 (2017), p. 411-425
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Phishing has given attackers power to masquerade as legitimate users of organizations, such as banks, to scam money and private information from victims. Phishing is so widespread that combating the phishing attacks could overwhelm the victim organization. It is important to group the phishing attacks to formulate effective defence mechanism. In this paper, we use clustering methods to analyze and characterize phishing emails and perform their relative attribution. Emails are first tokenized to a bag-of-word space and, then, transformed to a numeric vector space using frequencies of words in documents. Wordnet vocabulary is used to take effects of similar words into account and to reduce sparsity. The word similarity measure is combined with the term frequencies to introduce a novel text transformation into numeric features. To improve the accuracy, we apply inverse document frequency weighting, which gives higher weights to features used by fewer authors. The k-means and recently introduced three optimization based algorithms: MS-MGKM, INCA and DCClust are applied for clustering purposes. The optimization based algorithms indicate the existence of well separated clusters in the phishing emails dataset. © 2017, Springer Science+Business Media New York.