SMGKM : an efficient incremental algorithm for clustering document collections
- Authors: Bagirov, Adil , Seifollahi, Sattar , Piccardi, Massimo , Zare Borzeshi, Ehsan , Kruger, Bernie
- Date: 2023
- Type: Text , Conference paper
- Relation: 19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018, Hanoi, Vietnam, 18-24 March 2018, Computational Linguistics and Intelligent Text Processing Vol. 13397 LNCS, p. 314-328
- Full Text: false
- Reviewed:
- Description: Given a large unlabeled document collection, the aim of this paper is to develop an accurate and efficient algorithm for solving the clustering problem over this collection. Document collections typically contain tens or hundreds of thousands of documents, with thousands or tens of thousands of features (i.e., distinct words). Most existing clustering algorithms struggle to find accurate solutions on such large data sets. The proposed algorithm overcomes this difficulty by an incremental approach, incrementing the number of clusters progressively from an initial value of one to a set value. At each iteration, the new candidate cluster is initialized using a partitioning approach which is guaranteed to minimize the objective function. Experiments have been carried out over six, diverse datasets and with different evaluation criteria, showing that the proposed algorithm has outperformed comparable state-of-the-art clustering algorithms in all cases. © 2023, Springer Nature Switzerland AG.
Solving a system of nonlinear integral equations by an RBF network
- Authors: Golbabai, A. , Mammadov, Musa , Seifollahi, Sattar
- Date: 2009
- Type: Text , Journal article
- Relation: Computers & Mathematics with Applications Vol. 57, no. 10 (2009), p. 1651-1658
- Full Text: false
- Reviewed:
- Description: In this paper, a novel learning strategy for radial basis function networks (RBFN) is proposed. By adjusting the parameters of the hidden layer, including the RBF centers and widths, the weights of the output layer are adapted by local optimization methods. A new local optimization algorithm based on a combination of the gradient and Newton methods is introduced. The efficiency of some local optimization methods to Update the weights of RBFN is Studied in solving systems of nonlinear integral equations. (C) 2009 Elsevier Ltd. All rights reserved.
Novel weighting in single hidden layer feedforward neural networks for data classification
- Authors: Seifollahi, Sattar , Yearwood, John , Ofoghi, Bahadorreza
- Date: 2012
- Type: Text , Journal article
- Relation: Computers and Mathematics with Applications Vol. 64, no. 2 (2012), p. 128-136
- Full Text: false
- Reviewed:
- Description: We propose a binary classifier based on the single hidden layer feedforward neural network (SLFN) using radial basis functions (RBFs) and sigmoid functions in the hidden layer. We use a modified attribute-class correlation measure to determine the weights of attributes in the networks. Moreover, we propose new weights called as influence weights to utilize in the weights connecting the input layer and the hidden layer nodes (hidden weights) of the network with sigmoid hidden nodes. These weights are calculated as the sum of conditional probabilities of attribute values given class labels. Our learning procedure of the networks is based on the extreme learning machines; in which the parameters of the hidden nodes are first calculated and then the weights connecting the hidden nodes and output nodes (output weights) are found. The results of the networks with the proposed weights on some benchmark data sets show improvements over those of the conventional networks. © 2012 Elsevier Ltd. All rights reserved.
A simulated annealing-based maximum-margin clustering algorithm
- Authors: Seifollahi, Sattar , Bagirov, Adil , Borzeshi, Ehsan , Piccardi, Massimo
- Date: 2019
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 35, no. 1 (2019), p. 23-41
- Full Text:
- Reviewed:
- Description: Maximum-margin clustering is an extension of the support vector machine (SVM) to clustering. It partitions a set of unlabeled data into multiple groups by finding hyperplanes with the largest margins. Although existing algorithms have shown promising results, there is no guarantee of convergence of these algorithms to global solutions due to the nonconvexity of the optimization problem. In this paper, we propose a simulated annealing-based algorithm that is able to mitigate the issue of local minima in the maximum-margin clustering problem. The novelty of our algorithm is twofold, ie, (i) it comprises a comprehensive cluster modification scheme based on simulated annealing, and (ii) it introduces a new approach based on the combination of k-means++ and SVM at each step of the annealing process. More precisely, k-means++ is initially applied to extract subsets of the data points. Then, an unsupervised SVM is applied to improve the clustering results. Experimental results on various benchmark data sets (of up to over a million points) give evidence that the proposed algorithm is more effective at solving the clustering problem than a number of popular clustering algorithms.
Optimization based clustering algorithms for authorship analysis of phishing emails
- Authors: Seifollahi, Sattar , Bagirov, Adil , Layton, Robert , Gondal, Iqbal
- Date: 2017
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 46, no. 2 (2017), p. 411-425
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Phishing has given attackers power to masquerade as legitimate users of organizations, such as banks, to scam money and private information from victims. Phishing is so widespread that combating the phishing attacks could overwhelm the victim organization. It is important to group the phishing attacks to formulate effective defence mechanism. In this paper, we use clustering methods to analyze and characterize phishing emails and perform their relative attribution. Emails are first tokenized to a bag-of-word space and, then, transformed to a numeric vector space using frequencies of words in documents. Wordnet vocabulary is used to take effects of similar words into account and to reduce sparsity. The word similarity measure is combined with the term frequencies to introduce a novel text transformation into numeric features. To improve the accuracy, we apply inverse document frequency weighting, which gives higher weights to features used by fewer authors. The k-means and recently introduced three optimization based algorithms: MS-MGKM, INCA and DCClust are applied for clustering purposes. The optimization based algorithms indicate the existence of well separated clusters in the phishing emails dataset. © 2017, Springer Science+Business Media New York.
Globally convergent algorithms for solving unconstrained optimization problems
- Authors: Taheri, Sona , Mammadov, Musa , Seifollahi, Sattar
- Date: 2013
- Type: Text , Journal article
- Relation: Optimization Vol. , no. (2013), p. 1-15
- Full Text:
- Reviewed:
- Description: New algorithms for solving unconstrained optimization problems are presented based on the idea of combining two types of descent directions: the direction of anti-gradient and either the Newton or quasi-Newton directions. The use of latter directions allows one to improve the convergence rate. Global and superlinear convergence properties of these algorithms are established. Numerical experiments using some unconstrained test problems are reported. Also, the proposed algorithms are compared with some existing similar methods using results of experiments. This comparison demonstrates the efficiency of the proposed combined methods.
Attribute weighted Naive Bayes classifier using a local optimization
- Authors: Taheri, Sona , Yearwood, John , Mammadov, Musa , Seifollahi, Sattar
- Date: 2013
- Type: Text , Journal article
- Relation: Neural Computing & Applications Vol.24, no.5 (2013), p.995-1002
- Full Text:
- Reviewed:
- Description: The Naive Bayes classifier is a popular classification technique for data mining and machine learning. It has been shown to be very effective on a variety of data classification problems. However, the strong assumption that all attributes are conditionally independent given the class is often violated in real-world applications. Numerous methods have been proposed in order to improve the performance of the Naive Bayes classifier by alleviating the attribute independence assumption. However, violation of the independence assumption can increase the expected error. Another alternative is assigning the weights for attributes. In this paper, we propose a novel attribute weighted Naive Bayes classifier by considering weights to the conditional probabilities. An objective function is modeled and taken into account, which is based on the structure of the Naive Bayes classifier and the attribute weights. The optimal weights are determined by a local optimization method using the quasisecant method. In the proposed approach, the Naive Bayes classifier is taken as a starting point. We report the results of numerical experiments on several real-world data sets in binary classification, which show the efficiency of the proposed method.
A novel hybrid neural learning algorithm using simulated annealing and quasisecant method
- Authors: Yearwood, John , Bagirov, Adil , Seifollahi, Sattar
- Date: 2011
- Type: Text , Conference proceedings
- Full Text: false
- Description: In this paper, we propose a hybrid learning algorithm for the single hidden layer feedforward neural networks (SLFNs) for data classification. The proposed hybrid algorithm is a two-phase learning algorithm and is based on the quasisecant and the simulated annealing methods. First, the weights between the hidden layer and the output layer nodes (output layer weights) are adjusted by the quasisecant algorithm. Then the simulated annealing is applied for global attribute weighting. The weights between the input layer and the hidden layer nodes are fixed in advance and are not included in the learning process. The proposed two-phase learning of the network is a novel idea and is different from that of the existing ones. The numerical results on some benchmark data sets are also reported and these results are promising. © 2011, Australian Computer Society, Inc.
- Description: 2003009507