A novel optimization approach towards improving separability of clusters
- Authors: Bagirov, Adil , Hoseini-Monjezi, Najmeh , Taheri, Sona
- Date: 2023
- Type: Text , Journal article
- Relation: Computers and Operations Research Vol. 152, no. (2023), p.
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: The objective functions in optimization models of the sum-of-squares clustering problem reflect intra-cluster similarity and inter-cluster dissimilarities and in general, optimal values of these functions can be considered as appropriate measures for compactness of clusters. However, the use of the objective function alone may not lead to the finding of separable clusters. To address this shortcoming in existing models for clustering, we develop a new optimization model where the objective function is represented as a sum of two terms reflecting the compactness and separability of clusters. Based on this model we develop a two-phase incremental clustering algorithm. In the first phase, the clustering function is minimized to find compact clusters and in the second phase, a new model is applied to improve the separability of clusters. The Davies–Bouldin cluster validity index is applied as an additional measure to compare the compactness of clusters and silhouette coefficients are used to estimate the separability of clusters. The performance of the proposed algorithm is demonstrated and compared with that of four other algorithms using synthetic and real-world data sets. Numerical results clearly show that in comparison with other algorithms the new algorithm is able to find clusters with better separability and similar compactness. © 2022
Finding compact and well-separated clusters : clustering using silhouette coefficients
- Authors: Bagirov, Adil , Aliguliyev, Ramiz , Sultanova, Nargiz
- Date: 2023
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 135, no. (2023), p.
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: Finding compact and well-separated clusters in data sets is a challenging task. Most clustering algorithms try to minimize certain clustering objective functions. These functions usually reflect the intra-cluster similarity and inter-cluster dissimilarity. However, the use of such functions alone may not lead to the finding of well-separated and, in some cases, compact clusters. Therefore additional measures, called cluster validity indices, are used to estimate the true number of well-separated and compact clusters. Some of these indices are well-suited to be included into the optimization model of the clustering problem. Silhouette coefficients are among such indices. In this paper, a new optimization model of the clustering problem is developed where the clustering function is used as an objective and silhouette coefficients are used to formulate constraints. Then an algorithm, called CLUSCO (CLustering Using Silhouette COefficients), is designed to construct clusters incrementally. Three schemes are discussed to reduce the computational complexity of the algorithm. Its performance is evaluated using fourteen real-world data sets and compared with that of three state-of-the-art clustering algorithms. Results show that the CLUSCO is able to compute compact clusters which are significantly better separable in comparison with those obtained by other algorithms. © 2022 Elsevier Ltd
Methods and applications of clusterwise linear regression : a survey and comparison
- Authors: Long, Qiang , Bagirov, Adil , Taheri, Sona , Sultanova, Nargiz , Wu, Xue
- Date: 2023
- Type: Text , Journal article
- Relation: ACM Transactions on Knowledge Discovery from Data Vol. 17, no. 3 (2023), p.
- Relation: https://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: Clusterwise linear regression (CLR) is a well-known technique for approximating a data using more than one linear function. It is based on the combination of clustering and multiple linear regression methods. This article provides a comprehensive survey and comparative assessments of CLR including model formulations, description of algorithms, and their performance on small to large-scale synthetic and real-world datasets. Some applications of the CLR algorithms and possible future research directions are also discussed. © 2023 Association for Computing Machinery.
Nonsmooth optimization-based model and algorithm for semisupervised clustering
- Authors: Bagirov, Adil , Taheri, Sona , Bai, Fusheng , Zheng, Fangying
- Date: 2023
- Type: Text , Journal article
- Relation: IEEE Transactions on Neural Networks and Learning Systems Vol. 34, no. 9 (2023), p. 5517-5530
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: Using a nonconvex nonsmooth optimization approach, we introduce a model for semisupervised clustering (SSC) with pairwise constraints. In this model, the objective function is represented as a sum of three terms: the first term reflects the clustering error for unlabeled data points, the second term expresses the error for data points with must-link (ML) constraints, and the third term represents the error for data points with cannot-link (CL) constraints. This function is nonconvex and nonsmooth. To find its optimal solutions, we introduce an adaptive SSC (A-SSC) algorithm. This algorithm is based on the combination of the nonsmooth optimization method and an incremental approach, which involves the auxiliary SSC problem. The algorithm constructs clusters incrementally starting from one cluster and gradually adding one cluster center at each iteration. The solutions to the auxiliary SSC problem are utilized as starting points for solving the nonconvex SSC problem. The discrete gradient method (DGM) of nonsmooth optimization is applied to solve the underlying nonsmooth optimization problems. This method does not require subgradient evaluations and uses only function values. The performance of the A-SSC algorithm is evaluated and compared with four benchmarking SSC algorithms on one synthetic and 12 real-world datasets. Results demonstrate that the proposed algorithm outperforms the other four algorithms in identifying compact and well-separated clusters while satisfying most constraints. © 2021 IEEE.
SMGKM : an efficient incremental algorithm for clustering document collections
- Authors: Bagirov, Adil , Seifollahi, Sattar , Piccardi, Massimo , Zare Borzeshi, Ehsan , Kruger, Bernie
- Date: 2023
- Type: Text , Conference paper
- Relation: 19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018, Hanoi, Vietnam, 18-24 March 2018, Computational Linguistics and Intelligent Text Processing Vol. 13397 LNCS, p. 314-328
- Full Text: false
- Reviewed:
- Description: Given a large unlabeled document collection, the aim of this paper is to develop an accurate and efficient algorithm for solving the clustering problem over this collection. Document collections typically contain tens or hundreds of thousands of documents, with thousands or tens of thousands of features (i.e., distinct words). Most existing clustering algorithms struggle to find accurate solutions on such large data sets. The proposed algorithm overcomes this difficulty by an incremental approach, incrementing the number of clusters progressively from an initial value of one to a set value. At each iteration, the new candidate cluster is initialized using a partitioning approach which is guaranteed to minimize the objective function. Experiments have been carried out over six, diverse datasets and with different evaluation criteria, showing that the proposed algorithm has outperformed comparable state-of-the-art clustering algorithms in all cases. © 2023, Springer Nature Switzerland AG.
Limited Memory Bundle Method for Clusterwise Linear Regression
- Authors: Karmitsa, Napsu , Bagirov, Adil , Taheri, Sona , Joki, Kaisa
- Date: 2022
- Type: Text , Book chapter
- Relation: Intelligent Systems, Control and Automation: Science and Engineering p. 109-122
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: A clusterwise linear regression problem consists of finding a number of linear functions each approximating a subset of the given data. In this paper, the limited memory bundle method is modified and combined with the incremental approach to solve this problem using its nonsmooth optimization formulation. The main contribution of the proposed method is to obtain a fast solution time for large-scale clusterwise linear regression problems. The proposed algorithm is tested on small and large real-world data sets and compared with other algorithms for clusterwise linear regression. Numerical results demonstrate that the proposed algorithm is especially efficient in data sets with large numbers of data points and input variables. © 2022, Springer Nature Switzerland AG.
Missing value imputation via clusterwise linear regression
- Authors: Karmitsa, Napsu , Taheri, Sona , Bagirov, Adil , Makinen, Pauliina
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE transactions on knowledge and data engineering Vol. 34, no. 4 (2020), p. 1889-1901
- Full Text: false
- Reviewed:
- Description:
In this paper a new method of preprocessing incomplete data is introduced. The method is based on clusterwise linear regression and it combines two well-known approaches for missing value imputation: linear regression and clustering. The idea is to approximate missing values using only those data points that are somewhat similar to the incomplete data point. A similar idea is used also in clustering based imputation methods. Nevertheless, here the linear regression approach is used within each cluster to accurately predict the missing values, and this is done simultaneously to clustering. The proposed method is tested using some synthetic and real-world data sets and compared with other algorithms for missing value imputations. Numerical results demonstrate that the proposed method produces the most accurate imputations in MCAR and MAR data sets with a clear structure and the percentages of missing data no more than 25%
Robust piecewise linear L 1-regression via nonsmooth DC optimization
- Authors: Bagirov, Adil , Taheri, Sona , Karmitsa, Napsu , Sultanova, Nargiz , Asadi, Soodabeh
- Date: 2022
- Type: Text , Journal article
- Relation: Optimization Methods and Software Vol. 37, no. 4 (2022), p. 1289-1309
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: Piecewise linear (Formula presented.) -regression problem is formulated as an unconstrained difference of convex (DC) optimization problem and an algorithm for solving this problem is developed. Auxiliary problems are introduced to design an adaptive approach to generate a suitable piecewise linear regression model and starting points for solving the underlying DC optimization problems. The performance of the proposed algorithm as both approximation and prediction tool is evaluated using synthetic and real-world data sets containing outliers. It is also compared with mainstream machine learning regression algorithms using various performance measures. Results demonstrate that the new algorithm is robust to outliers and in general, provides better predictions than the other alternative regression algorithms for most data sets used in the numerical experiments. © 2020 Informa UK Limited, trading as Taylor & Francis Group.
An augmented subgradient method for minimizing nonsmooth DC functions
- Authors: Bagirov, Adil , Hoseini Monjezi, Najmeh , Taheri, Sona
- Date: 2021
- Type: Text , Journal article
- Relation: Computational Optimization and Applications Vol. 80, no. 2 (2021), p. 411-438
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: A method, called an augmented subgradient method, is developed to solve unconstrained nonsmooth difference of convex (DC) optimization problems. At each iteration of this method search directions are found by using several subgradients of the first DC component and one subgradient of the second DC component of the objective function. The developed method applies an Armijo-type line search procedure to find the next iteration point. It is proved that the sequence of points generated by the method converges to a critical point of the unconstrained DC optimization problem. The performance of the method is demonstrated using academic test problems with nonsmooth DC objective functions and its performance is compared with that of two general nonsmooth optimization solvers and five solvers specifically designed for unconstrained DC optimization. Computational results show that the developed method is efficient and robust for solving nonsmooth DC optimization problems. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Analysis of water quantity and quality trade-offs to inform selective harvesting of inflows in complex water resource systems
- Authors: Dey, Sayani , Barton, Andrew , Kandra, Harpreet , Bagirov, Adil , Wilson, Kym
- Date: 2021
- Type: Text , Journal article
- Relation: Water Resources Management Vol. 35, no. 12 (2021), p. 4149-4165
- Full Text: false
- Reviewed:
- Description: Challenges faced by water resource systems are multi-faceted. The problem can be even more pronounced in a dry continent like Australia where the water resources can often be afflicted by high salinity and turbidity. Therefore, modern water resource systems require to appropriately manage both water quality and quantity. This study aims to illustrate the trade-offs between water quantity and quality in a reservoir, based on decisions to harvest different inflow sources. Taylors Lake of the Grampians reservoir system in Western Victoria, Australia was chosen as the case study for this research as it is sufficiently complex and includes many of the contemporary water resources challenges seen around the world. Different operational scenarios were analysed which included increasingly stringent water quality criteria before the water was harvested or otherwise allowed to by-pass the storage. The study suggests that selective harvesting of water can be an option to improve the overall and long-term water quality within a reservoir, but stringent water quality measures can lead to an associated loss of overall water quantity. This research study provides useful insight to water planners and stakeholders in similar catchment settings around the world, to identify water harvesting regimes with competing water quality constraints. © 2021, The Author(s), under exclusive licence to Springer Nature B.V. Correction to: Analysis of Water Quantity and Quality Trade‑Offs to Inform Selective Harvesting of Inflows in Complex Water Resource Systems (Water Resources Management, (2021), 35, 12, (4149-4165), 10.1007/s11269-021-02936-x)
Incremental DC optimization algorithm for large-scale clusterwise linear regression
- Authors: Bagirov, Adil , Taheri, Sona , Cimen, Emre
- Date: 2021
- Type: Text , Journal article
- Relation: Journal of Computational and Applied Mathematics Vol. 389, no. (2021), p. 1-17
- Relation: https://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: The objective function in the nonsmooth optimization model of the clusterwise linear regression (CLR) problem with the squared regression error is represented as a difference of two convex functions. Then using the difference of convex algorithm (DCA) approach the CLR problem is replaced by the sequence of smooth unconstrained optimization subproblems. A new algorithm based on the DCA and the incremental approach is designed to solve the CLR problem. We apply the Quasi-Newton method to solve the subproblems. The proposed algorithm is evaluated using several synthetic and real-world data sets for regression and compared with other algorithms for CLR. Results demonstrate that the DCA based algorithm is efficient for solving CLR problems with the large number of data points and in particular, outperforms other algorithms when the number of input variables is small. © 2020 Elsevier B.V.
Multi-objective optimisation to manage trade-offs in water quality and quantity of complex water resource system
- Authors: Dey, Sayani , Barton, Andrew , Bagirov, Adil , Kandra, Harpreet , Wilson, Kym
- Date: 2021
- Type: Text , Conference paper
- Relation: Hydrology and Water Resources Symposium 2021, HWRS 2021: Digital Water: Hydrology and Water Resources Symposium 2021, Virtual online, 31 August-1 September 2021, HWRS 2021: Digital Water: Hydrology and Water Resources Symposium 2021 p. 465-480
- Full Text: false
- Reviewed:
- Description: Water of adequate quality and quantity is the key to health and integrity of the environment and fundamental to good water supply. Achieving water quality and quantity objectives can conflict and has become more complicated with challenges like, climate change, growing populations and changed land uses. Therefore, a multi-objective optimisation strategy is required for achieving optimal water quality and quantity outcomes from a water resources system. This study uses a multi-objective optimisation approach to illustrate the trade-offs occurring when water quantity and quality in a reservoir system are optimised. Taylors Lake, part of the Grampians Reservoir System in Western Victoria, Australia was chosen as the case study for this research as it is quite complex and includes many contemporary water resources challenges seen around the world, such as high turbidity and salinity. The objective functions are set in a way to maximise the water quantity available for supply, while minimising the deviation of quality parameters from the accepted limits. The water system is modelled using eWater Source® modelling platform, while optimisation is undertaken using NSGA-II optimisation technique. Daily time step data over a ten-year period was used in this work. Various optimisation runs were performed with different population sizes and generations to seek out the best trade-off curve. The optimisation results indicate trade-offs between salinity, turbidity, and quantity. Key findings for this case study show that through optimisation, stored water never exceeded 19,000 ML even though the storage capacity was 27,000 ML indicating a significant loss of water to improve quality, or alternatively, a potential asset re-design opportunity.
Subgradient smoothing method for nonsmooth nonconvex optimization
- Authors: Bagirov, Adil , Sultanova, Nargiz , Taheri, Sona , Ozturk, Gurkan
- Date: 2021
- Type: Text , Conference paper
- Relation: 5th International Conference on Numerical Analysis and Optimization: Theory, Methods, Applications and Technology Transfer, NAOV, Muscan, 6-9 January 2020 Vol. 354, p. 57-79
- Full Text: false
- Reviewed:
- Description: In this chapter an unconstrained nonsmooth nonconvex optimization problem is considered and a method for solving this problem is developed. In this method the subproblem for finding search directions is reduced to the unconstrained minimization of a smooth function. This is achieved by using subgradients computed in some neighborhood of a current iteration point and by formulating the search direction finding problem to the minimization of the convex piecewise linear function over the unit ball. The hyperbolic smoothing technique is applied to approximate the minimization problem by a sequence of smooth problems. The convergence of the proposed method is studied and its performance is evaluated using a set of nonsmooth optimization academic test problems. In addition, the method is implemented in GAMS and numerical results using different solvers from GAMS are reported. The proposed method is compared with a number of nonsmooth optimization methods. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
An incremental nonsmooth optimization algorithm for clustering using L1 and L∞ norms
- Authors: Ordin, Burak , Bagirov, Adil , Mohebi, Ehsam
- Date: 2020
- Type: Text , Journal article
- Relation: Journal of Industrial and Management Optimization Vol. 16, no. 6 (2020), p. 2757-2779
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: An algorithm is developed for solving clustering problems with the similarity measure defined using the L1and L∞ norms. It is based on an incremental approach and applies nonsmooth optimization methods to find cluster centers. Computational results on 12 data sets are reported and the proposed algorithm is compared with the X-means algorithm. ©
Bundle methods for nonsmooth DC optimization
- Authors: Joki, Kaisa , Bagirov, Adil
- Date: 2020
- Type: Text , Book chapter
- Relation: Numerical Nonsmooth Optimization: State of the Art Algorithms Chapter 8 p. 263-296
- Full Text: false
- Reviewed:
- Description: This chapter is devoted to algorithms for solving nonsmooth unconstrained difference of convex optimization problems. Different types of stationarity conditions are discussed and the relationship between sets of different stationary points (critical, Clarke stationary and inf-stationary) is established. Bundle methods are developed based on a nonconvex piecewise linear model of the objective function and the convergence of these methods is studied. Numerical results are presented to demonstrate the performance of the methods. © Springer Nature Switzerland AG 2020.
Discrete gradient methods
- Authors: Bagirov, Adil , Taheri, Sona , Karmitsa, Napsu
- Date: 2020
- Type: Text , Book chapter
- Relation: Numerical Nonsmooth Optimization: State of the Art Algorithms p. 621-654
- Full Text: false
- Reviewed:
- Description: In this chapter, the notion of a discrete gradient is introduced and it is shown that the discrete gradients can be used to approximate subdifferentials of a broad class of nonsmooth functions. Two methods based on such approximations, more specifically, the discrete gradient method (DGM) and its limited memory version (LDGB), are described. These methods are semi derivative-free methods for solving nonsmooth and, in general, nonconvex optimization problems. The performance of the methods is demonstrated using some academic test problems. © Springer Nature Switzerland AG 2020.
Final words
- Authors: Bagirov, Adil , Gaudioso, Manlio , Karmitsa, Napsu , Mäkelä, Marko , Taheri, Sona
- Date: 2020
- Type: Text , Book chapter
- Relation: Numerical Nonsmooth Optimization: State of the Art Algorithms p. 693-694
- Full Text: false
- Reviewed:
Introduction
- Authors: Bagirov, Adil , Gaudioso, Manlio , Karmitsa, Napsu , Mäkelä, Marko , Taheri, Sona
- Date: 2020
- Type: Text , Book chapter
- Relation: Numerical Nonsmooth Optimization: State of the Art Algorithms p. 1-16
- Full Text: false
- Reviewed:
- Description: Nonsmooth optimization (NSO) is among the most challenging tasks in the field of mathematical programming. It addresses optimization problems where objective and/or constraint functions have discontinuous gradients. NSO problems arise in many real life applications. Moreover, some smooth optimization techniques like different decomposition methods, dual formulations and exact penalty methods may lead us to solve NSO problems being either smaller in dimension or simpler in structure. In addition, some optimization problems may be analytically smooth but numerically nonsmooth. This is the case, for instance, with noisy input data and so-called stiff problems, which are numerically unstable and behave like nonsmooth problems. © Springer Nature Switzerland AG 2020.
New gene selection algorithm using hypeboxes to improve performance of classifiers
- Authors: Bagirov, Adil , Mardaneh, Karim
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Bioinformatics Research and Applications Vol. 16, no. 3 (2020), p. 269-289
- Full Text: false
- Reviewed:
- Description: The use of DNA microarray technology allows to measure the expression levels of thousands of genes in one single experiment which makes possible to apply classification techniques to classify tumours. However, the large number of genes and relatively small number of tumours in gene expression datasets may (and in some cases significantly) diminish the accuracy of many classifiers. Therefore, efficient gene selection algorithms are required to identify most informative genes or groups of genes to improve the performance of classifiers. In this paper, a new gene selection algorithm is developed using marginal hyberboxes of genes or groups of genes for each tumour type. Informative genes are defined using overlaps between hyberboxes. The results on six gene expression datasets demonstrate that the proposed algorithm is able to considerably reduce the number of genes and significantly improve the performance of classifiers. © 2020 Inderscience Enterprises Ltd.
Prediction of gold-bearing localised occurrences from limited exploration data
- Authors: Grigoryev, Igor , Bagirov, Adil , Tuck, Michael
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Computational Science and Engineering Vol. 21, no. 4 (2020), p. 503-512
- Full Text: false
- Reviewed:
- Description: Inaccurate drill-core assay interpretation in the exploration stage presents challenges to long-term profit of gold mining operations. Predicting the gold distribution within a deposit as precisely as possible is one of the most important aspects of the methodologies employed to avoid problems associated with financial expectations. The prediction of the variability of gold using a very limited number of drill-core samples is a very challenging problem. This is often intractable using traditional statistical tools where with less than complete spatial information certain assumptions are made about gold distribution and mineralisation. The decision-support predictive modelling methodology based on the unsupervised machine learning technique, presented in this paper avoids some of the restrictive limitations of traditional methods. It identifies promising exploration targets missed during exploration and recovers hidden spatial and physical characteristics of the explored deposit using information directly from drill hole database. Copyright © 2020 Inderscience Enterprises Ltd.