Incremental DC optimization algorithm for large-scale clusterwise linear regression
- Bagirov, Adil, Taheri, Sona, Cimen, Emre
- Authors: Bagirov, Adil , Taheri, Sona , Cimen, Emre
- Date: 2021
- Type: Text , Journal article
- Relation: Journal of Computational and Applied Mathematics Vol. 389, no. (2021), p. 1-17
- Relation: https://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: The objective function in the nonsmooth optimization model of the clusterwise linear regression (CLR) problem with the squared regression error is represented as a difference of two convex functions. Then using the difference of convex algorithm (DCA) approach the CLR problem is replaced by the sequence of smooth unconstrained optimization subproblems. A new algorithm based on the DCA and the incremental approach is designed to solve the CLR problem. We apply the Quasi-Newton method to solve the subproblems. The proposed algorithm is evaluated using several synthetic and real-world data sets for regression and compared with other algorithms for CLR. Results demonstrate that the DCA based algorithm is efficient for solving CLR problems with the large number of data points and in particular, outperforms other algorithms when the number of input variables is small. © 2020 Elsevier B.V.
Nonsmooth nonconvex optimization approach to clusterwise linear regression problems
- Bagirov, Adil, Ugon, Julien, Mirzayeva, Hijran
- Authors: Bagirov, Adil , Ugon, Julien , Mirzayeva, Hijran
- Date: 2013
- Type: Text , Journal article
- Relation: European Journal of Operational Research Vol. 229, no. 1 (2013), p. 132-142
- Full Text: false
- Reviewed:
- Description: Clusterwise regression consists of finding a number of regression functions each approximating a subset of the data. In this paper, a new approach for solving the clusterwise linear regression problems is proposed based on a nonsmooth nonconvex formulation. We present an algorithm for minimizing this nonsmooth nonconvex function. This algorithm incrementally divides the whole data set into groups which can be easily approximated by one linear regression function. A special procedure is introduced to generate a good starting point for solving global optimization problems at each iteration of the incremental algorithm. Such an approach allows one to find global or near global solution to the problem when the data sets are sufficiently dense. The algorithm is compared with the multistart Späth algorithm on several publicly available data sets for regression analysis. © 2013 Elsevier B.V. All rights reserved.
- Description: 2003011018
Prediction of monthly rainfall in Victoria, Australia : Clusterwise linear regression approach
- Bagirov, Adil, Mahmood, Arshad, Barton, Andrew
- Authors: Bagirov, Adil , Mahmood, Arshad , Barton, Andrew
- Date: 2017
- Type: Text , Journal article
- Relation: Atmospheric Research Vol. 188, no. (2017), p. 20-29
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889–2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations. © 2017 Elsevier B.V.
Rainfall prediction in Australia : Clusterwise linear regression approach
- Authors: Mahmood, Arshad
- Date: 2017
- Type: Text , Thesis , PhD
- Full Text:
- Description: Accurate rainfall prediction is a challenging task because of the complex physical processes involved. This complexity is compounded in Australia as the climate can be highly variable. Accurate rainfall prediction is immensely benecial for making informed policy, planning and management decisions, and can assist with the most sustainable operation of water resource systems. Short-term prediction of rainfall is provided by meteorological services; however, the intermediate to long-term prediction of rainfall remains challenging and contains much uncertainty. Many prediction approaches have been proposed in the literature, including statistical and computational intelligence approaches. However, finding a method to model the complex physical process of rainfall, especially in Australia where the climate is highly variable, is still a major challenge. The aims of this study are to: (a) develop an optimization based clusterwise linear regression method, (b) develop new prediction methods based on clusterwise linear regression, (c) assess the influence of geographic regions on the performance of prediction models in predicting monthly and weekly rainfall in Australia, (d) determine the combined influence of meteorological variables on rainfall prediction in Australia, and (e) carry out a comparative analysis of new and existing prediction techniques using Australian rainfall data. In this study, rainfall data with five input meteorological variables from 24 geographically diverse weather stations in Australia, over the period January 1970 to December 2014, have been taken from the Scientific Information for Land Owners (SILO). We also consider the climate zones when selecting weather stations, because Australia experiences a variety of climates due to its size. The data was divided into training and testing periods for evaluation purposes. In this study, optimization based clusterwise linear regression is modified and new prediction methods are developed for rainfall prediction. The proposed method is applied to predict monthly and weekly rainfall. The prediction performance of the clusterwise linear regression method was evaluated by comparing observed and predicted rainfall values using the performance measures: root mean squared error, the mean absolute error, the mean absolute scaled error and the Nash-Sutclie coefficient of efficiency. The proposed method is also compared with the clusterwise linear regression based on the maximum likelihood estimation, linear support vector machines for regression, support vector machines for regression with radial basis kernel function, multiple linear regression, artificial neural networks with and without hidden layer and k-nearest neighbours methods using computational results. Initially, to determine the appropriate input variables to be used in the investigation, we assessed all combinations of meteorological variables. The results confirm that single meteorological variables alone are unable to predict rainfall accurately. The prediction performance of all selected models was improved by adding the input variables in most locations. To assess the influence of geographic regions on the performance of prediction models and to compare the prediction performance of models, we trained models with the best combination of input variables and predicted monthly and weekly rainfall over the test periods. The results of this analysis confirm that the prediction performance of all selected models varied considerably with geographic regions for both weekly and monthly rainfall predictions. It is found that models have the lowest prediction error in the desert climate zone and highest in subtropical and tropical zones. The results also demonstrate that the proposed algorithm is capable of finding the patterns and trends of the observations for monthly and weekly rainfall predictions in all geographic regions. In desert, tropical and subtropical climate zones, the proposed method outperform other methods in most locations for both monthly and weekly rainfall predictions. In temperate and grassland zones the prediction performance of the proposed model is better in some locations while in the remaining locations it is slightly lower than the other models.
- Description: Doctor of Philosophy
- Description: Accurate rainfall prediction is a challenging task because of the complex physical processes involved. This complexity is compounded in Australia as the climate can be highly variable. Accurate rainfall prediction is immensely benecial for making informed policy, planning and management decisions, and can assist with the most sustainable operation of water resource systems. Short-term prediction of rainfall is provided by meteorological services; however, the intermediate to long-term prediction of rainfall remains challenging and contains much uncertainty. Many prediction approaches have been proposed in the literature, including statistical and computational intelligence approaches. However, finding a method to model the complex physical process of rainfall, especially in Australia where the climate is highly variable, is still a major challenge. The aims of this study are to: (a) develop an optimization based clusterwise linear regression method, (b) develop new prediction methods based on clusterwise linear regression, (c) assess the influence of geographic regions on the performance of prediction models in predicting monthly and weekly rainfall in Australia, (d) determine the combined influence of meteorological variables on rainfall prediction in Australia, and (e) carry out a comparative analysis of new and existing prediction techniques using Australian rainfall data. In this study, rainfall data with five input meteorological variables from 24 geographically diverse weather stations in Australia, over the period January 1970 to December 2014, have been taken from the Scientific Information for Land Owners (SILO). We also consider the climate zones when selecting weather stations, because Australia experiences a variety of climates due to its size. The data was divided into training and testing periods for evaluation purposes. In this study, optimization based clusterwise linear regression is modified and new prediction methods are developed for rainfall prediction. The proposed method is applied to predict monthly and weekly rainfall. The prediction performance of the clusterwise linear regression method was evaluated by comparing observed and predicted rainfall values using the performance measures: root mean squared error, the mean absolute error, the mean absolute scaled error and the Nash-Sutclie coefficient of efficiency. The proposed method is also compared with the clusterwise linear regression based on the maximum likelihood estimation, linear support vector machines for regression, support vector machines for regression with radial basis kernel function, multiple linear regression, artificial neural networks with and without hidden layer and k-nearest neighbors methods using computational results. Initially, to determine the appropriate input variables to be used in the investigation, we assessed all combinations of meteorological variables. The results confirm that single meteorological variables alone are unable to predict rainfall accurately. The prediction performance of all selected models was improved by adding the input variables in most locations. To assess the influence of geographic regions on the performance of prediction models and to compare the prediction performance of models, we trained models with the best combination of input variables and predicted monthly and weekly rainfall over the test periods. The results of this analysis confirm that the prediction performance of all selected models varied considerably with geographic regions for both weekly and monthly rainfall predictions. It is found that models have the lowest prediction error in the desert climate zone and highest in subtropical and tropical zones. The results also demonstrate that the proposed algorithm is capable of finding the patterns and trends of the observations for monthly and weekly rainfall predictions in all geographic regions. In desert, tropical and subtropical climate zones, the proposed method outperform other methods in most locations for both monthly and weekly rainfall predictions. In temperate and grassland zones the prediction performance of the proposed model is better in some locations while in the remaining locations it is slightly lower than the other models.
- Authors: Mahmood, Arshad
- Date: 2017
- Type: Text , Thesis , PhD
- Full Text:
- Description: Accurate rainfall prediction is a challenging task because of the complex physical processes involved. This complexity is compounded in Australia as the climate can be highly variable. Accurate rainfall prediction is immensely benecial for making informed policy, planning and management decisions, and can assist with the most sustainable operation of water resource systems. Short-term prediction of rainfall is provided by meteorological services; however, the intermediate to long-term prediction of rainfall remains challenging and contains much uncertainty. Many prediction approaches have been proposed in the literature, including statistical and computational intelligence approaches. However, finding a method to model the complex physical process of rainfall, especially in Australia where the climate is highly variable, is still a major challenge. The aims of this study are to: (a) develop an optimization based clusterwise linear regression method, (b) develop new prediction methods based on clusterwise linear regression, (c) assess the influence of geographic regions on the performance of prediction models in predicting monthly and weekly rainfall in Australia, (d) determine the combined influence of meteorological variables on rainfall prediction in Australia, and (e) carry out a comparative analysis of new and existing prediction techniques using Australian rainfall data. In this study, rainfall data with five input meteorological variables from 24 geographically diverse weather stations in Australia, over the period January 1970 to December 2014, have been taken from the Scientific Information for Land Owners (SILO). We also consider the climate zones when selecting weather stations, because Australia experiences a variety of climates due to its size. The data was divided into training and testing periods for evaluation purposes. In this study, optimization based clusterwise linear regression is modified and new prediction methods are developed for rainfall prediction. The proposed method is applied to predict monthly and weekly rainfall. The prediction performance of the clusterwise linear regression method was evaluated by comparing observed and predicted rainfall values using the performance measures: root mean squared error, the mean absolute error, the mean absolute scaled error and the Nash-Sutclie coefficient of efficiency. The proposed method is also compared with the clusterwise linear regression based on the maximum likelihood estimation, linear support vector machines for regression, support vector machines for regression with radial basis kernel function, multiple linear regression, artificial neural networks with and without hidden layer and k-nearest neighbours methods using computational results. Initially, to determine the appropriate input variables to be used in the investigation, we assessed all combinations of meteorological variables. The results confirm that single meteorological variables alone are unable to predict rainfall accurately. The prediction performance of all selected models was improved by adding the input variables in most locations. To assess the influence of geographic regions on the performance of prediction models and to compare the prediction performance of models, we trained models with the best combination of input variables and predicted monthly and weekly rainfall over the test periods. The results of this analysis confirm that the prediction performance of all selected models varied considerably with geographic regions for both weekly and monthly rainfall predictions. It is found that models have the lowest prediction error in the desert climate zone and highest in subtropical and tropical zones. The results also demonstrate that the proposed algorithm is capable of finding the patterns and trends of the observations for monthly and weekly rainfall predictions in all geographic regions. In desert, tropical and subtropical climate zones, the proposed method outperform other methods in most locations for both monthly and weekly rainfall predictions. In temperate and grassland zones the prediction performance of the proposed model is better in some locations while in the remaining locations it is slightly lower than the other models.
- Description: Doctor of Philosophy
- Description: Accurate rainfall prediction is a challenging task because of the complex physical processes involved. This complexity is compounded in Australia as the climate can be highly variable. Accurate rainfall prediction is immensely benecial for making informed policy, planning and management decisions, and can assist with the most sustainable operation of water resource systems. Short-term prediction of rainfall is provided by meteorological services; however, the intermediate to long-term prediction of rainfall remains challenging and contains much uncertainty. Many prediction approaches have been proposed in the literature, including statistical and computational intelligence approaches. However, finding a method to model the complex physical process of rainfall, especially in Australia where the climate is highly variable, is still a major challenge. The aims of this study are to: (a) develop an optimization based clusterwise linear regression method, (b) develop new prediction methods based on clusterwise linear regression, (c) assess the influence of geographic regions on the performance of prediction models in predicting monthly and weekly rainfall in Australia, (d) determine the combined influence of meteorological variables on rainfall prediction in Australia, and (e) carry out a comparative analysis of new and existing prediction techniques using Australian rainfall data. In this study, rainfall data with five input meteorological variables from 24 geographically diverse weather stations in Australia, over the period January 1970 to December 2014, have been taken from the Scientific Information for Land Owners (SILO). We also consider the climate zones when selecting weather stations, because Australia experiences a variety of climates due to its size. The data was divided into training and testing periods for evaluation purposes. In this study, optimization based clusterwise linear regression is modified and new prediction methods are developed for rainfall prediction. The proposed method is applied to predict monthly and weekly rainfall. The prediction performance of the clusterwise linear regression method was evaluated by comparing observed and predicted rainfall values using the performance measures: root mean squared error, the mean absolute error, the mean absolute scaled error and the Nash-Sutclie coefficient of efficiency. The proposed method is also compared with the clusterwise linear regression based on the maximum likelihood estimation, linear support vector machines for regression, support vector machines for regression with radial basis kernel function, multiple linear regression, artificial neural networks with and without hidden layer and k-nearest neighbors methods using computational results. Initially, to determine the appropriate input variables to be used in the investigation, we assessed all combinations of meteorological variables. The results confirm that single meteorological variables alone are unable to predict rainfall accurately. The prediction performance of all selected models was improved by adding the input variables in most locations. To assess the influence of geographic regions on the performance of prediction models and to compare the prediction performance of models, we trained models with the best combination of input variables and predicted monthly and weekly rainfall over the test periods. The results of this analysis confirm that the prediction performance of all selected models varied considerably with geographic regions for both weekly and monthly rainfall predictions. It is found that models have the lowest prediction error in the desert climate zone and highest in subtropical and tropical zones. The results also demonstrate that the proposed algorithm is capable of finding the patterns and trends of the observations for monthly and weekly rainfall predictions in all geographic regions. In desert, tropical and subtropical climate zones, the proposed method outperform other methods in most locations for both monthly and weekly rainfall predictions. In temperate and grassland zones the prediction performance of the proposed model is better in some locations while in the remaining locations it is slightly lower than the other models.
Nonsmooth optimization algorithm for solving clusterwise linear regression problems
- Bagirov, Adil, Ugon, Julien, Mirzayeva, Hijran
- Authors: Bagirov, Adil , Ugon, Julien , Mirzayeva, Hijran
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Optimization Theory and Applications Vol. 164, no. 3 (2015), p. 755-780
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Clusterwise linear regression consists of finding a number of linear regression functions each approximating a subset of the data. In this paper, the clusterwise linear regression problem is formulated as a nonsmooth nonconvex optimization problem and an algorithm based on an incremental approach and on the discrete gradient method of nonsmooth optimization is designed to solve it. This algorithm incrementally divides the whole dataset into groups which can be easily approximated by one linear regression function. A special procedure is introduced to generate good starting points for solving global optimization problems at each iteration of the incremental algorithm. The algorithm is compared with the multi-start Spath and the incremental algorithms on several publicly available datasets for regression analysis.
Prediction of gold-bearing localised occurrences from limited exploration data
- Grigoryev, Igor, Bagirov, Adil, Tuck, Michael
- Authors: Grigoryev, Igor , Bagirov, Adil , Tuck, Michael
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Computational Science and Engineering Vol. 21, no. 4 (2020), p. 503-512
- Full Text: false
- Reviewed:
- Description: Inaccurate drill-core assay interpretation in the exploration stage presents challenges to long-term profit of gold mining operations. Predicting the gold distribution within a deposit as precisely as possible is one of the most important aspects of the methodologies employed to avoid problems associated with financial expectations. The prediction of the variability of gold using a very limited number of drill-core samples is a very challenging problem. This is often intractable using traditional statistical tools where with less than complete spatial information certain assumptions are made about gold distribution and mineralisation. The decision-support predictive modelling methodology based on the unsupervised machine learning technique, presented in this paper avoids some of the restrictive limitations of traditional methods. It identifies promising exploration targets missed during exploration and recovers hidden spatial and physical characteristics of the explored deposit using information directly from drill hole database. Copyright © 2020 Inderscience Enterprises Ltd.
Clusterwise support vector linear regression
- Joki, Kaisa, Bagirov, Adil, Karmitsa, Napsu, Mäkelä, Marko, Taheri, Sona
- Authors: Joki, Kaisa , Bagirov, Adil , Karmitsa, Napsu , Mäkelä, Marko , Taheri, Sona
- Date: 2020
- Type: Text , Journal article
- Relation: European Journal of Operational Research Vol. 287, no. 1 (2020), p. 19-35
- Full Text:
- Reviewed:
- Description: In clusterwise linear regression (CLR), the aim is to simultaneously partition data into a given number of clusters and to find regression coefficients for each cluster. In this paper, we propose a novel approach to model and solve the CLR problem. The main idea is to utilize the support vector machine (SVM) approach to model the CLR problem by using the SVM for regression to approximate each cluster. This new formulation of the CLR problem is represented as an unconstrained nonsmooth optimization problem, where we minimize a difference of two convex (DC) functions. To solve this problem, a method based on the combination of the incremental algorithm and the double bundle method for DC optimization is designed. Numerical experiments are performed to validate the reliability of the new formulation for CLR and the efficiency of the proposed method. The results show that the SVM approach is suitable for solving CLR problems, especially, when there are outliers in data. © 2020 Elsevier B.V.
- Description: Funding details: Academy of Finland, 289500, 294002, 319274 Funding details: Turun Yliopisto Funding details: Australian Research Council, ARC, (Project no. DP190100580 ).
- Authors: Joki, Kaisa , Bagirov, Adil , Karmitsa, Napsu , Mäkelä, Marko , Taheri, Sona
- Date: 2020
- Type: Text , Journal article
- Relation: European Journal of Operational Research Vol. 287, no. 1 (2020), p. 19-35
- Full Text:
- Reviewed:
- Description: In clusterwise linear regression (CLR), the aim is to simultaneously partition data into a given number of clusters and to find regression coefficients for each cluster. In this paper, we propose a novel approach to model and solve the CLR problem. The main idea is to utilize the support vector machine (SVM) approach to model the CLR problem by using the SVM for regression to approximate each cluster. This new formulation of the CLR problem is represented as an unconstrained nonsmooth optimization problem, where we minimize a difference of two convex (DC) functions. To solve this problem, a method based on the combination of the incremental algorithm and the double bundle method for DC optimization is designed. Numerical experiments are performed to validate the reliability of the new formulation for CLR and the efficiency of the proposed method. The results show that the SVM approach is suitable for solving CLR problems, especially, when there are outliers in data. © 2020 Elsevier B.V.
- Description: Funding details: Academy of Finland, 289500, 294002, 319274 Funding details: Turun Yliopisto Funding details: Australian Research Council, ARC, (Project no. DP190100580 ).
Methods and applications of clusterwise linear regression : a survey and comparison
- Long, Qiang, Bagirov, Adil, Taheri, Sona, Sultanova, Nargiz, Wu, Xue
- Authors: Long, Qiang , Bagirov, Adil , Taheri, Sona , Sultanova, Nargiz , Wu, Xue
- Date: 2023
- Type: Text , Journal article
- Relation: ACM Transactions on Knowledge Discovery from Data Vol. 17, no. 3 (2023), p.
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text: false
- Reviewed:
- Description: Clusterwise linear regression (CLR) is a well-known technique for approximating a data using more than one linear function. It is based on the combination of clustering and multiple linear regression methods. This article provides a comprehensive survey and comparative assessments of CLR including model formulations, description of algorithms, and their performance on small to large-scale synthetic and real-world datasets. Some applications of the CLR algorithms and possible future research directions are also discussed. © 2023 Association for Computing Machinery.
- «
- ‹
- 1
- ›
- »