SMGKM : an efficient incremental algorithm for clustering document collections
- Authors: Bagirov, Adil , Seifollahi, Sattar , Piccardi, Massimo , Zare Borzeshi, Ehsan , Kruger, Bernie
- Date: 2023
- Type: Text , Conference paper
- Relation: 19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018, Hanoi, Vietnam, 18-24 March 2018, Computational Linguistics and Intelligent Text Processing Vol. 13397 LNCS, p. 314-328
- Full Text: false
- Reviewed:
- Description: Given a large unlabeled document collection, the aim of this paper is to develop an accurate and efficient algorithm for solving the clustering problem over this collection. Document collections typically contain tens or hundreds of thousands of documents, with thousands or tens of thousands of features (i.e., distinct words). Most existing clustering algorithms struggle to find accurate solutions on such large data sets. The proposed algorithm overcomes this difficulty by an incremental approach, incrementing the number of clusters progressively from an initial value of one to a set value. At each iteration, the new candidate cluster is initialized using a partitioning approach which is guaranteed to minimize the objective function. Experiments have been carried out over six, diverse datasets and with different evaluation criteria, showing that the proposed algorithm has outperformed comparable state-of-the-art clustering algorithms in all cases. © 2023, Springer Nature Switzerland AG.
Multi-objective optimisation to manage trade-offs in water quality and quantity of complex water resource system
- Authors: Dey, Sayani , Barton, Andrew , Bagirov, Adil , Kandra, Harpreet , Wilson, Kym
- Date: 2021
- Type: Text , Conference paper
- Relation: Hydrology and Water Resources Symposium 2021, HWRS 2021: Digital Water: Hydrology and Water Resources Symposium 2021, Virtual online, 31 August-1 September 2021, HWRS 2021: Digital Water: Hydrology and Water Resources Symposium 2021 p. 465-480
- Full Text: false
- Reviewed:
- Description: Water of adequate quality and quantity is the key to health and integrity of the environment and fundamental to good water supply. Achieving water quality and quantity objectives can conflict and has become more complicated with challenges like, climate change, growing populations and changed land uses. Therefore, a multi-objective optimisation strategy is required for achieving optimal water quality and quantity outcomes from a water resources system. This study uses a multi-objective optimisation approach to illustrate the trade-offs occurring when water quantity and quality in a reservoir system are optimised. Taylors Lake, part of the Grampians Reservoir System in Western Victoria, Australia was chosen as the case study for this research as it is quite complex and includes many contemporary water resources challenges seen around the world, such as high turbidity and salinity. The objective functions are set in a way to maximise the water quantity available for supply, while minimising the deviation of quality parameters from the accepted limits. The water system is modelled using eWater Source® modelling platform, while optimisation is undertaken using NSGA-II optimisation technique. Daily time step data over a ten-year period was used in this work. Various optimisation runs were performed with different population sizes and generations to seek out the best trade-off curve. The optimisation results indicate trade-offs between salinity, turbidity, and quantity. Key findings for this case study show that through optimisation, stored water never exceeded 19,000 ML even though the storage capacity was 27,000 ML indicating a significant loss of water to improve quality, or alternatively, a potential asset re-design opportunity.
Subgradient smoothing method for nonsmooth nonconvex optimization
- Authors: Bagirov, Adil , Sultanova, N. , Taheri, S. , Ozturk, G.
- Date: 2021
- Type: Text , Conference paper
- Relation: 5th International Conference on Numerical Analysis and Optimization: Theory, Methods, Applications and Technology Transfer, NAOV, Muscan, 6-9 January 2020 Vol. 354, p. 57-79
- Full Text: false
- Reviewed:
- Description: In this chapter an unconstrained nonsmooth nonconvex optimization problem is considered and a method for solving this problem is developed. In this method the subproblem for finding search directions is reduced to the unconstrained minimization of a smooth function. This is achieved by using subgradients computed in some neighborhood of a current iteration point and by formulating the search direction finding problem to the minimization of the convex piecewise linear function over the unit ball. The hyperbolic smoothing technique is applied to approximate the minimization problem by a sequence of smooth problems. The convergence of the proposed method is studied and its performance is evaluated using a set of nonsmooth optimization academic test problems. In addition, the method is implemented in GAMS and numerical results using different solvers from GAMS are reported. The proposed method is compared with a number of nonsmooth optimization methods. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Partial undersampling of imbalanced data for cyber threats detection
- Authors: Moniruzzaman, Md , Bagirov, Adil , Gondal, Iqbal
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 2020 Australasian Computer Science Week Multiconference, ACSW 2020
- Full Text:
- Reviewed:
- Description: Real-time detection of cyber threats is a challenging task in cyber security. With the advancement of technology and ease of access to the internet, more and more individuals and organizations are becoming the target for various cyber attacks such as malware, ransomware, spyware. The target of these attacks is to steal money or valuable information from the victims. Signature-based detection methods fail to keep up with the constantly evolving new threats. Machine learning based detection has drawn more attention of researchers due to its capability of detecting new and modified attacks based on previous attack's behaviour. The number of malicious activities in a certain domain is significantly low compared to the number of normal activities. Therefore, cyber threats detection data sets are imbalanced. In this paper, we proposed a partial undersampling method to deal with imbalanced data for detecting cyber threats. © 2020 ACM.
- Description: E1
A comparative study of unsupervised classification algorithms in multi-sized data sets
- Authors: Quddus, Syed , Bagirov, Adil
- Date: 2019
- Type: Text , Conference paper
- Relation: 2nd Artificial Intelligence and Cloud Computing Conference, AICCC 2019, Kobe, 21-23 December 2019 p. 26-32
- Full Text: false
- Reviewed:
- Description: The ability to mine and extract useful information automatically, from large data sets, is a common concern for organizations, for the last few decades. Over the internet, data is vastly increasing gradually and consequently the capacity to collect and store very large data is significantly increasing. Existing clustering algorithms are not always efficient and accurate in solving clustering problems for large data sets. However, the development of accurate and fast data classification algorithms for very large scale data sets is still a challenge. In this paper, we present an overview of various algorithms and approaches which are recently being used for Clustering of large data and E-document. In this paper, a comparative study of the performance of various algorithms: the global kmeans algorithm (GKM), the multi-start modified global kmeans algorithm (MS-MGKM), the multi-start kmeans algorithm (MS-KM), the difference of convex clustering algorithm (DCA), the clustering algorithm based on the difference of convex representation of the cluster function and non-smooth optimization (DC-L2), is carried out using C++. © 2019 ACM.
Multi-source cyber-attacks detection using machine learning
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
A server side solution for detecting webInject : A machine learning approach
- Authors: Moniruzzaman, Md , Bagirov, Adil , Gondal, Iqbal , Brown, Simon
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2018; Melbourne, Australia; 3rd June 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11154 LNAI, p. 162-167
- Full Text: false
- Reviewed:
- Description: With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. A majority portion of online attacks is now done by WebInject. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is a challenging task. In addition, various platforms and tools are used by individuals, so different solutions needed to be designed. Existing server side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. In this paper, we propose a server side solution using a machine learning approach to detect WebInject in banking websites. Unlike other techniques, our method collects features of a Document Object Model (DOM) and classifies it with the help of a pre-trained model.
A new modification of Kohonen neural network for VQ and clustering problems
- Authors: Mohebi, Ehsan , Bagirov, Adil
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of the 11-th Australasian Data Mining Conference (AusDM'13) Vol. 146, p. 81-88
- Full Text: false
- Reviewed:
- Description: Vector Quantization (VQ) and Clustering are significantly important in the field of data mining and pattern recognition. The Self Organizing Maps has been widely used for clustering and topology visualization. The topology of the SOM and its initial neurons play an important role in the convergence of the Kohonen neural network. In this paper, in order to improve the convergence of the SOM we introduce an algorithm based on the split and merging of clusters to initialize neurons. We also introduce a topology based on this initialization to optimize the vector quantization error. Such an approach allows one to find global or near global solution to the vector quantization and consequently clustering problem. The numerical results on 4 small to large real-world data sets are reported to demonstrate the performance of the proposed algorithm.
Pumping costs and water quality in the battlefield of optimal operation of water distribution networks
- Authors: Mala-Jetmarova, Helena , Bagirov, Adil , Barton, Andrew
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of the 35th IAHR World Congress
- Full Text: false
- Reviewed:
A novel approach to optimal pump scheduling in water distribution systems
- Authors: Bagirov, Adil , Barton, Andrew , Mala-Jetmarova, Helena , Al Nuaimat, Alia , Ahmed, S. T. , Sultanova, Nargiz , Yearwood, John
- Date: 2012
- Type: Text , Conference paper
- Relation: 14th Water Distribution Systems Analysis Conference 2012, WDSA 2012 Vol. 1; Adelaide, Australia; 24th-27th September; p. 618-631
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: The operation of a water distribution system is a complex task which involves scheduling of pumps, regulating water levels of storages, and providing satisfactory water quality to customers at required flow and pressure. Pump scheduling is one of the most important tasks of the operation of a water distribution system as it represents the major part of its operating costs. In this paper, a novel approach for modeling of pump scheduling to minimize energy consumption by pumps is introduced which uses pump's start/end run times as continuous variables. This is different from other approaches where binary integer variables for each hour are typically used which is considered very impractical from an operational perspective. The problem is formulated as a nonlinear programming problem and a new algorithm is developed for its solution. This algorithm is based on the combination of the grid search with the Hooke-Jeeves pattern search method. The performance of the algorithm is evaluated using literature test problems applying the hydraulic simulation model EPANet.
- Description: E1
Application of optimisation-based data mining techniques to medical data sets: A comparative analysis
- Authors: Dzalilov, Zari , Bagirov, Adil , Mammadov, Musa
- Date: 2012
- Type: Text , Conference paper
- Relation: IMMM 2102: The Second International Conference on Advances in Information Mining and Management p. 41-46
- Full Text: false
- Reviewed:
- Description: Abstract - Computational methods have become an important tool in the analysis of medical data sets. In this paper, we apply three optimisation-based data mining methods to the following data sets: (i) a cystic fibrosis data set and (ii) a tobacco control data set. Three algorithms used in the analysis of these data sets include: the modified linear least square fit, an optimization based heuristic algorithm for feature selection and an optimization based clustering algorithm. All these methods explore the relationship between features and classes, with the aim of determining contribution of specific features to the class outcome. However, the three algorithms are based on completely different approaches. We apply these methods to solve feature selection and classification problems. We also present comparative analysis of the algorithms using computational results. Results obtained confirm that these algorithms may be effectively applied to the analysis of other (bio)medical data sets
Comparison of metaheuristic algorithms for pump operation optimization
- Authors: Bagirov, Adil , Ahmed, S. T. , Barton, Andrew , Mala-Jetmarova, Helena , Al Nuaimat, Alia , Sultanova, Nargiz
- Date: 2012
- Type: Text , Conference paper
- Relation: 14th Water Distribution Systems Analysis Conference 2012, WDSA 2012 Vol. 2; Adelaide, Australia; 24th-27th September 2012; p. 886-896
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: Pumping cost constitutes the main part of the overall operating cost of water distribution systems. There are different optimization formulations of the pumping cost minimization problem including those with application of continuous and integer programming approaches. To date mainly various metaheuristics have been applied to solve this problem. However, the comprehensive comparison of those metaheuristics has not been done. Such a comparison is important to identify strengths and weaknesses of different algorithms which reflects on their performance. In this paper, we present a methodology for comparative analysis of widely used metaheuristics for solving the pumping cost minimization problem. This methodology includes the following comparison criteria: (a) the "optimal solution" obtained; (b) the efficiency; and (c) robustness. Algorithms applied are: particle swarm optimization, artificial bee colony and firefly algorithms. These algorithms were applied to one test problem available in the literature. The results obtained demonstrate that the artificial bee colony is the most robust and the firefly is the most efficient and accurate algorithm for this test problem. Funding :ARC
Framework for multi-objective optimisation of the operation of water distribution networks including water quality
- Authors: Mala-Jetmarova, Helena , Bagirov, Adil , Barton, Andrew
- Date: 2012
- Type: Text , Conference paper
- Relation: 10th International conference on Hydroinformatics
- Full Text: false
- Reviewed:
Minimization of pumping costs in water distribution systems using explicit and implicit pump scheduling
- Authors: Barton, Andrew , Mala-Jetmarova, Helena , Nuamat, Alia Mari Al , Bagirov, Adil , Sultanova, Nargiz , Ahmed, Shams
- Date: 2012
- Type: Text , Conference paper
- Relation: 34th Hydrology and Water Resources Symposium, HWRS 2012; Sydney, Australia; 19th-22nd November 2012; p. 1298-1305
- Relation: http://purl.org/au-research/grants/arc/LP0990908
- Full Text: false
- Reviewed:
- Description: The operation of a water distribution system is a complex task which involves scheduling of pumps, regulating water levels of storages, and providing satisfactory water quality to customers at required flow and pressure. Pump scheduling is one of the most important tasks of the operation of a water distribution system as it represents the major part of its operating costs. In this paper, a novel approach for modeling of pump scheduling to minimize energy consumption by pumps is introduced which uses pump's start/end run times. We separate two types of pumps, one is operated based on the water level in a storage and another one is operated based on downstream pressure. For the first type of pumps both the explicit and implicit pump scheduling can be used, whereas the second type pumps can be optimized only using implicit pump scheduling. The problem is formulated as an optimization problem and an algorithm is developed for its solution. The performance of the algorithm is evaluated using a literature test problem applying the hydraulic simulation model EPANet.
Adaption to water shortage through the implementation of a unique pipeline system in Victoria, Australia
- Authors: Mala-Jetmarova, Helena , Barton, Andrew , Bagirov, Adil , McRae-Williams, Pamela , Caris, Rob , Jackson, Peter
- Date: 2010
- Type: Conference paper
- Relation: Paper presented at Hydropredict' 2010, 2nd International Interdisciplinary Conference on predications for Hydrology, Ecology, and Water Resources Management
- Full Text:
- Reviewed:
- Description: Abstract Water resource development has played a crucial role in the Grampians, Wimmera and Mallee regions of Australia, with the main source of surface water located in several reservoirs in the Grampians mountain ranges. Historically, water was delivered by gravity through a vast 19 500 km earthen channel system from the reservoirs to the townships and farms. As a result of the severe and protracted drought experienced in the region over the past 13 years and the projected drying climate, there have been fundamental changes made to the management of water in order to better cope with water scarcity. The primary strategic effort to sustainably manage water resources was by removing the unsustainable transport of water via the open channels which resulted in very high losses through seepage and evaporation. This inefficient system has been replaced by a pressurised pipeline, the largest geographical water infrastructure project of its type in Australia, spreading across an area of approximately 20 000 km2. To manage the change in water balance as a result of the pipeline and drying climate, the regions water corporations and environmental agencies have designed a scheme for water allocations intended to sustain local communities, allow for regional development and improve environmental conditions. This paper describes the unique pipeline system recently completed, provides a brief summary of water sharing arrangements and introduces the research program currently underway to optimise the performance of the pipeline system.
A new modified global k-means algorithm for clustering large data sets
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at XIIIth International Conference : Applied Stochastic Models and Data Analysis, ASMDA 2009, Vilnius, Lithuania : 30th June - 3rd July 2009 p. 1-5
- Full Text: false
- Description: The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and inefficient for solving clustering problems in large data sets. Recently, in order to resolve difficulties with the choice of starting points, incremental approaches have been developed. The modified global k-means algorithm is based on such an approach. It iteratively adds one cluster center at a time. Numerical experiments show that this algorithm considerably improve the k-means algorithm. However, this algorithm is not suitable for clustering very large data sets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points spanning different parts of the data set. We exploit information gathered in previous iterations of the incremental algorithm to reduce its complexity.
- Description: 2003007558
An incremental approach for the construction of a piecewise linear classifier
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at XIIIth International Conference : Applied Stochastic Models and Data Analysis, ASMDA 2009, Vilnius, Lithuania : 30th June - 3rd July 2009 p. 507–511
- Relation: https://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Description: In this paper the problem of finding piecewise linear boundaries between sets is considered and is applied for solving supervised data classification problems. An algorithm for the computation of piecewise linear boundaries, consisting of two main steps, is proposed. In the first step sets are approximated by hyperboxes to find so-called “indeterminate” regions between sets. In the second step sets are separated inside these “indeterminate” regions by piecewise linear functions. These functions are computed incrementally starting with a linear function. Results of numerical experiments are reported. These results demonstrate that the new algorithm requires a reasonable training time and it produces consistently good test set accuracy on most data sets comparing with mainstream classifiers.
- Description: 2003007559
Optimisation of operations of a water distribution system for reduced power usage
- Authors: Bagirov, Adil , Ugon, Julien , Barton, Andrew , Briggs, Steven
- Date: 2008
- Type: Text , Conference paper
- Relation: Paper presented at 9th National Conference on Hydraulics in Water Engineering: Hydraulics 2008, Darwin, Northern Territory : 22nd-26th September 2008
- Full Text: false
- Description: There are many improvements to operation that can be made to a water distribution system once it has been constructed and placed in ground. Pipes and associated storages and pumps are typically designed to meet average peak daily demands, offer some capacity for growth, and also allow for some deterioration of performance over time. However, the 'as constructed' performance of the pipeline is invariably different to what was designed on paper, and this is particularly so for anything other than design flows, such as during times of water restrictions when there are significantly reduced flows. Because of this, there remain significant benefits to owners and operators for the adaptive and global optimisation of such systems. The present paper uses the Ouyen subsystem of the Northern Mallee Pipeline, in Victoria, as a case study for the development of an optimisation model. This has been done with the intent of using this model to reduce costs and provide better service to customers on this system. The Ouyen subsystem consists of 1600 km of trunk and distribution pipeline servicing an area of 456,000 Ha. The system includes 2 fixed speed pumps diverting water from the Murray River at Liparoo into two 150 ML balancing storages at Ouyen, 4 variable speed pumps feeding water from the balancing storages into the pipeline system, 2 variable speed pressure booster pumps and 5 town balancing storages. When considering all these components of the system, power consumption becomes an important part of the overall operation. The present paper considers a global optimisation model to minimise power consumption while maintaining reasonable performance of the system. The main components of the model are described including the network structure and the costs functions associated with the system. The final model presents the cost functions associated with the pump scheduling, including the penalties descriptions associated with maintaining appropriate storages levels and pressure bounds within the water distribution network.
- Description: 2003006758
A nonsmooth optimization approach to sensor network localization
- Authors: Bagirov, Adil , Lai, Daniel , Palaniswami, M.
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 3rd International Conference on Intelligent Sensors, Sensor Networks and Information, ISSNIP 2007, Melbourne, Victoria : 3rd-6th December 2007 p. 727-732
- Relation: http://purl.org/au-research/grants/arc/DP0666061
- Full Text:
- Description: In this paper the problem of localization of wireless sensor network is formulated as an unconstrained nonsmooth optimization problem. We minimize a distance objective function which incorporates unknown sensor nodes and nodes with known positions (anchors) in contrast to popular semidefinite programming (SDP) methods which use artificial objective functions. We study the main properties of the objective function in this problem and design an algorithm for its minimization. Our algorithm is a derivative-free discrete gradient method that allows one to find a near global solution. The algorithm can handle a large number of sensors in the network. This paper contains the theory of our proposed formulation and algorithm while experimental results are included in later work.
- Description: 2003004949
Visual tools for analysing evolution, emergence, and error in data streams
- Authors: Hart, Sol , Yearwood, John , Bagirov, Adil
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 987-992
- Full Text:
- Description: The relatively new field of stream mining has necessitated the development of robust drift-aware algorithms that provide accurate, real time, data handling capabilities. Tools are needed to assess and diagnose important trends and investigate drift evolution parameters. In this paper, we present two new and novel visualisation techniques, Pixie and Luna graphs, which incorporate salient group statistics coupled with intuitive visual representations of multidimensional groupings over time. Through the novel representations presented here, spatial interactions between temporal divisions can be diagnosed and overall distribution patterns identified. It provides a means of evaluating in non-constrained capacity, commonly constrained evolutionary problems.
- Description: 2003005432