Visual tools for analysing evolution, emergence, and error in data streams
- Authors: Hart, Sol , Yearwood, John , Bagirov, Adil
- Date: 2007
- Type: Text , Conference paper
- Relation: Paper presented at 6th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, Melbourne, Victoria : 11th-13th July 2007 p. 987-992
- Full Text:
- Description: The relatively new field of stream mining has necessitated the development of robust drift-aware algorithms that provide accurate, real time, data handling capabilities. Tools are needed to assess and diagnose important trends and investigate drift evolution parameters. In this paper, we present two new and novel visualisation techniques, Pixie and Luna graphs, which incorporate salient group statistics coupled with intuitive visual representations of multidimensional groupings over time. Through the novel representations presented here, spatial interactions between temporal divisions can be diagnosed and overall distribution patterns identified. It provides a means of evaluating in non-constrained capacity, commonly constrained evolutionary problems.
- Description: 2003005432
Hybridization of neural learning algorithms using evolutionary and discrete gradient approaches
- Authors: Ghosh, Ranadhir , Yearwood, John , Ghosh, Moumita , Bagirov, Adil
- Date: 2005
- Type: Text , Journal article
- Relation: Journal of Computer Science Vol. 1, no. 3 (2005), p. 387-394
- Full Text: false
- Reviewed:
- Description: In this study we investigated a hybrid model based on the Discrete Gradient method and an evolutionary strategy for determining the weights in a feed forward artificial neural network. Also we discuss different variants for hybrid models using the Discrete Gradient method and an evolutionary strategy for determining the weights in a feed forward artificial neural network. The Discrete Gradient method has the advantage of being able to jump over many local minima and find very deep local minima. However, earlier research has shown that a good starting point for the discrete gradient method can improve the quality of the solution point. Evolutionary algorithms are best suited for global optimisation problems. Nevertheless they are cursed with longer training times and often unsuitable for real world application. For optimisation problems such as weight optimisation for ANNs in real world applications the dimensions are large and time complexity is critical. Hence the idea of a hybrid model can be a suitable option. In this study we propose different fusion strategies for hybrid models combining the evolutionary strategy with the discrete gradient method to obtain an optimal solution much quicker. Three different fusion strategies are discussed: a linear hybrid model, an iterative hybrid model and a restricted local search hybrid model. Comparative results on a range of standard datasets are provided for different fusion hybrid models.
- Description: C1
- Description: 2003001357
Global optimization in the summarization of text documents
- Authors: Alyguliev, R. M. , Bagirov, Adil
- Date: 2005
- Type: Text , Journal article
- Relation: Automatic Control and Computer Sciences Vol. 39, no. 6 (2005), p. 42-47
- Full Text: false
- Reviewed:
- Description: In order to ensure minimal redundancy in the summary of a document and the greatest possible coverage of its content a method for the construction of summaries (summarization) based on the clustering of sentences is proposed in the article. Clustering of sentences reduces to a determination of cluster centroids the mathematical realization of which relies on a problem of global optimization. A determination of the number of clusters is one of the complex problems in the clustering procedure. Therefore, an algorithm of stepwise determination of the number of clusters is also proposed in the present study. © 2006 by Allerton Press, Inc.
- Description: C1
Cyberattack triage using incremental clustering for intrusion detection systems
- Authors: Taheri, Sona , Bagirov, Adil , Gondal, Iqbal , Brown, Simon
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Information Security Vol. 19, no. 5 (2020), p. 597-607
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text:
- Reviewed:
- Description: Intrusion detection systems (IDSs) are devices or software applications that monitor networks or systems for malicious activities and signals alerts/alarms when such activity is discovered. However, an IDS may generate many false alerts which affect its accuracy. In this paper, we develop a cyberattack triage algorithm to detect these alerts (so-called outliers). The proposed algorithm is designed using the clustering, optimization and distance-based approaches. An optimization-based incremental clustering algorithm is proposed to find clusters of different types of cyberattacks. Using a special procedure, a set of clusters is divided into two subsets: normal and stable clusters. Then, outliers are found among stable clusters using an average distance between centroids of normal clusters. The proposed algorithm is evaluated using the well-known IDS data sets—Knowledge Discovery and Data mining Cup 1999 and UNSW-NB15—and compared with some other existing algorithms. Results show that the proposed algorithm has a high detection accuracy and its false negative rate is very low. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
- Description: This research was conducted in Internet Commerce Security Laboratory (ICSL) funded by Westpac Banking Corporation Australia. In addition, the research by Dr. Sona Taheri and A/Prof. Adil Bagirov was supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (DP190100580).
Missing value imputation via clusterwise linear regression
- Authors: Karmitsa, Napsu , Taheri, Sona , Bagirov, Adil , Makinen, Pauliina
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE transactions on knowledge and data engineering Vol. 34, no. 4 (2020), p. 1889-1901
- Full Text: false
- Reviewed:
- Description:
In this paper a new method of preprocessing incomplete data is introduced. The method is based on clusterwise linear regression and it combines two well-known approaches for missing value imputation: linear regression and clustering. The idea is to approximate missing values using only those data points that are somewhat similar to the incomplete data point. A similar idea is used also in clustering based imputation methods. Nevertheless, here the linear regression approach is used within each cluster to accurately predict the missing values, and this is done simultaneously to clustering. The proposed method is tested using some synthetic and real-world data sets and compared with other algorithms for missing value imputations. Numerical results demonstrate that the proposed method produces the most accurate imputations in MCAR and MAR data sets with a clear structure and the percentages of missing data no more than 25%
New gene selection algorithm using hypeboxes to improve performance of classifiers
- Authors: Bagirov, Adil , Mardaneh, Karim
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Bioinformatics Research and Applications Vol. 16, no. 3 (2020), p. 269-289
- Full Text: false
- Reviewed:
- Description: The use of DNA microarray technology allows to measure the expression levels of thousands of genes in one single experiment which makes possible to apply classification techniques to classify tumours. However, the large number of genes and relatively small number of tumours in gene expression datasets may (and in some cases significantly) diminish the accuracy of many classifiers. Therefore, efficient gene selection algorithms are required to identify most informative genes or groups of genes to improve the performance of classifiers. In this paper, a new gene selection algorithm is developed using marginal hyberboxes of genes or groups of genes for each tumour type. Informative genes are defined using overlaps between hyberboxes. The results on six gene expression datasets demonstrate that the proposed algorithm is able to considerably reduce the number of genes and significantly improve the performance of classifiers. © 2020 Inderscience Enterprises Ltd.