Grouping points by shared subspaces for effective subspace clustering
- Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
- Date: 2018
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 83, no. (2018), p. 230-244
- Full Text: false
- Reviewed:
- Description: Clusters may exist in different subspaces of a multidimensional dataset. Traditional full-space clustering algorithms have difficulty in identifying these clusters. Various subspace clustering algorithms have used different subspace search strategies. They require clustering to assess whether cluster(s) exist in a subspace. In addition, all of them perform clustering by measuring similarity between points in the given feature space. As a result, the subspace selection and clustering processes are tightly coupled. In this paper, we propose a new subspace clustering framework named CSSub (Clustering by Shared Subspaces). It enables neighbouring core points to be clustered based on the number of subspaces they share. It explicitly splits candidate subspace selection and clustering into two separate processes, enabling different types of cluster definitions to be employed easily. Through extensive experiments on synthetic and real-world datasets, we demonstrate that CSSub discovers non-redundant subspace clusters with arbitrary shapes in noisy data; and it significantly outperforms existing state-of-the-art subspace clustering algorithms.
Implementing an ANN model optimized by genetic algorithm for estimating cohesion of limestone samples
- Authors: Khandelwal, Manoj , Marto, Aminaton , Fatemi, Seyed , Ghoroqi, Mahyar , Armaghani, Danial , Singh, Trilok , Tabrizi, Omid
- Date: 2018
- Type: Text , Journal article
- Relation: Engineering with Computers Vol. 34, no. 2 (2018), p. 307-317
- Full Text: false
- Reviewed:
- Description: Shear strength parameters such as cohesion are the most significant rock parameters which can be utilized for initial design of some geotechnical engineering applications. In this study, evaluation and prediction of rock material cohesion is presented using different approaches i.e., simple and multiple regression, artificial neural network (ANN) and genetic algorithm (GA)-ANN. For this purpose, a database including three model inputs i.e., p-wave velocity, uniaxial compressive strength and Brazilian tensile strength and one output which is cohesion of limestone samples was prepared. A meaningful relationship was found for all of the model inputs with suitable performance capacity for prediction of rock cohesion. Additionally, a high level of accuracy (coefficient of determination, R2 of 0.925) was observed developing multiple regression equation. To obtain higher performance capacity, a series of ANN and GA-ANN models were built. As a result, hybrid GA-ANN network provides higher performance for prediction of rock cohesion compared to ANN technique. GA-ANN model results (R2 = 0.976 and 0.967 for train and test) were better compared to ANN model results (R2 = 0.949 and 0.948 for train and test). Therefore, this technique is introduced as a new one in estimating cohesion of limestone samples. © 2017, Springer-Verlag London Ltd., part of Springer Nature.
Isolation-based anomaly detection using nearest-neighbor ensembles
- Authors: Bandaragoda, Tharindu , Ting, Kaiming , Albrecht, David , Liu, Fei , Zhu, Ye , Wells, Jonathan
- Date: 2018
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 34, no. 4 (2018), p. 968-998
- Full Text: false
- Reviewed:
- Description: The first successful isolation-based anomaly detector, ie, iForest, uses trees as a means to perform isolation. Although it has been shown to have advantages over existing anomaly detectors, we have identified 4 weaknesses, ie, its inability to detect local anomalies, anomalies with a high percentage of irrelevant attributes, anomalies that are masked by axis-parallel clusters, and anomalies in multimodal data sets. To overcome these weaknesses, this paper shows that an alternative isolation mechanism is required and thus presents iNNE or isolation using Nearest Neighbor Ensemble. Although relying on nearest neighbors, iNNE runs significantly faster than the existing nearest neighbor–based methods such as the local outlier factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. © 2018 Wiley Periodicals, Inc.
Local contrast as an effective means to robust clustering against varying densities
- Authors: Chen, Bo , Ting, Kaiming , Washio, Takashi , Zhu, Ye
- Date: 2018
- Type: Text , Journal article
- Relation: Machine Learning Vol. 107, no. 8-10 (2018), p. 1621-1645
- Full Text:
- Reviewed:
- Description: Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP. © 2018, The Author(s).
A comparison of bidding strategies for online auctions using fuzzy reasoning and negotiation decision functions
- Authors: Kaur, Preetinder , Goyal, Madhu , Lu, Jie
- Date: 2017
- Type: Text , Journal article
- Relation: IEEE Transactions on Fuzzy Systems Vol. 25, no. 2 (2017), p. 425-438
- Full Text:
- Reviewed:
- Description: Bidders often feel challenged when looking for the best bidding strategies to excel in the competitive environment of multiple and simultaneous online auctions for same or similar items. Bidders face complicated issues for deciding which auction to participate in, whether to bid early or late, and how much to bid. In this paper, we present the design of bidding strategies, which aim to forecast the bid amounts for buyers at a particular moment in time based on their bidding behavior and their valuation of an auctioned item. The agent develops a comprehensive methodology for final price estimation, which designs bidding strategies to address buyers' different bidding behaviors using two approaches: Mamdani method with regression analysis and negotiation decision functions. The experimental results show that the agents who follow fuzzy reasoning with a regression approach outperform other existing agents in most settings in terms of their success rate and expected utility.
A count data model for heart rate variability forecasting and premature ventricular contraction detection
- Authors: Allami, Ragheed , Stranieri, Andrew , Balasubramanian, Venki , Jelinek, Herbert
- Date: 2017
- Type: Text , Journal article
- Relation: Signal Image and Video Processing Vol. 11, no. 8 (2017), p. 1427-1435
- Full Text:
- Reviewed:
- Description: Heart rate variability (HRV) measures including the standard deviation of inter-beat variations (SDNN) require at least 5 min of ECG recordings to accurately measure HRV. In this paper, we predict, using counts data derived from a 3-min ECG recording, the 5-min SDNN and also detect premature ventricular contraction (PVC) beats with a high degree of accuracy. The approach uses counts data combined with a Poisson-generated function that requires minimal computational resources and is well suited to remote patient monitoring with wearable sensors that have limited power, storage and processing capacity. The ease of use and accuracy of the algorithm provide opportunity for accurate assessment of HRV and reduce the time taken to review patients in real time. The PVC beat detection is implemented using the same count data model together with knowledge-based rules derived from clinical knowledge.
A logical approach to experience-based reasoning
- Authors: Sun, Zhaohao
- Date: 2017
- Type: Text , Journal article , Review
- Relation: New Mathematics and Natural Computation Vol. 13, no. 1 (2017), p. 21-40
- Full Text:
- Reviewed:
- Description: Experience-based reasoning (EBR) is a paradigm used in almost every human activity as a part of human reasoning. However, EBR has not been seriously studied from a logical viewpoint. This paper will attempt to fill this gap by providing a unified logical approach to EBR. More specifically, this paper first examines EBR and inference rules. Then it proposes eight different rules of inference for EBR, which cover all possible EBRs from a logical viewpoint. These eight different rules of inference constitute the fundamentals for all EBR paradigms, and therefore will be the theoretical foundation for EBR. The proposed approach will facilitate research and development of EBR, human reasoning, and common sense reasoning. © 2017 World Scientific Publishing Company.
A mathematical foundation of big data
- Authors: Sun, Zhaohao , Wang, Paul
- Date: 2017
- Type: Text , Journal article
- Relation: New Mathematics and Natural Computation Vol. 13, no. 2 (2017), p. 83-99
- Full Text: false
- Reviewed:
- Description: The recent research evolution on big data has brought exciting aspiration to mathematicians, computer scientists and business professionals alike. However, the lack of a sound mathematical foundation presents itself as a real challenge amidst the swarm of big data marketing activities. This paper intends to propose a possible mathematical theory as a foundation for big data research. Specifically, we propose the concept of the adjective "big" as a mathematical operator, furthermore, the concept of so-called "big" logically and naturally fits the concept of being "linguistics variable" as per fuzzy logic research community for decades. The consequence of adopting such a mathematical modeling can be profoundly considered as an abstraction of the technologies, systems, tools for data management and processing that transforms data into big data. In addition, the concept of infinity of the big data is based on the theory of calculus and the set theory. Furthermore, the concept of relativity of the big data, as we find out, is based on the operations of the fuzzy subsets theory. The proposed approach in this paper, we hope, can facilitate and open up more opportunities for big data research and developments on big data analytics, business analytics, big data intelligence, big data computing as well as big data science. © 2017 World Scientific Publishing Company.
Data-dependent dissimilarity measure : An effective alternative to geometric distance measures
- Authors: Aryal, Sunil , Ting, Kaiming , Washio, Takashi , Haffari, Gholamreza
- Date: 2017
- Type: Text , Journal article
- Relation: Knowledge and Information Systems Vol. 53, no. 2 (2017), p. 479-506
- Full Text: false
- Reviewed:
- Description: Nearest neighbor search is a core process in many data mining algorithms. Finding reliable closest matches of a test instance is still a challenging task as the effectiveness of many general-purpose distance measures such as ℓp -norm decreases as the number of dimensions increases. Their performances vary significantly in different data distributions. This is mainly because they compute the distance between two instances solely based on their geometric positions in the feature space, and data distribution has no influence on the distance measure. This paper presents a simple data-dependent general-purpose dissimilarity measure called ‘ mp -dissimilarity’. Rather than relying on geometric distance, it measures the dissimilarity between two instances as a probability mass in a region that encloses the two instances in every dimension. It deems two instances in a sparse region to be more similar than two instances of equal inter-point geometric distance in a dense region. Our empirical results in k-NN classification and content-based multimedia information retrieval tasks show that the proposed mp -dissimilarity measure produces better task-specific performance than existing widely used general-purpose distance measures such as ℓp -norm and cosine distance across a wide range of moderate- to high-dimensional data sets with continuous only, discrete only, and mixed attributes.
Defying the gravity of learning curve : A characteristic of nearest neighbour anomaly detectors
- Authors: Ting, Kaiming , Washio, Takashi , Wells, Jonathan , Aryal, Sunil
- Date: 2017
- Type: Text , Journal article
- Relation: Machine Learning Vol. 106, no. 1 (2017), p. 55-91
- Full Text: false
- Reviewed:
- Description: Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as ‘more data the better’. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms.
Optimization based clustering algorithms for authorship analysis of phishing emails
- Authors: Seifollahi, Sattar , Bagirov, Adil , Layton, Robert , Gondal, Iqbal
- Date: 2017
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 46, no. 2 (2017), p. 411-425
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Phishing has given attackers power to masquerade as legitimate users of organizations, such as banks, to scam money and private information from victims. Phishing is so widespread that combating the phishing attacks could overwhelm the victim organization. It is important to group the phishing attacks to formulate effective defence mechanism. In this paper, we use clustering methods to analyze and characterize phishing emails and perform their relative attribution. Emails are first tokenized to a bag-of-word space and, then, transformed to a numeric vector space using frequencies of words in documents. Wordnet vocabulary is used to take effects of similar words into account and to reduce sparsity. The word similarity measure is combined with the term frequencies to introduce a novel text transformation into numeric features. To improve the accuracy, we apply inverse document frequency weighting, which gives higher weights to features used by fewer authors. The k-means and recently introduced three optimization based algorithms: MS-MGKM, INCA and DCClust are applied for clustering purposes. The optimization based algorithms indicate the existence of well separated clusters in the phishing emails dataset. © 2017, Springer Science+Business Media New York.
A data mining approach for machine fault diagnosis based on associated frequency patterns
- Authors: Rashid, Md. Mamunur , Amar, Muhammad , Gondal, Iqbal , Kamruzzaman, Joarder
- Date: 2016
- Type: Text , Journal article
- Relation: Applied Intelligence Vol. 45, no. 3 (2016), p. 638-651
- Full Text: false
- Reviewed:
- Description: Bearings play a crucial role in rotational machines and their failure is one of the foremost causes of breakdowns in rotary machinery. Their functionality is directly relevant to the operational performance, service life and efficiency of these machines. Therefore, bearing fault identification is very significant. The accuracy of fault or anomaly detection by the current techniques is not adequate. We propose a data mining-based framework for fault identification and anomaly detection from machine vibration data. In this framework, to capture the useful knowledge from the vibration data stream (VDS), we first pre-process the data using Fast Fourier Transform (FFT) to extract the frequency signature and then build a compact tree called SAFP-tree (sliding window associated frequency pattern tree), and propose a mining algorithm called SAFP. Our SAFP algorithm can mine associated frequency patterns (i.e., fault frequency signatures) in the current window of VDS and use them to identify faults in the bearing data. Finally, SAFP is further enhanced to SAFP-AD for anomaly detection by determining the normal behavior measure (NBM) from the extracted frequency patterns. The results show that our technique is very efficient in identifying faults and detecting anomalies over VDS and can be used for remote machine health diagnosis. © 2016, Springer Science+Business Media New York.
A generic ensemble approach to estimate multidimensional likelihood in Bayesian classifier learning
- Authors: Aryal, Sunil , Ting, Kaiming
- Date: 2016
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 32, no. 3 (2016), p. 458-479
- Full Text: false
- Reviewed:
- Description: In Bayesian classifier learning, estimating the joint probability distribution (,) or the likelihood (|) directly from training data is considered to be difficult, especially in large multidimensional data sets. To circumvent this difficulty, existing Bayesian classifiers such as Naive Bayes, BayesNet, and ADE have focused on estimating simplified surrogates of (,) from different forms of one‐dimensional likelihoods. Contrary to the perceived difficulty in multidimensional likelihood estimation, we present a simple generic ensemble approach to estimate multidimensional likelihood directly from data. The idea is to aggregate (|) estimated from a random subsample of data . This article presents two ways to estimate multidimensional likelihoods using the proposed generic approach and introduces two new Bayesian classifiers called and that estimate (|) using a nearest‐neighbor density estimation and a probability estimation through feature space partitioning, respectively. Unlike the existing Bayesian classifiers, ENNBayes and MassBayes have constant training time and space complexities and they scale better than existing Bayesian classifiers in very large data sets. Our empirical evaluation shows that ENNBayes and MassBayes yield better predictive accuracy than the existing Bayesian classifiers in benchmark data sets.
A novel depth edge prioritization based coding technique to boost-UP HEVC performance
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2016
- Type: Text , Conference paper
- Relation: 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)
- Full Text: false
- Reviewed:
- Description: In addition to the texture, multiview video employs the utilization of depth coding for the reconstruction of 3D video and Free viewpoint video. Standing on some texture-depth correlations, a number of methods in literature reuses texture motion vector for the corresponding depth coding to reduce encoding time by avoiding costly motion estimation process. However, texture similarity metric is not always equivalent to the corresponding depth similarity metric especially at edge levels. Since their approaches could not explicitly detect and encode acute edge motions of depth objects, eventually, could not reach the similar or improved rate-distortion (RD) performance against the High Efficiency Video Coding (HEVC) reference test model (HM). With a view to more accurate motion detection and modeling, the proposed technique exploits an extra Pattern Mode comprising a group of pattern templates (GPTs) with different rectangular and non-rectangular object shapes and edges compared to the existing HEVC block partitioning modes. Moreover, the proposed Pattern Mode only encodes the motion areas and skips the background areas. The experimental results show that the proposed technique could save 30% encoding time and improve average 0.1dB Bjontegard Delta peak signal-to-noise ratio (BD-PSNR) compared to the HM.
An algorithm for clustering using L1-norm based on hyperbolic smoothing technique
- Authors: Bagirov, Adil , Mohebi, Ehsan
- Date: 2016
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 32, no. 3 (2016), p. 439-457
- Relation: http://purl.org/au-research/grants/arc/DP140103213
- Full Text: false
- Reviewed:
- Description: Cluster analysis deals with the problem of organization of a collection of objects into clusters based on a similarity measure, which can be defined using various distance functions. The use of different similarity measures allows one to find different cluster structures in a data set. In this article, an algorithm is developed to solve clustering problems where the similarity measure is defined using the L1-norm. The algorithm is designed using the nonsmooth optimization approach to the clustering problem. Smoothing techniques are applied to smooth both the clustering function and the L1-norm. The algorithm computes clusters sequentially and finds global or near global solutions to the clustering problem. Results of numerical experiments using 12 real-world data sets are reported, and the proposed algorithm is compared with two other clustering algorithms. ©2015 Wiley Periodicals, Inc.
Commentary : A decomposition of the outlier detection problem into a set of supervised learning problems
- Authors: Zhu, Ye , Ting, Kaiming
- Date: 2016
- Type: Text , Journal article
- Relation: Machine Learning Vol. 105, no. 2 (2016), p. 301-304
- Full Text: false
- Reviewed:
- Description: This article discusses the material in relation to iForest (Liu et al. in ACM Trans Knowl Discov Data 6(1):3, 2012) reported in a recent Machine Learning Journal paper by Paulheim and Meusel (Mach Learn 100(2–3):509–531, 2015). It presents an empirical comparison result of iForest using the default parameter settings suggested by its creator (Liu et al. 2012) and iForest using the settings employed by Paulheim and Meusel (2015). This comparison has an impact on the conclusion made by Paulheim and Meusel (2015). © 2016, The Author(s).
Constrained self organizing maps for data clusters visualization
- Authors: Mohebi, Ehsan , Bagirov, Adil
- Date: 2016
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 43, no. 3 (2016), p. 849-869
- Full Text: false
- Reviewed:
- Description: High dimensional data visualization is one of the main tasks in the field of data mining and pattern recognition. The self organizing maps (SOM) is one of the topology visualizing tool that contains a set of neurons that gradually adapt to input data space by competitive learning and form clusters. The topology preservation of the SOM strongly depends on the learning process. Due to this limitation one cannot guarantee the convergence of the SOM in data sets with clusters of arbitrary shape. In this paper, we introduce Constrained SOM (CSOM), the new version of the SOM by modifying the learning algorithm. The idea is to introduce an adaptive constraint parameter to the learning process to improve the topology preservation and mapping quality of the basic SOM. The computational complexity of the CSOM is less than those with the SOM. The proposed algorithm is compared with similar topology preservation algorithms and the numerical results on eight small to large real-world data sets demonstrate the efficiency of the proposed algorithm. © 2015, Springer Science+Business Media New York.
Density-ratio based clustering for discovering clusters with varying densities
- Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
- Date: 2016
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 60, no. (2016), p. 983-997
- Full Text: false
- Reviewed:
- Description: Density-based clustering algorithms are able to identify clusters of arbitrary shapes and sizes in a dataset which contains noise. It is well-known that most of these algorithms, which use a global density threshold, have difficulty identifying all clusters in a dataset having clusters of greatly varying densities. This paper identifies and analyses the condition under which density-based clustering algorithms fail in this scenario. It proposes a density-ratio based method to overcome this weakness, and reveals that it can be implemented in two approaches. One approach is to modify a density-based clustering algorithm to do density-ratio based clustering by using its density estimator to compute density-ratio. The other approach involves rescaling the given dataset only. An existing density-based clustering algorithm, which is applied to the rescaled dataset, can find all clusters with varying densities that would otherwise impossible had the same algorithm been applied to the unscaled dataset. We provide an empirical evaluation using DBSCAN, OPTICS and SNN to show the effectiveness of these two approaches. © 2016 Elsevier Ltd
Enhancing SIFT-based image registration performance by building and selecting highly discriminating descriptors
- Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 84, no. (2016), p. 156-162
- Full Text: false
- Reviewed:
- Description: In this paper we will investigate the gradient utilization in building SIFT (Scale Invariant Feature Transform)-like descriptors for image registration. There are generally two types of gradient information, i.e. gradient magnitude and gradient occurrence, which can be used for building SIFT-like descriptors. We will provide a theoretical analysis on the effectiveness of each of the two types of gradient information when used individually. Based on our analysis, we will propose a novel technique which systematically uses both types of gradient information together for image registration. Moreover, we will propose a strategy to select keypoint matches with a higher discrimination. The proposed technique can be used for both mono-modal and multi-modal image registration. Our experimental results show that the proposed technique improves registration accuracy over existing SIFT-like descriptors. © 2016 Elsevier B.V.
Exploring the existence and potential underpinnings of dog-human and horse-human attachment bonds
- Authors: Payne, Elyssa , DeAraugo, Jodi , Bennett, Pauleen , McGreevy, Paul
- Date: 2016
- Type: Text , Journal article , Review
- Relation: Behavioural Processes Vol. 125, no. (2016), p. 114-121
- Full Text: false
- Reviewed:
- Description: This article reviews evidence for the existence of attachment bonds directed toward humans in dog-human and horse-human dyads. It explores each species' alignment with the four features of a typical attachment bond: separation-related distress, safe haven, secure base and proximity seeking. While dog-human dyads show evidence of each of these, there is limited alignment for horse-human dyads. These differences are discussed in the light of the different selection paths of domestic dogs and horses as well as the different contexts in which the two species interact with humans. The role of emotional intelligence in humans as a potential mediator for human-animal relationships, attachment or otherwise, is also examined. Finally, future studies, which may clarify the interplay between attachment, human-animal relationships and emotional intelligence, are proposed. Such avenues of research may help us explore the concepts of trust and bonding that are often said to occur at the dog-human and horse-human interface. © 2015.