A CAD system using clustering and novel feature extraction technique
- Authors: Ghosh, Ranadhir , Ghosh, Moumita , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at CISTM 2005, Gurgaon, India : 24th - 26th July, 2005
- Full Text: false
- Reviewed:
- Description: Many previous efforts have utilized many different approaches for recognition in breast cancer detection using various ANN classifier-modelling techniques. Most of the previous work was concentred mostly on the classification of the damaged areas with the help of doctor’s suggestion. Doctors use to mark the suspicious areas area in the mammogram and the classifier only extract those marked areas and tries to classify it. An intelligent automatic diagnosis system can be very helpful for radiologist in diagnosing Breast cancer. In this research we are applying a local search gradient free clustering algorithm to find out the suspicious / damaged area. We compare our results with the doctor’s marking. Also it has been observed that, beyond a certain point, the inclusion of additional features leads to a worse rather than better performance. Moreover, the choice of features to represent the patterns affects several aspects of pattern recognition problems such as accuracy, required learning time and a necessary number of samples. A common problem with the multi-category feature classification is the conflict between the categories. None of the feasible solutions allow simultaneous optimal solution for all categories. In order to find an optimal solution the search space can be divided based on an individual category in each sub region and finally merging them through decision spport system. Combining the feature selection with the classifier has been a major challenge for the researchers. A similar technique employed in both the levels often worsens their performance. Some preliminary studies has revealed that while using traditional canonical GA has been a good choice for feature selection modules, however under perform for the classifier level module. An evolutionary based algorithm for the classifier level provides a much better solution for this purpose. In this paper we propose a hybrid canonical based feature extraction technique with a combination of evolutionary algorithm based classifier using a feed forward MLP model.
- Description: E1
- Description: 2003001369
A framework for clustering and dynamic maintenance of xml documents
- Authors: Al-Shammari, Ahmed , Liu, Chengfei , Naseriparsa, Mehdi , Vo, Bao , Anwar, Tarique , Zhou, Rrui
- Date: 2017
- Type: Text , Conference paper
- Relation: 13th International Conference on Advanced Data Mining and Applications, ADMA 2017 Vol. 10604 LNAI, p. 399-412
- Full Text: false
- Reviewed:
- Description: Web data clustering has been widely studied in the data mining communities. However, dynamic maintenance of the web data clusters is still a challenging task. In this paper, we propose a novel framework called XClusterMaint which serves for both clustering and maintenance of the XML documents. For clustering, we take both structure and content into account and propose an efficient solution for grouping the documents based on the combination of structure and content similarity. For maintenance, we propose an incremental approach for maintaining the existing clusters dynamically when we receive new incoming XML documents. Since the dynamic maintenance of the clusters is computationally expensive, we also propose an improved approach which uses a lazy maintenance scheme to improve the performance of the clusters maintenance. The experimental results on real datasets verify the efficiency of the proposed clustering and maintenance model. © Springer International Publishing AG 2017.
A general stochastic clustering method for automatic cluster discovery
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 44, no. 10-11 (2011), p. 2786-2799
- Full Text: false
- Reviewed:
- Description: Finding clusters in data is a challenging problem. Given a dataset, we usually do not know the number of natural clusters hidden in the dataset. The problem is exacerbated when there is little or no additional information except the data itself. This paper proposes a general stochastic clustering method that is a simplification of nature-inspired ant-based clustering approach. It begins with a basic solution and then performs stochastic search to incrementally improve the solution until the underlying clusters emerge, resulting in automatic cluster discovery in datasets. This method differs from several recent methods in that it does not require users to input the number of clusters and it makes no explicit assumption about the underlying distribution of a dataset. Our experimental results show that the proposed method performs better than several existing methods in terms of clustering accuracy and efficiency in majority of the datasets used in this study. Our theoretical analysis shows that the proposed method has linear time and space complexities, and our empirical study shows that it can accurately and efficiently discover clusters in large datasets in which many existing methods fail to run.
A hybrid clustering algorithm using two level of abstraction
- Authors: Ghosh, Ranadhir , Mammadov, Musa , Ghosh, Moumita , Yearwood, John
- Date: 2005
- Type: Text , Conference paper
- Relation: Paper presented at Fuzzy Logic, Soft Computing, and Computational Intelligence, 11th International Fuzzy Systems Association World Congress, Beijing, China : 28th - 31st July, 2005
- Full Text: false
- Reviewed:
- Description: E1
- Description: 2003001360
A new modification of Kohonen neural network for VQ and clustering problems
- Authors: Mohebi, Ehsan , Bagirov, Adil
- Date: 2013
- Type: Text , Conference paper
- Relation: Proceedings of the 11-th Australasian Data Mining Conference (AusDM'13) Vol. 146, p. 81-88
- Full Text: false
- Reviewed:
- Description: Vector Quantization (VQ) and Clustering are significantly important in the field of data mining and pattern recognition. The Self Organizing Maps has been widely used for clustering and topology visualization. The topology of the SOM and its initial neurons play an important role in the convergence of the Kohonen neural network. In this paper, in order to improve the convergence of the SOM we introduce an algorithm based on the split and merging of clusters to initialize neurons. We also introduce a topology based on this initialization to optimize the vector quantization error. Such an approach allows one to find global or near global solution to the vector quantization and consequently clustering problem. The numerical results on 4 small to large real-world data sets are reported to demonstrate the performance of the proposed algorithm.
A new modified global k-means algorithm for clustering large data sets
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at XIIIth International Conference : Applied Stochastic Models and Data Analysis, ASMDA 2009, Vilnius, Lithuania : 30th June - 3rd July 2009 p. 1-5
- Full Text: false
- Description: The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and inefficient for solving clustering problems in large data sets. Recently, in order to resolve difficulties with the choice of starting points, incremental approaches have been developed. The modified global k-means algorithm is based on such an approach. It iteratively adds one cluster center at a time. Numerical experiments show that this algorithm considerably improve the k-means algorithm. However, this algorithm is not suitable for clustering very large data sets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points spanning different parts of the data set. We exploit information gathered in previous iterations of the incremental algorithm to reduce its complexity.
- Description: 2003007558
A new optimal power flow approach for wind energy integrated power systems
- Authors: Rahmani, Shima , Amjady, Nima
- Date: 2017
- Type: Text , Journal article
- Relation: Energy Vol. 134, no. (2017), p. 349-359
- Full Text: false
- Reviewed:
- Description: Penetration of wind generation into power systems in recent years has greatly affected optimal power flow (OPF) because of the uncertain behavior of this new energy resource. In this research work, at first, a novel scenario generation approach is proposed to model wind power (WP) uncertainty. The proposed scenario generation approach includes construction of probability density function (PDF) pertaining to WP forecast error, segmentation of the PDF by an efficient clustering approach to obtain both the optimal number and the optimal arrangement of the clusters, and the generation of WP scenarios using the optimized clusters through roulette wheel mechanism. Secondly, this paper presents a new OPF framework based on DC network modeling for wind generation integrated power systems. Thirdly, a new out-of-sample analysis is presented to evaluate the long-run performance of the proposed OPF approach encountering various realizations of uncertain WPs. Finally, the performance of the proposed method for solving WP-integrated OPF problem is extensively illustrated on the IEEE 30-bus and the IEEE 118-bus test systems and compared with the performance of the deterministic method and the Weibull PDF method. These comparisons illustrate better performance of the proposed method, while it has reasonable computation times. •A new scenario generation approach is presented.•A new wind power integrated optimal power model is proposed.•A new out-of-sample analysis is presented.•The effectiveness of the proposed model is extensively illustrated.
A new perceptual dissimilarity measure for image retrieval and clustering
- Authors: Shojanazeri, Hamid
- Date: 2018
- Type: Text , Thesis , PhD
- Full Text:
- Description: Image retrieval and clustering are two important tools for analysing and organising images. Dissimilarity measure is central to both image retrieval and clustering. The performance of image retrieval and clustering algorithms depends on the effectiveness of the dissimilarity measure. ‘Minkowski’ distance, or more specifically, ‘Euclidean’ distance, is the most widely used dissimilarity measure in image retrieval and clustering. Euclidean distance depends only on the geometric position of two data instances in the feature space and completely ignores the data distribution. However, data distribution has an effect on human perception. The argument that two data instances in a dense area are more perceptually dissimilar than the same two instances in a sparser area, is proposed by psychologists. Based on this idea, a dissimilarity measure called, ‘mp’, has been proposed to address Euclidean distance’s limitation of ignoring the data distribution. Here, mp relies on data distribution to calculate the dissimilarity between two instances. As prescribed in mp, higher data mass between two data instances implies higher dissimilarity, and vice versa. mp relies only on data distribution and completely ignores the geometric distance in its calculations. In the aggregation of dissimilarities between two instances over all the dimensions in feature space, both Euclidean distance and mp give same priority to all the dimensions. This may result in a situation that the final dissimilarity between two data instances is determined by a few dimensions of feature vectors with relatively much higher values. As a result, the dissimilarity derived may not align well with human perception. The need to address the limitations of Minkowski distance measures, along with the importance of a dissimilarity measure that considers both geometric distance and the perceptual effect of data distribution in measuring dissimilarity between images motivated this thesis. It studies the performance of mp for image retrieval. It investigates a new dissimilarity measure that combines both Euclidean distance and data distribution. In addition to these, it studies the performance of such a dissimilarity measure for image retrieval and clustering. Our performance study of mp for image retrieval shows that relying only on data distribution to measure the dissimilarity results in some situations, where the mp’s measurement is contrary to human perception. This thesis introduces a new dissimilarity measure called, perceptual dissimilarity measure (PDM). PDM considers the perceptual effect of data distribution in combination with Euclidean distance. PDM has two variants, PDM1 and PDM2. PDM1 focuses on improving mp by weighting it using Euclidean distance in situations where mp may not retrieve accurate results. PDM2 considers the effect of data distribution on the perceived dissimilarity measured by Euclidean distance. PDM2 proposes a weighting system for Euclidean distance using a logarithmic transform of data mass. The proposed PDM variants have been used as alternatives to Euclidean distance and mp to improve the accuracy in image retrieval. Our results show that PDM2 has consistently performed the best, compared to Euclidean distance, mp and PDM1. PDM1’s performance was not consistent, although it has performed better than mp in all the experiments, but it could not outperform Euclidean distance in some cases. Following the promising results of PDM2 in image retrieval, we have studied its performance for image clustering. k-means is the most widely used clustering algorithm in scientific and industrial applications. k-medoids is the closest clustering algorithm to k-means. Unlike k-means which works only with Euclidean distance, k-medoids gives the option to choose the arbitrary dissimilarity measure. We have used Euclidean distance, mp and PDM2 as the dissimilarity measure in k-medoids and compared the results with k-means. Our clustering results show that PDM2 has perfromed overally the best. This confirms our retrieval results and identifies PDM2 as a suitable dissimilarity measure for image retrieval and clustering.
- Description: Doctor of Philosophy
A new solar power prediction method based on feature clustering and hybrid-classification-regression forecasting
- Authors: Nejati, Maryam , Amjady, Nima
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE transactions on sustainable energy Vol. 13, no. 2 (2022), p. 1188-1198
- Full Text: false
- Reviewed:
- Description: Solar generation systems are globally extending in terms of scale and number, which highlights the increasing importance of solar power forecast. In this paper, a day-ahead solar power prediction method is proposed including 1) a novel feature selecting/clustering approach based on relevancy and redundancy criteria and 2) an innovative hybrid-classification-regression forecasting engine. The proposed feature selecting/clustering approach filters out irrelevant features and partitions relevant features to two separate subsets to decrease the redundancy of features. Each of these two subsets is separately trained by one forecasting engine and the final solar power prediction of the proposed method is obtained by a relevancy-based combination of these two forecasts. The proposed forecasting engine classifies the historical data based on the learnability of its constituent regression models and assigns each class of training samples to one regression model. Each regression model predicts the outputs of the test samples that belong to its class. The effectiveness of the proposed solar power prediction method is illustrated by testing on two real-world solar farms.
An application of novel clustering technique for information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2011
- Type: Text , Conference paper
- Relation: Applications and Techniques in Information Security Workshop p. 5-11
- Full Text: false
- Reviewed:
- Description: This article presents experimental results devoted to a new application of the novel clustering technique introduced by the authors recently. Our aim is to facilitate the application of robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on the particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, we use a consensus function to combine these independent clusterings into one consensus clustering . Feature ranking is used to select a subset of features for the consensus function. Third, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of three consensus functions, Cluster-Based Graph Formulation (CBGF), Hybrid Bipartite Graph Formulation (HBGF), and Instance-Based Graph Formulation (IBGF) and a variety of supervised classification algorithms. The best precision and recall have been obtained by the combination of the HBGF consensus function and the SMO classifier with the polynomial kernel.
- Description: 2003009195
An experiment in task decomposition and ensembling for a modular artificial neural network
- Authors: Ferguson, Brent , Ghosh, Ranadhir , Yearwood, John
- Date: 2004
- Type: Text , Conference paper
- Relation: Paper presented at Innovations in Applied Artificial Intelligence: 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, Ottawa, Canada : 17th May, 2004
- Full Text:
- Reviewed:
- Description: Modular neural networks have the possibility of overcoming common scalability and interference problems experienced by fully connected neural networks when applied to large databases. In this paper we trial an approach to constructing modular ANN's for a very large problem from CEDAR for the classification of handwritten characters. In our approach, we apply progressive task decomposition methods based upon clustering and regression techniques to find modules. We then test methods for combining the modules into ensembles and compare their structural characteristics and classification performance with that of an ANN having a fully connected topology. The results reveal improvements to classification rates as well as network topologies for this problem.
- Description: E1
- Description: 2003000852
An overview of geospatial methods used in unintentional injury epidemiology
- Authors: Singh, Himalaya , Fortington, Lauren , Thompson, Helen , Finch, Caroline
- Date: 2016
- Type: Text , Journal article
- Relation: Injury Epidemiology Vol. 3, no. 32 (2016), p. 1-12
- Relation: http://purl.org/au-research/grants/nhmrc/1058737
- Full Text:
- Reviewed:
- Description: BACKGROUND: Injuries are a leading cause of death and disability around the world. Injury incidence is often associated with socio-economic and physical environmental factors. The application of geospatial methods has been recognised as important to gain greater understanding of the complex nature of injury and the associated diverse range of geographically-diverse risk factors. Therefore, the aim of this paper is to provide an overview of geospatial methods applied in unintentional injury epidemiological studies. METHODS: Nine electronic databases were searched for papers published in 2000-2015, inclusive. Included were papers reporting unintentional injuries using geospatial methods for one or more categories of spatial epidemiological methods (mapping; clustering/cluster detection; and ecological analysis). Results describe the included injury cause categories, types of data and details relating to the applied geospatial methods. RESULTS: From over 6,000 articles, 67 studies met all inclusion criteria. The major categories of injury data reported with geospatial methods were road traffic (n = 36), falls (n = 11), burns (n = 9), drowning (n = 4), and others (n = 7). Grouped by categories, mapping was the most frequently used method, with 62 (93%) studies applying this approach independently or in conjunction with other geospatial methods. Clustering/cluster detection methods were less common, applied in 27 (40%) studies. Three studies (4%) applied spatial regression methods (one study using a conditional autoregressive model and two studies using geographically weighted regression) to examine the relationship between injury incidence (drowning, road deaths) with aggregated data in relation to explanatory factors (socio-economic and environmental). CONCLUSION: The number of studies using geospatial methods to investigate unintentional injuries has increased over recent years. While the majority of studies have focused on road traffic injuries, other injury cause categories, particularly falls and burns, have also demonstrated the application of these methods. Geospatial investigations of injury have largely been limited to mapping of data to visualise spatial structures. Use of more sophisticated approaches will help to understand a broader range of spatial risk factors, which remain under-explored when using traditional epidemiological approaches.
Application of rank correlation, clustering and classification in information security
- Authors: Beliakov, Gleb , Yearwood, John , Kelarev, Andrei
- Date: 2012
- Type: Text , Journal article
- Relation: Journal of Networks Vol. 7, no. 6 (2012), p. 935-945
- Full Text:
- Reviewed:
- Description: This article is devoted to experimental investigation of a novel application of a clustering technique introduced by the authors recently in order to use robust and stable consensus functions in information security, where it is often necessary to process large data sets and monitor outcomes in real time, as it is required, for example, for intrusion detection. Here we concentrate on a particular case of application to profiling of phishing websites. First, we apply several independent clustering algorithms to a randomized sample of data to obtain independent initial clusterings. Silhouette index is used to determine the number of clusters. Second, rank correlation is used to select a subset of features for dimensionality reduction. We investigate the effectiveness of the Pearson Linear Correlation Coefficient, the Spearman Rank Correlation Coefficient and the Goodman-Kruskal Correlation Coefficient in this application. Third, we use a consensus function to combine independent initial clusterings into one consensus clustering. Fourth, we train fast supervised classification algorithms on the resulting consensus clustering in order to enable them to process the whole large data set as well as new data. The precision and recall of classifiers at the final stage of this scheme are critical for effectiveness of the whole procedure. We investigated various combinations of several correlation coefficients, consensus functions, and a variety of supervised classification algorithms. © 2012 Academy Publisher.
- Description: 2003010277
Applications of functional data analysis : A systematic review
- Authors: Ullah, Shahid , Finch, Caroline
- Date: 2013
- Type: Text , Journal article
- Relation: BMC Medical Research Methodology Vol. 13, no. 43 (2013), p.1-12
- Relation: http://purl.org/au-research/grants/nhmrc/565900
- Full Text:
- Reviewed:
- Description: Background Functional data analysis (FDA) is increasingly being used to better analyze, model and predict time series data. Key aspects of FDA include the choice of smoothing technique, data reduction, adjustment for clustering, functional linear modeling and forecasting methods. Methods A systematic review using 11 electronic databases was conducted to identify FDA application studies published in the peer-review literature during 1995–2010. Papers reporting methodological considerations only were excluded, as were non-English articles. Results In total, 84 FDA application articles were identified; 75.0% of the reviewed articles have been published since 2005. Application of FDA has appeared in a large number of publications across various fields of sciences; the majority is related to biomedicine applications (21.4%). Overall, 72 studies (85.7%) provided information about the type of smoothing techniques used, with B-spline smoothing (29.8%) being the most popular. Functional principal component analysis (FPCA) for extracting information from functional data was reported in 51 (60.7%) studies. One-quarter (25.0%) of the published studies used functional linear models to describe relationships between explanatory and outcome variables and only 8.3% used FDA for forecasting time series data. Conclusions Despite its clear benefits for analyzing time series data, full appreciation of the key features and value of FDA have been limited to date, though the applications show its relevance to many public health and biomedical problems. Wider application of FDA to all studies involving correlated measurements should allow better modeling of, and predictions from, such data in the future especially as FDA makes no a priori age and time effects assumptions.
Constrained self organizing maps for data clusters visualization
- Authors: Mohebi, Ehsan , Bagirov, Adil
- Date: 2016
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 43, no. 3 (2016), p. 849-869
- Full Text: false
- Reviewed:
- Description: High dimensional data visualization is one of the main tasks in the field of data mining and pattern recognition. The self organizing maps (SOM) is one of the topology visualizing tool that contains a set of neurons that gradually adapt to input data space by competitive learning and form clusters. The topology preservation of the SOM strongly depends on the learning process. Due to this limitation one cannot guarantee the convergence of the SOM in data sets with clusters of arbitrary shape. In this paper, we introduce Constrained SOM (CSOM), the new version of the SOM by modifying the learning algorithm. The idea is to introduce an adaptive constraint parameter to the learning process to improve the topology preservation and mapping quality of the basic SOM. The computational complexity of the CSOM is less than those with the SOM. The proposed algorithm is compared with similar topology preservation algorithms and the numerical results on eight small to large real-world data sets demonstrate the efficiency of the proposed algorithm. © 2015, Springer Science+Business Media New York.
Data sharing in secure multimedia wireless sensor networks
- Authors: Usman, Muhammad , Jan, Mian Ahmad , Xiangjian, He , Nanda, Priyadarsi
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 IEEE Trustcom/BigDataSE/ISPA;Tianjin, China; 23-26 August 2016 p. 590-597
- Full Text: false
- Reviewed:
- Description: The use of Multimedia Wireless Sensor Networks (MWSNs) is becoming common nowadays with a rapid growth in communication facilities. Similar to any other WSNs, these networks face various challenges while providing security, trust and privacy for user data. Provisioning of the aforementioned services become an uphill task especially while dealing with real-time streaming data. These networks operates with resource-constrained sensor nodes for days, months and even years depending on the nature of an application. The resource-constrained nature of these networks makes it difficult for the nodes to tackle real-time data in mission-critical applications such as military surveillance, forest fire monitoring, health-care and industrial automation. For a secured MWSN, the transmission and processing of streaming data needs to be explored deeply. The conventional data authentication schemes are not suitable for MWSNs due to the limitations imposed on sensor nodes in terms of battery power, computation, available bandwidth and storage. In this paper, we propose a novel quality-driven clustering-based technique for authenticating streaming data in MWSNs. Nodes with maximum energy are selected as Cluster Heads (CHs). The CHs collect data from member nodes and forward it to the Base Station (BS), thus preventing member nodes with low energy from dying soon and increasing life span of the underlying network. The proposed approach not only authenticates the streaming data but also maintains the quality of transmitted data. The proposed data authentication scheme coupled with an Error Concealment technique provides an energy-efficient and distortion-free real-time data streaming. The proposed scheme is compared with an unsupervised resources scenario. The simulation results demonstrate better network lifetime along with 21.34 dB gain in Peak Signal-to-Noise Ratio (PSNR) of received video data streams.
Dynamic Bayesian network modeling of cyanobacterial biological processes via gene clustering
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2011
- Type: Text , Conference paper
- Relation: 18th International Conference on Neural Information Processing, ICONIP 2011; Shanghai; China; 13th-17th November 2011; published in (Lecture Notes in Computer Science series) Vol. 7062 (1) pg 97-106
- Full Text: false
- Reviewed:
- Description: Cyanobacteria are photosynthetic organisms that are credited with both the creation and replenishment of the oxygen-rich atmosphere, and are also responsible for more than half of the primary production on earth. Despite their crucial evolutionary and environmental roles, the study of these organisms has lagged behind other model organisms. This paper presents preliminary results on our ongoing research to unravel the biological interactions occurring within cyanobacteria. We develop an analysis framework that leverages recently developed bioinformatics and machine learning tools, such as genome-wide sequence matching based annotation, gene ontology analysis, cluster analysis and dynamic Bayesian network. Together, these tools allow us to overcome the lack of knowledge of less well-studied organisms, and reveal interesting relationships among their biological processes. Experiments on the Cyanothece bacterium demonstrate the practicability and usefulness of our approach. © 2011 Springer-Verlag.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, Vol.7062 (1), pp.97-106
Establishing phishing provenance using orthographic features
- Authors: Liping, Ma , Yearwood, John , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009
- Full Text:
- Description: After phishing message detection, determining the provenance of phishing messages and Websites is the second step to tracing cybercriminals. In this paper, we present a novel method to cluster phishing emails automatically using orthographic features. In particular, we develop an algorithm to cluster documents and remove redundant features at the same time. After collecting all the possible features based on observation, we adapt the modified global k-mean method repeatedly, and generate the objective function values over a range of tolerance values across different subsets of features. Finally, we identify the appropriate clusters based on studying the distribution of the objective function values. Experimental evaluation of a large number of computations demonstrates that our clustering and feature selection techniques are highly effective and achieve reliable results.
- Description: 2003007842
Frequency decomposition based gene clustering
- Authors: Rahman, Md Abdur , Chetty, Madhu , Bulach, Dieter , Wangikar, Pramod
- Date: 2015
- Type: Text , Conference paper
- Relation: 22nd International Conference on Neural Information Processing, ICONIP 2015; Istanbul, Turkey; 9th-12th November 2015 Vol. 9490, p. 170-181
- Full Text: false
- Reviewed:
- Description: Gene expressions have been commonly applied to understand the inherent underlying mechanism of known biological processes. Although the microarray gene expressions usually appear aperiodic, with proper signal processing techniques, its periodic components can be easily obtained. Thus, if expressions of interconnected (regulatory and regulated) genes are decomposed, at least one common frequency component will appear in these genes. Exploiting this novel concept, we propose a frequency decomposition approach for gene clustering to better understand the gene interconnection topology. This method, based on Hilbert Huang Transform (HHT) enables us to segregate every periodic component of the gene expressions. Next, a multilevel clustering is performed based on these frequency components. Unlike existing clustering algorithms, the proposed method assimilates a meaningful knowledge of the gene interactions topology. The information related to underlying gene interactions is vital and can prove useful in many existing evolutionary optimisation algorithms for genetic network reconstruction. We validate the entire approach by its application to a 15-gene synthetic network. © Springer International Publishing Switzerland 2015.
Functional specialisation and socio-economic factors in population change : A clustering study in non-metropolitan Australia
- Authors: Mardaneh, Karim
- Date: 2015
- Type: Text , Journal article
- Relation: Urban Studies Vol. 53, no. 8 (2015), p. 1591-1616
- Full Text: false
- Reviewed:
- Description: Although research has examined population growth and decline using functional specialisation, little attention has been paid to the possible combined effects of functional specialisation and socio-economic factors on population change. Using the Australian Bureau of Statistics Census Data 2001–2006 for statistical local areas, this study presents an investigation of the role of both functional specialisation and socio-economic factors in population change in non-metropolitan areas under the sustenance framework. The uniqueness of the study is twofold. Conceptually it develops a framework to compare the combined role of functional specialisation and socio-economic factors on population change; and, empirically it uses data mining (cluster analysis) techniques to investigate the extent of this combined role. The results show the significance of both functional specialisation and socio-economic factors. Policy implications of the study indicate the need to examine regional development and population change in relation to functional specialisation and socio-economic factors and their impact on viability of non-metropolitan areas. © Urban Studies Journal Limited 2015.