Integrating online social networks with e-Commerce : A CBR approach
- Sun, Zhaohao, Firmin, Sally, Yearwood, John
- Authors: Sun, Zhaohao , Firmin, Sally , Yearwood, John
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Integrating online social networks (OSN) with e-commerce is a part of Enterprise 2.0 and social media and is of significance for development of e-commerce and online social networking services. However, how to integrate online social networks including Facebook with e-commerce is still a big issue for companies. Case based reasoning (CBR) has a number of successful applications in e-commerce and web services. This article examines how to integrate OSN with e-commerce, how to integrate CBR with e-commerce and how to integrate CBR with OSN. This article also proposes a CBR architecture for integrating online social networks with e-commerce using CBR as an intelligent intermediary. One of the research findings indicates that the principle of CBR is a useful marketing strategy for integrating e-commerce and OSN. The approach proposed in this research will facilitate the development of e-commerce, Enterprise 3.0 and online social networking services. Sun, Firmin, & Yearwood © 2012.
- Description: 2003010901
- Authors: Sun, Zhaohao , Firmin, Sally , Yearwood, John
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Integrating online social networks (OSN) with e-commerce is a part of Enterprise 2.0 and social media and is of significance for development of e-commerce and online social networking services. However, how to integrate online social networks including Facebook with e-commerce is still a big issue for companies. Case based reasoning (CBR) has a number of successful applications in e-commerce and web services. This article examines how to integrate OSN with e-commerce, how to integrate CBR with e-commerce and how to integrate CBR with OSN. This article also proposes a CBR architecture for integrating online social networks with e-commerce using CBR as an intelligent intermediary. One of the research findings indicates that the principle of CBR is a useful marketing strategy for integrating e-commerce and OSN. The approach proposed in this research will facilitate the development of e-commerce, Enterprise 3.0 and online social networking services. Sun, Firmin, & Yearwood © 2012.
- Description: 2003010901
MapReduce neural network framework for efficient content based image retrieval from large datasets in the cloud
- Venkatraman, Sitalakshmi, Kulkarni, Siddhivinayak
- Authors: Venkatraman, Sitalakshmi , Kulkarni, Siddhivinayak
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Recently, content based image retrieval (CBIR) has gained active research focus due to wide applications such as crime prevention, medicine, historical research and digital libraries. With digital explosion, image collections in databases in distributed locations over the Internet pose a challenge to retrieve images that are relevant to user queries efficiently and accurately. It becomes increasingly important to develop new CBIR techniques that are effective and scalable for real-time processing of very large image collections. To address this, the paper proposes a novel MapReduce neural network framework for CBIR from large data collection in a cloud environment. We adopt natural language queries that use a fuzzy approach to classify the colour images based on their content and apply Map and Reduce functions that can operate in cloud clusters for arriving at accurate results in real-time. Preliminary experimental results for classifying and retrieving images from large data sets were quite convincing to carry out further experimental evaluations. © 2012 IEEE.
- Description: 2003010699
- Authors: Venkatraman, Sitalakshmi , Kulkarni, Siddhivinayak
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Recently, content based image retrieval (CBIR) has gained active research focus due to wide applications such as crime prevention, medicine, historical research and digital libraries. With digital explosion, image collections in databases in distributed locations over the Internet pose a challenge to retrieve images that are relevant to user queries efficiently and accurately. It becomes increasingly important to develop new CBIR techniques that are effective and scalable for real-time processing of very large image collections. To address this, the paper proposes a novel MapReduce neural network framework for CBIR from large data collection in a cloud environment. We adopt natural language queries that use a fuzzy approach to classify the colour images based on their content and apply Map and Reduce functions that can operate in cloud clusters for arriving at accurate results in real-time. Preliminary experimental results for classifying and retrieving images from large data sets were quite convincing to carry out further experimental evaluations. © 2012 IEEE.
- Description: 2003010699
Performance evaluation of multi-tier ensemble classifiers for phishing websites
- Abawajy, Jemal, Beliakov, Gleb, Kelarev, Andrei, Yearwood, John
- Authors: Abawajy, Jemal , Beliakov, Gleb , Kelarev, Andrei , Yearwood, John
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the toptier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi-tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi-level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer.
- Authors: Abawajy, Jemal , Beliakov, Gleb , Kelarev, Andrei , Yearwood, John
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the toptier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi-tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi-level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer.
Prudent fraud detection in internet banking
- Maruatona, Omaru, Vamplew, Peter, Dazeley, Richard
- Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Most commercial Fraud Detection components of Internet banking systems use some kind of hybrid setup usually comprising a Rule-Base and an Artificial Neural Network. Such rule bases have been criticised for a lack of innovation in their approach to Knowledge Acquisition and maintenance. Furthermore, the systems are brittle; they have no way of knowing when a previously unseen set of fraud patterns is beyond their current knowledge. This limitation may have far reaching consequences in an online banking system. This paper presents a viable alternative to brittleness in Knowledge Based Systems; a potential milestone in the rapid detection of unique and novel fraud patterns in Internet banking. The experiments conducted with real online banking transaction log files suggest that Prudent based fraud detection may be a worthy alternative in online banking. © 2012 IEEE.
- Description: 2003010883
- Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Most commercial Fraud Detection components of Internet banking systems use some kind of hybrid setup usually comprising a Rule-Base and an Artificial Neural Network. Such rule bases have been criticised for a lack of innovation in their approach to Knowledge Acquisition and maintenance. Furthermore, the systems are brittle; they have no way of knowing when a previously unseen set of fraud patterns is beyond their current knowledge. This limitation may have far reaching consequences in an online banking system. This paper presents a viable alternative to brittleness in Knowledge Based Systems; a potential milestone in the rapid detection of unique and novel fraud patterns in Internet banking. The experiments conducted with real online banking transaction log files suggest that Prudent based fraud detection may be a worthy alternative in online banking. © 2012 IEEE.
- Description: 2003010883
State transition algorithm for traveling salesman problem
- Yang, Chunhua, Tang, Xiaolin, Zhou, Xiaojun, Gui, Weihua
- Authors: Yang, Chunhua , Tang, Xiaolin , Zhou, Xiaojun , Gui, Weihua
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Discrete version of state transition algorithm is proposed in order to solve the traveling salesman problem. Three special operators for discrete optimization problem named swap, shift and symmetry transformations are presented. Convergence analysis and time complexity of the algorithm are also considered. To make the algorithm simple and efficient, no parameter adjusting is suggested in current version. Experiments are carried out to test the performance of the strategy, and comparisons with simulated annealing and ant colony optimization have demonstrated the effectiveness of the proposed algorithm. The results also show that the discrete state transition algorithm consumes much less time and has better search ability than its counterparts, which indicates that state transition algorithm is with strong adaptability. © 2012 Chinese Assoc of Automati.
- Authors: Yang, Chunhua , Tang, Xiaolin , Zhou, Xiaojun , Gui, Weihua
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Discrete version of state transition algorithm is proposed in order to solve the traveling salesman problem. Three special operators for discrete optimization problem named swap, shift and symmetry transformations are presented. Convergence analysis and time complexity of the algorithm are also considered. To make the algorithm simple and efficient, no parameter adjusting is suggested in current version. Experiments are carried out to test the performance of the strategy, and comparisons with simulated annealing and ant colony optimization have demonstrated the effectiveness of the proposed algorithm. The results also show that the discrete state transition algorithm consumes much less time and has better search ability than its counterparts, which indicates that state transition algorithm is with strong adaptability. © 2012 Chinese Assoc of Automati.
Structure learning of Bayesian networks using a new unrestricted dependency algorithm
- Taheri, Sona, Mammadov, Musa
- Authors: Taheri, Sona , Mammadov, Musa
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Bayesian Networks have deserved extensive attentions in data mining due to their efficiencies, and reasonable predictive accuracy. A Bayesian Network is a directed acyclic graph in which each node represents a variable and each arc a probabilistic dependency between two variables. Constructing a Bayesian Network from data is the learning process that is divided in two steps: learning structure and learning parameter. In many domains, the structure is not known a priori and must be inferred from data. This paper presents an iterative unrestricted dependency algorithm for learning structure of Bayesian Networks for binary classification problems. Numerical experiments are conducted on several real world data sets, where continuous features are discretized by applying two different methods. The performance of the proposed algorithm is compared with the Naive Bayes, the Tree Augmented Naive Bayes, and the k
- Authors: Taheri, Sona , Mammadov, Musa
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Bayesian Networks have deserved extensive attentions in data mining due to their efficiencies, and reasonable predictive accuracy. A Bayesian Network is a directed acyclic graph in which each node represents a variable and each arc a probabilistic dependency between two variables. Constructing a Bayesian Network from data is the learning process that is divided in two steps: learning structure and learning parameter. In many domains, the structure is not known a priori and must be inferred from data. This paper presents an iterative unrestricted dependency algorithm for learning structure of Bayesian Networks for binary classification problems. Numerical experiments are conducted on several real world data sets, where continuous features are discretized by applying two different methods. The performance of the proposed algorithm is compared with the Naive Bayes, the Tree Augmented Naive Bayes, and the k
Towards an implementation of information flow security using semantic web technologies
- Ureche, Oana, Layton, Robert, Watters, Paul
- Authors: Ureche, Oana , Layton, Robert , Watters, Paul
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Controlling the flow of sensitive data has been widely acknowledged as a critical aspect for securing web information systems. A common limitation of previous approaches for the implementation of the information flow control is their proposal of new scripting languages. This makes them infeasible to be applied to existing systems written in traditional programming languages as these systems need to be redeveloped in the proposed scripting language. This paper proposes a methodology that offers a common interlinqua through the use of Semantic Web technologies for securing web information systems independently of their programming language. © 2012 IEEE.
- Description: 2003011056
- Authors: Ureche, Oana , Layton, Robert , Watters, Paul
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Controlling the flow of sensitive data has been widely acknowledged as a critical aspect for securing web information systems. A common limitation of previous approaches for the implementation of the information flow control is their proposal of new scripting languages. This makes them infeasible to be applied to existing systems written in traditional programming languages as these systems need to be redeveloped in the proposed scripting language. This paper proposes a methodology that offers a common interlinqua through the use of Semantic Web technologies for securing web information systems independently of their programming language. © 2012 IEEE.
- Description: 2003011056
Unsupervised authorship analysis of phishing webpages
- Layton, Robert, Watters, Paul, Dazeley, Richard
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
Demand for apprenticeships and traineeships: What are the implications for the future?
- Authors: Smith, Erica , Bush, Tony
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: While the Australian apprenticeship and traineeship system is currently strong, the overall strength belies some areas of weakness. One of these areas is the uneven nature of demand from applicants for positions as apprentices and trainees, which means that some industries, occupations and employers struggle to find enough applicants while others are over-subscribed. While apprenticships or traineeships in some occupations and/or companies offering positions within those occupations find it difficult to attract applicants of a suitable calibre. This paper reports on a research project undertaken during 2010, with 21 employers who employed apprentices and trainees. The different recruitment strategies and outcomes of the companies are described and the possible for companies' apparent success or failure to attract suitable applicants are discussed. Some suggestions for future policy and practice at company, regional and national level are offered.
- Authors: Smith, Erica , Bush, Tony
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: While the Australian apprenticeship and traineeship system is currently strong, the overall strength belies some areas of weakness. One of these areas is the uneven nature of demand from applicants for positions as apprentices and trainees, which means that some industries, occupations and employers struggle to find enough applicants while others are over-subscribed. While apprenticships or traineeships in some occupations and/or companies offering positions within those occupations find it difficult to attract applicants of a suitable calibre. This paper reports on a research project undertaken during 2010, with 21 employers who employed apprentices and trainees. The different recruitment strategies and outcomes of the companies are described and the possible for companies' apparent success or failure to attract suitable applicants are discussed. Some suggestions for future policy and practice at company, regional and national level are offered.
Feature selection using misclassification counts
- Bagirov, Adil, Yatsko, Andrew, Stranieri, Andrew
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
- Authors: Bagirov, Adil , Yatsko, Andrew , Stranieri, Andrew
- Date: 2011
- Type: Conference proceedings , Unpublished work
- Relation: Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), 51-62. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121.
- Full Text:
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
- Description: Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and the data acquisition effort, considering all data components being accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree, to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance, what ranking does not immediately decide. Additionally, feature ranking methods available from different independent sources are called in for direct comparison.
Industry type and business size on economic growth: Comparing Australia's Regional and Metropolitan areas
- Authors: Mardaneh, Karim
- Date: 2011
- Type: Text , Conference proceedings
- Relation: 56th Annual ICSB World Conference; Back to the Future - Changes in Perspectives of Global Entrepreneurship and Innovation,Stockholm, Sweden, 15-18 June, 2011
- Full Text:
- Reviewed:
- Description: While the main body of literature regarding small-to-medium enterprises is focused on formation and growth, there is insufficient research about the role of both (a) firm size and (b) location on economic growth. The role of firm size and industrial structure on economic growth has been examined by some researchers. Pagano (2003) and Pagano and Schivardi (2000) identified a positive association between average firm size and growth and Carree and Thurik (1999) found evidence that the low number of large firms in an industry could lead to a higher value added growth. The current study attempts to investigate the impact of industry structure and businesses operating within these industries on economic growth. This paper uses “k-means” clustering algorithm to cluster Statistical Local Areas. Regression analysis is utilised to identify drivers of economic growth. Preliminary results suggest that size of business may act as a driver of economic growth but the impact could vary based on location.
- Authors: Mardaneh, Karim
- Date: 2011
- Type: Text , Conference proceedings
- Relation: 56th Annual ICSB World Conference; Back to the Future - Changes in Perspectives of Global Entrepreneurship and Innovation,Stockholm, Sweden, 15-18 June, 2011
- Full Text:
- Reviewed:
- Description: While the main body of literature regarding small-to-medium enterprises is focused on formation and growth, there is insufficient research about the role of both (a) firm size and (b) location on economic growth. The role of firm size and industrial structure on economic growth has been examined by some researchers. Pagano (2003) and Pagano and Schivardi (2000) identified a positive association between average firm size and growth and Carree and Thurik (1999) found evidence that the low number of large firms in an industry could lead to a higher value added growth. The current study attempts to investigate the impact of industry structure and businesses operating within these industries on economic growth. This paper uses “k-means” clustering algorithm to cluster Statistical Local Areas. Regression analysis is utilised to identify drivers of economic growth. Preliminary results suggest that size of business may act as a driver of economic growth but the impact could vary based on location.
Performance improvement of vertical handoff algorithms for QoS support over heterogenuous wireless networks
- Sharna, Shusmita, Murshed, Manzur
- Authors: Sharna, Shusmita , Murshed, Manzur
- Date: 2011
- Type: Text , Conference proceedings
- Relation: Proceedings of the Thirty-Fourth Australasian Computer Science (ASSC 2011), 17th -20th January, Perth, 2011 p. 17-24
- Full Text:
- Reviewed:
- Description: During the vertical handoff procedure, handoff decision is the most important step that affects the normal working of communication. An incorrect handoff decision or selection of a non-optimal network can result in undesirable effects such as higher costs, poor service experience, degrade the quality of service and even break off current communication. The objective of this paper is to determine the conditions under which vertical handoff should be performed in heterogeneous wireless networks. In this paper, we present a comprehensive analysis of different vertical handoff decision algorithms. To evaluate tradeoffs between their performance and efficiency, we propose two improved vertical handoff decision algorithm based on Markov Decision Process which are referred to as MDP_SAW and MDP_TOPSIS. The proposed mechanism assists the terminal in selecting the top candidate network and offer better available bandwidth so that user satisfaction is effectively maximized. In addition, our proposed method avoids unbeneficial handoffs in the wireless overlay networks.
- Authors: Sharna, Shusmita , Murshed, Manzur
- Date: 2011
- Type: Text , Conference proceedings
- Relation: Proceedings of the Thirty-Fourth Australasian Computer Science (ASSC 2011), 17th -20th January, Perth, 2011 p. 17-24
- Full Text:
- Reviewed:
- Description: During the vertical handoff procedure, handoff decision is the most important step that affects the normal working of communication. An incorrect handoff decision or selection of a non-optimal network can result in undesirable effects such as higher costs, poor service experience, degrade the quality of service and even break off current communication. The objective of this paper is to determine the conditions under which vertical handoff should be performed in heterogeneous wireless networks. In this paper, we present a comprehensive analysis of different vertical handoff decision algorithms. To evaluate tradeoffs between their performance and efficiency, we propose two improved vertical handoff decision algorithm based on Markov Decision Process which are referred to as MDP_SAW and MDP_TOPSIS. The proposed mechanism assists the terminal in selecting the top candidate network and offer better available bandwidth so that user satisfaction is effectively maximized. In addition, our proposed method avoids unbeneficial handoffs in the wireless overlay networks.
Zero-day malware detection based on supervised learning algorithms of API call signatures
- Alazab, Mamoun, Venkatraman, Sitalakshmi, Watters, Paul, Alazab, Moutaz
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO - PolyKernel, SMO - Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today. © 2011, Australian Computer Society, Inc.
- Description: 2003009506
- Authors: Alazab, Mamoun , Venkatraman, Sitalakshmi , Watters, Paul , Alazab, Moutaz
- Date: 2011
- Type: Text , Conference proceedings
- Full Text:
- Description: Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO - PolyKernel, SMO - Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today. © 2011, Australian Computer Society, Inc.
- Description: 2003009506
Adaptive clustering with feature ranking for DDoS attacks detection
- Zi, Lifang, Yearwood, John, Wu, Xin
- Authors: Zi, Lifang , Yearwood, John , Wu, Xin
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Distributed Denial of Service (DDoS) attacks pose an increasing threat to the current internet. The detection of such attacks plays an important role in maintaining the security of networks. In this paper, we propose a novel adaptive clustering method combined with feature ranking for DDoS attacks detection. First, based on the analysis of network traffic, preliminary variables are selected. Second, the Modified Global K-means algorithm (MGKM) is used as the basic incremental clustering algorithm to identify the cluster structure of the target data. Third, the linear correlation coefficient is used for feature ranking. Lastly, the feature ranking result is used to inform and recalculate the clusters. This adaptive process can make worthwhile adjustments to the working feature vector according to different patterns of DDoS attacks, and can improve the quality of the clusters and the effectiveness of the clustering algorithm. The experimental results demonstrate that our method is effective and adaptive in detecting the separate phases of DDoS attacks. © 2010 IEEE.
- Authors: Zi, Lifang , Yearwood, John , Wu, Xin
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Distributed Denial of Service (DDoS) attacks pose an increasing threat to the current internet. The detection of such attacks plays an important role in maintaining the security of networks. In this paper, we propose a novel adaptive clustering method combined with feature ranking for DDoS attacks detection. First, based on the analysis of network traffic, preliminary variables are selected. Second, the Modified Global K-means algorithm (MGKM) is used as the basic incremental clustering algorithm to identify the cluster structure of the target data. Third, the linear correlation coefficient is used for feature ranking. Lastly, the feature ranking result is used to inform and recalculate the clusters. This adaptive process can make worthwhile adjustments to the working feature vector according to different patterns of DDoS attacks, and can improve the quality of the clusters and the effectiveness of the clustering algorithm. The experimental results demonstrate that our method is effective and adaptive in detecting the separate phases of DDoS attacks. © 2010 IEEE.
An application of consensus clustering for DDoS attacks detection
- Zi, Lifang, Yearwood, John, Kelarev, Andrei
- Authors: Zi, Lifang , Yearwood, John , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: The detection of Distributed Denial of Service (DDos) attacks is very important for maintaining the security of networks and the Internet. This paper introduces a novel iterative consensus process based on Hybrid Bipartite Graph Formulation (HGBF) consensus function for DDos attacks detection. First, the features are extracted during feature extraction process based on the analysis of network traffic. Second, several clustering algorithms are applied in combination with the silhouette index to obtain a collection of independent initial clusterings. Third, the HGBF consensus function and silhouette index are used to find an appropriate consensus clustering of the initial clusterings. Fourth, this new consensus clustering is added to the pool of initial clusterings replacing another clustering with the worst Silhouette index. Fifth, the process continues iteratively until the Silhouette index of the resulting consensus clusterings stabilizes. This iterative consensus clustering process can improve the quality of the clusters. The experimental results demonstrate that our iterative consensus process is effective and can be used in practice for detecting the separate phased of DDos attacks.
- Authors: Zi, Lifang , Yearwood, John , Kelarev, Andrei
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: The detection of Distributed Denial of Service (DDos) attacks is very important for maintaining the security of networks and the Internet. This paper introduces a novel iterative consensus process based on Hybrid Bipartite Graph Formulation (HGBF) consensus function for DDos attacks detection. First, the features are extracted during feature extraction process based on the analysis of network traffic. Second, several clustering algorithms are applied in combination with the silhouette index to obtain a collection of independent initial clusterings. Third, the HGBF consensus function and silhouette index are used to find an appropriate consensus clustering of the initial clusterings. Fourth, this new consensus clustering is added to the pool of initial clusterings replacing another clustering with the worst Silhouette index. Fifth, the process continues iteratively until the Silhouette index of the resulting consensus clusterings stabilizes. This iterative consensus clustering process can improve the quality of the clusters. The experimental results demonstrate that our iterative consensus process is effective and can be used in practice for detecting the separate phased of DDos attacks.
Automatic sleep stage identification: difficulties and possible solutions
- Sukhorukova, Nadezda, Stranieri, Andrew, Ofoghi, Bahadorreza, Vamplew, Peter, Saleem, Muhammad Saad, Ma, Liping, Ugon, Adrien, Ugon, Julien, Muecke, Nial, Amiel, Hélène, Philippe, Carole, Bani-Mustafa, Ahmed, Huda, Shamsul, Bertoli, Marcello, Levy, P, Ganascia, J.G
- Authors: Sukhorukova, Nadezda , Stranieri, Andrew , Ofoghi, Bahadorreza , Vamplew, Peter , Saleem, Muhammad Saad , Ma, Liping , Ugon, Adrien , Ugon, Julien , Muecke, Nial , Amiel, Hélène , Philippe, Carole , Bani-Mustafa, Ahmed , Huda, Shamsul , Bertoli, Marcello , Levy, P , Ganascia, J.G
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: The diagnosis of many sleep disorders is a labour intensive task that involves the specialised interpretation of numerous signals including brain wave, breath and heart rate captured in overnight polysomnogram sessions. The automation of diagnoses is challenging for data mining algorithms because the data sets are extremely large and noisy, the signals are complex and specialist's analyses vary. This work reports on the adaptation of approaches from four fields; neural networks, mathematical optimisation, financial forecasting and frequency domain analysis to the problem of automatically determing a patient's stage of sleep. Results, though preliminary, are promising and indicate that combined approaches may prove more fruitful than the reliance on a approach.
- Authors: Sukhorukova, Nadezda , Stranieri, Andrew , Ofoghi, Bahadorreza , Vamplew, Peter , Saleem, Muhammad Saad , Ma, Liping , Ugon, Adrien , Ugon, Julien , Muecke, Nial , Amiel, Hélène , Philippe, Carole , Bani-Mustafa, Ahmed , Huda, Shamsul , Bertoli, Marcello , Levy, P , Ganascia, J.G
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: The diagnosis of many sleep disorders is a labour intensive task that involves the specialised interpretation of numerous signals including brain wave, breath and heart rate captured in overnight polysomnogram sessions. The automation of diagnoses is challenging for data mining algorithms because the data sets are extremely large and noisy, the signals are complex and specialist's analyses vary. This work reports on the adaptation of approaches from four fields; neural networks, mathematical optimisation, financial forecasting and frequency domain analysis to the problem of automatically determing a patient's stage of sleep. Results, though preliminary, are promising and indicate that combined approaches may prove more fruitful than the reliance on a approach.
Cluster based rule discovery model for enhancement of government's tobacco control strategy
- Huda, Shamsul, Yearwood, John, Borland, Ron
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Discovery of interesting rules describing the behavioural patterns of smokers' quitting intentions is an important task in the determination of an effective tobacco control strategy. In this paper, we investigate a compact and simplified rule discovery process for predicting smokers' quitting behaviour that can provide feedback to build an scientific evidence-based adaptive tobacco control policy. Standard decision tree (SDT) based rule discovery depends on decision boundaries in the feature space which are orthogonal to the axis of the feature of a particular decision node. This may limit the ability of SDT to learn intermediate concepts for high dimensional large datasets such as tobacco control. In this paper, we propose a cluster based rule discovery model (CRDM) for generation of more compact and simplified rules for the enhancement of tobacco control policy. The clusterbased approach builds conceptual groups from which a set of decision trees (a decision forest) are constructed. Experimental results on the tobacco control data set show that decision rules from the decision forest constructed by CRDM are simpler and can predict smokers' quitting intention more accurately than a single decision tree. © 2010 IEEE.
- Authors: Huda, Shamsul , Yearwood, John , Borland, Ron
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Discovery of interesting rules describing the behavioural patterns of smokers' quitting intentions is an important task in the determination of an effective tobacco control strategy. In this paper, we investigate a compact and simplified rule discovery process for predicting smokers' quitting behaviour that can provide feedback to build an scientific evidence-based adaptive tobacco control policy. Standard decision tree (SDT) based rule discovery depends on decision boundaries in the feature space which are orthogonal to the axis of the feature of a particular decision node. This may limit the ability of SDT to learn intermediate concepts for high dimensional large datasets such as tobacco control. In this paper, we propose a cluster based rule discovery model (CRDM) for generation of more compact and simplified rules for the enhancement of tobacco control policy. The clusterbased approach builds conceptual groups from which a set of decision trees (a decision forest) are constructed. Experimental results on the tobacco control data set show that decision rules from the decision forest constructed by CRDM are simpler and can predict smokers' quitting intention more accurately than a single decision tree. © 2010 IEEE.
GOM: New Genetic Optimizing Model for broadcasting tree in MANET
- Elaiwat, Said, Alazab, Ammar, Venkatraman, Sitalakshmi, Alazab, Mamoun
- Authors: Elaiwat, Said , Alazab, Ammar , Venkatraman, Sitalakshmi , Alazab, Mamoun
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Data broadcasting in a mobile ad-hoc network (MANET) is the main method of information dissemination in many applications, in particular for sending critical information to all hosts. Finding an optimal broadcast tree in such networks is a challenging task due to the broadcast storm problem. The aim of this work is to propose a new genetic model using a fitness function with the primary goal of finding an optimal broadcast tree. Our new method, called Genetic Optimisation Model (GOM) alleviates the broadcast storm problem to a great extent as the experimental simulations result in efficient broadcast tree with minimal flood and minimal hops. The result of this model also shows that it has the ability to give different optimal solutions according to the nature of the network. © 2010 IEEE.
- Authors: Elaiwat, Said , Alazab, Ammar , Venkatraman, Sitalakshmi , Alazab, Mamoun
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Data broadcasting in a mobile ad-hoc network (MANET) is the main method of information dissemination in many applications, in particular for sending critical information to all hosts. Finding an optimal broadcast tree in such networks is a challenging task due to the broadcast storm problem. The aim of this work is to propose a new genetic model using a fitness function with the primary goal of finding an optimal broadcast tree. Our new method, called Genetic Optimisation Model (GOM) alleviates the broadcast storm problem to a great extent as the experimental simulations result in efficient broadcast tree with minimal flood and minimal hops. The result of this model also shows that it has the ability to give different optimal solutions according to the nature of the network. © 2010 IEEE.
Hybrid wrapper-filter approaches for input feature selection using maximum relevance and Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA)
- Huda, Shamsul, Yearwood, John, Stranieri, Andrew
- Authors: Huda, Shamsul , Yearwood, John , Stranieri, Andrew
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Feature selection is an important research problem in machine learning and data mining applications. This paper proposes a hybrid wrapper and filter feature selection algorithm by introducing the filter's feature ranking score in the wrapper stage to speed up the search process for wrapper and thereby finding a more compact feature subset. The approach hybridizes a Mutual Information (MI) based Maximum Relevance (MR) filter ranking heuristic with an Artificial Neural Network (ANN) based wrapper approach where Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA) has been combined with MR (MR-ANNIGMA) to guide the search process in the wrapper. The novelty of our approach is that we use hybrid of wrapper and filter methods that combines filter's ranking score with the wrapper-heuristic's score to take advantages of both filter and wrapper heuristics. Performance of the proposed MRANNIGMA has been verified using bench mark data sets and compared to both independent filter and wrapper based approaches. Experimental results show that MR-ANNIGMA achieves more compact feature sets and higher accuracies than both filter and wrapper approaches alone. © 2010 IEEE.
- Authors: Huda, Shamsul , Yearwood, John , Stranieri, Andrew
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Feature selection is an important research problem in machine learning and data mining applications. This paper proposes a hybrid wrapper and filter feature selection algorithm by introducing the filter's feature ranking score in the wrapper stage to speed up the search process for wrapper and thereby finding a more compact feature subset. The approach hybridizes a Mutual Information (MI) based Maximum Relevance (MR) filter ranking heuristic with an Artificial Neural Network (ANN) based wrapper approach where Artificial Neural Network Input Gain Measurement Approximation (ANNIGMA) has been combined with MR (MR-ANNIGMA) to guide the search process in the wrapper. The novelty of our approach is that we use hybrid of wrapper and filter methods that combines filter's ranking score with the wrapper-heuristic's score to take advantages of both filter and wrapper heuristics. Performance of the proposed MRANNIGMA has been verified using bench mark data sets and compared to both independent filter and wrapper based approaches. Experimental results show that MR-ANNIGMA achieves more compact feature sets and higher accuracies than both filter and wrapper approaches alone. © 2010 IEEE.
Improving Naive Bayes classifier using conditional probabilities
- Taheri, Sona, Mammadov, Musa, Bagirov, Adil
- Authors: Taheri, Sona , Mammadov, Musa , Bagirov, Adil
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Naive Bayes classifier is the simplest among Bayesian Network classifiers. It has shown to be very efficient on a variety of data classification problems. However, the strong assumption that all features are conditionally independent given the class is often violated on many real world applications. Therefore, improvement of the Naive Bayes classifier by alleviating the feature independence assumption has attracted much attention. In this paper, we develop a new version of the Naive Bayes classifier without assuming independence of features. The proposed algorithm approximates the interactions between features by using conditional probabilities. We present results of numerical experiments on several real world data sets, where continuous features are discretized by applying two different methods. These results demonstrate that the proposed algorithm significantly improve the performance of the Naive Bayes classifier, yet at the same time maintains its robustness. © 2011, Australian Computer Society, Inc.
- Description: 2003009505
- Authors: Taheri, Sona , Mammadov, Musa , Bagirov, Adil
- Date: 2010
- Type: Text , Conference proceedings
- Full Text:
- Description: Naive Bayes classifier is the simplest among Bayesian Network classifiers. It has shown to be very efficient on a variety of data classification problems. However, the strong assumption that all features are conditionally independent given the class is often violated on many real world applications. Therefore, improvement of the Naive Bayes classifier by alleviating the feature independence assumption has attracted much attention. In this paper, we develop a new version of the Naive Bayes classifier without assuming independence of features. The proposed algorithm approximates the interactions between features by using conditional probabilities. We present results of numerical experiments on several real world data sets, where continuous features are discretized by applying two different methods. These results demonstrate that the proposed algorithm significantly improve the performance of the Naive Bayes classifier, yet at the same time maintains its robustness. © 2011, Australian Computer Society, Inc.
- Description: 2003009505