How much material on BitTorrent is infringing content? A case study
- Authors: Watters, Paul , Layton, Robert , Dazeley, Richard
- Date: 2011
- Type: Text , Journal article
- Relation: Information Security Technical Report Vol. 16, no. 2 (2011), p. 79-87
- Full Text: false
- Reviewed:
- Description: BitTorrent is a widely used protocol for peer-to-peer (P2P) file sharing, including material which is often suspected to be infringing content. However, little systematic research has been undertaken to establish to measure the true extent of illegal file sharing. In this paper, we propose a new methodology for measuring the extent of infringing content. Our initial results indicate that at least 89.9% of files shared contain infringing content, with a replication study on another sample finding 97%. We discuss the limitations of the approach in this case study, including sampling biases, and outline proposals to further verify the results. The implications of the work vis - vis the management of piracy at the network level are discussed. © 2011 Published by Elsevier Ltd. All rights reserved.
Online knowledge validation with prudence analysis in a document management application
- Authors: Dazeley, Richard , Park, Sung Sik , Kang, Byeongho
- Date: 2011
- Type: Text , Journal article
- Relation: Expert Systems with Applications Vol. , no. (2011), p.
- Full Text: false
- Reviewed:
- Description: Prudence analysis (PA) is a relatively new, practical and highly innovative approach to solving the problem of brittleness in knowledge based system (KBS) development. PA is essentially an online validation approach where as each situation or case is presented to the KBS for inferencing the result is simultaneously validated. Therefore, instead of the system simply providing a conclusion, it also provides a warning when the validation fails. Previous studies have shown that a modification to multiple classification ripple-down rules (MCRDR) referred to as rated MCRDR (RM) has been able to achieve strong and flexible results in simulated domains with artificial data sets. This paper presents a study into the effectiveness of RM in an eHealth document monitoring and classification domain using human expertise. Additionally, this paper also investigates what affect PA has when the KBS developer relied entirely on the warnings for maintenance. Results indicate that the system is surprisingly robust even when warning accuracy is allowed to drop quite low. This study of a previously little touched area provides a strong indication of the potential for future knowledge based system development. © 2011 Elsevier Ltd. All rights reserved.
Detection of CAN by ensemble classifiers based on Ripple Down rules
- Authors: Kelarev, Andrei , Dazeley, Richard , Stranieri, Andrew , Yearwood, John , Jelinek, Herbert
- Date: 2012
- Type: Text , Book chapter
- Relation: Knowledge Management and Acquisition for Intelligent Systems p. 147-159
- Full Text: false
- Reviewed:
- Description: It is well known that classification models produced by the Ripple Down Rules are easier to maintain and update. They are compact and can provide an explanation of their reasoning making them easy to understand for medical practitioners. This article is devoted to an empirical investigation and comparison of several ensemble methods based on Ripple Down Rules in a novel application for the detection of cardiovascular autonomic neuropathy (CAN) from an extensive data set collected by the Diabetes Complications Screening Research Initiative at Charles Sturt University. Our experiments included essential ensemble methods, several more recent state-of-the-art techniques, and a novel consensus function based on graph partitioning. The results show that our novel application of Ripple Down Rules in ensemble classifiers for the detection of CAN achieved better performance parameters compared with the outcomes obtained previously in the literature.
Prudent fraud detection in internet banking
- Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Most commercial Fraud Detection components of Internet banking systems use some kind of hybrid setup usually comprising a Rule-Base and an Artificial Neural Network. Such rule bases have been criticised for a lack of innovation in their approach to Knowledge Acquisition and maintenance. Furthermore, the systems are brittle; they have no way of knowing when a previously unseen set of fraud patterns is beyond their current knowledge. This limitation may have far reaching consequences in an online banking system. This paper presents a viable alternative to brittleness in Knowledge Based Systems; a potential milestone in the rapid detection of unique and novel fraud patterns in Internet banking. The experiments conducted with real online banking transaction log files suggest that Prudent based fraud detection may be a worthy alternative in online banking. © 2012 IEEE.
- Description: 2003010883
Recentred local profiles for authorship attribution
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 18, no. 3 (2012), p. 293-312
- Full Text:
- Reviewed:
- Description: Authorship attribution methods aim to determine the author of a document, by using information gathered from a set of documents with known authors. One method of performing this task is to create profiles containing distinctive features known to be used by each author. In this paper, a new method of creating an author or document profile is presented that detects features considered distinctive, compared to normal language usage. This recentreing approach creates more accurate profiles than previous methods, as demonstrated empirically using a known corpus of authorship problems. This method, named recentred local profiles, determines authorship accurately using a simple 'best matching author' approach to classification, compared to other methods in the literature. The proposed method is shown to be more stable than related methods as parameter values change. Using a weighted voting scheme, recentred local profiles is shown to outperform other methods in authorship attribution, with an overall accuracy of 69.9% on the ad-hoc authorship attribution competition corpus, representing a significant improvement over related methods. Copyright © Cambridge University Press 2011.
- Description: 2003010688
RM and RDM, a preliminary evaluation of two prudent RDR Techniques
- Authors: Maruatona, Omaru , Vamplew, Peter , Dazeley, Richard
- Date: 2012
- Type: Text , Book chapter
- Relation: Knowledge Management and acquisition for intelligent systems: 12th Pacific Rim Knowledge Acquisition workshop p. 188-194
- Full Text: false
- Reviewed:
- Description: Rated Multiple Classification Ripple Down Rules (RM) and Ripple Down Models (RDM) are two of the successful prudent RDR approaches published. To date, there has not been a published, dedicated comparison of the two. This paper presents a systematic preliminary evaluation and analysis of the two techniques. The tests and results reported in this paper are the first phase of direct evaluations of RM and RDM against each other.
Unsupervised authorship analysis of phishing webpages
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Conference proceedings
- Full Text:
- Description: Authorship analysis on phishing websites enables the investigation of phishing attacks, beyond basic analysis. In authorship analysis, salient features from documents are used to determine properties about the author, such as which of a set of candidate authors wrote a given document. In unsupervised authorship analysis, the aim is to group documents such that all documents by one author are grouped together. Applying this to cyber-attacks shows the size and scope of attacks from specific groups. This in turn allows investigators to focus their attention on specific attacking groups rather than trying to profile multiple independent attackers. In this paper, we analyse phishing websites using the current state of the art unsupervised authorship analysis method, called NUANCE. The results indicate that the application produces clusters which correlate strongly to authorship, evaluated using expert knowledge and external information as well as showing an improvement over a previous approach with known flaws. © 2012 IEEE.
- Description: 2003010678
A survey of multi-objective sequential decision-making
- Authors: Roijers, Diederik , Vamplew, Peter , Whiteson, Shimon , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Artificial Intelligence Research Vol. 48, no. (2013), p. 67-113
- Full Text:
- Reviewed:
- Description: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. © 2013 AI Access Foundation.
- Description: C1
Authorship analysis of aliases: Does topic influence accuracy?
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. Online first, no. (2013), p.
- Full Text:
- Reviewed:
- Description: Aliases play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these ‘random sub-aliases’. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local n-gram methods perform better than others.
Automated unsupervised authorship analysis using evidence accumulation clustering
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 19, no. 1 (2013), p. 95-120
- Full Text:
- Reviewed:
- Description: Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of clusters of authorship within a corpus by analysts. However, there is a need in many fields for more sophisticated unsupervised methods to automate the discovery, profiling and organisation of related information through clustering of documents by authorship. An automated and unsupervised methodology for clustering documents by authorship is proposed in this paper. The methodology is named NUANCE, for n-gram Unsupervised Automated Natural Cluster Ensemble. Testing indicates that the derived clusters have a strong correlation to the true authorship of unseen documents. © 2011 Cambridge University Press.
- Description: 2003010584
Evaluating authorship distance methods using the positive Silhouette coefficient
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 19, no. 4 (2013), p. 517-535
- Full Text:
- Reviewed:
- Description: Unsupervised Authorship Analysis (UAA) aims to cluster documents by authorship without knowing the authorship of any documents. An important factor in UAA is the method for calculating the distance between documents. This choice of the authorship distance method is considered more critical to the end result than the choice of cluster analysis algorithm. One method for measuring the correlation between a distance metric and a labelling (such as class values or clusters) is the Silhouette Coefficient (SC). The SC can be leveraged by measuring the correlation between the authorship distance method and the true authorship, evaluating the quality of the distance method. However, we show that the SC can be severely affected by outliers. To address this issue, we introduce the Positive Silhouette Coefficient, given as the proportion of instances with a positive SC value. This metric is not easily altered by outliers and produces a more robust metric. A large number of authorship distance methods are then compared using the PSC, and the findings are presented. This research provides an insight into the efficacy of methods for UAA and presents a framework for testing authorship distance methods.
- Description: C1
Local n-grams for author identification: Notebook for PAN at CLEF 2013 C3 - CEUR Workshop Proceedings
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2013
- Type: Text , Conference proceedings
- Full Text:
- Description: Our approach to the author identification task uses existing authorship attribution methods using local n-grams (LNG) and performs a weighted ensemble. This approach came in third for this year's competition, using a relatively simple scheme of weights by training set accuracy. LNG models create profiles, consisting of a list of character n-grams that best represent a particular author's writing. The use of a weighted ensemble improved upon the accuracy of the method without reducing the speed of the algorithm; the submitted solution was not only near the top of the leaderboard in terms of accuracy, but it was also one of the faster algorithms submitted.
Energy-efficient priority-based routing scheme for the healthcare wireless sensor networks
- Authors: Saeed, Ather , Stranieri, Andrew , Dazeley, Richard
- Date: 2014
- Type: Text , Conference paper
- Relation: 9th WSEAS International Conference on Remote Sensing, Budapest 10/12/2103 pg 19-27
- Full Text: false
- Reviewed:
- Description: Abstract: - In time-critical and data intensive applications, efficient acquisition of sensitive datasets is a challenge because of network congestion, void regions and node failures that commonly occur in wireless sensor networks (WSN), while monitoring the wellbeing of patients with serious medical conditions. The sensor devices attached to such patients are used for monitoring the vital signs of those with serious heart problems, Parkinson disease, Epilepsy and high blood pressure. This paper typically focuses on the reliable acquisition of datasets and provides a fault-tolerant priority based routing scheme with Dynamic Jumping (FTMPR-DJ) for the energy-efficient acquisition and dissemination of datasets. A new fault-tolerant scheme has been proposed that will significantly minimize data loss and network congestion and is well supported with extensive experiments to show effectiveness of the proposed routing scheme.
Real-time self-stabilizing scheme for the localization of faults in wireless sensor networks
- Authors: Saeed, Ather , Stranieri, Andrew , Dazeley, Richard
- Date: 2014
- Type: Text , Book chapter
- Relation: Recents advances in Image, Audio and Signal Processing p. 233-242
- Full Text: false
- Reviewed:
- Description: Reliable acquisition of data from massively dense wireless sensor networks (WSN) is a challenge due to the unpredictable behaviour of nodes responsible for collecting and disseminating datasets of interest. Therefore, accurate sensing of events from nodes depend on several microscopic and macroscopic factors such as distance of a node from the sink, radio signal strength and connectedness of network for routing datasets to the nearest sink. Several Clustering schemes have been proposed for routing datasets, where major focus was on finding the next cluster-head with maximum energy for routing data. Such schemes are not suitable for the real-time dissemination of datasets because electing the next cluster-head is a computational intensive process. A new energy-efficient self-stabilizing sliding rectangle protocol (ESSRP) is proposed in this paper for ensuring reliability and connectedness of regions for minimizing data loss and prolonging network life. The proposed scheme not only looks at the energy-balance of a particular cluster but also ensures fault-localization and tolerance by providing self-stabilization to network in the event of nodes or links failure using Green’s Theorem. The WSN rectangular regions should be oriented counter-clockwise, piecewise regular and continuously differentiable so that faults can be efficiently localized, identified and rectified in a particular region
Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables
- Authors: Dazeley, Richard , Vamplew, Peter , Bignold, Adam
- Date: 2015
- Type: Text , Conference paper
- Relation: 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Full Text: false
- Reviewed:
- Description: Value-based approaches to reinforcement learning (RL) maintain a value function that measures the long term utility of a state or state-action pair. A long standing issue in RL is how to create a finite representation in a continuous, and therefore infinite, state environment. The common approach is to use function approximators such as tile coding, memory or instance based methods. These provide some balance between generalisation, resolution, and storage, but converge slowly in multidimensional state environments. Another approach of quantizing state into lookup tables has been commonly regarded as highly problematic, due to large memory requirements and poor generalisation. In particular , attempting to reduce memory requirements and increase generalisation by using coarser quantization forms a non-Markovian system that does not converge. This paper investigates the problem in using quantized lookup tables and presents an extension to the Q-Learning algorithm, referred to as Coarse Q-Learning (C QL), which resolves these issues. The presented algorithm will be shown to drastically reduce the memory requirements and increase generalisation by simulating the Markov property. In particular, this algorithm means the size of the input space is determined by the granularity required by the policy being learnt, rather than by the inadequacies of the learning algorithm or the nature of the state-reward dynamics of the environment. Importantly, the method presented solves the problem represented by the curse of dimensionality.
Reinforcement learning of pareto-optimal multiobjective policies using steering
- Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron
- Date: 2015
- Type: Text , Conference paper
- Relation: 28th Australasian Joint Conference on Artificial Intelligence, AI 2015; Canberra, ACT; 30th November-4th December 2015 Vol. 9457, p. 596-608
- Full Text: false
- Reviewed:
- Description: There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available. © Springer International Publishing Switzerland 2015.
Softmax exploration strategies for multiobjective reinforcement learning
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 263, no. (2017), p. 74-86
- Full Text:
- Reviewed:
- Description: Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation.
Steering approaches to Pareto-optimal multiobjective reinforcement learning
- Authors: Vamplew, Peter , Issabekov, Rustam , Dazeley, Richard , Foale, Cameron , Berry, Adam , Moore, Tim , Creighton, Douglas
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 263, no. (2017), p. 26-38
- Full Text:
- Reviewed:
- Description: For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.
Human-aligned artificial intelligence is a multiobjective problem
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Firmin, Sally , Mummery, Jane
- Date: 2018
- Type: Text , Journal article
- Relation: Ethics and Information Technology Vol. 20, no. 1 (2018), p. 27-40
- Full Text:
- Reviewed:
- Description: As the capabilities of artificial intelligence (AI) systems improve, it becomes important to constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of ethical, legal and safety-based frameworks have been proposed as a basis for designing these constraints. Despite their variations, these frameworks share the common characteristic that decision-making must consider multiple potentially conflicting factors. We demonstrate that these alignment frameworks can be represented as utility functions, but that the widely used Maximum Expected Utility (MEU) paradigm provides insufficient support for such multiobjective decision-making. We show that a Multiobjective Maximum Expected Utility paradigm based on the combination of vector utilities and non-linear action–selection can overcome many of the issues which limit MEU’s effectiveness in implementing aligned AI. We examine existing approaches to multiobjective AI, and identify how these can contribute to the development of human-aligned intelligent agents. © 2017, Springer Science+Business Media B.V.
Non-functional regression : A new challenge for neural networks
- Authors: Vamplew, Peter , Dazeley, Richard , Foale, Cameron , Choudhury, Tanveer
- Date: 2018
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 314, no. (2018), p. 326-335
- Full Text:
- Reviewed:
- Description: This work identifies an important, previously unaddressed issue for regression based on neural networks – learning to accurately approximate problems where the output is not a function of the input (i.e. where the number of outputs required varies across input space). Such non-functional regression problems arise in a number of applications, and can not be adequately handled by existing neural network algorithms. To demonstrate the benefits possible from directly addressing non-functional regression, this paper proposes the first neural algorithm to do so – an extension of the Resource Allocating Network (RAN) which adds additional output neurons to the network structure during training. This new algorithm, called the Resource Allocating Network with Varying Output Cardinality (RANVOC), is demonstrated to be capable of learning to perform non-functional regression, on both artificially constructed data and also on the real-world task of specifying parameter settings for a plasma-spray process. Importantly RANVOC is shown to outperform not just the original RAN algorithm, but also the best possible error rates achievable by any functional form of regression.