Machine learning-based modelling for museum visitations prediction
- Authors: Yap, Norman , Gong, Mingwei , Naha, Ranesh , Mahanti, Aniket
- Date: 2020
- Type: Text , Conference proceedings
- Relation: 2020 International Symposium on Networks, Computers and Communications (ISNCC); Montreal, Canada; 20-22nd October, 2020, p.1-7
- Full Text: false
- Reviewed:
- Description: Cultural venues like museums increasingly seek to harness the value of data analytics to make data driven decisions related to exhibitions duration, marketing campaigns, resource planning, and revenue optimization. One key priority is the need to understand the influencing factors behind visitor attendance. Using data collected from a large museum, we investigated whether the weather has a significant impact on visitor attendance or that other factors are more important. We applied the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology to perform the research, developed and built four different types of regression models using R and its machine learning packages to model visitor attendance. The models were trained and evaluated. Predictions of visitor attendance were then generated from each of the four models and forecast accuracy was measured. The extreme gradient boost model was the best model with the highest average forecast accuracy of 93% and lowest forecast variability when benchmarked against the actual visitor attendance from the test data set. The weather was not considered to be as significant in predicting visitor trends and numbers to the museum compared to factors like time of the day, day of the week and school holidays. However, it was still measured to have a slight impact as excluding weather variables resulted in a model with a poorer fit. Weather can potentially have a more marked impact on cultural attractions in more extreme weather environments and outdoor venues.
Categorical features transformation with compact one-hot encoder for fraud detection in distributed environment
- Authors: Ul Haq, Ikram , Gondal, Iqbal , Vamplew, Peter , Brown, Simon
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 16th Australasian Conference on Data Mining, AusDM 2018; Bathurst, NSW; 28 November 2018 through 30 November 2018 Vol. 996, p. 69-80
- Full Text: false
- Reviewed:
- Description: Fraud detection for online banking is an important research area, but one of the challenges is the heterogeneous nature of transactions data i.e. a combination of numeric as well as mixed attributes. Usually, numeric format data gives better performance for classification, regression and clustering algorithms. However, many machine learning problems have categorical, or nominal features, rather than numeric features only. In addition, some machine learning platforms such as Apache Spark accept numeric data only. One-hot Encoding (OHE) is a widely used approach for transforming categorical features to numerical features in traditional data mining tasks. The one-hot approach has some challenges as well: the sparseness of the transformed data and that the distinct values of an attribute are not always known in advance. Other than the model accuracy, compactness of machine learning models is equally important due to growing memory and storage needs. This paper presents an innovative technique to transform categorical features to numeric features by compacting sparse data even if all the distinct values are not known. The transformed data can be used for the development of fraud detection systems. The accuracy of the results has been validated on synthetic and real bank fraud data and a publicly available anomaly detection (KDD-99) dataset on a multi-node data cluster. © Springer Nature Singapore Pte Ltd. 2019.
Discovering regularities from traditional chinese medicine prescriptions via bipartite embedding model
- Authors: Ruan, Chunyang , Ma, Jiangang , Wang, Ye , Zhang, Yanchun , Yang, Yun
- Date: 2019
- Type: Text , Conference proceedings
- Relation: International Joint Conferences on Artificial Intelligence (IJCAI-49); Macao, China; 10th-16th August 2019; published in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) p. 3346-3352
- Full Text: false
- Reviewed:
- Description: Regularities analysis for prescriptions is a significant task for traditional Chinese medicine (TCM), both in inheritance of clinical experience and in improvement of clinical quality. Recently, many methods have been proposed for regularities discovery, but this task is challenging due to the quantity, sparsity and free-style of prescriptions. In this paper, we address the specific problem of regularities discovery and propose a graph embedding based framework for regularities discovery for massive prescriptions. We model this task as a relation prediction in which the correlation of two herbs or of herb and symptom are incorporated to characterize the different relationships. Specifically, we first establish a heterogeneous network with herbs and symptoms as its nodes. We develop a bipartite embedding model termed HS2Vec to detect regularities, which explores multiple relations of herbherb, and herb-symptom based on the heterogeneous network. Experiments on four real-world datasets demonstrate that the proposed framework is very effective for regularities discovery.
Multi-source cyber-attacks detection using machine learning
- Authors: Taheri, Sona , Gondal, Iqbal , Bagirov, Adil , Harkness, Greg , Brown, Simon , Chi, Chihung
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 IEEE International Conference on Industrial Technology, ICIT 2019; Melbourne, Australia; 13th-15th February 2019 Vol. 2019-February, p. 1167-1172
- Full Text:
- Reviewed:
- Description: The Internet of Things (IoT) has significantly increased the number of devices connected to the Internet ranging from sensors to multi-source data information. As the IoT continues to evolve with new technologies number of threats and attacks against IoT devices are on the increase. Analyzing and detecting these attacks originating from different sources needs machine learning models. These models provide proactive solutions for detecting attacks and their sources. In this paper, we propose to apply a supervised machine learning classification technique to identify cyber-attacks from each source. More precisely, we apply the incremental piecewise linear classifier that constructs boundary between sources/classes incrementally starting with one hyperplane and adding more hyperplanes at each iteration. The algorithm terminates when no further significant improvement of the separation of sources/classes is possible. The construction and usage of piecewise linear boundaries allows us to avoid any possible overfitting. We apply the incremental piecewise linear classifier on the multi-source real world cyber security data set to identify cyber-attacks and their sources.
- Description: Proceedings of the IEEE International Conference on Industrial Technology
A server side solution for detecting webInject : A machine learning approach
- Authors: Moniruzzaman, Md , Bagirov, Adil , Gondal, Iqbal , Brown, Simon
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2018; Melbourne, Australia; 3rd June 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11154 LNAI, p. 162-167
- Full Text: false
- Reviewed:
- Description: With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. A majority portion of online attacks is now done by WebInject. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is a challenging task. In addition, various platforms and tools are used by individuals, so different solutions needed to be designed. Existing server side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. In this paper, we propose a server side solution using a machine learning approach to detect WebInject in banking websites. Unlike other techniques, our method collects features of a Document Object Model (DOM) and classifies it with the help of a pre-trained model.
Deep user modeling for content-based event recommendation in event-based social networks
- Authors: Wang, Zhibo , Zhang, Yongquan , Chen, Honglong , Li, Zhetao , Xia, Feng
- Date: 2018
- Type: Text , Conference proceedings
- Relation: p. 1304-1312
- Full Text: false
- Reviewed:
- Description: Event-based social networks (EBSNs) are the newly emerging social platforms for users to publish events online and attract others to attend events offline. The content information of events plays an important role in event recommendation. However, the content-based approaches in existing event recommender systems cannot fully represent the preference of each user on events since most of them focus on exploiting the content information from events' perspective, and the bag-of-words model, commonly used by them, can only capture word frequency but ignore word orders and sentence structure. In this paper, we shift the focus from events' perspective to users' perspective, and propose a Deep User Modeling framework for Event Recommendation (DUMER) to characterize the preference of users by exploiting the contextual information of events that users have attended. Specifically, we utilize convolutional neural network (CNN) with word embedding to deeply capture the contextual information of a user's interested events and build up a user latent model for each user. We then incorporate the user latent model into probabilistic matrix factorization (PMF) model to enhance the recommendation accuracy. We conduct experiments on the real-world dataset crawled from a typical EBSN, Meetup.com, and the experimental results show that DUMER outperforms the compared benchmarks.
Secure passive keyless entry and start system using machine learning
- Authors: Ahmad, Usman , Song, Hong , Bilal, Awais , Alazab, Mamoun , Jolfaei, Alireza
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 11th International Conference on Security, Privacy and Anonymity in Computation, Communication, and Storage, SpaCCS 2018; Melbourne, Australia; 11th-13th December 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11342 LNCS, p. 304-313
- Full Text: false
- Reviewed:
- Description: Despite the benefits of the passive keyless entry and start (PKES) system in improving the locking and starting capabilities, it is vulnerable to relay attacks even though the communication is protected using strong cryptographic techniques. In this paper, we propose a data-intensive solution based on machine learning to mitigate relay attacks on PKES Systems. The main contribution of the paper, beyond the novelty of the solution in using machine learning, is in (1) the use of a set of security features that accurately profiles the PKES system, (2) identifying abnormalities in PKES regular behavior, and (3) proposing a countermeasure that guarantees a desired probability of detection with a fixed false alarm rate by trading off the training time and accuracy. We evaluated our method using the last three months log of a PKES system using the Decision Tree, SVM, KNN and ANN and provide the comparative analysis of the relay attack detection results. Our proposed framework leverages the accuracy of supervised learning on known classes with the adaptability of k-fold cross-validation technique for identifying malicious and suspicious activities. Our test results confirm the effectiveness of the proposed solution in distinguishing relayed messages from legitimate transactions.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)