A new solar power prediction method based on feature clustering and hybrid-classification-regression forecasting
- Authors: Nejati, Maryam , Amjady, Nima
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE transactions on sustainable energy Vol. 13, no. 2 (2022), p. 1188-1198
- Full Text: false
- Reviewed:
- Description: Solar generation systems are globally extending in terms of scale and number, which highlights the increasing importance of solar power forecast. In this paper, a day-ahead solar power prediction method is proposed including 1) a novel feature selecting/clustering approach based on relevancy and redundancy criteria and 2) an innovative hybrid-classification-regression forecasting engine. The proposed feature selecting/clustering approach filters out irrelevant features and partitions relevant features to two separate subsets to decrease the redundancy of features. Each of these two subsets is separately trained by one forecasting engine and the final solar power prediction of the proposed method is obtained by a relevancy-based combination of these two forecasts. The proposed forecasting engine classifies the historical data based on the learnability of its constituent regression models and assigns each class of training samples to one regression model. Each regression model predicts the outputs of the test samples that belong to its class. The effectiveness of the proposed solar power prediction method is illustrated by testing on two real-world solar farms.
Subgraph adaptive structure-aware graph contrastive learning
- Authors: Chen, Zhikui , Peng, Yin , Yu, Shuo , Cao, Chen , Xia, Feng
- Date: 2022
- Type: Text , Journal article
- Relation: Mathematics (Basel) Vol. 10, no. 17 (2022), p. 3047
- Full Text:
- Reviewed:
- Description: Graph contrastive learning (GCL) has been subject to more attention and been widely applied to numerous graph learning tasks such as node classification and link prediction. Although it has achieved great success and even performed better than supervised methods in some tasks, most of them depend on node-level comparison, while ignoring the rich semantic information contained in graph topology, especially for social networks. However, a higher-level comparison requires subgraph construction and encoding, which remain unsolved. To address this problem, we propose a subgraph adaptive structure-aware graph contrastive learning method (PASCAL) in this work, which is a subgraph-level GCL method. In PASCAL, we construct subgraphs by merging all motifs that contain the target node. Then we encode them on the basis of motif number distribution to capture the rich information hidden in subgraphs. By incorporating motif information, PASCAL can capture richer semantic information hidden in local structures compared with other GCL methods. Extensive experiments on six benchmark datasets show that PASCAL outperforms state-of-art graph contrastive learning and supervised methods in most cases.
Imbalanced data classification and its application in cyber security
- Authors: Moniruzzaman, Md
- Date: 2020
- Type: Text , Thesis , PhD
- Full Text:
- Description: Cyber security, also known as information technology security or simply as information security, aims to protect government organizations, companies and individuals by defending their computers, servers, electronic systems, networks, and data from malicious attacks. With the advancement of client-side on the fly web content generation techniques, it becomes easier for attackers to modify the content of a website dynamically and gain access to valuable information. The impact of cybercrime to the global economy is now more than ever, and it is growing day by day. Among various types of cybercrimes, financial attacks are widely spread and the financial sector is among most targeted. Both corporations and individuals are losing a huge amount of money each year. The majority portion of financial attacks is carried out by banking malware and web-based attacks. The end users are not always skilled enough to differentiate between injected content and actual contents of a webpage. Designing a real-time security system for ensuring a safe browsing experience is a challenging task. Some of the existing solutions are designed for client side and all the users have to install it in their system, which is very difficult to implement. In addition, various platforms and tools are used by organizations and individuals, therefore, different solutions are needed to be designed. The existing server-side solution often focuses on sanitizing and filtering the inputs. It will fail to detect obfuscated and hidden scripts. This is a realtime security system and any significant delay will hamper user experience. Therefore, finding the most optimized and efficient solution is very important. To ensure an easy installation and integration capabilities of any solution with the existing system is also a critical factor to consider. If the solution is efficient but difficult to integrate, then it may not be a feasible solution for practical use. Unsupervised and supervised data classification techniques have been widely applied to design algorithms for solving cyber security problems. The performance of these algorithms varies depending on types of cyber security problems and size of datasets. To date, existing algorithms do not achieve high accuracy in detecting malware activities. Datasets in cyber security and, especially those from financial sectors, are predominantly imbalanced datasets as the number of malware activities is significantly less than the number of normal activities. This means that classifiers for imbalanced datasets can be used to develop supervised data classification algorithms to detect malware activities. Development of classifiers for imbalanced data sets has been subject of research over the last decade. Most of these classifiers are based on oversampling and undersampling techniques and are not efficient in many situations as such techniques are applied globally. In this thesis, we develop two new algorithms for solving supervised data classification problems in imbalanced datasets and then apply them to solve malware detection problems. The first algorithm is designed using the piecewise linear classifiers by formulating this problem as an optimization problem and by applying the penalty function method. More specifically, we add more penalty to the objective function for misclassified points from minority classes. The second method is based on the combination of the supervised and unsupervised (clustering) algorithms. Such an approach allows one to identify areas in the input space where minority classes are located and to apply local oversampling or undersampling. This approach leads to the design of more efficient and accurate classifiers. The proposed algorithms are tested using real-world datasets. Results clearly demonstrate superiority of newly introduced algorithms. Then we apply these algorithms to design classifiers to detect malwares.
- Description: Doctor of Philosophy
Partial undersampling of imbalanced data for cyber threats detection
- Authors: Moniruzzaman, Md , Bagirov, Adil , Gondal, Iqbal
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 2020 Australasian Computer Science Week Multiconference, ACSW 2020
- Full Text:
- Reviewed:
- Description: Real-time detection of cyber threats is a challenging task in cyber security. With the advancement of technology and ease of access to the internet, more and more individuals and organizations are becoming the target for various cyber attacks such as malware, ransomware, spyware. The target of these attacks is to steal money or valuable information from the victims. Signature-based detection methods fail to keep up with the constantly evolving new threats. Machine learning based detection has drawn more attention of researchers due to its capability of detecting new and modified attacks based on previous attack's behaviour. The number of malicious activities in a certain domain is significantly low compared to the number of normal activities. Therefore, cyber threats detection data sets are imbalanced. In this paper, we proposed a partial undersampling method to deal with imbalanced data for detecting cyber threats. © 2020 ACM.
- Description: E1
THCluster: herb supplements categorization for precision traditional Chinese medicine
- Authors: Ruan, Chunyang , Wang, Ye , Zhang, Yanchun , Ma, Jiangang , Chen, Huijuan , Aickelin, Uwe , Zhu, Shanfeng , Zhang, Ting
- Date: 2020
- Type: Text , Conference proceedings
- Relation: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);Kansas City, MO, USA; 13-16 Nov. 2017 p. 417-424
- Full Text: false
- Reviewed:
- Description: There has been a continuing demand for traditional and complementary medicine worldwide. A fundamental and important topic in Traditional Chinese Medicine (TCM) is to optimize the prescription and to detect herb regularities from TCM data. In this paper, we propose a novel clustering model to solve this general problem of herb categorization, a pivotal task of prescription optimization and herb regularities. The model utilizes Random Walks method, Bayesian rules and Expectation Maximization(EM) models to complete a clustering analysis effectively on a heterogeneous information network. We performed extensive experiments on the real-world datasets and compared our method with other algorithms and experts. Experimental results have demonstrated the effectiveness of the proposed model for discovering useful categorization of herbs and its potential clinical manifestations.
Impact of node deployment and routing for protection of critical infrastructures
- Authors: Subhan, Fazli , Noreen, Madiha , Imran, Muhammad , Tariq, Moeenuddin , Khan, Asfandyar , Shoaib, Muhammad
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Access Vol. 7, no. (2019), p. 11502-11514
- Full Text:
- Reviewed:
- Description: Recently, linear wireless sensor networks (LWSNs) have been eliciting increasing attention because of their suitability for applications such as the protection of critical infrastructures. Most of these applications require LWSN to remain operational for a longer period. However, the non-replenishable limited battery power of sensor nodes does not allow them to meet these expectations. Therefore, a shorter network lifetime is one of the most prominent barriers in large-scale deployment of LWSN. Unlike most existing studies, in this paper, we analyze the impact of node placement and clustering on LWSN network lifetime. First, we categorize and classify existing node placement and clustering schemes for LWSN and introduce various topologies for disparate applications. Then, we highlight the peculiarities of LWSN applications and discuss their unique characteristics. Several application domains of LWSN are described. We present three node placement strategies (i.e., linear sequential, linear parallel, and grid) and various deployment methods such as random, uniform, decreasing distance, and triangular. Extensive simulation experiments are conducted to analyze the performance of the three state-of-the-art routing protocols in the context of node deployment strategies and methods. The experimental results demonstrate that the node deployment strategies and methods significantly affect LWSN lifetime. © 2013 IEEE.
Machine learning in mental health: a scoping review of methods and applications
- Authors: Shatte, Adrian , Hutchinson, Delyse , Teague, Samantha
- Date: 2019
- Type: Text , Journal article
- Relation: Psychological Medicine Vol. 49, no. 9 (2019), p. 1426-1448
- Full Text: false
- Reviewed:
- Description: This paper aims to synthesise the literature on machine learning (ML) and big data applications for mental health, highlighting current research and applications in practice. We employed a scoping review methodology to rapidly map the field of ML in mental health. Eight health and information technology research databases were searched for papers covering this domain. Articles were assessed by two reviewers, and data were extracted on the article's mental health application, ML technique, data type, and study results. Articles were then synthesised via narrative review. Three hundred papers focusing on the application of ML to mental health were identified. Four main application domains emerged in the literature, including: (i) detection and diagnosis (ii) prognosis, treatment and support (iii) public health, and (iv) research and clinical administration. The most common mental health conditions addressed included depression, schizophrenia, and Alzheimer's disease. ML techniques used included support vector machines, decision trees, neural networks, latent Dirichlet allocation, and clustering. Overall, the application of ML to mental health has demonstrated a range of benefits across the areas of diagnosis, treatment and support, research, and clinical administration. With the majority of studies identified focusing on the detection and diagnosis of mental health conditions, it is evident that there is significant room for the application of ML to other areas of psychology and mental health. The challenges of using ML techniques are discussed, as well as opportunities to improve and advance the field.
A new perceptual dissimilarity measure for image retrieval and clustering
- Authors: Shojanazeri, Hamid
- Date: 2018
- Type: Text , Thesis , PhD
- Full Text:
- Description: Image retrieval and clustering are two important tools for analysing and organising images. Dissimilarity measure is central to both image retrieval and clustering. The performance of image retrieval and clustering algorithms depends on the effectiveness of the dissimilarity measure. ‘Minkowski’ distance, or more specifically, ‘Euclidean’ distance, is the most widely used dissimilarity measure in image retrieval and clustering. Euclidean distance depends only on the geometric position of two data instances in the feature space and completely ignores the data distribution. However, data distribution has an effect on human perception. The argument that two data instances in a dense area are more perceptually dissimilar than the same two instances in a sparser area, is proposed by psychologists. Based on this idea, a dissimilarity measure called, ‘mp’, has been proposed to address Euclidean distance’s limitation of ignoring the data distribution. Here, mp relies on data distribution to calculate the dissimilarity between two instances. As prescribed in mp, higher data mass between two data instances implies higher dissimilarity, and vice versa. mp relies only on data distribution and completely ignores the geometric distance in its calculations. In the aggregation of dissimilarities between two instances over all the dimensions in feature space, both Euclidean distance and mp give same priority to all the dimensions. This may result in a situation that the final dissimilarity between two data instances is determined by a few dimensions of feature vectors with relatively much higher values. As a result, the dissimilarity derived may not align well with human perception. The need to address the limitations of Minkowski distance measures, along with the importance of a dissimilarity measure that considers both geometric distance and the perceptual effect of data distribution in measuring dissimilarity between images motivated this thesis. It studies the performance of mp for image retrieval. It investigates a new dissimilarity measure that combines both Euclidean distance and data distribution. In addition to these, it studies the performance of such a dissimilarity measure for image retrieval and clustering. Our performance study of mp for image retrieval shows that relying only on data distribution to measure the dissimilarity results in some situations, where the mp’s measurement is contrary to human perception. This thesis introduces a new dissimilarity measure called, perceptual dissimilarity measure (PDM). PDM considers the perceptual effect of data distribution in combination with Euclidean distance. PDM has two variants, PDM1 and PDM2. PDM1 focuses on improving mp by weighting it using Euclidean distance in situations where mp may not retrieve accurate results. PDM2 considers the effect of data distribution on the perceived dissimilarity measured by Euclidean distance. PDM2 proposes a weighting system for Euclidean distance using a logarithmic transform of data mass. The proposed PDM variants have been used as alternatives to Euclidean distance and mp to improve the accuracy in image retrieval. Our results show that PDM2 has consistently performed the best, compared to Euclidean distance, mp and PDM1. PDM1’s performance was not consistent, although it has performed better than mp in all the experiments, but it could not outperform Euclidean distance in some cases. Following the promising results of PDM2 in image retrieval, we have studied its performance for image clustering. k-means is the most widely used clustering algorithm in scientific and industrial applications. k-medoids is the closest clustering algorithm to k-means. Unlike k-means which works only with Euclidean distance, k-medoids gives the option to choose the arbitrary dissimilarity measure. We have used Euclidean distance, mp and PDM2 as the dissimilarity measure in k-medoids and compared the results with k-means. Our clustering results show that PDM2 has perfromed overally the best. This confirms our retrieval results and identifies PDM2 as a suitable dissimilarity measure for image retrieval and clustering.
- Description: Doctor of Philosophy
Influence of clustering on the opinion formation dynamics in online social networks
- Authors: Das, Rajkumar , Kamruzzaman, Joarder , Karmakar, Gour
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 25th International Conference on Neural Information Processing, ICONIP 2018; Siem Reap; Cambodia; 13th-16th December 2018; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11306 LNCS, p. 144-155
- Full Text: false
- Reviewed:
- Description: With the advent of Online Social Networks (OSNs), opinion formation dynamics continuously evolves, mainly because of the widespread use of OSNs as a platform of social interactions and our growing exposure to others’ opinions instantly. When presented with neighbours’ opinions in OSNs, the natural clustering ability of human agents enables them to perceive the grouping of opinions formed in the neighbourhood. A group with similar opinions exhibits stronger influence on an agent than the individual group members. Distance-based opinion formation models only consider the influence of neighbours who are within a confidence bound threshold in the opinion space. However, a bigger group formed outside this distance threshold can exhibit stronger influence than a group within the bound, especially when that group contains influential or popular agents like leaders. To the knowledge of the authors, the proposed model is the first to consider the impact of clustering capability of agent and incorporates the influence of opinion clusters (groups) formed outside the confidence bound. Simulation results show that our model can capture several characteristics of real-world opinion dynamics. © Springer Nature Switzerland AG 2018.
Neighbourhood contrast : A better means to detect clusters than density
- Authors: Chen, Bo , Ting, Kaiming
- Date: 2018
- Type: Text , Conference paper
- Relation: 22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018; Melbourne, Australia; 3rd-6th June 2018; published in Lecutre Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10939 LNAI, p. 401-412
- Full Text: false
- Reviewed:
- Description: Most density-based clustering algorithms suffer from large density variations among clusters. This paper proposes a new measure called Neighbourhood Contrast (NC) as a better alternative to density in detecting clusters. The proposed NC admits all local density maxima, regardless of their densities, to have similar NC values. Due to this unique property, NC is a better means to detect clusters in a dataset with large density variations among clusters. We provide two applications of NC. First, replacing density with NC in the current state-of-the-art clustering procedure DP leads to significantly improved clustering performance. Second, we devise a new clustering algorithm called Neighbourhood Contrast Clustering (NCC) which does not require density or distance calculations, and therefore has a linear time complexity in terms of dataset size. Our empirical evaluation shows that both NC-based methods outperform density-based methods including the current state-of-the-art.
VANET–LTE based heterogeneous vehicular clustering for driving assistance and route planning applications
- Authors: Ahmad, Iftikhar , Noor, Rafidah , Ahmedy, Ismail , Shah, Syed , Imran, Muhammad
- Date: 2018
- Type: Text , Journal article
- Relation: Computer Networks Vol. 145, no. (2018), p. 128-140
- Full Text: false
- Reviewed:
- Description: The Internet of vehicles incorporates multiple access networks and technologies to connect vehicles on roads. These vehicles usually require the use of individual long-term evolution (LTE) connections to send/receive data to/from a remote server to make smart decisions regarding route planning and driving. An increasing number of vehicles on the roads may not only overwhelm LTE network usage but also incur added cost. Clustering helps minimize LTE usage, but the high speed of vehicles renders connections unstable and unreliable not only among vehicles but also between vehicles and the LTE network. Moreover, non-cooperative behavior among vehicles within a cluster is a bottleneck in sharing costly data acquired from the Internet. To address these issues, we propose a novel destination- and interest-aware clustering (DIAC) mechanism. DIAC primarily incorporates a strategic game-theoretic algorithm and a self-location calculation algorithm. The former allows vehicles to participate/cooperate and enforces a fair-use policy among the cluster members (CMs), whereas the latter enables CMs to calculate their location coordinates in the absence of a global positioning system under an urban topography. DIAC strives to reduce the frequency of link failures not only among vehicles but also between each vehicle and the 3G/LTE network. The mechanism also considers vehicle mobility and LTE link quality and exploits common interests among vehicles in the cluster formation phase. The performance of the DIAC mechanism is validated through extensive simulations, whose results demonstrate that the performance of the proposed mechanism is superior to that of similar and existing approaches. © 2018 Elsevier B.V.
A framework for clustering and dynamic maintenance of xml documents
- Authors: Al-Shammari, Ahmed , Liu, Chengfei , Naseriparsa, Mehdi , Vo, Bao , Anwar, Tarique , Zhou, Rrui
- Date: 2017
- Type: Text , Conference paper
- Relation: 13th International Conference on Advanced Data Mining and Applications, ADMA 2017 Vol. 10604 LNAI, p. 399-412
- Full Text: false
- Reviewed:
- Description: Web data clustering has been widely studied in the data mining communities. However, dynamic maintenance of the web data clusters is still a challenging task. In this paper, we propose a novel framework called XClusterMaint which serves for both clustering and maintenance of the XML documents. For clustering, we take both structure and content into account and propose an efficient solution for grouping the documents based on the combination of structure and content similarity. For maintenance, we propose an incremental approach for maintaining the existing clusters dynamically when we receive new incoming XML documents. Since the dynamic maintenance of the clusters is computationally expensive, we also propose an improved approach which uses a lazy maintenance scheme to improve the performance of the clusters maintenance. The experimental results on real datasets verify the efficiency of the proposed clustering and maintenance model. © Springer International Publishing AG 2017.
A new optimal power flow approach for wind energy integrated power systems
- Authors: Rahmani, Shima , Amjady, Nima
- Date: 2017
- Type: Text , Journal article
- Relation: Energy Vol. 134, no. (2017), p. 349-359
- Full Text: false
- Reviewed:
- Description: Penetration of wind generation into power systems in recent years has greatly affected optimal power flow (OPF) because of the uncertain behavior of this new energy resource. In this research work, at first, a novel scenario generation approach is proposed to model wind power (WP) uncertainty. The proposed scenario generation approach includes construction of probability density function (PDF) pertaining to WP forecast error, segmentation of the PDF by an efficient clustering approach to obtain both the optimal number and the optimal arrangement of the clusters, and the generation of WP scenarios using the optimized clusters through roulette wheel mechanism. Secondly, this paper presents a new OPF framework based on DC network modeling for wind generation integrated power systems. Thirdly, a new out-of-sample analysis is presented to evaluate the long-run performance of the proposed OPF approach encountering various realizations of uncertain WPs. Finally, the performance of the proposed method for solving WP-integrated OPF problem is extensively illustrated on the IEEE 30-bus and the IEEE 118-bus test systems and compared with the performance of the deterministic method and the Weibull PDF method. These comparisons illustrate better performance of the proposed method, while it has reasonable computation times. •A new scenario generation approach is presented.•A new wind power integrated optimal power model is proposed.•A new out-of-sample analysis is presented.•The effectiveness of the proposed model is extensively illustrated.
An overview of geospatial methods used in unintentional injury epidemiology
- Authors: Singh, Himalaya , Fortington, Lauren , Thompson, Helen , Finch, Caroline
- Date: 2016
- Type: Text , Journal article
- Relation: Injury Epidemiology Vol. 3, no. 32 (2016), p. 1-12
- Relation: http://purl.org/au-research/grants/nhmrc/1058737
- Full Text:
- Reviewed:
- Description: BACKGROUND: Injuries are a leading cause of death and disability around the world. Injury incidence is often associated with socio-economic and physical environmental factors. The application of geospatial methods has been recognised as important to gain greater understanding of the complex nature of injury and the associated diverse range of geographically-diverse risk factors. Therefore, the aim of this paper is to provide an overview of geospatial methods applied in unintentional injury epidemiological studies. METHODS: Nine electronic databases were searched for papers published in 2000-2015, inclusive. Included were papers reporting unintentional injuries using geospatial methods for one or more categories of spatial epidemiological methods (mapping; clustering/cluster detection; and ecological analysis). Results describe the included injury cause categories, types of data and details relating to the applied geospatial methods. RESULTS: From over 6,000 articles, 67 studies met all inclusion criteria. The major categories of injury data reported with geospatial methods were road traffic (n = 36), falls (n = 11), burns (n = 9), drowning (n = 4), and others (n = 7). Grouped by categories, mapping was the most frequently used method, with 62 (93%) studies applying this approach independently or in conjunction with other geospatial methods. Clustering/cluster detection methods were less common, applied in 27 (40%) studies. Three studies (4%) applied spatial regression methods (one study using a conditional autoregressive model and two studies using geographically weighted regression) to examine the relationship between injury incidence (drowning, road deaths) with aggregated data in relation to explanatory factors (socio-economic and environmental). CONCLUSION: The number of studies using geospatial methods to investigate unintentional injuries has increased over recent years. While the majority of studies have focused on road traffic injuries, other injury cause categories, particularly falls and burns, have also demonstrated the application of these methods. Geospatial investigations of injury have largely been limited to mapping of data to visualise spatial structures. Use of more sophisticated approaches will help to understand a broader range of spatial risk factors, which remain under-explored when using traditional epidemiological approaches.
Constrained self organizing maps for data clusters visualization
- Authors: Mohebi, Ehsan , Bagirov, Adil
- Date: 2016
- Type: Text , Journal article
- Relation: Neural Processing Letters Vol. 43, no. 3 (2016), p. 849-869
- Full Text: false
- Reviewed:
- Description: High dimensional data visualization is one of the main tasks in the field of data mining and pattern recognition. The self organizing maps (SOM) is one of the topology visualizing tool that contains a set of neurons that gradually adapt to input data space by competitive learning and form clusters. The topology preservation of the SOM strongly depends on the learning process. Due to this limitation one cannot guarantee the convergence of the SOM in data sets with clusters of arbitrary shape. In this paper, we introduce Constrained SOM (CSOM), the new version of the SOM by modifying the learning algorithm. The idea is to introduce an adaptive constraint parameter to the learning process to improve the topology preservation and mapping quality of the basic SOM. The computational complexity of the CSOM is less than those with the SOM. The proposed algorithm is compared with similar topology preservation algorithms and the numerical results on eight small to large real-world data sets demonstrate the efficiency of the proposed algorithm. © 2015, Springer Science+Business Media New York.
Data sharing in secure multimedia wireless sensor networks
- Authors: Usman, Muhammad , Jan, Mian Ahmad , Xiangjian, He , Nanda, Priyadarsi
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 IEEE Trustcom/BigDataSE/ISPA;Tianjin, China; 23-26 August 2016 p. 590-597
- Full Text: false
- Reviewed:
- Description: The use of Multimedia Wireless Sensor Networks (MWSNs) is becoming common nowadays with a rapid growth in communication facilities. Similar to any other WSNs, these networks face various challenges while providing security, trust and privacy for user data. Provisioning of the aforementioned services become an uphill task especially while dealing with real-time streaming data. These networks operates with resource-constrained sensor nodes for days, months and even years depending on the nature of an application. The resource-constrained nature of these networks makes it difficult for the nodes to tackle real-time data in mission-critical applications such as military surveillance, forest fire monitoring, health-care and industrial automation. For a secured MWSN, the transmission and processing of streaming data needs to be explored deeply. The conventional data authentication schemes are not suitable for MWSNs due to the limitations imposed on sensor nodes in terms of battery power, computation, available bandwidth and storage. In this paper, we propose a novel quality-driven clustering-based technique for authenticating streaming data in MWSNs. Nodes with maximum energy are selected as Cluster Heads (CHs). The CHs collect data from member nodes and forward it to the Base Station (BS), thus preventing member nodes with low energy from dying soon and increasing life span of the underlying network. The proposed approach not only authenticates the streaming data but also maintains the quality of transmitted data. The proposed data authentication scheme coupled with an Error Concealment technique provides an energy-efficient and distortion-free real-time data streaming. The proposed scheme is compared with an unsupervised resources scenario. The simulation results demonstrate better network lifetime along with 21.34 dB gain in Peak Signal-to-Noise Ratio (PSNR) of received video data streams.
Frequency decomposition based gene clustering
- Authors: Rahman, Md Abdur , Chetty, Madhu , Bulach, Dieter , Wangikar, Pramod
- Date: 2015
- Type: Text , Conference paper
- Relation: 22nd International Conference on Neural Information Processing, ICONIP 2015; Istanbul, Turkey; 9th-12th November 2015 Vol. 9490, p. 170-181
- Full Text: false
- Reviewed:
- Description: Gene expressions have been commonly applied to understand the inherent underlying mechanism of known biological processes. Although the microarray gene expressions usually appear aperiodic, with proper signal processing techniques, its periodic components can be easily obtained. Thus, if expressions of interconnected (regulatory and regulated) genes are decomposed, at least one common frequency component will appear in these genes. Exploiting this novel concept, we propose a frequency decomposition approach for gene clustering to better understand the gene interconnection topology. This method, based on Hilbert Huang Transform (HHT) enables us to segregate every periodic component of the gene expressions. Next, a multilevel clustering is performed based on these frequency components. Unlike existing clustering algorithms, the proposed method assimilates a meaningful knowledge of the gene interactions topology. The information related to underlying gene interactions is vital and can prove useful in many existing evolutionary optimisation algorithms for genetic network reconstruction. We validate the entire approach by its application to a 15-gene synthetic network. © Springer International Publishing Switzerland 2015.
Functional specialisation and socio-economic factors in population change : A clustering study in non-metropolitan Australia
- Authors: Mardaneh, Karim
- Date: 2015
- Type: Text , Journal article
- Relation: Urban Studies Vol. 53, no. 8 (2015), p. 1591-1616
- Full Text: false
- Reviewed:
- Description: Although research has examined population growth and decline using functional specialisation, little attention has been paid to the possible combined effects of functional specialisation and socio-economic factors on population change. Using the Australian Bureau of Statistics Census Data 2001–2006 for statistical local areas, this study presents an investigation of the role of both functional specialisation and socio-economic factors in population change in non-metropolitan areas under the sustenance framework. The uniqueness of the study is twofold. Conceptually it develops a framework to compare the combined role of functional specialisation and socio-economic factors on population change; and, empirically it uses data mining (cluster analysis) techniques to investigate the extent of this combined role. The results show the significance of both functional specialisation and socio-economic factors. Policy implications of the study indicate the need to examine regional development and population change in relation to functional specialisation and socio-economic factors and their impact on viability of non-metropolitan areas. © Urban Studies Journal Limited 2015.
REPLOT : REtrieving Profile Links on Twitter for malicious campaign discovery
- Authors: Perez, Charles , Birregah, Babiga , Layton, Robert , Lemercier, Marc , Watters, Paul
- Date: 2015
- Type: Text , Journal article
- Relation: AI Communications Vol. 29, no. 1 (2015), p. 107-122
- Full Text:
- Reviewed:
- Description: Social networking sites are increasingly subject to malicious activities such as self-propagating worms, confidence scams and drive-by-download malwares. The high number of users associated with the presence of sensitive data, such as personal or professional information, is certainly an unprecedented opportunity for attackers. These attackers are moving away from previous platforms of attack, such as emails, towards social networking websites. In this paper, we present a full stack methodology for the identification of campaigns of malicious profiles on social networking sites, composed of maliciousness classification, campaign discovery and attack profiling. The methodology named REPLOT, for REtrieving Profile Links On Twitter, contains three major phases. First, profiles are analysed to determine whether they are more likely to be malicious or benign. Second, connections between suspected malicious profiles are retrieved using a late data fusion approach consisting of temporal and authorship analysis based models to discover campaigns. Third, the analysis of the discovered campaigns is performed to investigate the attacks. In this paper, we apply this methodology to a real world dataset, with a view to understanding the links between malicious profiles, their attack methods and their connections. Our analysis identifies a cluster of linked profiles focusing on propagating malicious links, as well as profiling two other major clusters of attacking campaigns. © 2016 - IOS Press and the authors. All rights reserved.
LiNearN : A new approach to nearest neighbour density estimator
- Authors: Wells, Jonathan , Ting, Kaiming , Washio, Takashi
- Date: 2014
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 47, no. 8 (2014), p. 2702-2720
- Full Text: false
- Reviewed:
- Description: Despite their wide spread use, nearest neighbour density estimators have two fundamental limitations: O(n2) time complexity and O(n) space complexity. Both limitations constrain nearest neighbour density estimators to small data sets only. Recent progress using indexing schemes has improved to near linear time complexity only.We propose a new approach, called LiNearN for Linear time Nearest Neighbour algorithm, that yields the first nearest neighbour density estimator having O(n) time complexity and constant space complexity, as far as we know. This is achieved without using any indexing scheme because LiNearN uses a subsampling approach for which the subsample values are significantly less than the data size. Like existing density estimators, our asymptotic analysis reveals that the new density estimator has a parameter to trade off between bias and variance. We show that algorithms based on the new nearest neighbour density estimator can easily scale up to data sets with millions of instances in anomaly detection and clustering tasks. Highlights•Reject the premise that a NN algorithm must find the NN for every instance.•The first NN density estimator that has O(n) time complexity and O(1) space complexity.•These complexities are achieved without using any indexing scheme.•Our asymptotic analysis reveals that it trades off between bias and variance.•Easily scales up to large data sets in anomaly detection and clustering tasks.