Motifs in big networks : methods and applications
- Yu, Shuo, Xu, Jin, Zhang, Chen, Xia, Feng, Almakhadmeh, Zafer, Tolba, Amr
- Authors: Yu, Shuo , Xu, Jin , Zhang, Chen , Xia, Feng , Almakhadmeh, Zafer , Tolba, Amr
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Access Vol. 7, no. (2019), p. 183322-183338
- Full Text:
- Reviewed:
- Description: Motifs have been recognized as basic network blocks and are found to be quite powerful in modeling certain patterns. Generally speaking, local characteristics of big networks could be reflected in network motifs. Over the years, motifs have attracted a lot of attention from researchers. However, most current literature reviews on motifs generally focus on the field of biological science. In contrast, here we try to present a comprehensive survey on motifs in the context of big networks. We introduce the definition of motifs and other related concepts. Big networks with motif-based structures are analyzed. Specifically, we respectively analyze four kinds of networks, including biological networks, social networks, academic networks, and infrastructure networks. We then examine methods for motif discovery, motif counting, and motif clustering. The applications of motifs in different areas have also been reviewed. Finally, some challenges and open issues in this direction are discussed. © 2013 IEEE.
- Authors: Yu, Shuo , Xu, Jin , Zhang, Chen , Xia, Feng , Almakhadmeh, Zafer , Tolba, Amr
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Access Vol. 7, no. (2019), p. 183322-183338
- Full Text:
- Reviewed:
- Description: Motifs have been recognized as basic network blocks and are found to be quite powerful in modeling certain patterns. Generally speaking, local characteristics of big networks could be reflected in network motifs. Over the years, motifs have attracted a lot of attention from researchers. However, most current literature reviews on motifs generally focus on the field of biological science. In contrast, here we try to present a comprehensive survey on motifs in the context of big networks. We introduce the definition of motifs and other related concepts. Big networks with motif-based structures are analyzed. Specifically, we respectively analyze four kinds of networks, including biological networks, social networks, academic networks, and infrastructure networks. We then examine methods for motif discovery, motif counting, and motif clustering. The applications of motifs in different areas have also been reviewed. Finally, some challenges and open issues in this direction are discussed. © 2013 IEEE.
CRI : measuring city infection risk amid COVID-19
- Liu, Mingliang, Yu, Shuo, Chu, Xinbei, Xia, Feng
- Authors: Liu, Mingliang , Yu, Shuo , Chu, Xinbei , Xia, Feng
- Date: 2020
- Type: Text , Conference paper
- Relation: 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2020
- Full Text: false
- Reviewed:
- Description: The outbreak of COVID-19 has brought incalculable economy and life losses. Accurately assessing the risk of a certain city can help formulate effective measures to prevent and control COVID-19 in time. It will be of great significance for us to measure city risk in infection amid epidemics. City risk in infection is related to many factors. To address this problem, this paper proposes city risk index (CRI) to measure city risk in infection, considering the following four perspectives: Economy (i.e., GDP and FCI), technology (i.e., education and innovation), population, and geographical position (i.e., latitude and longitude). The experimental results show that CRI can be effectively employed to measure city risk in infection amid COVID-19 as well as other similar epidemics. The proposed CRI can be used to guide policymakers for better emergency management policies making when coping with COVID-19. © 2020 IEEE.
Detecting outlier patterns with query-based artificially generated searching conditions
- Yu, Shuo, Xia, Feng, Sun, Yuchen, Tang, Tao, Yan, Xiaoran, Lee, Ivan
- Authors: Yu, Shuo , Xia, Feng , Sun, Yuchen , Tang, Tao , Yan, Xiaoran , Lee, Ivan
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Computational Social Systems Vol. 8, no. 1 (2021), p. 134-147
- Full Text:
- Reviewed:
- Description: In the age of social computing, finding interesting network patterns or motifs is significant and critical for various areas, such as decision intelligence, intrusion detection, medical diagnosis, social network analysis, fake news identification, and national security. However, subgraph matching remains a computationally challenging problem, let alone identifying special motifs among them. This is especially the case in large heterogeneous real-world networks. In this article, we propose an efficient solution for discovering and ranking human behavior patterns based on network motifs by exploring a user's query in an intelligent way. Our method takes advantage of the semantics provided by a user's query, which in turn provides the mathematical constraint that is crucial for faster detection. We propose an approach to generate query conditions based on the user's query. In particular, we use meta paths between the nodes to define target patterns as well as their similarities, leading to efficient motif discovery and ranking at the same time. The proposed method is examined in a real-world academic network using different similarity measures between the nodes. The experiment result demonstrates that our method can identify interesting motifs and is robust to the choice of similarity measures. © 2014 IEEE.
- Authors: Yu, Shuo , Xia, Feng , Sun, Yuchen , Tang, Tao , Yan, Xiaoran , Lee, Ivan
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Computational Social Systems Vol. 8, no. 1 (2021), p. 134-147
- Full Text:
- Reviewed:
- Description: In the age of social computing, finding interesting network patterns or motifs is significant and critical for various areas, such as decision intelligence, intrusion detection, medical diagnosis, social network analysis, fake news identification, and national security. However, subgraph matching remains a computationally challenging problem, let alone identifying special motifs among them. This is especially the case in large heterogeneous real-world networks. In this article, we propose an efficient solution for discovering and ranking human behavior patterns based on network motifs by exploring a user's query in an intelligent way. Our method takes advantage of the semantics provided by a user's query, which in turn provides the mathematical constraint that is crucial for faster detection. We propose an approach to generate query conditions based on the user's query. In particular, we use meta paths between the nodes to define target patterns as well as their similarities, leading to efficient motif discovery and ranking at the same time. The proposed method is examined in a real-world academic network using different similarity measures between the nodes. The experiment result demonstrates that our method can identify interesting motifs and is robust to the choice of similarity measures. © 2014 IEEE.
API : an index for quantifying a scholar's academic potential
- Ren, Jing, Wang, Lei, Wang, Kailai, Yu, Shuo, Hou, Mingliang, Lee, Ivan, Kong, Xiangje, Xia, Feng
- Authors: Ren, Jing , Wang, Lei , Wang, Kailai , Yu, Shuo , Hou, Mingliang , Lee, Ivan , Kong, Xiangje , Xia, Feng
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Access Vol. 7, no. (2019), p. 178675-178684
- Full Text:
- Reviewed:
- Description: In the context of big scholarly data, various metrics and indicators have been widely applied to evaluate the impact of scholars from different perspectives, such as publication counts, citations, ${h}$-index, and their variants. However, these indicators have limited capacity in characterizing prospective impacts or achievements of scholars. To solve this problem, we propose the Academic Potential Index (API) to quantify scholar's academic potential. Furthermore, an algorithm is devised to calculate the value of API. It should be noted that API is a dynamic index throughout scholar's academic career. By applying API to rank scholars, we can identify scholars who show their academic potentials during the early academic careers. With extensive experiments conducted based on the Microsoft Academic Graph dataset, it can be found that the proposed index evaluates scholars' academic potentials effectively and captures the variation tendency of their academic impacts. Besides, we also apply this index to identify rising stars in academia. Experimental results show that the proposed API can achieve superior performance in identifying potential scholars compared with three baseline methods. © 2019 IEEE.
- Authors: Ren, Jing , Wang, Lei , Wang, Kailai , Yu, Shuo , Hou, Mingliang , Lee, Ivan , Kong, Xiangje , Xia, Feng
- Date: 2019
- Type: Text , Journal article
- Relation: IEEE Access Vol. 7, no. (2019), p. 178675-178684
- Full Text:
- Reviewed:
- Description: In the context of big scholarly data, various metrics and indicators have been widely applied to evaluate the impact of scholars from different perspectives, such as publication counts, citations, ${h}$-index, and their variants. However, these indicators have limited capacity in characterizing prospective impacts or achievements of scholars. To solve this problem, we propose the Academic Potential Index (API) to quantify scholar's academic potential. Furthermore, an algorithm is devised to calculate the value of API. It should be noted that API is a dynamic index throughout scholar's academic career. By applying API to rank scholars, we can identify scholars who show their academic potentials during the early academic careers. With extensive experiments conducted based on the Microsoft Academic Graph dataset, it can be found that the proposed index evaluates scholars' academic potentials effectively and captures the variation tendency of their academic impacts. Besides, we also apply this index to identify rising stars in academia. Experimental results show that the proposed API can achieve superior performance in identifying potential scholars compared with three baseline methods. © 2019 IEEE.
Graph Force Learning
- Sun, Ke, Liu, Jiaying, Yu, Shuo, Xu, Bo, Xia, Feng
- Authors: Sun, Ke , Liu, Jiaying , Yu, Shuo , Xu, Bo , Xia, Feng
- Date: 2020
- Type: Text , Conference paper
- Relation: 8th IEEE International Conference on Big Data, Big Data 2020 p. 2987-2994
- Full Text:
- Reviewed:
- Description: Features representation leverages the great power in network analysis tasks. However, most features are discrete which poses tremendous challenges to effective use. Recently, increasing attention has been paid on network feature learning, which could map discrete features to continued space. Unfortunately, current studies fail to fully preserve the structural information in the feature space due to random negative sampling strategy during training. To tackle this problem, we study the problem of feature learning and novelty propose a force-based graph learning model named GForce inspired by the spring-electrical model. GForce assumes that nodes are in attractive forces and repulsive forces, thus leading to the same representation with the original structural information in feature learning. Comprehensive experiments on three benchmark datasets demonstrate the effectiveness of the proposed framework. Furthermore, GForce opens up opportunities to use physics models to model node interaction for graph learning. © 2020 IEEE.
- Authors: Sun, Ke , Liu, Jiaying , Yu, Shuo , Xu, Bo , Xia, Feng
- Date: 2020
- Type: Text , Conference paper
- Relation: 8th IEEE International Conference on Big Data, Big Data 2020 p. 2987-2994
- Full Text:
- Reviewed:
- Description: Features representation leverages the great power in network analysis tasks. However, most features are discrete which poses tremendous challenges to effective use. Recently, increasing attention has been paid on network feature learning, which could map discrete features to continued space. Unfortunately, current studies fail to fully preserve the structural information in the feature space due to random negative sampling strategy during training. To tackle this problem, we study the problem of feature learning and novelty propose a force-based graph learning model named GForce inspired by the spring-electrical model. GForce assumes that nodes are in attractive forces and repulsive forces, thus leading to the same representation with the original structural information in feature learning. Comprehensive experiments on three benchmark datasets demonstrate the effectiveness of the proposed framework. Furthermore, GForce opens up opportunities to use physics models to model node interaction for graph learning. © 2020 IEEE.
OFFER: A Motif Dimensional Framework for Network Representation Learning
- Yu, Shuo, Xia, Feng, Xu, Jin, Chen, Zhikui, Lee, Ivan
- Authors: Yu, Shuo , Xia, Feng , Xu, Jin , Chen, Zhikui , Lee, Ivan
- Date: 2020
- Type: Text , Conference proceedings
- Relation: 29th ACM International Conference on Information and Knowledge Management, CIKM 2020 p. 3349-3352
- Full Text:
- Reviewed:
- Description: Aiming at better representing multivariate relationships, this paper investigates a motif dimensional framework for higher-order graph learning. The graph learning effectiveness can be improved through OFFER. The proposed framework mainly aims at accelerating and improving higher-order graph learning results. We apply the acceleration procedure from the dimensional of network motifs. Specifically, the refined degree for nodes and edges are conducted in two stages: (1) employ motif degree of nodes to refine the adjacency matrix of the network; and (2) employ motif degree of edges to refine the transition probability matrix in the learning process. In order to assess the efficiency of the proposed framework, four popular network representation algorithms are modified and examined. By evaluating the performance of OFFER, both link prediction results and clustering results demonstrate that the graph representation learning algorithms enhanced with OFFER consistently outperform the original algorithms with higher efficiency. © 2020 ACM.
- Authors: Yu, Shuo , Xia, Feng , Xu, Jin , Chen, Zhikui , Lee, Ivan
- Date: 2020
- Type: Text , Conference proceedings
- Relation: 29th ACM International Conference on Information and Knowledge Management, CIKM 2020 p. 3349-3352
- Full Text:
- Reviewed:
- Description: Aiming at better representing multivariate relationships, this paper investigates a motif dimensional framework for higher-order graph learning. The graph learning effectiveness can be improved through OFFER. The proposed framework mainly aims at accelerating and improving higher-order graph learning results. We apply the acceleration procedure from the dimensional of network motifs. Specifically, the refined degree for nodes and edges are conducted in two stages: (1) employ motif degree of nodes to refine the adjacency matrix of the network; and (2) employ motif degree of edges to refine the transition probability matrix in the learning process. In order to assess the efficiency of the proposed framework, four popular network representation algorithms are modified and examined. By evaluating the performance of OFFER, both link prediction results and clustering results demonstrate that the graph representation learning algorithms enhanced with OFFER consistently outperform the original algorithms with higher efficiency. © 2020 ACM.
Detection of four-node motif in complex networks
- Ning, Zhaolong, Liu, Lei, Yu, Shuo, Xia, Feng
- Authors: Ning, Zhaolong , Liu, Lei , Yu, Shuo , Xia, Feng
- Date: 2018
- Type: Text , Conference proceedings
- Relation: Complex Networks & Their Applications VI; Lyon, France; November 29th-1st December, 2017 p. 453-462
- Full Text: false
- Reviewed:
- Description: Complex network analysis has gained research interests in a wide range of fields. Network motif, which is one of the most popular network properties, is a statistically significant network subgraph. In this paper, we propose a fast methodology, called Four-node Motif Detection Algorithm (FMDA), to extract four-node motifs in complex networks. Specifically, we employ a two-way spectral clustering method to cut big networks into small sub-graphs, and then identify motifs by recognition algorithm to reduce the computational complexity. After that, we use three isomorphic four-node motifs to analyze network structure by American Physical Society (APS) data set.
Predictive representation learning in motif-based graph networks
- Zhang, Kaiyuan, Yu, Shuo, Wan, Liangtian, Li, Jianxin, Xia, Feng
- Authors: Zhang, Kaiyuan , Yu, Shuo , Wan, Liangtian , Li, Jianxin , Xia, Feng
- Date: 2019
- Type: Text , Book chapter
- Relation: AI 2019: Advances in Artificial Intelligence Chapter 15 p. 177-188
- Full Text: false
- Reviewed:
- Description: Link prediction is an important task for analyzing social networks which also has other applications such as bioinformatics and e-commerce. Network representation learning (NRL), which can significantly enhance the performance for link prediction, has attracted much attention in recent years. However, the existing NRL methods mainly focus on observed network structures without considering hidden prediction knowledge in the representation space. Meanwhile, some random walk based NRL methods are dissatisfactory to learn link knowledge in dense networks with large scales. In this paper, we propose a predictive representation learning (PRL) model, which unifies node representations and motif-based structures, to improve prediction ability of NRL. We firstly enhance node representations based on motif-biased random walks and then employ L2-SVM to learn motif-connected node-pairs. By jointly optimizing two objectives of existent and nonexistent edges representations, we preserve more information of nodes in representation space based on supervised learning. To evaluate the performance of our proposed model, we implement experiments on 5 real data sets. Simulation results illustrate that our proposed model achieves better link prediction performance compared with other state-of-the-arts methods.
Team recognition in big scholarly data: exploring collaboration intensity
- Yu, Shuo, Xia, Feng, Zhang, Kaiyuan, Ning, Zhaolong, Zhong, Jiaofei, Liu, Chengfei
- Authors: Yu, Shuo , Xia, Feng , Zhang, Kaiyuan , Ning, Zhaolong , Zhong, Jiaofei , Liu, Chengfei
- Type: Text , Conference proceedings
- Relation: p. 925-932
- Full Text: false
- Reviewed:
- Description: The scale of scholarly data has been expanded due to the fact that scientific productions are increasing rapidly and new scholars affiliate academia incessantly. Scholars are shifting their research patterns from individual research to academic teamwork due to the complexity of scientific issues. In order to achieve higher reputations and better performance, academic teams with leaders are constructed to speed up knowledge sharing and problem solving. It is significant to explore team-based issues with the increasing interests of information exploration in big scholarly data. However, existing academic team definitions are commonly not quantitative, which makes it difficult to identify real academic teams. In this work, we propose a collaboration relationship evaluation index called Collaboration Intensity Index (CII), which is a two-way and quantitative index to evaluate collaboration intensity between two scholars in the network. Then, we construct a new type of co-author network with edges weighted by CII, which differs from the original co-author networks. This network reflects the newly scientific research patterns inside or outside academic teams. Furthermore, we propose TRAC (Team Recognition Algorithm based on CII) to identify academic teams from large co-author networks. Finally, we use DBLP data set, which contains 1,250,440 scholars and 1,575,949 published papers, to identify teams by TRAC. Comparing with fast unfolding algorithm and real team data, the effectiveness of our method can be demonstrated.
Big networks : a survey
- Bedru, Hayat, Yu, Shuo, Xiao, Xinru, Zhang, Da, Xia, Feng
- Authors: Bedru, Hayat , Yu, Shuo , Xiao, Xinru , Zhang, Da , Xia, Feng
- Date: 2020
- Type: Text , Journal article , Review
- Relation: Computer Science Review Vol. 37, no. (2020), p.
- Full Text:
- Reviewed:
- Description: A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called a big network. A big networks is generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further. © 2020 Elsevier Inc.
- Authors: Bedru, Hayat , Yu, Shuo , Xiao, Xinru , Zhang, Da , Xia, Feng
- Date: 2020
- Type: Text , Journal article , Review
- Relation: Computer Science Review Vol. 37, no. (2020), p.
- Full Text:
- Reviewed:
- Description: A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called a big network. A big networks is generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further. © 2020 Elsevier Inc.
Motif discovery in networks : a survey
- Yu, Shuo, Feng, Yufan, Zhang, Da, Bedru, Hayat, Xu, Bo, Xia, Feng
- Authors: Yu, Shuo , Feng, Yufan , Zhang, Da , Bedru, Hayat , Xu, Bo , Xia, Feng
- Date: 2020
- Type: Text , Journal article
- Relation: Computer Science Review Vol. 37, no. (2020), p.
- Full Text: false
- Reviewed:
- Description: Motifs are regarded as network blocks because motifs can be used to present fundamental patterns in networks. Motif discovery is well applied in various scientific problems, including subgraph mining and graph isomorphism tasks. This paper analyzes and summarizes current motif discovery algorithms in the field of network science with both efficiency and accuracy perspectives. In this paper, we present motif discovery algorithms, including MFinder, FanMod, Grochow, MODA, Kavosh, G-tries, QuateXelero, color-coding approaches, and GPU-based approaches. Based on that, we discuss the real-world applications of the algorithms mentioned above under different scenarios. Since motif discovery algorithms are diffusely demanded in many applications, several challenges may be firstly handled, including high computational complexity, higher order motif discovery, same motif detection, discovering heterogeneous sizes of motifs, as well as motif discovery results visualization. This work sheds light on current research progress and future research orientations. © 2020 Elsevier Inc.
Understanding serendipity in science : a survey
- Yu, Shuo, Bedru, Hayat, Xinbei, Chu, Yuyuan, Yuan, Xia, Feng
- Authors: Yu, Shuo , Bedru, Hayat , Xinbei, Chu , Yuyuan, Yuan , Xia, Feng
- Date: 2021
- Type: Text , Journal article
- Relation: Data Analysis and Knowledge Discovery Vol. 5, no. 1 (2021), p. 16-35
- Full Text: false
- Reviewed:
- Description: [Objective] This paper summarizes the components and definitions of serendipity, reviews representative supporting technologies and applications of serendipity in science, and discusses challenges and future directions in this field. [Coverage] We searched relevant keywords such as“serendipity”,“novelty”and “diversity”in research repositories such as Microsoft Academic and Google Scholar. A total of 102 well-selected references are finally cited. [Methods] We reviewed serendipitous discoveries in various scenarios, and discussed the concept of serendipity in the context of science. Relevant tools and applications are categorized. [Results] The tools that support serendipity are conducive to scientific research. However, there is no uniform definition of serendipity, thus making it difficult to measure serendipity in science. [Limitations] The factors affecting serendipity in science are complex, and yet to be explored. [Conclusions] Serendipity is one of the indispensable factors for scientific advances. However, many challenges are facing the exploration of serendipity in science, such as lack of metrics and difficulty to control. © 2021, Chinese Academy of Sciences. All rights reserved.
How to optimize an academic team when the outlier member is leaving?
- Yu, Shuo, Liu, Jiaying, Wei, Haoran, Xia, Feng, Tong, Hanghang
- Authors: Yu, Shuo , Liu, Jiaying , Wei, Haoran , Xia, Feng , Tong, Hanghang
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Intelligent Systems Vol. 36, no. 3 (May-Jun 2021), p. 23-30
- Full Text:
- Reviewed:
- Description: An academic team is a highly cohesive collaboration group of scholars, which has been recognized as an effective way to improve scientific output in terms of both quality and quantity. However, the high staff turnover brings about a series of problems that may have negative influences on team performance. To address this challenge, we first detect the tendency of the member who may potentially leave. Here, the outlierness is defined with respect to familiarity, which is quantified by using collaboration intensity. It is assumed that if a team member has a higher familiarity with scholars outside the team, then this member might probably leave the team. To minimize the influence caused by the leaving of such an outlier member, we propose an optimization solution to find a proper candidate who can replace the outlier member. Based on random walk with graph kernel, our solution involves familiarity matching, skill matching, as well as structure matching. The proposed approach proves to be effective and outperforms existing methods when applied to computer science academic teams.
- Authors: Yu, Shuo , Liu, Jiaying , Wei, Haoran , Xia, Feng , Tong, Hanghang
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Intelligent Systems Vol. 36, no. 3 (May-Jun 2021), p. 23-30
- Full Text:
- Reviewed:
- Description: An academic team is a highly cohesive collaboration group of scholars, which has been recognized as an effective way to improve scientific output in terms of both quality and quantity. However, the high staff turnover brings about a series of problems that may have negative influences on team performance. To address this challenge, we first detect the tendency of the member who may potentially leave. Here, the outlierness is defined with respect to familiarity, which is quantified by using collaboration intensity. It is assumed that if a team member has a higher familiarity with scholars outside the team, then this member might probably leave the team. To minimize the influence caused by the leaving of such an outlier member, we propose an optimization solution to find a proper candidate who can replace the outlier member. Based on random walk with graph kernel, our solution involves familiarity matching, skill matching, as well as structure matching. The proposed approach proves to be effective and outperforms existing methods when applied to computer science academic teams.
CHIEF : clustering With higher-order motifs in big networks
- Xia, Feng, Yu, Shuo, Liu, Chengfei, Li, Jianxin, Lee, Ivan
- Authors: Xia, Feng , Yu, Shuo , Liu, Chengfei , Li, Jianxin , Lee, Ivan
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Network Science and Engineering Vol. 9, no. 3 (2022), p. 990-1005
- Full Text:
- Reviewed:
- Description: Clustering network vertices is an enabler of various applications such as social computing and Internet of Things. However, challenges arise for clustering when networks increase in scale. This paper proposes CHIEF (Clustering with HIgher-ordEr motiFs), a solution which consists of two motif clustering techniques: standard acceleration CHIEF-ST and approximate acceleration CHIEF-AP. Both algorithms firstly find the maximal $k$-edge-connected subgraphs within the target networks to lower the network scale by optimizing the network structure with maximal $k$-edge-connected subgraphs, and then use heterogeneous four-node motifs clustering in higher-order dense networks. For CHIEF-ST, we illustrate that all target motifs will be kept after this procedure when the minimum node degree of the target motif is equal or greater than $k$. For CHIEF-AP, we prove that the eigenvalues of the adjacency matrix and the Laplacian matrix are relatively stable after this step. CHIEF offers an improved efficiency of motif clustering for big networks, and it verifies higher-order motif significance. Experiments on real and synthetic networks demonstrate that the proposed solutions outperform baseline approaches in large network analysis, and higher-order motifs outperform traditional triangle motifs in clustering. © 2022 IEEE Computer Society. All rights reserved.
- Authors: Xia, Feng , Yu, Shuo , Liu, Chengfei , Li, Jianxin , Lee, Ivan
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Network Science and Engineering Vol. 9, no. 3 (2022), p. 990-1005
- Full Text:
- Reviewed:
- Description: Clustering network vertices is an enabler of various applications such as social computing and Internet of Things. However, challenges arise for clustering when networks increase in scale. This paper proposes CHIEF (Clustering with HIgher-ordEr motiFs), a solution which consists of two motif clustering techniques: standard acceleration CHIEF-ST and approximate acceleration CHIEF-AP. Both algorithms firstly find the maximal $k$-edge-connected subgraphs within the target networks to lower the network scale by optimizing the network structure with maximal $k$-edge-connected subgraphs, and then use heterogeneous four-node motifs clustering in higher-order dense networks. For CHIEF-ST, we illustrate that all target motifs will be kept after this procedure when the minimum node degree of the target motif is equal or greater than $k$. For CHIEF-AP, we prove that the eigenvalues of the adjacency matrix and the Laplacian matrix are relatively stable after this step. CHIEF offers an improved efficiency of motif clustering for big networks, and it verifies higher-order motif significance. Experiments on real and synthetic networks demonstrate that the proposed solutions outperform baseline approaches in large network analysis, and higher-order motifs outperform traditional triangle motifs in clustering. © 2022 IEEE Computer Society. All rights reserved.
Higher-order structure based anomaly detection on attributed networks
- Yuan, Xu, Zhou, Na, Yu, Shuo, Huang, Huafei, Chen, Zhikui, Xia, Feng
- Authors: Yuan, Xu , Zhou, Na , Yu, Shuo , Huang, Huafei , Chen, Zhikui , Xia, Feng
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 IEEE International Conference on Big Data, Big Data 2021, virtual online, 15-18 December 2021, Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 p. 2691-2700
- Full Text:
- Reviewed:
- Description: Anomaly detection (such as telecom fraud detection and medical image detection) has attracted the increasing attention of people. The complex interaction between multiple entities widely exists in the network, which can reflect specific human behavior patterns. Such patterns can be modeled by higher-order network structures, thus benefiting anomaly detection on attributed networks. However, due to the lack of an effective mechanism in most existing graph learning methods, these complex interaction patterns fail to be applied in detecting anomalies, hindering the progress of anomaly detection to some extent. In order to address the aforementioned issue, we present a higher-order structure based anomaly detection (GUIDE) method. We exploit attribute autoencoder and structure autoencoder to reconstruct node attributes and higher-order structures, respectively. Moreover, we design a graph attention layer to evaluate the significance of neighbors to nodes through their higher-order structure differences. Finally, we leverage node attribute and higher-order structure reconstruction errors to find anomalies. Extensive experiments on five real-world datasets (i.e., ACM, Citation, Cora, DBLP, and Pubmed) are implemented to verify the effectiveness of GUIDE. Experimental results in terms of ROC-AUC, PR-AUC, and Recall@K show that GUIDE significantly outperforms the state-of-art methods. © 2021 IEEE.
- Authors: Yuan, Xu , Zhou, Na , Yu, Shuo , Huang, Huafei , Chen, Zhikui , Xia, Feng
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 IEEE International Conference on Big Data, Big Data 2021, virtual online, 15-18 December 2021, Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 p. 2691-2700
- Full Text:
- Reviewed:
- Description: Anomaly detection (such as telecom fraud detection and medical image detection) has attracted the increasing attention of people. The complex interaction between multiple entities widely exists in the network, which can reflect specific human behavior patterns. Such patterns can be modeled by higher-order network structures, thus benefiting anomaly detection on attributed networks. However, due to the lack of an effective mechanism in most existing graph learning methods, these complex interaction patterns fail to be applied in detecting anomalies, hindering the progress of anomaly detection to some extent. In order to address the aforementioned issue, we present a higher-order structure based anomaly detection (GUIDE) method. We exploit attribute autoencoder and structure autoencoder to reconstruct node attributes and higher-order structures, respectively. Moreover, we design a graph attention layer to evaluate the significance of neighbors to nodes through their higher-order structure differences. Finally, we leverage node attribute and higher-order structure reconstruction errors to find anomalies. Extensive experiments on five real-world datasets (i.e., ACM, Citation, Cora, DBLP, and Pubmed) are implemented to verify the effectiveness of GUIDE. Experimental results in terms of ROC-AUC, PR-AUC, and Recall@K show that GUIDE significantly outperforms the state-of-art methods. © 2021 IEEE.
Familiarity-based collaborative team recognition in academic social networks
- Yu, Shuo, Xia, Feng, Zhang, Chen, Wei, Haoran, Keogh, Kathleen, Chen, Honglong
- Authors: Yu, Shuo , Xia, Feng , Zhang, Chen , Wei, Haoran , Keogh, Kathleen , Chen, Honglong
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Computational Social Systems Vol. 9, no. 5 (2022), p. 1432-1445
- Full Text:
- Reviewed:
- Description: Collaborative teamwork is key to major scientific discoveries. However, the prevalence of collaboration among researchers makes team recognition increasingly challenging. Previous studies have demonstrated that people are more likely to collaborate with individuals they are familiar with. In this work, we employ the definition of familiarity and then propose faMiliarity-based cOllaborative Team recOgnition (MOTO) algorithm to recognize collaborative teams. MOTO calculates the shortest distance matrix within the global collaboration network and the local density of each node. Central team members are initially recognized based on local density. Then, MOTO recognizes the remaining team members by using the familiarity metric and shortest distance matrix. Extensive experiments have been conducted upon a large-scale dataset. The experimental results show that compared with baseline methods, MOTO can recognize the largest number of teams. The teams recognized by the MOTO possess more cohesive team structures and lower team communication costs compared with other methods. MOTO utilizes familiarity in team recognition to identify cohesive academic teams. The recognized teams are in line with real-world collaborative teamwork patterns. Based on team recognition using MOTO, the research team structure and performance are further analyzed for given time periods. The number of teams that consist of members from different institutions increases gradually. Such teams are found to perform better in comparison with those whose members are from the same institution. © 2014 IEEE.
- Authors: Yu, Shuo , Xia, Feng , Zhang, Chen , Wei, Haoran , Keogh, Kathleen , Chen, Honglong
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Computational Social Systems Vol. 9, no. 5 (2022), p. 1432-1445
- Full Text:
- Reviewed:
- Description: Collaborative teamwork is key to major scientific discoveries. However, the prevalence of collaboration among researchers makes team recognition increasingly challenging. Previous studies have demonstrated that people are more likely to collaborate with individuals they are familiar with. In this work, we employ the definition of familiarity and then propose faMiliarity-based cOllaborative Team recOgnition (MOTO) algorithm to recognize collaborative teams. MOTO calculates the shortest distance matrix within the global collaboration network and the local density of each node. Central team members are initially recognized based on local density. Then, MOTO recognizes the remaining team members by using the familiarity metric and shortest distance matrix. Extensive experiments have been conducted upon a large-scale dataset. The experimental results show that compared with baseline methods, MOTO can recognize the largest number of teams. The teams recognized by the MOTO possess more cohesive team structures and lower team communication costs compared with other methods. MOTO utilizes familiarity in team recognition to identify cohesive academic teams. The recognized teams are in line with real-world collaborative teamwork patterns. Based on team recognition using MOTO, the research team structure and performance are further analyzed for given time periods. The number of teams that consist of members from different institutions increases gradually. Such teams are found to perform better in comparison with those whose members are from the same institution. © 2014 IEEE.
Graph learning : a survey
- Xia, Feng, Sun, Ke, Yu, Shuo, Aziz, Abdul, Wan, Liangtian, Pan, Shirui, Liu, Huan
- Authors: Xia, Feng , Sun, Ke , Yu, Shuo , Aziz, Abdul , Wan, Liangtian , Pan, Shirui , Liu, Huan
- Date: 2021
- Type: Text , Journal article , Review
- Relation: IEEE Transactions on Artificial Intelligence Vol. 2, no. 2 (2021), p. 109-127
- Full Text:
- Reviewed:
- Description: Graphs are widely used as a popular representation of the network structure of connected data. Graph data can be found in a broad spectrum of application domains such as social systems, ecosystems, biological networks, knowledge graphs, and information systems. With the continuous penetration of artificial intelligence technologies, graph learning (i.e., machine learning on graphs) is gaining attention from both researchers and practitioners. Graph learning proves effective for many tasks, such as classification, link prediction, and matching. Generally, graph learning methods extract relevant features of graphs by taking advantage of machine learning algorithms. In this survey, we present a comprehensive overview on the state-of-the-art of graph learning. Special attention is paid to four categories of existing graph learning methods, including graph signal processing, matrix factorization, random walk, and deep learning. Major models and algorithms under these categories are reviewed, respectively. We examine graph learning applications in areas such as text, images, science, knowledge graphs, and combinatorial optimization. In addition, we discuss several promising research directions in this field. Impact Statement—Real-world intelligent systems generally rely on machine learning algorithms handling data of various types. Despite their ubiquity, graph data have imposed unprecedented challenges to machine learning due to their inherent complexity. Unlike text, audio and images, graph data are embedded in an irregular domain, making some essential operations of existing machine learning algorithms inapplicable. Many graph learning models and algorithms have been developed to tackle these challenges. This article presents a systematic review of the state-of-the-art graph learning approaches as well as their potential applications. The article serves multiple purposes. First, it acts as a quick reference to graph learning for researchers and practitioners in different areas such as social computing, information retrieval, computer vision, bioinformatics, economics, and e-commence. Second, it presents insights into open areas of research in the field. Third, it aims to stimulate new research ideas and more interests in graph learning. © IEEE Transactions on Artificial Intelligence 2020.
- Authors: Xia, Feng , Sun, Ke , Yu, Shuo , Aziz, Abdul , Wan, Liangtian , Pan, Shirui , Liu, Huan
- Date: 2021
- Type: Text , Journal article , Review
- Relation: IEEE Transactions on Artificial Intelligence Vol. 2, no. 2 (2021), p. 109-127
- Full Text:
- Reviewed:
- Description: Graphs are widely used as a popular representation of the network structure of connected data. Graph data can be found in a broad spectrum of application domains such as social systems, ecosystems, biological networks, knowledge graphs, and information systems. With the continuous penetration of artificial intelligence technologies, graph learning (i.e., machine learning on graphs) is gaining attention from both researchers and practitioners. Graph learning proves effective for many tasks, such as classification, link prediction, and matching. Generally, graph learning methods extract relevant features of graphs by taking advantage of machine learning algorithms. In this survey, we present a comprehensive overview on the state-of-the-art of graph learning. Special attention is paid to four categories of existing graph learning methods, including graph signal processing, matrix factorization, random walk, and deep learning. Major models and algorithms under these categories are reviewed, respectively. We examine graph learning applications in areas such as text, images, science, knowledge graphs, and combinatorial optimization. In addition, we discuss several promising research directions in this field. Impact Statement—Real-world intelligent systems generally rely on machine learning algorithms handling data of various types. Despite their ubiquity, graph data have imposed unprecedented challenges to machine learning due to their inherent complexity. Unlike text, audio and images, graph data are embedded in an irregular domain, making some essential operations of existing machine learning algorithms inapplicable. Many graph learning models and algorithms have been developed to tackle these challenges. This article presents a systematic review of the state-of-the-art graph learning approaches as well as their potential applications. The article serves multiple purposes. First, it acts as a quick reference to graph learning for researchers and practitioners in different areas such as social computing, information retrieval, computer vision, bioinformatics, economics, and e-commence. Second, it presents insights into open areas of research in the field. Third, it aims to stimulate new research ideas and more interests in graph learning. © IEEE Transactions on Artificial Intelligence 2020.
Subgraph adaptive structure-aware graph contrastive learning
- Chen, Zhikui, Peng, Yin, Yu, Shuo, Cao, Chen, Xia, Feng
- Authors: Chen, Zhikui , Peng, Yin , Yu, Shuo , Cao, Chen , Xia, Feng
- Date: 2022
- Type: Text , Journal article
- Relation: Mathematics (Basel) Vol. 10, no. 17 (2022), p. 3047
- Full Text:
- Reviewed:
- Description: Graph contrastive learning (GCL) has been subject to more attention and been widely applied to numerous graph learning tasks such as node classification and link prediction. Although it has achieved great success and even performed better than supervised methods in some tasks, most of them depend on node-level comparison, while ignoring the rich semantic information contained in graph topology, especially for social networks. However, a higher-level comparison requires subgraph construction and encoding, which remain unsolved. To address this problem, we propose a subgraph adaptive structure-aware graph contrastive learning method (PASCAL) in this work, which is a subgraph-level GCL method. In PASCAL, we construct subgraphs by merging all motifs that contain the target node. Then we encode them on the basis of motif number distribution to capture the rich information hidden in subgraphs. By incorporating motif information, PASCAL can capture richer semantic information hidden in local structures compared with other GCL methods. Extensive experiments on six benchmark datasets show that PASCAL outperforms state-of-art graph contrastive learning and supervised methods in most cases.
- Authors: Chen, Zhikui , Peng, Yin , Yu, Shuo , Cao, Chen , Xia, Feng
- Date: 2022
- Type: Text , Journal article
- Relation: Mathematics (Basel) Vol. 10, no. 17 (2022), p. 3047
- Full Text:
- Reviewed:
- Description: Graph contrastive learning (GCL) has been subject to more attention and been widely applied to numerous graph learning tasks such as node classification and link prediction. Although it has achieved great success and even performed better than supervised methods in some tasks, most of them depend on node-level comparison, while ignoring the rich semantic information contained in graph topology, especially for social networks. However, a higher-level comparison requires subgraph construction and encoding, which remain unsolved. To address this problem, we propose a subgraph adaptive structure-aware graph contrastive learning method (PASCAL) in this work, which is a subgraph-level GCL method. In PASCAL, we construct subgraphs by merging all motifs that contain the target node. Then we encode them on the basis of motif number distribution to capture the rich information hidden in subgraphs. By incorporating motif information, PASCAL can capture richer semantic information hidden in local structures compared with other GCL methods. Extensive experiments on six benchmark datasets show that PASCAL outperforms state-of-art graph contrastive learning and supervised methods in most cases.
Fairness-aware predictive graph learning in social networks
- Wang, Lei, Yu, Shuo, Febrinanto, Falih, Alqahtani, Fayez, El-Tobely, Tarek
- Authors: Wang, Lei , Yu, Shuo , Febrinanto, Falih , Alqahtani, Fayez , El-Tobely, Tarek
- Date: 2022
- Type: Text , Journal article
- Relation: Mathematics Vol. 10, no. 15 (2022), p.
- Full Text:
- Reviewed:
- Description: Predictive graph learning approaches have been bringing significant advantages in many real-life applications, such as social networks, recommender systems, and other social-related downstream tasks. For those applications, learning models should be able to produce a great prediction result to maximize the usability of their application. However, the paradigm of current graph learning methods generally neglects the differences in link strength, leading to discriminative predictive results, resulting in different performance between tasks. Based on that problem, a fairness-aware predictive learning model is needed to balance the link strength differences and not only consider how to formulate it. To address this problem, we first formally define two biases (i.e., Preference and Favoritism) that widely exist in previous representation learning models. Then, we employ modularity maximization to distinguish strong and weak links from the quantitative perspective. Eventually, we propose a novel predictive learning framework entitled ACE that first implements the link strength differentiated learning process and then integrates it with a dual propagation process. The effectiveness and fairness of our proposed ACE have been verified on four real-world social networks. Compared to nine different state-of-the-art methods, ACE and its variants show better performance. The ACE framework can better reconstruct networks, thus also providing a high possibility of resolving misinformation in graph-structured data. © 2022 by the authors.
- Authors: Wang, Lei , Yu, Shuo , Febrinanto, Falih , Alqahtani, Fayez , El-Tobely, Tarek
- Date: 2022
- Type: Text , Journal article
- Relation: Mathematics Vol. 10, no. 15 (2022), p.
- Full Text:
- Reviewed:
- Description: Predictive graph learning approaches have been bringing significant advantages in many real-life applications, such as social networks, recommender systems, and other social-related downstream tasks. For those applications, learning models should be able to produce a great prediction result to maximize the usability of their application. However, the paradigm of current graph learning methods generally neglects the differences in link strength, leading to discriminative predictive results, resulting in different performance between tasks. Based on that problem, a fairness-aware predictive learning model is needed to balance the link strength differences and not only consider how to formulate it. To address this problem, we first formally define two biases (i.e., Preference and Favoritism) that widely exist in previous representation learning models. Then, we employ modularity maximization to distinguish strong and weak links from the quantitative perspective. Eventually, we propose a novel predictive learning framework entitled ACE that first implements the link strength differentiated learning process and then integrates it with a dual propagation process. The effectiveness and fairness of our proposed ACE have been verified on four real-world social networks. Compared to nine different state-of-the-art methods, ACE and its variants show better performance. The ACE framework can better reconstruct networks, thus also providing a high possibility of resolving misinformation in graph-structured data. © 2022 by the authors.
Relational structure-aware knowledge graph representation in complex space
- Sun, Ke, Yu, Shuo, Peng, Ciyuan, Wang, Yueru, Alfarraj, Osama, Tolba, Amr, Xia, Feng
- Authors: Sun, Ke , Yu, Shuo , Peng, Ciyuan , Wang, Yueru , Alfarraj, Osama , Tolba, Amr , Xia, Feng
- Date: 2022
- Type: Text , Journal article
- Relation: Mathematics Vol. 10, no. 11 (2022), p.
- Full Text:
- Reviewed:
- Description: Relations in knowledge graphs have rich relational structures and various binary relational patterns. Various relation modelling strategies are proposed for embedding knowledge graphs, but they fail to fully capture both features of relations, rich relational structures and various binary relational patterns. To address the problem of insufficient embedding due to the complexity of the relations, we propose a novel knowledge graph representation model in complex space, namely MARS, to exploit complex relations to embed knowledge graphs. MARS takes the mechanisms of complex numbers and message-passing and then embeds triplets into relation-specific complex hyperplanes. Thus, MARS can well preserve various relation patterns, as well as structural information in knowledge graphs. In addition, we find that the scores generated from the score function approximate a Gaussian distribution. The scores in the tail cannot effectively represent triplets. To address this particular issue and improve the precision of embeddings, we use the standard deviation to limit the dispersion of the score distribution, resulting in more accurate embeddings of triplets. Comprehensive experiments on multiple benchmarks demonstrate that our model significantly outperforms existing state-of-the-art models for link prediction and triple classification. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Sun, Ke , Yu, Shuo , Peng, Ciyuan , Wang, Yueru , Alfarraj, Osama , Tolba, Amr , Xia, Feng
- Date: 2022
- Type: Text , Journal article
- Relation: Mathematics Vol. 10, no. 11 (2022), p.
- Full Text:
- Reviewed:
- Description: Relations in knowledge graphs have rich relational structures and various binary relational patterns. Various relation modelling strategies are proposed for embedding knowledge graphs, but they fail to fully capture both features of relations, rich relational structures and various binary relational patterns. To address the problem of insufficient embedding due to the complexity of the relations, we propose a novel knowledge graph representation model in complex space, namely MARS, to exploit complex relations to embed knowledge graphs. MARS takes the mechanisms of complex numbers and message-passing and then embeds triplets into relation-specific complex hyperplanes. Thus, MARS can well preserve various relation patterns, as well as structural information in knowledge graphs. In addition, we find that the scores generated from the score function approximate a Gaussian distribution. The scores in the tail cannot effectively represent triplets. To address this particular issue and improve the precision of embeddings, we use the standard deviation to limit the dispersion of the score distribution, resulting in more accurate embeddings of triplets. Comprehensive experiments on multiple benchmarks demonstrate that our model significantly outperforms existing state-of-the-art models for link prediction and triple classification. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.