List of Titles

Showing items 1 - 20 of 26

Your selections:

Simple supervised dissimilarity measure : bolstering iForest-induced similarity with class information without learning

- Wells, Jonathan, Aryal, Sunil, Ting, Kai

Authors: Wells, Jonathan , Aryal, Sunil , Ting, Kai
Date: 2020
Type: Text , Journal article
Relation: Knowledge and Information Systems Vol. 62, no. 8 (2020), p. 3203-3216
Full Text: false
Reviewed:
Description: Existing distance metric learning methods require optimisation to learn a feature space to transform data—this makes them computationally expensive in large datasets. In classification tasks, they make use of class information to learn an appropriate feature space. In this paper, we present a simple supervised dissimilarity measure which does not require learning or optimisation. It uses class information to measure dissimilarity of two data instances in the input space directly. It is a supervised version of an existing data-dependent dissimilarity measure called me. Our empirical results in k-NN and LVQ classification tasks show that the proposed simple supervised dissimilarity measure generally produces predictive accuracy better than or at least as good as existing state-of-the-art supervised and unsupervised dissimilarity measures. © 2020, Springer-Verlag London Ltd., part of Springer Nature.

Using meta-regression data mining to improve predictions of performance based on heart rate dynamics for Australian football

- Jelinek, Herbert, Kelarev, Andrei, Robinson, Dean, Stranieri, Andrew, Cornforth, David

Authors: Jelinek, Herbert , Kelarev, Andrei , Robinson, Dean , Stranieri, Andrew , Cornforth, David
Date: 2014
Type: Text , Journal article
Relation: Applied Soft Computing Vol. 14, no. PART A (2014), p. 81-87
Full Text: false
Reviewed:
Description: This work investigates the effectiveness of using computer-based machine learning regression algorithms and meta-regression methods to predict performance data for Australian football players based on parameters collected during daily physiological tests. Three experiments are described. The first uses all available data with a variety of regression techniques. The second uses a subset of features selected from the available data using the Random Forest method. The third used meta-regression with the selected feature subset. Our experiments demonstrate that feature selection and meta-regression methods improve the accuracy of predictions for match performance of Australian football players based on daily data of medical tests, compared to regression methods alone. Meta-regression methods and feature selection were able to obtain performance prediction outcomes with significant correlation coefficients. The best results were obtained by the additive regression based on isotonic regression for a set of most influential features selected by Random Forest. This model was able to predict athlete performance data with a correlation coefficient of 0.86 (p < 0.05). Â© 2013 Published by Elsevier B.V. All rights reserved.
Description: C1

Clustered memetic algorithm with local heuristics for ab initio protein structure prediction

- Islam, M. D., Chetty, Madhu

Authors: Islam, M. D. , Chetty, Madhu
Date: 2013
Type: Text , Journal article
Relation: IEEE Transactions on Evolutionary Computation Vol. 17, no. 4 (2013), p. 558-576
Full Text: false
Reviewed:
Description: Low-resolution protein models are often used within a hierarchical framework for structure prediction. However, even with these simplified but realistic protein models, the search for the optimal solution remains NP complete. The complexity is further compounded by the multimodal nature of the search space. In this paper, we propose a systematic design of an evolutionary search technique, namely the memetic algorithm (MA), to effectively search the vast search space by exploiting the domain-specific knowledge and taking cognizance of the multimodal nature of the search space. The proposed MA achieves this by incorporating various novel features: 1) a modified fitness function includes two additional terms to account for the hydrophobic and polar nature of the residues; 2) a systematic (rather than random) generation of population automatically prevents an occurrence of invalid conformations; 3) a generalized nonisomorphic encoding scheme implicitly eliminates generation of twins (similar conformations) in the population; 4) the identification of a meme (protein substructures) during optimization from different basins of attraction - a process that is equivalent to implicit applications of threading principles; 5) a clustering of the population corresponds to basins of attraction that allows evolution to overcome the complexity of multimodal search space, thereby avoiding search getting trapped in a local optimum; and 6) a 2-stage framework gathers domain knowledge (i.e., substructures or memes) from different basins of attraction for a combined execution in the second stage. Experiments conducted with different lattice models using known benchmark protein sequences and comparisons carried out with recently reported approaches in this journal show that the proposed algorithm has robustness, speed, accuracy, and superior performance. The approach is generic and can easily be extended for applications to other classes of problems.

An algorithm for clustering using L1-norm based on hyperbolic smoothing technique

- Bagirov, Adil, Mohebi, Ehsan

Authors: Bagirov, Adil , Mohebi, Ehsan
Date: 2016
Type: Text , Journal article
Relation: Computational Intelligence Vol. 32, no. 3 (2016), p. 439-457
Relation: http://purl.org/au-research/grants/arc/DP140103213
Full Text: false
Reviewed:
Description: Cluster analysis deals with the problem of organization of a collection of objects into clusters based on a similarity measure, which can be defined using various distance functions. The use of different similarity measures allows one to find different cluster structures in a data set. In this article, an algorithm is developed to solve clustering problems where the similarity measure is defined using the L1-norm. The algorithm is designed using the nonsmooth optimization approach to the clustering problem. Smoothing techniques are applied to smooth both the clustering function and the L1-norm. The algorithm computes clusters sequentially and finds global or near global solutions to the clustering problem. Results of numerical experiments using 12 real-world data sets are reported, and the proposed algorithm is compared with two other clustering algorithms. ©2015 Wiley Periodicals, Inc.

Density-ratio based clustering for discovering clusters with varying densities

- Zhu, Ye, Ting, Kaiming, Carman, Mark

Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
Date: 2016
Type: Text , Journal article
Relation: Pattern Recognition Vol. 60, no. (2016), p. 983-997
Full Text: false
Reviewed:
Description: Density-based clustering algorithms are able to identify clusters of arbitrary shapes and sizes in a dataset which contains noise. It is well-known that most of these algorithms, which use a global density threshold, have difficulty identifying all clusters in a dataset having clusters of greatly varying densities. This paper identifies and analyses the condition under which density-based clustering algorithms fail in this scenario. It proposes a density-ratio based method to overcome this weakness, and reveals that it can be implemented in two approaches. One approach is to modify a density-based clustering algorithm to do density-ratio based clustering by using its density estimator to compute density-ratio. The other approach involves rescaling the given dataset only. An existing density-based clustering algorithm, which is applied to the rescaled dataset, can find all clusters with varying densities that would otherwise impossible had the same algorithm been applied to the unscaled dataset. We provide an empirical evaluation using DBSCAN, OPTICS and SNN to show the effectiveness of these two approaches. © 2016 Elsevier Ltd

A generic ensemble approach to estimate multidimensional likelihood in Bayesian classifier learning

- Aryal, Sunil, Ting, Kaiming

Authors: Aryal, Sunil , Ting, Kaiming
Date: 2016
Type: Text , Journal article
Relation: Computational Intelligence Vol. 32, no. 3 (2016), p. 458-479
Full Text: false
Reviewed:
Description: In Bayesian classifier learning, estimating the joint probability distribution (,) or the likelihood (|) directly from training data is considered to be difficult, especially in large multidimensional data sets. To circumvent this difficulty, existing Bayesian classifiers such as Naive Bayes, BayesNet, and ADE have focused on estimating simplified surrogates of (,) from different forms of one‐dimensional likelihoods. Contrary to the perceived difficulty in multidimensional likelihood estimation, we present a simple generic ensemble approach to estimate multidimensional likelihood directly from data. The idea is to aggregate (|) estimated from a random subsample of data . This article presents two ways to estimate multidimensional likelihoods using the proposed generic approach and introduces two new Bayesian classifiers called and that estimate (|) using a nearest‐neighbor density estimation and a probability estimation through feature space partitioning, respectively. Unlike the existing Bayesian classifiers, ENNBayes and MassBayes have constant training time and space complexities and they scale better than existing Bayesian classifiers in very large data sets. Our empirical evaluation shows that ENNBayes and MassBayes yield better predictive accuracy than the existing Bayesian classifiers in benchmark data sets.

Clustering in large data sets with the limited memory bundle method

- Karmitsa, Napsu, Bagirov, Adil, Taheri, Sona

Authors: Karmitsa, Napsu , Bagirov, Adil , Taheri, Sona
Date: 2018
Type: Text , Journal article
Relation: Pattern Recognition Vol. 83, no. (2018), p. 245-259
Relation: http://purl.org/au-research/grants/arc/DP140103213
Full Text: false
Reviewed:
Description: The aim of this paper is to design an algorithm based on nonsmooth optimization techniques to solve the minimum sum-of-squares clustering problems in very large data sets. First, the clustering problem is formulated as a nonsmooth optimization problem. Then the limited memory bundle method [Haarala et al., 2007] is modified and combined with an incremental approach to design a new clustering algorithm. The algorithm is evaluated using real world data sets with both the large number of attributes and the large number of data points. It is also compared with some other optimization based clustering algorithms. The numerical results demonstrate the efficiency of the proposed algorithm for clustering in very large data sets.

LiNearN : A new approach to nearest neighbour density estimator

- Wells, Jonathan, Ting, Kaiming, Washio, Takashi

Authors: Wells, Jonathan , Ting, Kaiming , Washio, Takashi
Date: 2014
Type: Text , Journal article
Relation: Pattern Recognition Vol. 47, no. 8 (2014), p. 2702-2720
Full Text: false
Reviewed:
Description: Despite their wide spread use, nearest neighbour density estimators have two fundamental limitations: O(n2) time complexity and O(n) space complexity. Both limitations constrain nearest neighbour density estimators to small data sets only. Recent progress using indexing schemes has improved to near linear time complexity only.We propose a new approach, called LiNearN for Linear time Nearest Neighbour algorithm, that yields the first nearest neighbour density estimator having O(n) time complexity and constant space complexity, as far as we know. This is achieved without using any indexing scheme because LiNearN uses a subsampling approach for which the subsample values are significantly less than the data size. Like existing density estimators, our asymptotic analysis reveals that the new density estimator has a parameter to trade off between bias and variance. We show that algorithms based on the new nearest neighbour density estimator can easily scale up to data sets with millions of instances in anomaly detection and clustering tasks. Highlights•Reject the premise that a NN algorithm must find the NN for every instance.•The first NN density estimator that has O(n) time complexity and O(1) space complexity.•These complexities are achieved without using any indexing scheme.•Our asymptotic analysis reveals that it trades off between bias and variance.•Easily scales up to large data sets in anomaly detection and clustering tasks.

Commentary : A decomposition of the outlier detection problem into a set of supervised learning problems

- Zhu, Ye, Ting, Kaiming

Authors: Zhu, Ye , Ting, Kaiming
Date: 2016
Type: Text , Journal article
Relation: Machine Learning Vol. 105, no. 2 (2016), p. 301-304
Full Text: false
Reviewed:
Description: This article discusses the material in relation to iForest (Liu et al. in ACM Trans Knowl Discov Data 6(1):3, 2012) reported in a recent Machine Learning Journal paper by Paulheim and Meusel (Mach Learn 100(2–3):509–531, 2015). It presents an empirical comparison result of iForest using the default parameter settings suggested by its creator (Liu et al. 2012) and iForest using the settings employed by Paulheim and Meusel (2015). This comparison has an impact on the conclusion made by Paulheim and Meusel (2015). © 2016, The Author(s).

COREG : A corner based registration technique for multimodal images

- Lv, Guohua, Teng, Shyh, Lu, Guojun

Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
Date: 2018
Type: Text , Journal article
Relation: Multimedia Tools and Applications Vol. 77, no. 10 (2018), p. 12607-12634
Full Text: false
Reviewed:
Description: This paper presents a COrner based REGistration technique for multimodal images (referred to as COREG). The proposed technique focuses on addressing large content and scale differences in multimodal images. Unlike traditional multimodal image registration techniques that rely on intensities or gradients for feature representation, we propose to use contour-based corners. First, curvature similarity between corners are for the first time explored for the purpose of multimodal image registration. Second, a novel local descriptor called Distribution of Edge Pixels Along Contour (DEPAC) is proposed to represent the edges in the neighborhood of corners. Third, a simple yet effective way of estimating scale difference is proposed by making use of geometric relationships between corner triplets from the reference and target images. Using a set of benchmark multimodal images and multimodal microscopic images, we will demonstrate that our proposed technique outperforms a state-of-the-art multimodal image registration technique. © 2017, Springer Science+Business Media, LLC.

A detector of structural similarity for multi-modal microscopic image registration

- Lv, Guohua, Teng, Shyh, Lu, Guojun

Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
Date: 2018
Type: Text , Journal article
Relation: Multimedia Tools and Applications Vol. 77, no. 6 (2018), p. 7675-7701
Full Text: false
Reviewed:
Description: This paper presents a Detector of Structural Similarity (DSS) to minimize the visual differences between brightfield and confocal microscopic images. The context of this work is that it is very challenging to effectively register such images due to a low structural similarity in image contents. To address this issue, DSS aims to maximize the structural similarity by utilizing the intensity relationships among red-green-blue (RGB) channels in images. Technically, DSS can be combined with any multi-modal image registration technique in registering brightfield and confocal microscopic images. Our experimental results show that DSS significantly increases the visual similarity in such images, thereby improving the registration performance of an existing state-of-the-art multi-modal image registration technique by up to approximately 27%. © 2017, Springer Science+Business Media New York.

Grouping points by shared subspaces for effective subspace clustering

- Zhu, Ye, Ting, Kaiming, Carman, Mark

Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
Date: 2018
Type: Text , Journal article
Relation: Pattern Recognition Vol. 83, no. (2018), p. 230-244
Full Text: false
Reviewed:
Description: Clusters may exist in different subspaces of a multidimensional dataset. Traditional full-space clustering algorithms have difficulty in identifying these clusters. Various subspace clustering algorithms have used different subspace search strategies. They require clustering to assess whether cluster(s) exist in a subspace. In addition, all of them perform clustering by measuring similarity between points in the given feature space. As a result, the subspace selection and clustering processes are tightly coupled. In this paper, we propose a new subspace clustering framework named CSSub (Clustering by Shared Subspaces). It enables neighbouring core points to be clustered based on the number of subspaces they share. It explicitly splits candidate subspace selection and clustering into two separate processes, enabling different types of cluster definitions to be employed easily. Through extensive experiments on synthetic and real-world datasets, we demonstrate that CSSub discovers non-redundant subspace clusters with arbitrary shapes in noisy data; and it significantly outperforms existing state-of-the-art subspace clustering algorithms.

Isolation-based anomaly detection using nearest-neighbor ensembles

- Bandaragoda, Tharindu, Ting, Kaiming, Albrecht, David, Liu, Fei, Zhu, Ye, Wells, Jonathan

Authors: Bandaragoda, Tharindu , Ting, Kaiming , Albrecht, David , Liu, Fei , Zhu, Ye , Wells, Jonathan
Date: 2018
Type: Text , Journal article
Relation: Computational Intelligence Vol. 34, no. 4 (2018), p. 968-998
Full Text: false
Reviewed:
Description: The first successful isolation-based anomaly detector, ie, iForest, uses trees as a means to perform isolation. Although it has been shown to have advantages over existing anomaly detectors, we have identified 4 weaknesses, ie, its inability to detect local anomalies, anomalies with a high percentage of irrelevant attributes, anomalies that are masked by axis-parallel clusters, and anomalies in multimodal data sets. To overcome these weaknesses, this paper shows that an alternative isolation mechanism is required and thus presents iNNE or isolation using Nearest Neighbor Ensemble. Although relying on nearest neighbors, iNNE runs significantly faster than the existing nearest neighbor–based methods such as the local outlier factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. © 2018 Wiley Periodicals, Inc.

Quick View

Local contrast as an effective means to robust clustering against varying densities

- Chen, Bo, Ting, Kaiming, Washio, Takashi, Zhu, Ye

Authors: Chen, Bo , Ting, Kaiming , Washio, Takashi , Zhu, Ye
Date: 2018
Type: Text , Journal article
Relation: Machine Learning Vol. 107, no. 8-10 (2018), p. 1621-1645
Full Text:
Reviewed:
Description: Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP. © 2018, The Author(s).

Quick View

A survey on context awareness in big data analytics for business applications

- Dinh, Loan, Karmakar, Gour, Kamruzzaman, Joarder

Authors: Dinh, Loan , Karmakar, Gour , Kamruzzaman, Joarder
Date: 2020
Type: Text , Journal article
Relation: Knowledge and Information Systems Vol. 62, no. 9 (2020), p. 3387-3415
Full Text:
Reviewed:
Description: The concept of context awareness has been in existence since the 1990s. Though initially applied exclusively in computer science, over time it has increasingly been adopted by many different application domains such as business, health and military. Contexts change continuously because of objective reasons, such as economic situation, political matter and social issues. The adoption of big data analytics by businesses is facilitating such change at an even faster rate in much complicated ways. The potential benefits of embedding contextual information into an application are already evidenced by the improved outcomes of the existing context-aware methods in those applications. Since big data is growing very rapidly, context awareness in big data analytics has become more important and timely because of its proven efficiency in big data understanding and preparation, contributing to extracting the more and accurate value of big data. Many surveys have been published on context-based methods such as context modelling and reasoning, workflow adaptations, computational intelligence techniques and mobile ubiquitous systems. However, to our knowledge, no survey of context-aware methods on big data analytics for business applications supported by enterprise level software has been published to date. To bridge this research gap, in this paper first, we present a definition of context, its modelling and evaluation techniques, and highlight the importance of contextual information for big data analytics. Second, the works in three key business application areas that are context-aware and/or exploit big data analytics have been thoroughly reviewed. Finally, the paper concludes by highlighting a number of contemporary research challenges, including issues concerning modelling, managing and applying business contexts to big data analytics. © 2020, Springer-Verlag London Ltd., part of Springer Nature.

Quick View

Scholar2vec : vector representation of scholars for lifetime collaborator prediction

- Wang, Wei, Xia, Feng, Wu, Jian, Gong, Zhiguo, Tong, Hanghang, Davison, Brian

Authors: Wang, Wei , Xia, Feng , Wu, Jian , Gong, Zhiguo , Tong, Hanghang , Davison, Brian
Date: 2021
Type: Text , Journal article
Relation: ACM Transactions on Knowledge Discovery from Data Vol. 15, no. 3 (2021), p.
Full Text:
Reviewed:
Description: While scientific collaboration is critical for a scholar, some collaborators can be more significant than others, e.g., lifetime collaborators. It has been shown that lifetime collaborators are more influential on a scholar's academic performance. However, little research has been done on investigating predicting such special relationships in academic networks. To this end, we propose Scholar2vec, a novel neural network embedding for representing scholar profiles. First, our approach creates scholars' research interest vector from textual information, such as demographics, research, and influence. After bridging research interests with a collaboration network, vector representations of scholars can be gained with graph learning. Meanwhile, since scholars are occupied with various attributes, we propose to incorporate four types of scholar attributes for learning scholar vectors. Finally, the early-stage similarity sequence based on Scholar2vec is used to predict lifetime collaborators with machine learning methods. Extensive experiments on two real-world datasets show that Scholar2vec outperforms state-of-the-art methods in lifetime collaborator prediction. Our work presents a new way to measure the similarity between two scholars by vector representation, which tackles the knowledge between network embedding and academic relationship mining. © 2021 Association for Computing Machinery.

Quick View

Deep Reinforcement Learning for Vehicular Edge Computing: An Intelligent Offloading System

- Ning, Zhaolong, Dong, Peiran, Wang, Xiaojie, Rodrigues, Joel, Xia, Feng

Authors: Ning, Zhaolong , Dong, Peiran , Wang, Xiaojie , Rodrigues, Joel , Xia, Feng
Date: 2019
Type: Text , Journal article
Relation: ACM Transactions on Intelligent Systems and Technology Vol. 10, no. 6 (Dec 2019), p. 24
Full Text:
Reviewed:
Description: The development of smart vehicles brings drivers and passengers a comfortable and safe environment. Various emerging applications are promising to enrich users' traveling experiences and daily life. However, how to execute computing-intensive applications on resource-constrained vehicles still faces huge challenges. In this article, we construct an intelligent offloading system for vehicular edge computing by leveraging deep reinforcement learning. First, both the communication and computation states are modelled by finite Markov chains. Moreover, the task scheduling and resource allocation strategy is formulated as a joint optimization problem to maximize users' Quality of Experience (QoE). Due to its complexity, the original problem is further divided into two sub-optimization problems. A two-sided matching scheme and a deep reinforcement learning approach are developed to schedule offloading requests and allocate network resources, respectively. Performance evaluations illustrate the effectiveness and superiority of our constructed system.

Quick View

Integrated generalized zero-shot learning for fine-grained classification

- Shermin, Tasfia, Teng, Shyh, Sohel, Ferdous, Murshed, Manzur, Lu, Guojun

Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
Date: 2022
Type: Text , Journal article
Relation: Pattern Recognition Vol. 122, no. (2022), p.
Full Text:
Reviewed:
Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd

Quick View

Tracing the Pace of COVID-19 research : topic modeling and evolution

- Liu, Jiaying, Nie, Hansong, Li, Shihao, Ren, Jing, Xia, Feng

Authors: Liu, Jiaying , Nie, Hansong , Li, Shihao , Ren, Jing , Xia, Feng
Date: 2021
Type: Text , Journal article
Relation: Big Data Research Vol. 25, no. (2021), p.
Full Text:
Reviewed:
Description: COVID-19 has been spreading rapidly around the world. With the growing attention on the deadly pandemic, discussions and research on COVID-19 are rapidly increasing to exchange latest findings with the hope to accelerate the pace of finding a cure. As a branch of information technology, artificial intelligence (AI) has greatly expedited the development of human society. In this paper, we investigate and visualize the on-going advancements of early scientific research on COVID-19 from the perspective of AI. By adopting the Latent Dirichlet Allocation (LDA) model, this paper allocates the research articles into 50 key research topics pertinent to COVID-19 according to their abstracts. We present an overview of early studies of the COVID-19 crisis at different scales including referencing/citation behavior, topic variation and their inner interactions. We also identify innovative papers that are regarded as the cornerstones in the development of COVID-19 research. The results unveil the focus of scientific research, thereby giving deep insights into how the academic society contributes to combating the COVID-19 pandemic. © 2021 Elsevier Inc. **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Jing Ren and Feng Xia" is provided in this record**
Description: COVID-19 has been spreading rapidly around the world. With the growing attention on the deadly pandemic, discussions and research on COVID-19 are rapidly increasing to exchange latest findings with the hope to accelerate the pace of finding a cure. As a branch of information technology, artificial intelligence (AI) has greatly expedited the development of human society. In this paper, we investigate and visualize the on-going advancements of early scientific research on COVID-19 from the perspective of AI. By adopting the Latent Dirichlet Allocation (LDA) model, this paper allocates the research articles into 50 key research topics pertinent to COVID-19 according to their abstracts. We present an overview of early studies of the COVID-19 crisis at different scales including referencing/citation behavior, topic variation and their inner interactions. We also identify innovative papers that are regarded as the cornerstones in the development of COVID-19 research. The results unveil the focus of scientific research, thereby giving deep insights into how the academic society contributes to combating the COVID-19 pandemic. © 2021 Elsevier Inc.

Quick View

The gene of scientific success

- Kong, Xiangjie, Zhang, Jun, Zhang, Da, Bu, Yi, Ding, Ying, Xia, Feng

Authors: Kong, Xiangjie , Zhang, Jun , Zhang, Da , Bu, Yi , Ding, Ying , Xia, Feng
Date: 2020
Type: Text , Journal article
Relation: ACM Transactions on Knowledge Discovery from Data Vol. 14, no. 4 (2020), p.
Full Text:
Reviewed:
Description: This article elaborates how to identify and evaluate causal factors to improve scientific impact. Currently, analyzing scientific impact can be beneficial to various academic activities including funding application, mentor recommendation, discovering potential cooperators, and the like. It is universally acknowledged that high-impact scholars often have more opportunities to receive awards as an encouragement for their hard work. Therefore, scholars spend great efforts in making scientific achievements and improving scientific impact during their academic life. However, what are the determinate factors that control scholars' academic success? The answer to this question can help scholars conduct their research more efficiently. Under this consideration, our article presents and analyzes the causal factors that are crucial for scholars' academic success. We first propose five major factors including article-centered factors, author-centered factors, venue-centered factors, institution-centered factors, and temporal factors. Then, we apply recent advanced machine learning algorithms and jackknife method to assess the importance of each causal factor. Our empirical results show that author-centered and article-centered factors have the highest relevancy to scholars' future success in the computer science area. Additionally, we discover an interesting phenomenon that the h-index of scholars within the same institution or university are actually very close to each other. © 2020 ACM.

1
2

‹ › ×