A highway-centric labeling approach for answering distance queries on large sparse graphs
- Authors: Jin, Ruoming , Ruan, Ning , Xiang, Yang , Lee, Victor
- Date: 2012
- Type: Text , Conference paper
- Relation: ACM SIGMOD International Conference on Management of Data, 2012 p. 445-456
- Full Text: false
- Reviewed:
- Description: The distance query, which asks the length of the shortest path from a vertex u to another vertex v, has applications ranging from link analysis, semantic web and other ontology processing, to social network operations. Here, we propose a novel labeling scheme, referred to as Highway-Centric Labeling, for answering distance queries in a large sparse graph. It empowers the distance labeling with a highway structure and leverages a novel bipartite set cover framework/algorithm. Highway-centric labeling provides better labeling size than the state-of-the-art 2-hop labeling, theoretically and empirically. It also offers both exact distance and approximate distance with bounded accuracy. A detailed experimental evaluation on both synthetic and real datasets demonstrates that highway-centric labeling can outperform the state-of-the-art distance computation approaches in terms of both index size and query time. © 2012 ACM.
- Description: Proceedings of the ACM SIGMOD International Conference on Management of Data
Effective and efficient itemset pattern summarization: regression-based approaches
- Authors: Jin, Ruoming , Abu-ata, Muad , Xiang, Yang , Ruan, Ning
- Date: 2008
- Type: Text , Conference paper
- Relation: KDD '08 Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
- Full Text: false
- Reviewed:
- Description: In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a non-linear regression problem. We show that under certain conditions, we can transform the non-linear regression problem to a linear regression problem. We propose two new methods, k-regression and tree-regression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The K-regression approach, employing a K-means type clustering method, guarantees that the total restoration error achieves a local minimum. The treeregression approach employs a decision-tree type of top-down partition process. In addition, we discuss alternatives to estimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both real and synthetic datasets demonstrates that our approaches significantly improve the summarization performance in terms of both accuracy (restoration error), and computational cost.
Efficiently answering reachability queries on very large directed graphs
- Authors: Jin, Ruoming , Xiang, Yang , Ruan, Ning , Wang, Haixun
- Date: 2008
- Type: Text , Conference paper
- Relation: SIGMOD '08 Proceedings of the 2008 ACM SIGMOD international conference on Management of data, p. 595-608
- Full Text: false
- Reviewed:
- Description: Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.
Path-tree: An efficient reachability indexing scheme for large directed graphs
- Authors: Jin, Ruoming , Ruan, Ning , Xiang, Yang , Wang, Haixun
- Date: 2011
- Type: Text , Journal article
- Relation: ACM Transactions on Database Systems Vol. 36, no. 1 (2011), p.
- Full Text: false
- Reviewed: