A highway-centric labeling approach for answering distance queries on large sparse graphs
- Authors: Jin, Ruoming , Ruan, Ning , Xiang, Yang , Lee, Victor
- Date: 2012
- Type: Text , Conference paper
- Relation: ACM SIGMOD International Conference on Management of Data, 2012 p. 445-456
- Full Text: false
- Reviewed:
- Description: The distance query, which asks the length of the shortest path from a vertex u to another vertex v, has applications ranging from link analysis, semantic web and other ontology processing, to social network operations. Here, we propose a novel labeling scheme, referred to as Highway-Centric Labeling, for answering distance queries in a large sparse graph. It empowers the distance labeling with a highway structure and leverages a novel bipartite set cover framework/algorithm. Highway-centric labeling provides better labeling size than the state-of-the-art 2-hop labeling, theoretically and empirically. It also offers both exact distance and approximate distance with bounded accuracy. A detailed experimental evaluation on both synthetic and real datasets demonstrates that highway-centric labeling can outperform the state-of-the-art distance computation approaches in terms of both index size and query time. © 2012 ACM.
- Description: Proceedings of the ACM SIGMOD International Conference on Management of Data
SCARAB: scaling reachability computation on large graphs
- Authors: Jin, Ruoming , Ruan, Ning , Dey, Saikat , Xu, Jeffrey Yu
- Date: 2012
- Type: Text , Conference paper
- Relation: ACM SIGMOD International Conference on Management of Data, 2012 p. 169-180
- Full Text: false
- Reviewed:
- Description: Most of the existing reachability indices perform well on small- to medium- size graphs, but reach a scalability bottleneck around one million vertices/edges. As graphs become increasingly large, scalability is quickly becoming the major research challenge for the reachability computation today. Can we construct indices which scale to graphs with tens of millions of vertices and edges? Can the existing reachability indices which perform well on moderate-size graphs be scaled to very large graphs? In this paper, we propose SCARAB (standing for SCAlable ReachABility), a unified reachability computation framework: it not only can scale the existing state-of-the-art reachability indices, which otherwise could only be constructed and work on moderate size graphs, but also can help speed up the online query answering approaches. Our experimental results demonstrate that SCARAB can perform on graphs with millions of vertices/edges and is also much faster then GRAIL, the state-of-the-art scalability index approach.
Distance preserving graph simplification
- Authors: Ruan, Ning , Jin, Ruoming , Yan, Huang
- Date: 2011
- Type: Text , Conference paper
- Relation: ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining p. 1200-1205
- Full Text: false
- Reviewed:
Path-tree: An efficient reachability indexing scheme for large directed graphs
- Authors: Jin, Ruoming , Ruan, Ning , Xiang, Yang , Wang, Haixun
- Date: 2011
- Type: Text , Journal article
- Relation: ACM Transactions on Database Systems Vol. 36, no. 1 (2011), p.
- Full Text: false
- Reviewed:
Computing label-constraint reachability in graph databases
- Authors: Jin, Ruoming , Hong, Hui , Wang, Haixun , Ruan, Ning , Yang, Xiang
- Date: 2010
- Type: Text , Conference paper
- Relation: SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data,
- Full Text: false
- Reviewed:
- Description: Our world today is generating huge amounts of graph data such as social networks, biological networks, and the semantic web. Many of these real-world graphs are edge-labeled graphs, i.e., each edge has a label that denotes the relationship between the two vertices connected by the edge. A fundamental research problem on these labeled graphs is how to handle the label-constraint reachability query: Can vertex u reach vertex v through a path whose edge labels are constrained by a set of labels? In this work, we introduce a novel tree-based index framework which utilizes the directed maximal weighted spanning tree algorithm and sampling techniques to maximally compress the generalized transitive closure for the labeled graphs. An extensive experimental evaluation on both real and synthetic datasets demonstrates the efficiency of our approach in answering label-constraint reachability queries.
3-HOP: a high-compression indexing scheme for reachability query
- Authors: Jin, Ruoming , Yang, Xiang , Ruan, Ning , Fuhry, David
- Date: 2009
- Type: Text , Conference paper
- Relation: SIGMOD '09 Proceedings of the 2009 ACM SIGMOD International Conference on Management of data p. 813-826
- Full Text: false
- Reviewed:
- Description: Reachability queries on large directed graphs have attracted much attention recently. The existing work either uses spanning structures, such as chains or trees, to compress the complete transitive closure, or utilizes the 2-hop strategy to describe the reachability. Almost all of these approaches work well for very sparse graphs. However, the challenging problem is that as the ratio of the number of edges to the number of vertices increases, the size of the compressed transitive closure grows very large. In this paper, we propose a new 3-hop indexing scheme for directed graphs with higher density. The basic idea of 3-hop indexing is to use chain structures in combination with hops to minimize the number of structures that must be indexed. Technically, our goal is to find a 3-hop scheme over dense DAGs (directed acyclic graphs) with minimum index size. We develop an efficient algorithm to discover a transitive closure contour, which yields near optimal index size. Empirical studies show that our 3-hop scheme has much smaller index size than state-of-the-art reachability query schemes such as 2-hop and path-tree when DAGs are not very sparse, while our query time is close to path-tree, which is considered to be one of the best reachability query schemes.
A sparsification approach for temporal graphical model decomposition
- Authors: Ruan, Ning , Jin, Ruoming , Lee, Victor , Huang, K
- Date: 2009
- Type: Text , Conference paper
- Full Text: false
- Reviewed:
Effective and efficient itemset pattern summarization: regression-based approaches
- Authors: Jin, Ruoming , Abu-ata, Muad , Xiang, Yang , Ruan, Ning
- Date: 2008
- Type: Text , Conference paper
- Relation: KDD '08 Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
- Full Text: false
- Reviewed:
- Description: In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a non-linear regression problem. We show that under certain conditions, we can transform the non-linear regression problem to a linear regression problem. We propose two new methods, k-regression and tree-regression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The K-regression approach, employing a K-means type clustering method, guarantees that the total restoration error achieves a local minimum. The treeregression approach employs a decision-tree type of top-down partition process. In addition, we discuss alternatives to estimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both real and synthetic datasets demonstrates that our approaches significantly improve the summarization performance in terms of both accuracy (restoration error), and computational cost.
Efficiently answering reachability queries on very large directed graphs
- Authors: Jin, Ruoming , Xiang, Yang , Ruan, Ning , Wang, Haixun
- Date: 2008
- Type: Text , Conference paper
- Relation: SIGMOD '08 Proceedings of the 2008 ACM SIGMOD international conference on Management of data, p. 595-608
- Full Text: false
- Reviewed:
- Description: Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as path-tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.