Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable OLTP applications
- Authors: Kamal, Joarder , Murshed, Manzur , Buyya, Rajkumar
- Date: 2016
- Type: Text , Journal article
- Relation: Future Generation Computer Systems Vol. 56, no. March (2016), p. 421-436
- Full Text: false
- Reviewed:
- Description: On-line Transaction Processing (OLTP) applications often rely on shared-nothing distributed databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-distributed servers can adversely impact the performance of such databases, especially when the transactions are short-lived and these require immediate responses. The. k-way min-cut graph clustering based database repartitioning algorithms can be used to reduce the number of DTs with acceptable level of load balancing. Web applications, where DT profile changes over time due to dynamically varying workload patterns, frequent database repartitioning is needed to keep up with the change. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and. k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster-the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hypergraph, and compressed hypergraph representations used in the literature. To compare the performance of any incremental repartitioning framework without any bias of the external min-cut algorithm due to graph size variations, a transaction generation model is developed that can maintain a target number of unique transactions in any arbitrary observation window, irrespective of new transaction arrival rate. The overall impact of DTs at any instance is estimated from the exponential moving average of the recurrence period of unique transactions to avoid transient fluctuations. The effectiveness and adaptability of the proposed incremental repartitioning framework for transactional workloads have been established with extensive simulations on both range partitioned and consistent hash partitioned databases. © 2015 Elsevier B.V.
Distributed database management systems : Architectural design choices for the cloud
- Authors: Kamal, Joarder , Murshed, Manzur
- Date: 2014
- Type: Text , Book chapter
- Relation: Cloud Computing : Challenges, Limitations and R&D Solutions (Computer Communications and Networks series) Chapter 2 p. 23-50
- Full Text: false
- Reviewed:
- Description: Cloud computing has changed the way we used to exploit software and systems. The two decades’ practice of architecting solutions and services over the Internet has just revolved within the past few years. End users are now relying more on paying for what they use instead of purchasing a full-phase license. System owners are also in rapid hunt for business profits by deploying their services in the Cloud and thus maximising global outreach and minimising overall management costs. However, deploying and scaling Cloud applications regionally and globally are highly challenging. In this context, distributed data management systems in the Cloud promise rapid elasticity and horizontal scalability so that Cloud applications can sustain enormous growth in data volume, velocity, and value. Besides, distributed data replication and rapid partitioning are the two fundamental hammers to nail down these challenges. While replication ensures database read scalability and georeachability, data partitioning favours database write scalability and system-level load balance. System architects and administrators often face difficulties in managing a multi-tenant distributed database system in Cloud scale as the underlying workload characteristics change frequently. In this chapter, the inherent challenges of such phenomena are discussed in detail alongside their historical backgrounds. Finally, potential way outs to overcome such architectural barriers are presented under the light of recent research and development in this area.
Progressive data stream mining and transaction classification for workload-aware incremental database repartitioning
- Authors: Kamal, Joarder , Murshed, Manzur , Gaber, Mohamed
- Date: 2014
- Type: Text , Conference proceedings
- Relation: IEEE/ACM International Symposium on Big Data Computing, BDC 2014; London, United Kingdom; 8th-11th December 2014; p. 8-15
- Full Text:
- Reviewed:
- Description: Minimising the impact of distributed transactions (DTs) in a shared-nothing distributed database is extremely challenging for transactional workloads. With dynamic workload nature and rapid growth in data volume the underlying database requires incremental repartitioning to maintain acceptable level of DTs and data load balance with minimum physical data migrations. In a workload-aware repartitioning scheme transactional workload is modelled as graph or hyper graph, and subsequently perform k-way min-cut clustering guaranteeing minimum edge cuts can reduce the impact of DTs significantly by mapping the workload clusters into logical database partitions. However, without exploring the inherent workload characteristics, the overall processing and computing times for large-scale workload networks increase in polynomial orders. In this paper, a workload-aware incremental database repartitioning technique is proposed, which effectively exploits proactive transaction classification and workload stream mining techniques. Workload batches are modelled in graph, hyper graph, and compressed hyper graph then repartitioned to produce a fresh tuple-to-partition data migration plan for every incremental cycle. Experimental studies in a simulated TPC-C environment demonstrate that the proposed model can be effectively adopted in managing rapid data growth and dynamic workloads, thus progressively reduce the overall processing time required to operate over the workload networks.
Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable cloud applications
- Authors: Kamal, Joarder , Murshed, Manzur , Buyya, Rajkumar
- Date: 2014
- Type: Text , Conference paper
- Relation: 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC) p. 213-222
- Full Text: false
- Reviewed:
- Description: Cloud applications often rely on shared-nothing distributed databases that can sustain rapid growth in data volume. Distributed transactions (DTs) that involve data tuples from multiple geo-distributed servers can adversely impact the performance of such databases, especially when the transactions are short-lived in and require immediate response. The k-way min-cut graph clustering algorithm has been found effective to reduce the number of DTs with acceptable level of load balancing. Benefits of such a static partitioning scheme, however, is short-lived in Cloud applications with dynamically varying workload patterns where DT profile changes over time. This paper addresses this emerging challenge by introducing incremental repartitioning. In each repartitioning cycle, DT profile is learnt online and k-way min-cut clustering algorithm is applied on a special sub-graph representing all DTs as well as those non-DTs that have at least one tuple in a DT. The latter ensures that the min-cut algorithm minimally reintroduces new DTs from the non-DTs while maximally transforming existing DTs into non-DTs in the new partitioning. Potential load imbalance risk is mitigated by applying the graph clustering algorithm on the finer logical partitions instead of the servers and relying on random one-to-one cluster-to-partition mapping that naturally balances out loads. Inter-server data-migration due to repartitioning is kept in check with two special mappings favouring the current partition of majority tuples in a cluster -- the many-to-one version minimising data migrations alone and the one-to-one version reducing data migration without affecting load balancing. A distributed data lookup process, inspired by the roaming protocol in mobile networks, is introduced to efficiently handle data migration without affecting scalability. The effectiveness of the proposed framework is evaluated on realistic TPC-C workloads comprehensively using graph, hyper graph, and compressed hyper graph representations used in the literature. Simulation results convincingly support incremental repartitioning against static partitioning.