- Title
- Batch clustering algorithm for big data sets
- Creator
- Alguliyev, Rasim; Aliguliyev, Ramiz; Bagirov, Adil; Karimov, Rafael
- Date
- 2017
- Type
- Text; Conference proceedings
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/164197
- Identifier
- vital:12990
- Identifier
-
https://doi.org/10.1109/ICAICT.2016.7991657
- Identifier
- ISBN:9781509018406 (ISBN)
- Abstract
- Vast spread of computing technologies has led to abundance of large data sets. Today tech companies like, Google, Facebook, Twitter and Amazon handle big data sets and log terabytes, if not petabytes, of data per day. Thus, there is a need to find similarities and define groupings among the elements of these big data sets. One of the ways to find these similarities is data clustering. Currently, there exist several data clustering algorithms which differ by their application area and efficiency. Increase in computational power and algorithmic improvements have reduced the time for clustering of big data sets. But it usually happens that big data sets can't be processed whole due to hardware and computational restrictions. In this paper, the classic k-means clustering algorithm is compared to the proposed batch clustering (BC) algorithm for the required computation time and objective function. The BC algorithm is designed to cluster large data sets in batches but maintain the efficiency and quality. Several experiments confirm that batch clustering algorithm for big data sets is more efficient in using computational power, data storage and results in better clustering compared to k-means algorithm. The experiments are conducted with the data set of 2 (two) million two-dimensional data points. © 2016 IEEE.
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Relation
- 10th IEEE International Conference on Application of Information and Communication Technologies, AICT 2016; Baku, Azerbaijan; 12th-14th October 2016 p. 1-4
- Rights
- Copyright © 2016 IEEE.
- Rights
- This metadata is freely available under a CCO license
- Subject
- Batch Clustering; Big Data; Big Data Clustering; Clustering Algorithms; k-means
- Reviewed
- Hits: 1735
- Visitors: 1553
- Downloads: 2
Thumbnail | File | Description | Size | Format |
---|