Home   >   CSC-OpenAccess Library   >    Manuscript Information
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
N.Sandhya, K. Anuradha, V. Sowmya, Ch. Vidyadhari
Pages - 1 - 13     |    Revised - 10-08-2014     |    Published - 15-09-2014
Volume - 5   Issue - 1    |    Publication Date - September 2014  Table of Contents
MORE INFORMATION
KEYWORDS
Text Clustering, Partitional Approach, Initial Centroids, Similarity Measures, Cluster Accuracy.
ABSTRACT
Text document clustering is gaining popularity in the knowledge discovery field for effectively navigating, browsing and organizing large amounts of textual information into a small number of meaningful clusters. Text mining is a semi-automated process of extracting knowledge from voluminous unstructured data. A widely studied data mining problem in the text domain is clustering. Clustering is an unsupervised learning method that aims to find groups of similar objects in the data with respect to some predefined criterion. In this work we propose a variant method for finding initial centroids. The initial centroids are chosen by using farthest neighbors. For the partitioning based clustering algorithms traditionally the initial centroids are chosen randomly but in the proposed method the initial centroids are chosen by using farthest neighbors. The accuracy of the clusters and efficiency of the partition based clustering algorithms depend on the initial centroids chosen. In the experiment, kmeans algorithm is applied and the initial centroids for kmeans are chosen by using farthest neighbors. Our experimental results shows the accuracy of the clusters and efficiency of the kmeans algorithm is improved compared to the traditional way of choosing initial centroids.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
A. Ehrenfeucht and D. Haussler. “A new distance metric on strings computable in linear time”. Discrete Applied Math, 1988.
Anderberg, M, “Cluster analysis for applications” ,Academic Press, New York 1973.
Anna Huang, “Similarity Measures for Text Document Clustering”, published in the proceedings of New Zealand Computer Science Research Student Conference 2008.
B. Larson, C. Aone, “Fast and effective text mining using linear-time document clustering”, in:Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 98(463), 1999, pp. 16–22.
Bradley, P. S., Fayyad, “Refining initial points for K-Means clustering”, Proc. 15th International Conf. on Machine Learning, San Francisco, CA, 1998, pp. 91-99.
C.C. Aggarwal, S.G. Gates, P.S. Yu, “On the merits of building categorization systems by supervised clustering”, in: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp.352–356.
C.J.Van Rijsbergen,(1989), “Information Retrieval”, Buttersworth, London, Second Edition.
D. Manning, Prabhakar Raghavan, Hinrich Schütze, “An Introduction to Information Retrieval Christopher”, Cambridge University Press, Cambridge, England
D.R. Cutting, D.R. Karger, J.O. Pedersen, and J.W. Tukey, Scatter/Gather: ”A Cluster-based Approach to Browsing Large Document Collections”, SIGIR ‘92, Pages 318 – 329, 1992.
G. Kowalski,”Information Retrieval Systems – Theory and Implementation”, Kluwer Academic Publishers, 1997.
G. Salton, M.J. McGill, “Introduction to Modern Information Retrieval”. McGraw-Hill, 1989.
Harmanpreet singh, Kamaljit Kaur, “New Method for Finding Initial Cluster Centroids in Kmeans Algorithm”,International Journal of Computer Applications (0975 – 8887) Volume 74–No.6, July 2013
K. A. Abdul Nazeer and M. P. Sebastian, “ Improving the accuracy and efficiency of the kmeans clustering algorithm”, Proceedings of the World Congress on Engineering, London,UK, vol. 1, 2009.
Katsavounidis, I., Kuo, C., Zhang, Z., “A new initialization technique for generalized lloyd iteration”, IEEE Signal Processing Letters 1 (10), 1994, pp. 144-146.
Koheri Arai and Ali Ridho Barakbah, “Hierarchical k-means: an algorithm for centroids initialization for k-means”, Reports of The Faculty of Science and Engineering Saga University, vol. 36, No.1, 2007.
M. Rodeh, V. R. Pratt, and S. Even. “Linear algorithm for data compression via string matching”. In Journal of the ACM, pages 28(1):16–24, 1981.
M.F. Porter, “An algorithm for suffix stripping”, Program, vol.14, no.3, pp. 130-137, 1980.
Madhu Yedla, S.R. Pathakota, T.M. Srinivasa, “Enhancing K-means Clustering Algorithm with Improved Initial Centre”, International Journal of Computer Science and Information Technologies, 1 (2) , 2010, pp. 121-125.
O. Zamir, O. Etzioni, O. Madani, R.M. Karp, Fast and Intuitive Clustering of Web Documents,KDD ’97, Pages 287-290, 1997.
O. Zamir, O. Etzioni, O. Madani, R.M. Karp, “Fast and intuitive clustering of web documents”,in: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997, pp. 287–290.
Peter Weiner. “Linear pattern matching algorithms”. In SWAT ’73: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973), pages 1–11,Washington, DC, USA, 1973. IEEE Computer Society.
R. Baeza-Yates, B. Ribeiro-Neto, “Modern Information Retrieval”, Addison-Wesley, 1999.
Salton, G., Wong, A., Yang, C.S. (1975). “A vector space model for automatic indexing”.Communications of the ACM, 18(11):613-620.
Samarjeet Borah, M.K. Ghose, “Performance Analysis of AIM-K-means & K- means in Quality Cluster Generation”, Journal of Computing, vol. 1, Issue 1, December 2009.
Saurabh Sharma, Vishal Gupta. ”Domain Based Punjabi Text Document Clustering”.Proceedings of COLING 2012: Demonstration Papers, pages 393–400,COLING 2012,Mumbai, December 2012.
Tou, J., Gonzales, “Pattern Recognition Principles” ,Addison-Wesley, Reading, MA, 1974.
Ye Yunming, “Advances in knowledge discovery and data mining”, (Springer, 2006).
Professor N.Sandhya
VNRVJIET - India
sandhyanadela@gmail.com
Professor K. Anuradha
Professor/CSE Gokaraju Rangaraju Institute of Engineering and Technology Hyderabad, 500 090,India - India
Associate Professor V. Sowmya
Associate.Prof/CSE Gokaraju Rangaraju Institute of Engineering and Technology Hyderabad, 500 090,India - India
Associate Professor Ch. Vidyadhari
Asst.Prof/CSE Gokaraju Rangaraju Institute of Engineering and Technology Hyderabad, 500 090,India - India