Home   >   CSC-OpenAccess Library   >    Manuscript Information
Document Topic Generation in Text Mining by using Cluster Analysis with EROCK
Rizwan Ahmad, Aasia Khanum
Pages - 176 - 182     |    Revised - 30-04-2010     |    Published - 10-06-2010
Volume - 4   Issue - 2    |    Publication Date - May 2010  Table of Contents
Text Mining, Cluster Analysis, Document Similarity
Clustering is useful technique in the field of textual data mining. Cluster analysis divides objects into meaningful groups based on similarity between objects. Copious material is available from the World Wide Web (WWW) in response to any user-provided query. It becomes tedious for the user to manually extract real required information from this material. This paper proposes a scheme to effectively address this problem with the help of cluster analysis. In particular, the ROCK algorithm is studied with some modifications. ROCK generates better clusters than other clustering algorithms for data with categorical attributes. We present an enhanced version of ROCK called Enhanced ROCK (EROCK) with improved similarity measure as well as storage efficiency. Evaluation of the proposed algorithm done on standard text documents shows improved performance.
CITED BY (13)  
1 Hashimi, H., Hafez, A., & Mathkour, H. (2015). Selection criteria for text mining approaches. Computers in Human Behavior, 51, 729-733.
2 El-Said, A. M., ELDESOKY, A., & Arafat, H. A. (2015). An Efficient Approach to Construct Object Model of Static Textual Structure with Dynamic Behavior Based on Q-learning. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 31(4), 1267-1289.
3 Kadhim, A. I., Cheah, Y. N., & Ahamed, N. H. (2014, December). Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering. In Artificial Intelligence with Applications in Engineering and Technology (ICAIET), 2014 4th International Conference on (pp. 69-73). IEEE.
4 Benghabrit, A., Ouhbi, B., Behja, H., & Frikh, B. (2013). Statistical and Semantic Feature Selection for Text Clustering. Journal of Intelligent Computing, 4(2), 69-79.
5 El-Said, A. M., Eldesoky, A. I., & Arafat, H. A. (2013). An efficient object oriented text analysis (OOTA) approach to construct static structure with dynamic behavior. International Journal of Information Acquisition, 9(01), 1350006.
6 Benghabrit, A., Ouhbi, B., Behja, H., & Frikh, B. (2013, June). Text clustering using statistical and semantic data. In Computer and Information Technology (WCCIT), 2013 World Congress on (pp. 1-6). IEEE.
7 Mahalle, M. S. D., & Shah, D. K. Semantic Based Approach for Document Clustering. Journal of Sci., Engg. & Tech. Mgt. Vol 4 (1), MPSTME, Mumbai. July 2012.
8 Khandare, S. S., & Malode, S. N. International Journal of Science Innovations and Discoveries An International peer.
9 Jiang, F. Deriving Topics and Opinions from Microblog.
10 Mahalle, M. S. D., & Shah, K. Document Clustering by using Semantics.
11 Tyagi, A., & Sharma, S. (2012). Implementation Of ROCK Clustering Algorithm For The Optimization Of Query Searching Time. International Journal on Computer Science and Engineering, 4(5), 809.
12 Keole, R. R., & Bamnote, G. R. (2010). Clustering Techniques in Web Content Mining. International Journal of Advanced Research in Computer Science, 1(4).
13 Keole, R. R., & Bamnote, G. R. (2010). Fuzzy Clustering in Web Content Mining. International Journal of Advanced Research in Computer Science, 1(4).
1 Google Scholar 
2 Academic Journals Database 
3 ScientificCommons 
4 Academic Index 
5 CiteSeerX 
6 refSeek 
7 iSEEK 
8 Socol@r  
9 ResearchGATE 
10 Libsearch 
11 Bielefeld Academic Search Engine (BASE) 
12 Scribd 
13 WorldCat 
14 SlideShare 
16 PdfSR 
Sholom Weiss, Brian White, Chid Apte,” Lightweight Document Clustering”, IBM Research Report RC-21684.
Sudipto Guha, Rajeev Rastogi and Kyuseok Shim, “ROCK: A robust clustering algorithm for categorical attributes”. In: IEEE Internat. Conf. Data Engineering, Sydney, March 1999.
Alain Lelu, Martine Cadot, Pascal Cuxac, “Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends.”, International Workshop on Webometrics, Informatics and Scientometrics & Seventh COLIENT Meeting, France, 2006.
Brigitte Mathiak and Silke Eckstein,” Five Steps to Text Mining in Biomedical Literature”, Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics
Huang, Z. (1998). Extensions to the K-means Algorithm for Clustering Large Datasets with Categorical Values. Data Mining and Knowledge Discovery, 2, p. 283-304.
Huidong Jin , Man-Leung Wong , K. -S. Leung, “Scalable Model-Based Clustering for Large Databases Based on Data Summarization”, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.27 n.11, p.1710-1719, November 2005.
Linas Baltruns, Juozas Gordevicius, “Implementation of CURE Clustering Algorithm”, February 1, 2005.
M. Castellano, G. Mastronardi, A. Aprile, and G. Tarricone,”A Web Text Mining Flexible Architecture”, World Academy of Science, Engineering and Technology 32 2007.
Masrah Azrifah Azmi Murad, Trevor Martin,”Similarity-Based Estimation for Document Summarization using Fuzzy Sets”, IJCSS, Volume (1): Issue (4), pp 1-12.
Murtagh, F., “A Survey of Recent Advances in Hierarchical Clustering Algorithms”, The Computer Journal, 1983.
Ng, R.T. and Han, J. 1994. Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th VLDB Conference, Santiago, Chile, pp. 144–155.
S. Salvador, P. Chan, Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms, Proceedings of the 16th IEE International Conference on Tools with AI, 2004, pp. 576–584.
Shaoxu Song and Chunping Li, “Improved ROCK for Text Clustering Using Asymmetric Proximity”, SOFSEM 2006, LNCS 3831, pp. 501–510, 2006.
Sohil Dineshkumar Pandya, Paresh V Virparia, “Testing Various Similarity Metrics and their Permutations with
Stan Salvador and Philip Chan, Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, Proc. 16th IEEE Intl. Conf. on Tools with AI, pp. 576–584, 2004.
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, “CURE: An Efficient Clustering Algorithm for Large Databases”.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu, “An Efficient k-Means Clustering Algorithm: Analysis and Implementation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, July 2002.
Tian Zhang, Raghu Ramakrishan, Miron Livny, “BIRCH: An Efficent Data Clustering Method for Very Large Databases”.
] Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-sheng Chen, Oner Ulvi Celepcikay, Christian Giusti, and Christoph F. Eick, "MOSAIC: A proximity graph approach for agglomerative clustering," Proceedings 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Regensbug Germany, September 2007.
Mr. Rizwan Ahmad
- Pakistan
Mr. Aasia Khanum
- Pakistan