Survey: Work implemented in Data Mining and Clustering Algorithm |
Author(s): |
Prof. Dushyant Chawda , LDRP-ITR; Prof. Pratik Modi, LDRP-ITR; Monil Khamar, LDRP-ITR |
Keywords: |
Cosine Similarity Document Clustering TF-IDF Topics Web Data Mining |
Abstract |
Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed a documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach is based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts the weight of the document calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights. |
Other Details |
Paper ID: LDRPTCP029 Published in: Conference 12 : LDRP TECON23 Publication Date: 23/12/2023 Page(s): 147-149 |
Article Preview |
Download Article |
|