High Impact Factor : 4.396 icon | Submit Manuscript Online icon | UGC Approved icon

Survey: Work implemented in Data Mining and Clustering Algorithm

Author(s):

Prof. Dushyant Chawda , LDRP-ITR; Prof. Pratik Modi, LDRP-ITR; Monil Khamar, LDRP-ITR

Keywords:

Cosine Similarity Document Clustering TF-IDF Topics Web Data Mining

Abstract

Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed a documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach is based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts the weight of the document calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights.

Other Details

Paper ID: LDRPTCP029
Published in: Conference 12 : LDRP TECON23
Publication Date: 23/12/2023
Page(s): 147-149

Article Preview




Download Article