HTPI: Hadoop Text Processing Interface |
Author(s): |
| Disha Kangar , Kurukshetra University, Kurukshetra; Dr. Kanwal Garg, Kurukshetra University, Kurukshetra |
Keywords: |
| Document Similarity, Hadoop, Information Retreival, Jaccard Coefficient, Map-Reduce, Skipping, Stemming. |
Abstract |
|
Text mining is a practice which is regarded as the supporting pillars of Information Retreival. This paper is in simple terms dedicated to text mining and bear a prime focus on mining academic papers. An architecture is proposed by the authors is presented in the paper, which they have named HTPI. This framework is built upon Java eclipse using Apache Hadoop. The problem under consideration for the paper is the reference metamorphosis of the references mentioned in the references section of any scientific paper based upon the similarity score(between the referenced paper and the paper whose reference list is being re-ordered) retrieved. Various notions have been used in the paper like stemming, skipping and similarity calculation using Jaccard Coefficient. |
Other Details |
|
Paper ID: IJSRDV2I4032 Published in: Volume : 2, Issue : 4 Publication Date: 01/07/2014 Page(s): 53-55 |
Article Preview |
|
|
|
|
