High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

A Review on Novel Approach for Text Compression

Author(s):

Pallavi P. Surwade , MET BHUJBAL COLLEGE,NASHIK; V. B. More, MET BHUJBAL COLLEGE,NASHIK

Keywords:

NCD, Text Compression

Abstract

Generally, textual data sets are represented by using different models. But, sometimes it does not capture the text structure, or some models that preserves text structure. Vector space model is also known as the ‘bag of word model’. To represent textual document using vector space model is based on most text mining methods. This model cannot maintain the text structure as it is. Compression distances are the most widely used technique for the purpose of knowledge discovery and mining of data also to improve the performance metric. Compression distance technique is for measuring the similarity between two documents. By applying Distortion technique which is for purpose of destroys the text structure. A distortion technique removes non-relevant words as well as maintains the text structure. Mostly, clustering is based on compression. The NCD i.e. Normalized compression distance technique is works in any application area. By length of compressed data files NCD gets computed. NCD captures the structural similarity between text documents as well as between XML documents. In document retrieval process different documents are stored as different entities. The results show that, by using a compressor that makes the choice of the size of the left-context symbols helps to determine the nature of the data sets. By applying a specific word removal technique clustering accuracy can be improved. This distortion technique consists of removing the most frequent words of the language preserving the previous text structure.

Other Details

Paper ID: IJSRDV4I100274
Published in: Volume : 4, Issue : 10
Publication Date: 01/01/2017
Page(s): 774-777

Article Preview

Download Article