The document proposes a novel approach for document and feature reduction in text categorization using prototypes and rough sets. It introduces a prototype-based algorithm to reduce documents while preserving classification accuracy. A rough set-based method is also presented to select a subset of relevant features. The methods are evaluated on benchmark datasets and are shown to improve both classification performance and computational efficiency compared to baseline methods.
1. A novel approach based on prototypes and rough sets for document and feature reductions in text categorization Shing-Hua Ho and Jung-Hsien Chiang Reporter :CHE-MIN LIAO 2007/8/27
2.
3.
4.
5.
6.
7.
8.
9.
10. Document reduction based on prototype concept Step 05: Determine the index of the closest prototype to each document dv as Iv =arg min( Dvt ) Step 06: If Iv = z , ∀ dv ∈ Gz Then go to Step 11 End If Step 07: If s ( PIv )≠ s ( Pz ) ∀ cv ∈ Gz Then Set Z = Z +1 and split Gz into two subgroups Ga and Gb Update their means: Pa =mean( Ga ) and Pb =mean( Gb ) If s ( Pa )= s ( Pb ) Then go to Step 04 End If End If
18. Feature reduction based on rough sets The rough-based feature selection algorithm achieves exclusive clusters and required to determine the desired number of clusters.Theoretically,the suitable maximum number of clusters is estimated as ,where N is the size of the features