SlideShare une entreprise Scribd logo
1  sur  3
Télécharger pour lire hors ligne
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 03 | May-2014 | NCRIET-2014, Available @ http://www.ijret.org 321
CLASSIFICATION OF TEXT DATA USING FEATURE CLUSTERING
ALGORITHM
Avinash Guru1
, Asma Parveen2
1
MTech 4th
sem,Department of Computer Science and Engineering,KBN College of EngineeringGulbarga,Karnataka,
India
2
HOD, Department of Computer Science and Engineering, KBN College of Engineering Gulbarga, Karnataka, India
Abstract
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. Generally clustering
means the collection of similar objects or data in groups. In this paper, we propose a feature clustering algorithm for classifying the
text data. The document set contains number of words; these words are grouped into clusters based on the similarity. Words that are
similar to each other are grouped into the same cluster, and the words that are not similar are grouped in another cluster. Each
cluster is characterized by a membership function with statistical mean and deviation. When all the words are fed in the document
then the clusters are formed automatically. Then the extracted feature starts functioning as it is based on the weighted combination of
the words. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the
training data. Earlier, the user has to specify the extracted feature in advance but now it is not required as the clusters are formed
automatically and the trial and error method can be avoided. The experimental results show that our method can run faster and
obtain better extracted features than other methods.
Keywords:Feature clustering, feature extraction, feature reduction, text classification.
-----------------------------------------------------------------------***-----------------------------------------------------------------------
1. INTRODUCTION
In text classification, generally the dimensionality of feature
vector is huge, and it is difficult to classify the large
dimensional data. Hence to reduce this difficulty the feature
reduction approaches is applied. There are two major
approaches used in this feature reduction. They are; feature
selection and feature extraction. This dissertation contributes
to the subject area of Data Clustering, and also to the
application of Clustering to Image Analysis. Data clustering
acts as an intelligent tool, a method that allows the user to
handle large volumes of data effectively. The basic function of
clustering is to transform data of any origin into a more
compact form, one that represents accurately the original data.
The compact representation should allow the user to deal with
and utilize more effectively the original volume of data. The
accuracy of the clustering is vital because it would be counter-
productive if the compact form of the data does not accurately
represent the original data. One of our main contributions is
addressing the accuracy of an established fuzzy clustering
algorithm.
Typically, a set of numeric observations, or features, are
collected of each object.The collected feature-sets are
aggregated into a list which then acts as the input to achosen
computational clustering algorithm. This algorithm then
provides a descriptionof the grouping structure which it has
discovered within the objects.
1.1 Fundamental Concepts of Clustering
Generally clustering means the combination of similar objects
or data in a group. Based on the similarity test we classify the
data into different clusters. The words that are similar are
grouped in one cluster and the words which are different are
grouped in another cluster. The computing revolutionof the
sixties and seventies gave momentum to this new field
because, for the first time,Computers enabled the processing
of large amounts of data and took the burden of thevery large
amounts of computation generally involvedif translated to
modern formalisms, Linnaeus’s quotation is very relevant to
theclustering problem. Linnaeus uses the term natural
distinction; this is the much sought after goal of clustering
finding an “intrinsic classification” or an “inherent
structure”in data. The better we are at finding an inherent
structure in data,the more knowledge we possess about it. As
the bigger the volume of data is more numerous objects, the
more necessary it is todevelop better clustering methods.
1.2 Contributions
 We studied and investigated the FCMalgorithm (Fuzzy c-
Means Clustering Algorithm) thoroughly and identified
its main strengths and weaknesses.
 We developed a systematic method for analyzing FCM’s
classification accuracy when it is used to cluster data sets
that contain clusters of very different sizes and
populations.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 03 | May-2014 | NCRIET-2014, Available @ http://www.ijret.org 322
 We proposed a new algorithm, based on FCM, which
performs far more accurately than FCM on data sets like
those described above. We also investigated performance
properties of our new algorithm.
 The feature clustering algorithm is used to reduce the
dimensionality of the features in text classification.
 By applying this algorithm, the derived membership
function matches closely and provides the exact results.
2. EXISTING SYSTEM
In the existing system we have the Bottleneck approach. These
approaches provide the divisive information-theoretic feature
clustering, In this system we have some set of original words
present in the document. Each time when we want to form a
new cluster we have to compare the words with the original
words. Hence when the words matches then only the cluster is
formed otherwise no cluster. Hence this system works on the
concept of trial and error method; this is one of the major
disadvantages of the existing system.
3. PROPOSED SYSTEM
We propose a feature clustering algorithm, which is mainly
used to reduce the number of features in the text classification.
The words in the feature vector of a document set are
represented as distributions, and processed one after another.
Words that are similar to each other are grouped into the same
cluster. Each cluster is characterized by a membership
function with statistical mean and deviation. If a word is not
similar to any existing cluster, a new cluster is created for this
word.
4. SYSTEM ARCHITECTURE
Fig: Architecture Diagram
4.1 Preprocessing
In this module we construct the word pattern of training
document set. Read the document set and remove the stop
words and perform stemming process. Get the feature vector
from the training document .Next we construct the word
pattern.
4.2 Self-Constructing Clustering
In this module, we use the self-constructing clustering
algorithm. First we read each word pattern, then we compare
the similarity based on the original words. If the word matches
with given set of words then the word is grouped in the
existing cluster and if the word does not match then it is
grouped in a new cluster.
4.3 Feature Extraction
Feature extraction module begins; here we compute the cluster
in three different ways: hard weight, soft weight, mixed
weight, In the hard weight clustering the data is divided into
crisps, where the data indicates exactly one cluster. Degree of
membership is either 0 or 1 and this hard clustering method
leads to local optimum In the soft-weighting approach, each
word is allowed to contribute to all new extracted features,
with the degrees depending on the values of the membership
functions. The mixed-weighting approach is a combination of
the hard-weighting approach and the soft-weighting approach.
4.4 Text Classification
Given a set D of training documents, text classification can be
done as follows: Get the training document set and specify the
similarity threshold ρ. Assume that k clusters are obtained for
the words in the feature vector W. Then find the weighting
matrix T and convert D to D`. Using weka we classify the text.
Weka is a collection of machine learning algorithms for data
mining tasks.
5. CONCLUSIONS
In this work, we have presented a feature clustering algorithm.
By using this algorithm each cluster is used as an extracted
feature and this reduced the dimensionality of data.
REFERENCES
[1].J. Yan, B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q.
Yang, W. Xi,and Z. Chen, “Effective and Efficient
Dimensionality Reductionfor Large-Scale and Streaming Data
Preprocessing,” IEEETrans.Knowledge and Data Eng., vol.
18, no. 3, pp. 320-333, Mar. 2006
[2].G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-
LabelData,” Data Mining and Knowledge Discovery
Handbook, O. Maimonand L. Rokach eds., second . Springer,
2009
[3]. H. Kim, P. Howland, and H. Park, “Dimension Reduction
in Text Classification with Support Vector Machines,” J.
Machine Learning Research, vol. 6, pp. 37-53, 2005.
[4]. F. Sebastiani, “Machine Learning in Automated Text
Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp.
1-47, 2002.
[5]. B.Y. Ricardo and R.N. Berthier, Modern Information
Retrieval. Addison Wesley Longman, 1999.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 03 | May-2014 | NCRIET-2014, Available @ http://www.ijret.org 323
[6]. E.F. Combarro, E. Montan˜ e´s, I. Dı´az, J. Ranilla, and R.
Mones, “Introducing a Family of Linear Measures for Feature
Selection in Text Categorization,” IEEE Trans. Knowledge
and Data Eng., vol. 17, no. 9, pp. 1223-1232, Sept. 2005.
[7]. K. Daphne and M. Sahami, “Toward Optimal Feature
Selection,” Proc. 13th Int’l Conf. Machine Learning, pp. 284-
292, 1996.
[8]. R. Kohavi and G. John, “Wrappers for Feature Subset
Selection,” Aritficial Intelligence, vol. 97, no. 1-2, pp. 273-
324, 1997
[9]. I.S. Dhillon, S. Mallela, and R. Kumar, “A Divisive
Infomation-Theoretic Feature Clustering Algorithm for Text
Classification,”J. Machine Learning Research, vol. 3, pp.
1265-1287, 2003.
[10]. D. Ienco and R. Meo, “Exploration and Reduction of the
FeatureSpace by Hierarchical Clustering,” Proc. SIAM Conf.
Data Mining,pp. 577-587, 2008.

Contenu connexe

Tendances

Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006IJARTES
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...Happiest Minds Technologies
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
 
Column store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setColumn store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setijma
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...IJORCS
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478IJRAT
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataIOSR Journals
 
A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringIRJET Journal
 

Tendances (19)

Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
 
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
Column store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute setColumn store decision tree classification of unseen attribute set
Column store decision tree classification of unseen attribute set
 
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...A Combined Approach for Feature Subset Selection and Size Reduction for High ...
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm
 
A4 elanjceziyan
A4 elanjceziyanA4 elanjceziyan
A4 elanjceziyan
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
 
I6 mala3 sowmya
I6 mala3 sowmyaI6 mala3 sowmya
I6 mala3 sowmya
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
Enhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online DataEnhanced Clustering Algorithm for Processing Online Data
Enhanced Clustering Algorithm for Processing Online Data
 
A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed Clustering
 
A new link based approach for categorical data clustering
A new link based approach for categorical data clusteringA new link based approach for categorical data clustering
A new link based approach for categorical data clustering
 

En vedette

Elevating forensic investigation system for file clustering
Elevating forensic investigation system for file clusteringElevating forensic investigation system for file clustering
Elevating forensic investigation system for file clusteringeSAT Publishing House
 
Compressive strength variability of brown coal fly ash geopolymer concrete
Compressive strength variability of brown coal fly ash geopolymer concreteCompressive strength variability of brown coal fly ash geopolymer concrete
Compressive strength variability of brown coal fly ash geopolymer concreteeSAT Publishing House
 
Lab view study of electrical power distribution system
Lab view study of electrical power distribution systemLab view study of electrical power distribution system
Lab view study of electrical power distribution systemeSAT Publishing House
 
Economical placement of shear walls in a moment resisting frame for earthquak...
Economical placement of shear walls in a moment resisting frame for earthquak...Economical placement of shear walls in a moment resisting frame for earthquak...
Economical placement of shear walls in a moment resisting frame for earthquak...eSAT Publishing House
 
Effect of fly ash on the rheological and filtration
Effect of fly ash on the rheological and filtrationEffect of fly ash on the rheological and filtration
Effect of fly ash on the rheological and filtrationeSAT Publishing House
 
A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...eSAT Publishing House
 
A comparative flow analysis of naca 6409 and naca 4412 aerofoil
A comparative flow analysis of naca 6409 and naca 4412 aerofoilA comparative flow analysis of naca 6409 and naca 4412 aerofoil
A comparative flow analysis of naca 6409 and naca 4412 aerofoileSAT Publishing House
 
Assessment of the leachability and mechanical stability of mud from a zinc pl...
Assessment of the leachability and mechanical stability of mud from a zinc pl...Assessment of the leachability and mechanical stability of mud from a zinc pl...
Assessment of the leachability and mechanical stability of mud from a zinc pl...eSAT Publishing House
 
Comparative study of one and two diode model of solar photovoltaic cell
Comparative study of one and two diode model of solar photovoltaic cellComparative study of one and two diode model of solar photovoltaic cell
Comparative study of one and two diode model of solar photovoltaic celleSAT Publishing House
 
Usability guidelines for usable user interface
Usability guidelines for usable user interfaceUsability guidelines for usable user interface
Usability guidelines for usable user interfaceeSAT Publishing House
 
Study of surface roughness for discontinuous
Study of surface roughness for discontinuousStudy of surface roughness for discontinuous
Study of surface roughness for discontinuouseSAT Publishing House
 
Performance and emission characteristics of al2 o3
Performance and emission characteristics of al2 o3Performance and emission characteristics of al2 o3
Performance and emission characteristics of al2 o3eSAT Publishing House
 
Conceptual design of laser assisted fixture for bending operation
Conceptual design of laser assisted fixture for bending operationConceptual design of laser assisted fixture for bending operation
Conceptual design of laser assisted fixture for bending operationeSAT Publishing House
 
Power system stability improvement under three
Power system stability improvement under threePower system stability improvement under three
Power system stability improvement under threeeSAT Publishing House
 
An iterative unsymmetrical trimmed midpoint median filter for removal of high...
An iterative unsymmetrical trimmed midpoint median filter for removal of high...An iterative unsymmetrical trimmed midpoint median filter for removal of high...
An iterative unsymmetrical trimmed midpoint median filter for removal of high...eSAT Publishing House
 
A novel scheme for reliable multipath routing
A novel scheme for reliable multipath routingA novel scheme for reliable multipath routing
A novel scheme for reliable multipath routingeSAT Publishing House
 
Application of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mixApplication of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mixeSAT Publishing House
 
A review of pre combustion co2 capture in igcc
A review of pre combustion co2 capture in igccA review of pre combustion co2 capture in igcc
A review of pre combustion co2 capture in igcceSAT Publishing House
 
A survey on optimal route queries for road networks
A survey on optimal route queries for road networksA survey on optimal route queries for road networks
A survey on optimal route queries for road networkseSAT Publishing House
 
Multi objective genetic algorithm for regression
Multi objective genetic algorithm for regressionMulti objective genetic algorithm for regression
Multi objective genetic algorithm for regressioneSAT Publishing House
 

En vedette (20)

Elevating forensic investigation system for file clustering
Elevating forensic investigation system for file clusteringElevating forensic investigation system for file clustering
Elevating forensic investigation system for file clustering
 
Compressive strength variability of brown coal fly ash geopolymer concrete
Compressive strength variability of brown coal fly ash geopolymer concreteCompressive strength variability of brown coal fly ash geopolymer concrete
Compressive strength variability of brown coal fly ash geopolymer concrete
 
Lab view study of electrical power distribution system
Lab view study of electrical power distribution systemLab view study of electrical power distribution system
Lab view study of electrical power distribution system
 
Economical placement of shear walls in a moment resisting frame for earthquak...
Economical placement of shear walls in a moment resisting frame for earthquak...Economical placement of shear walls in a moment resisting frame for earthquak...
Economical placement of shear walls in a moment resisting frame for earthquak...
 
Effect of fly ash on the rheological and filtration
Effect of fly ash on the rheological and filtrationEffect of fly ash on the rheological and filtration
Effect of fly ash on the rheological and filtration
 
A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...
 
A comparative flow analysis of naca 6409 and naca 4412 aerofoil
A comparative flow analysis of naca 6409 and naca 4412 aerofoilA comparative flow analysis of naca 6409 and naca 4412 aerofoil
A comparative flow analysis of naca 6409 and naca 4412 aerofoil
 
Assessment of the leachability and mechanical stability of mud from a zinc pl...
Assessment of the leachability and mechanical stability of mud from a zinc pl...Assessment of the leachability and mechanical stability of mud from a zinc pl...
Assessment of the leachability and mechanical stability of mud from a zinc pl...
 
Comparative study of one and two diode model of solar photovoltaic cell
Comparative study of one and two diode model of solar photovoltaic cellComparative study of one and two diode model of solar photovoltaic cell
Comparative study of one and two diode model of solar photovoltaic cell
 
Usability guidelines for usable user interface
Usability guidelines for usable user interfaceUsability guidelines for usable user interface
Usability guidelines for usable user interface
 
Study of surface roughness for discontinuous
Study of surface roughness for discontinuousStudy of surface roughness for discontinuous
Study of surface roughness for discontinuous
 
Performance and emission characteristics of al2 o3
Performance and emission characteristics of al2 o3Performance and emission characteristics of al2 o3
Performance and emission characteristics of al2 o3
 
Conceptual design of laser assisted fixture for bending operation
Conceptual design of laser assisted fixture for bending operationConceptual design of laser assisted fixture for bending operation
Conceptual design of laser assisted fixture for bending operation
 
Power system stability improvement under three
Power system stability improvement under threePower system stability improvement under three
Power system stability improvement under three
 
An iterative unsymmetrical trimmed midpoint median filter for removal of high...
An iterative unsymmetrical trimmed midpoint median filter for removal of high...An iterative unsymmetrical trimmed midpoint median filter for removal of high...
An iterative unsymmetrical trimmed midpoint median filter for removal of high...
 
A novel scheme for reliable multipath routing
A novel scheme for reliable multipath routingA novel scheme for reliable multipath routing
A novel scheme for reliable multipath routing
 
Application of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mixApplication of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mix
 
A review of pre combustion co2 capture in igcc
A review of pre combustion co2 capture in igccA review of pre combustion co2 capture in igcc
A review of pre combustion co2 capture in igcc
 
A survey on optimal route queries for road networks
A survey on optimal route queries for road networksA survey on optimal route queries for road networks
A survey on optimal route queries for road networks
 
Multi objective genetic algorithm for regression
Multi objective genetic algorithm for regressionMulti objective genetic algorithm for regression
Multi objective genetic algorithm for regression
 

Similaire à Classification of text data using feature clustering algorithm

Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
 
Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach IJCSIS Research Publications
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusabilityAlexander Decker
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
 
Survey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text MiningSurvey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text Miningvivatechijri
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET Journal
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
 
Data clustering and optimization techniques
Data clustering and optimization techniquesData clustering and optimization techniques
Data clustering and optimization techniquesSpyros Ktenas
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
A survey on clustering techniques for identification of
A survey on clustering techniques for identification ofA survey on clustering techniques for identification of
A survey on clustering techniques for identification ofeSAT Publishing House
 

Similaire à Classification of text data using feature clustering algorithm (20)

Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
M43016571
M43016571M43016571
M43016571
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusability
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
H04564550
H04564550H04564550
H04564550
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
Survey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text MiningSurvey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text Mining
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...
 
313 318
313 318313 318
313 318
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document Clustering
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Data clustering and optimization techniques
Data clustering and optimization techniquesData clustering and optimization techniques
Data clustering and optimization techniques
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
A survey on clustering techniques for identification of
A survey on clustering techniques for identification ofA survey on clustering techniques for identification of
A survey on clustering techniques for identification of
 

Plus de eSAT Publishing House

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnameSAT Publishing House
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...eSAT Publishing House
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnameSAT Publishing House
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...eSAT Publishing House
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaeSAT Publishing House
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingeSAT Publishing House
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...eSAT Publishing House
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...eSAT Publishing House
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...eSAT Publishing House
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a revieweSAT Publishing House
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...eSAT Publishing House
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard managementeSAT Publishing House
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallseSAT Publishing House
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...eSAT Publishing House
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...eSAT Publishing House
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaeSAT Publishing House
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structureseSAT Publishing House
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingseSAT Publishing House
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...eSAT Publishing House
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...eSAT Publishing House
 

Plus de eSAT Publishing House (20)

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnam
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnam
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, india
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity building
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a review
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard management
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear walls
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of india
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structures
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildings
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
 

Dernier

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 

Dernier (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 

Classification of text data using feature clustering algorithm

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 03 | May-2014 | NCRIET-2014, Available @ http://www.ijret.org 321 CLASSIFICATION OF TEXT DATA USING FEATURE CLUSTERING ALGORITHM Avinash Guru1 , Asma Parveen2 1 MTech 4th sem,Department of Computer Science and Engineering,KBN College of EngineeringGulbarga,Karnataka, India 2 HOD, Department of Computer Science and Engineering, KBN College of Engineering Gulbarga, Karnataka, India Abstract Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. Generally clustering means the collection of similar objects or data in groups. In this paper, we propose a feature clustering algorithm for classifying the text data. The document set contains number of words; these words are grouped into clusters based on the similarity. Words that are similar to each other are grouped into the same cluster, and the words that are not similar are grouped in another cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words are fed in the document then the clusters are formed automatically. Then the extracted feature starts functioning as it is based on the weighted combination of the words. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Earlier, the user has to specify the extracted feature in advance but now it is not required as the clusters are formed automatically and the trial and error method can be avoided. The experimental results show that our method can run faster and obtain better extracted features than other methods. Keywords:Feature clustering, feature extraction, feature reduction, text classification. -----------------------------------------------------------------------***----------------------------------------------------------------------- 1. INTRODUCTION In text classification, generally the dimensionality of feature vector is huge, and it is difficult to classify the large dimensional data. Hence to reduce this difficulty the feature reduction approaches is applied. There are two major approaches used in this feature reduction. They are; feature selection and feature extraction. This dissertation contributes to the subject area of Data Clustering, and also to the application of Clustering to Image Analysis. Data clustering acts as an intelligent tool, a method that allows the user to handle large volumes of data effectively. The basic function of clustering is to transform data of any origin into a more compact form, one that represents accurately the original data. The compact representation should allow the user to deal with and utilize more effectively the original volume of data. The accuracy of the clustering is vital because it would be counter- productive if the compact form of the data does not accurately represent the original data. One of our main contributions is addressing the accuracy of an established fuzzy clustering algorithm. Typically, a set of numeric observations, or features, are collected of each object.The collected feature-sets are aggregated into a list which then acts as the input to achosen computational clustering algorithm. This algorithm then provides a descriptionof the grouping structure which it has discovered within the objects. 1.1 Fundamental Concepts of Clustering Generally clustering means the combination of similar objects or data in a group. Based on the similarity test we classify the data into different clusters. The words that are similar are grouped in one cluster and the words which are different are grouped in another cluster. The computing revolutionof the sixties and seventies gave momentum to this new field because, for the first time,Computers enabled the processing of large amounts of data and took the burden of thevery large amounts of computation generally involvedif translated to modern formalisms, Linnaeus’s quotation is very relevant to theclustering problem. Linnaeus uses the term natural distinction; this is the much sought after goal of clustering finding an “intrinsic classification” or an “inherent structure”in data. The better we are at finding an inherent structure in data,the more knowledge we possess about it. As the bigger the volume of data is more numerous objects, the more necessary it is todevelop better clustering methods. 1.2 Contributions  We studied and investigated the FCMalgorithm (Fuzzy c- Means Clustering Algorithm) thoroughly and identified its main strengths and weaknesses.  We developed a systematic method for analyzing FCM’s classification accuracy when it is used to cluster data sets that contain clusters of very different sizes and populations.
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 03 | May-2014 | NCRIET-2014, Available @ http://www.ijret.org 322  We proposed a new algorithm, based on FCM, which performs far more accurately than FCM on data sets like those described above. We also investigated performance properties of our new algorithm.  The feature clustering algorithm is used to reduce the dimensionality of the features in text classification.  By applying this algorithm, the derived membership function matches closely and provides the exact results. 2. EXISTING SYSTEM In the existing system we have the Bottleneck approach. These approaches provide the divisive information-theoretic feature clustering, In this system we have some set of original words present in the document. Each time when we want to form a new cluster we have to compare the words with the original words. Hence when the words matches then only the cluster is formed otherwise no cluster. Hence this system works on the concept of trial and error method; this is one of the major disadvantages of the existing system. 3. PROPOSED SYSTEM We propose a feature clustering algorithm, which is mainly used to reduce the number of features in the text classification. The words in the feature vector of a document set are represented as distributions, and processed one after another. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. If a word is not similar to any existing cluster, a new cluster is created for this word. 4. SYSTEM ARCHITECTURE Fig: Architecture Diagram 4.1 Preprocessing In this module we construct the word pattern of training document set. Read the document set and remove the stop words and perform stemming process. Get the feature vector from the training document .Next we construct the word pattern. 4.2 Self-Constructing Clustering In this module, we use the self-constructing clustering algorithm. First we read each word pattern, then we compare the similarity based on the original words. If the word matches with given set of words then the word is grouped in the existing cluster and if the word does not match then it is grouped in a new cluster. 4.3 Feature Extraction Feature extraction module begins; here we compute the cluster in three different ways: hard weight, soft weight, mixed weight, In the hard weight clustering the data is divided into crisps, where the data indicates exactly one cluster. Degree of membership is either 0 or 1 and this hard clustering method leads to local optimum In the soft-weighting approach, each word is allowed to contribute to all new extracted features, with the degrees depending on the values of the membership functions. The mixed-weighting approach is a combination of the hard-weighting approach and the soft-weighting approach. 4.4 Text Classification Given a set D of training documents, text classification can be done as follows: Get the training document set and specify the similarity threshold ρ. Assume that k clusters are obtained for the words in the feature vector W. Then find the weighting matrix T and convert D to D`. Using weka we classify the text. Weka is a collection of machine learning algorithms for data mining tasks. 5. CONCLUSIONS In this work, we have presented a feature clustering algorithm. By using this algorithm each cluster is used as an extracted feature and this reduced the dimensionality of data. REFERENCES [1].J. Yan, B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q. Yang, W. Xi,and Z. Chen, “Effective and Efficient Dimensionality Reductionfor Large-Scale and Streaming Data Preprocessing,” IEEETrans.Knowledge and Data Eng., vol. 18, no. 3, pp. 320-333, Mar. 2006 [2].G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi- LabelData,” Data Mining and Knowledge Discovery Handbook, O. Maimonand L. Rokach eds., second . Springer, 2009 [3]. H. Kim, P. Howland, and H. Park, “Dimension Reduction in Text Classification with Support Vector Machines,” J. Machine Learning Research, vol. 6, pp. 37-53, 2005. [4]. F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002. [5]. B.Y. Ricardo and R.N. Berthier, Modern Information Retrieval. Addison Wesley Longman, 1999.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 03 | May-2014 | NCRIET-2014, Available @ http://www.ijret.org 323 [6]. E.F. Combarro, E. Montan˜ e´s, I. Dı´az, J. Ranilla, and R. Mones, “Introducing a Family of Linear Measures for Feature Selection in Text Categorization,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1223-1232, Sept. 2005. [7]. K. Daphne and M. Sahami, “Toward Optimal Feature Selection,” Proc. 13th Int’l Conf. Machine Learning, pp. 284- 292, 1996. [8]. R. Kohavi and G. John, “Wrappers for Feature Subset Selection,” Aritficial Intelligence, vol. 97, no. 1-2, pp. 273- 324, 1997 [9]. I.S. Dhillon, S. Mallela, and R. Kumar, “A Divisive Infomation-Theoretic Feature Clustering Algorithm for Text Classification,”J. Machine Learning Research, vol. 3, pp. 1265-1287, 2003. [10]. D. Ienco and R. Meo, “Exploration and Reduction of the FeatureSpace by Hierarchical Clustering,” Proc. SIAM Conf. Data Mining,pp. 577-587, 2008.