SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
International Journal of Computer Engineering (IJCET), ISSN 0976 – 6367(Print),
 International Journal of Computer Engineering and Technology
and Technology (IJCET), ISSN 0976 1, May - June (2010), © IAEME
 ISSN 0976 – 6375(Online) Volume 1, Number
                                                – 6367(Print)          IJCET
ISSN 0976 – 6375(Online) Volume 1
Number 1, May - June (2010), pp. 158-165                      ©IAEME
© IAEME, http://www.iaeme.com/ijcet.html


 PATENT DATA CLUSTERING: A MEASURING UNIT FOR
                                    INNOVATORS
                                        M.Pratheeban
                                       Research Scholar
                          Anna University of Technology Coimbatore
                           E-mail id: pratheeban_mca@yahoo.co.in

                                    Dr. S. Balasubramanian
                                      Former Director- IPR
                          Anna University of Technology Coimbatore
                         E-Mail id: s_balasubramanian@rediffmail.com

ABSTRACT
        As software applications increase in volume, grouping the application into
smaller, more manageable components is often proposed as a means of assisting software
maintenance activities. One of the thrusting in software development is Patent Data
Clustering. The key challenge of Patent Data Clustering has how they can cluster and to
improve searching the patent data in repositories. In this paper, we propose a new
clustering algorithm that improved clustering facilities for patent data.
INTRODUCTION
        Patent Data Clustering is a method for grouping patent related data. Clustering of
patent data documents (such as Titles, Abstract and Claims) has been used to bring out
the importance of patents for researchers. Clustering analysis is an unsupervised process
that divides a set of objectives into homogeneous groups. It is to measure or perceived
intrinsic characteristics or similarities among patent. Patent Clustering is to speed up
shifting through large sets of patent data for analyzing the patent that helps people to
identify competitive and technology trends. The need for academic researchers to retrieve
patents is increasing. Because applying for patents are now considered on important
research activity [6].




                                            158
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


PATENT INFORMATION
        Patents are an important source of scientific, technical and information. For
anyone planning to apply for a patent, a search is crucial to identify the existence of prior
art, which affects the patentability of an invention. For researchers, patents can be
important as they are often the only published information on specific topics, and can
provide insight into research directions. Patents are also used by marketing and
competitive intelligence professionals, to find out about work being done by others.
PATENT DATABASE
        Information that may be provided in Patent Databases
        Patent data may relate to unexamined and examined patent applications, and
includes:
    •   Titles and abstract in English (if the patent is in another language)
    •   Inventor’s name
    •   Patent assignee
    •   Patent publication data
    •   Images
    •   Full text (sometimes this is available through a separate database, or must be
        ordered)
    •   International Patent Classification (IPC) codes.
        The IPC is used by over 70 patent authorities to classify and index the subject
matter of published patent specifications. It is presumably based on literacy warrant, and
sections range from the very broad to the specific [2].
PATENT ASSESSMENT AND TECHNOLOGY AREA ASSESSMENT
        Currently high quality valuing of patents and patent applications and the
assessment of technology areas with respect to their potential to give rise to patent
application is done mainly manually which is very costly and time consuming. We are
developing techniques that uses statistical and semantic information from patent as well
as user based data for market aspects to prognosticate the patent.




                                                159
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


MINING PATENT
        A Clear and effective IP Strategy critically incorporates a clear and effective
strategy for managing an organization’s patent portfolio [7]. It means the analysis of all
patents that can directly revolutionize business and technology development practice.
Patent mining is a premeditated and core functions for any IP-Centric business to secure
technology development and provides an establishment to help the administrators make
to plan decisions regarding technology development.
        Today patent management applications and robust search engines allow internal
IP managers to quickly pull together organized set of patents from within their own
portfolios those of specific competitors and those specific competitions and those patents
citing relevant technical or industry terms. Companies once only interested in
understanding the patents within their own portfolio are now interested in knowing about
the patents held by competitors [8].
BASICS OF CLUSTERING
        Clustering is a division of data into groups of similar objects. Each group, called
cluster, consists of objects that are similar between themselves and dissimilar to objects
of other groups [1]. It groups a set of data in a way that maximizes the similarity within
clusters and minimizes the similarity between two different clusters. These discovered
clusters can help explain the characteristics of the underlying data distribution and serve
as the foundation for other data mining and analysis techniques [5]. The quality of a
clustering method is also measured by its ability to discover some or all of the hidden
patterns. The quality of a clustering result also depends on both the similarity measure
used by the method and its implementation [3].
CLUSTERING ALGORITHMS
        Most existing clustering algorithms find clusters that fit some static model.
Although effective in some cases, these algorithms can break down that is, cluster the
data incorrectly if the user doesn’t select appropriate static-model parameters. Or
sometimes the model cannot adequately capture the clusters’ characteristics. Most of
these algorithms break down when the data contains clusters of diverse shapes, densities,




                                                160
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


and sizes [5]. Cluster analysis is the organization of a collection of patterns into clusters
based on similarity [4].
LIMITATIONS OF TRADITIONAL CLUSTERING ALGORITHMS
        Partition-based clustering techniques such as K-Means and Clarans attempt to
break a data set into K clusters such that the partition optimizes a given criterion. These
algorithms assume that clusters are hyper-ellipsoidal and of similar sizes. They can’t find
clusters that vary in size, or concave shapes [9]. DBScan (Density-Based Spatial
Clustering of Applications with Noise), a well known spatial clustering algorithm, can
find clusters of arbitrary shapes. DBScan defines a cluster to be a maximum set of
density-connected points, which means that every core point in a cluster must have at
least a minimum number of points (MinPts) within a given radius (Eps) [10].
        DBScan assumes that all points within genuine clusters can be reached from one
another by traversing a path of density connected points and points across different
clusters cannot. DBScan can find arbitrarily shaped clusters if the cluster density can be
determined beforehand and the cluster density is uniform [10]. Hierarchical clustering
algorithms produce a nested sequence of clusters with a single, all-inclusive cluster at the
top and single-point clusters at the bottom.
        Agglomerative hierarchical algorithms start with each data point as a separate
cluster. Each step of the algorithm involves merging two clusters that are the most
similar. After each merger, the total number of clusters decreases by one. Users can
repeat these steps until they obtain the desired number of clusters or the distance between
the two closest clusters goes above a certain threshold. The fact that most hierarchical
algorithms do not revisit once constructed (intermediate) clusters with the purpose of
their improvement [1].
        In Agglomerative Hierarchical Clustering provision can be made for a relocation
of objects that may have been 'incorrectly' grouped at an early stage. The result should be
examined closely to ensure it makes sense. Use of different distance metrics for
measuring distances between clusters may generate different results. Performing multiple
experiments and comparing the results is recommended to support the veracity of the
original results. [11]



                                                161
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


        The many variations of agglomerative hierarchical algorithms primarily differ in
how they update the similarity between existing and merged clusters. In some
hierarchical methods, each cluster is represented by a centroid or medoid a data point that
is the closest to the center of the cluster and the similarity between two clusters is
measured by the similarity between the centroids / medoids. Both of these schemes fail
for data in which points in a given cluster are closer to the center of another cluster than
to the center of their own cluster.
        Rock a recently developed algorithm that operates on a derived similarity graph,
scales the aggregate interconnectivity with respect to a user-specified interconnectivity
model. However, the major limitation of all such schemes is that they assume a static,
user supplied interconnectivity model. Such models are inflexible and can easily lead to
incorrect merging decisions when the model under or overestimates the interconnectivity
of the data set. Although some schemes allow the connectivity to vary for different
problem domains, it is still the same for all clusters irrespective of their densities and
shapes [12].
        CURE measures the similarity between two clusters by the similarity of the
closest pair of points belonging to different clusters. Unlike centroid/medoid-based
methods, CURE can find clusters of arbitrary shapes and sizes, as it represents each
cluster via multiple representative points. Shrinking the representative points toward the
centroid allows CURE to avoid some of the problems associated with noise and outliers.
However, these techniques fail to account for special characteristics of individual
clusters. They can make incorrect merging decisions when the underlying data does not
follow the assumed model or when noise is present. In some algorithms, the similarity
between two clusters is captured by the aggregate of the similarities among pairs of items
belonging to different clusters [13].
        Existing algorithms use a static model of the clusters and do not use information
about the nature of individual clusters as they are merged. Furthermore, one set of
schemes ignores the information about the aggregate interconnectivity of items in two
clusters. The other set of schemes ignores information about the closeness of two clusters
as defined by the similarity of the closest items across two clusters. By only considering



                                                162
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


either interconnectivity or closeness, these algorithms can easily select and merge the
wrong pair of clusters
USAGE OF ALGORITHMS:
        The most standard approach for document classification in recent years in
applying machine learning, such as support vector machine or Naïve Bayes. However this
approach is not easy to apply to the patent mining Task, because the number of classes is
large and it occurs in a high calculation cast [6]. So we propose a new algorithm rather
than machine learning algorithms.
OUR APPROACH
        We propose a new dynamic algorithm it satisfies for both interlink and nearness
in identifying the most similar pair of clusters. Thus, it does not depend on a static, user-
supplied model and can automatically adapt to the internal characteristics of the merged
clusters. In above algorithm we replaced Chameleon with suitable k-mediods which may
give better result in interlink compared to interlink using k-means.                 From various
comparisons we came know that the average time taken by K-Means algorithm is greater
than the time taken by K-Medoids algorithm for same set of data and also K-Means
algorithm is efficient for smaller data sets and K-Medoids algorithm seems to perform
better for large data sets [14].
For Inter links of patent,
    1. Randomly choose k objects from the data set to be the cluster medoids at the
        initial state. Collect the patent data related to particular field or all fields

    2. For each pair of non-selected object h and selected object i, calculate the total
        swapping cost Tih.

    3. For each pair of i and h, If Tih < 0, i is replaced by h Then assign each non-
        selected object to the most similar representative object.

    4. Repeat steps 2 and 3 until no change happens




                                                163
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


        Absolute nearness of two clusters is normalized by the internal nearness of the
clusters. During the calculation of nearness, the algorithm use to find the genuine clusters
by repeatedly combining these sub clusters.
CONCLUSION
        The methodology of dynamic modeling of clusters in agglomerative hierarchical
methods is applicable to all types of data as long as a similarity is available. Even though
we chose to model the data using k-mediods in this paper, it is entirely possible to use
other algorithms suitable for patent mining domains. Our future research work includes
the practical implementation of this algorithm for better results in patent mining.
REFERENCE
      [1] Pavel Berkhin, “Survey of Clustering Data Mining Techniques”, Accrue
            Software,     Inc http://www.ee.ucr.edu/~barth/EE242/clustering_survey.pdf.
      [2]    http://www.wipo.int/classifications/ipc/en/
      [3] Dr. Osmar R. Zaïane, “Principles of Knowledge Discovery in Databases”,
            University of Alberta, CMPUT690
      [4]    Cheng- Fa Tsai, Han-Chang Wu, Chun-Wei Tsai, ”A New Data Clustering
            Approach for Data Mining in Large Database”, International Symposium on
            Parallel Architectures, Algorithms and Networks (ISPAN,02).
      [5]     George Karypis, Eui-Hong (Sam) Han, Vipin Kumar, “Chameleon:
            Hierarchical      Clustering      Using     Dynamic        Modeling”.       http://www-
            leibniz.imag.fr/Apprentissage/Depot/Selection/ karypis99.pdf
      [6] Hidetsugu Nanba, “Hiroshima City University at NTC1R-7 Patent Mining
            Task”, Proceedings of NTCIR-7 Workshop Meeting, December 16–19, 2008,
            Tokyo, Japan
      [7] Bob Stembridge, Breda Corish, “Patent data mining and effective patent
            portfolio management”, Intellectual Asset Management, October/November
            2004
      [8] Edward Khan,”Patent mining in a changing world of technology and product
            development”, Intellectual Asset Management, July/August 2003




                                                164
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


       [9] Raymond T.Ng, Jiawei Han “Efficient and Effective Clustering Methods for
            Spatial Data Mining”, Proceedings of the 20th VLDB Conference, Santiago,
            Chile 1994.
      [10] Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu, “A Density-Based
            Algorithm for Discovering Clusters in Large Spatial Databases with Noise”,
            Proceedings of 2nd International Conference on Knowledge Discovery and
            Data Mining (KDD-96)
      [11]http://www.improvedoutcomes.Com/docs/WebSiteDocs/Clustering/Agglomerat
            ive_ Hierarchical_ Clustering_Overview.htm
      [12] S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm
            for Categorical Attributes,” Proc. 15th Int’l Conf. Data Eng., IEEE CS Press,
            Los Alamitos, Calif., 1999, pp. 512-521.
      [13] S. Guha, R. Rastogi, and K. Shim, “CURE: An Efficient Clustering Algorithm
            for Large Databases,” Proc. ACM SIGMOD Int’l Conf. Management of Data,
            ACM Press, New York, 1998, pp. 73-84.
      [14] T. Velmurugan and T. Santhanam,” Computational Complexity between K-
            Means and K-Medoids Clustering Algorithms for Normal and Uniform
            Distributions of Data Points”, Journal of Computer Science 6 (3): 363-368,
            2010 ISSN 1549-3636, 2010 Science Publications




                                                165

Contenu connexe

Tendances

Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Treparel
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek outiaemedu
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - CopyAMIT KUMAR
 
IRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET Journal
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Grid resource discovery a survey and comparative analysis 2
Grid resource discovery a survey and comparative analysis 2Grid resource discovery a survey and comparative analysis 2
Grid resource discovery a survey and comparative analysis 2IAEME Publication
 
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET Journal
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory basedijaia
 

Tendances (17)

Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - Copy
 
final seminar
final seminarfinal seminar
final seminar
 
IRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and Techniques
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Grid resource discovery a survey and comparative analysis 2
Grid resource discovery a survey and comparative analysis 2Grid resource discovery a survey and comparative analysis 2
Grid resource discovery a survey and comparative analysis 2
 
K355662
K355662K355662
K355662
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory based
 

En vedette

Three phase parameter data logging and fault detection using gsm technology
Three phase parameter data logging and fault detection using gsm technologyThree phase parameter data logging and fault detection using gsm technology
Three phase parameter data logging and fault detection using gsm technologyiaemedu
 
Modeling and predicting the monthly rainfall in tamilnadu
Modeling and predicting the monthly rainfall in tamilnaduModeling and predicting the monthly rainfall in tamilnadu
Modeling and predicting the monthly rainfall in tamilnaduiaemedu
 
Wind and solar integrated to smart grid using islanding operation
Wind and solar integrated to smart grid using islanding operationWind and solar integrated to smart grid using islanding operation
Wind and solar integrated to smart grid using islanding operationiaemedu
 
Sesion 1.comunicación y medios. primera parte
Sesion 1.comunicación y medios. primera parteSesion 1.comunicación y medios. primera parte
Sesion 1.comunicación y medios. primera partesharon sequeira
 
Bmd5 opdracht 2
Bmd5 opdracht 2Bmd5 opdracht 2
Bmd5 opdracht 2cvk1986
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridiaemedu
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
Extraction of qrs complexes using automated bayesian regularization neural ne...
Extraction of qrs complexes using automated bayesian regularization neural ne...Extraction of qrs complexes using automated bayesian regularization neural ne...
Extraction of qrs complexes using automated bayesian regularization neural ne...iaemedu
 

En vedette (9)

Three phase parameter data logging and fault detection using gsm technology
Three phase parameter data logging and fault detection using gsm technologyThree phase parameter data logging and fault detection using gsm technology
Three phase parameter data logging and fault detection using gsm technology
 
Modeling and predicting the monthly rainfall in tamilnadu
Modeling and predicting the monthly rainfall in tamilnaduModeling and predicting the monthly rainfall in tamilnadu
Modeling and predicting the monthly rainfall in tamilnadu
 
Wind and solar integrated to smart grid using islanding operation
Wind and solar integrated to smart grid using islanding operationWind and solar integrated to smart grid using islanding operation
Wind and solar integrated to smart grid using islanding operation
 
Sesion 1.comunicación y medios. primera parte
Sesion 1.comunicación y medios. primera parteSesion 1.comunicación y medios. primera parte
Sesion 1.comunicación y medios. primera parte
 
Reference_GB
Reference_GBReference_GB
Reference_GB
 
Bmd5 opdracht 2
Bmd5 opdracht 2Bmd5 opdracht 2
Bmd5 opdracht 2
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using grid
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
Extraction of qrs complexes using automated bayesian regularization neural ne...
Extraction of qrs complexes using automated bayesian regularization neural ne...Extraction of qrs complexes using automated bayesian regularization neural ne...
Extraction of qrs complexes using automated bayesian regularization neural ne...
 

Similaire à Patent data clustering a measuring unit for innovators

Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...IAEME Publication
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...Nicolle Dammann
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving dataiaemedu
 
Coordination issues of multi agent systems in distributed data mining
Coordination issues of multi agent systems in distributed data miningCoordination issues of multi agent systems in distributed data mining
Coordination issues of multi agent systems in distributed data miningIAEME Publication
 
Parametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmParametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmIAEME Publication
 
Performance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkPerformance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkIAEME Publication
 
A machine learning model for predicting innovation effort of firms
A machine learning model for predicting innovation effort of  firmsA machine learning model for predicting innovation effort of  firms
A machine learning model for predicting innovation effort of firmsIJECEIAES
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET Journal
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...Editor IJMTER
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual GuideIRJET Journal
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using dataeSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover systemeSAT Journals
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Efficient Privacy Preserving Clustering Based Multi Keyword Search
Efficient Privacy Preserving Clustering Based Multi Keyword Search        Efficient Privacy Preserving Clustering Based Multi Keyword Search
Efficient Privacy Preserving Clustering Based Multi Keyword Search IRJET Journal
 
Multikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsMultikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsIRJET Journal
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringIRJET Journal
 

Similaire à Patent data clustering a measuring unit for innovators (20)

Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
Coordination issues of multi agent systems in distributed data mining
Coordination issues of multi agent systems in distributed data miningCoordination issues of multi agent systems in distributed data mining
Coordination issues of multi agent systems in distributed data mining
 
Parametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmParametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithm
 
Performance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkPerformance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural network
 
50120130405016 2
50120130405016 250120130405016 2
50120130405016 2
 
A machine learning model for predicting innovation effort of firms
A machine learning model for predicting innovation effort of  firmsA machine learning model for predicting innovation effort of  firms
A machine learning model for predicting innovation effort of firms
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
 
Ijetcas14 409
Ijetcas14 409Ijetcas14 409
Ijetcas14 409
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Efficient Privacy Preserving Clustering Based Multi Keyword Search
Efficient Privacy Preserving Clustering Based Multi Keyword Search        Efficient Privacy Preserving Clustering Based Multi Keyword Search
Efficient Privacy Preserving Clustering Based Multi Keyword Search
 
Multikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsMultikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive Graphs
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
 

Plus de iaemedu

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech iniaemedu
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingiaemedu
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationiaemedu
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingiaemedu
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challengesiaemedu
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanismiaemedu
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationiaemedu
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cmaiaemedu
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presenceiaemedu
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineeringiaemedu
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobilesiaemedu
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacementiaemedu
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approachiaemedu
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentiaemedu
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationiaemedu
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksiaemedu
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyiaemedu
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryiaemedu
 
A comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueA comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueiaemedu
 
A comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’sA comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’siaemedu
 

Plus de iaemedu (20)

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech in
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routing
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow application
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challenges
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanism
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modification
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cma
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presence
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineering
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobiles
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacement
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approach
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environment
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networks
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classify
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imagery
 
A comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueA comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining technique
 
A comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’sA comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’s
 

Patent data clustering a measuring unit for innovators

  • 1. International Journal of Computer Engineering (IJCET), ISSN 0976 – 6367(Print), International Journal of Computer Engineering and Technology and Technology (IJCET), ISSN 0976 1, May - June (2010), © IAEME ISSN 0976 – 6375(Online) Volume 1, Number – 6367(Print) IJCET ISSN 0976 – 6375(Online) Volume 1 Number 1, May - June (2010), pp. 158-165 ©IAEME © IAEME, http://www.iaeme.com/ijcet.html PATENT DATA CLUSTERING: A MEASURING UNIT FOR INNOVATORS M.Pratheeban Research Scholar Anna University of Technology Coimbatore E-mail id: pratheeban_mca@yahoo.co.in Dr. S. Balasubramanian Former Director- IPR Anna University of Technology Coimbatore E-Mail id: s_balasubramanian@rediffmail.com ABSTRACT As software applications increase in volume, grouping the application into smaller, more manageable components is often proposed as a means of assisting software maintenance activities. One of the thrusting in software development is Patent Data Clustering. The key challenge of Patent Data Clustering has how they can cluster and to improve searching the patent data in repositories. In this paper, we propose a new clustering algorithm that improved clustering facilities for patent data. INTRODUCTION Patent Data Clustering is a method for grouping patent related data. Clustering of patent data documents (such as Titles, Abstract and Claims) has been used to bring out the importance of patents for researchers. Clustering analysis is an unsupervised process that divides a set of objectives into homogeneous groups. It is to measure or perceived intrinsic characteristics or similarities among patent. Patent Clustering is to speed up shifting through large sets of patent data for analyzing the patent that helps people to identify competitive and technology trends. The need for academic researchers to retrieve patents is increasing. Because applying for patents are now considered on important research activity [6]. 158
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME PATENT INFORMATION Patents are an important source of scientific, technical and information. For anyone planning to apply for a patent, a search is crucial to identify the existence of prior art, which affects the patentability of an invention. For researchers, patents can be important as they are often the only published information on specific topics, and can provide insight into research directions. Patents are also used by marketing and competitive intelligence professionals, to find out about work being done by others. PATENT DATABASE Information that may be provided in Patent Databases Patent data may relate to unexamined and examined patent applications, and includes: • Titles and abstract in English (if the patent is in another language) • Inventor’s name • Patent assignee • Patent publication data • Images • Full text (sometimes this is available through a separate database, or must be ordered) • International Patent Classification (IPC) codes. The IPC is used by over 70 patent authorities to classify and index the subject matter of published patent specifications. It is presumably based on literacy warrant, and sections range from the very broad to the specific [2]. PATENT ASSESSMENT AND TECHNOLOGY AREA ASSESSMENT Currently high quality valuing of patents and patent applications and the assessment of technology areas with respect to their potential to give rise to patent application is done mainly manually which is very costly and time consuming. We are developing techniques that uses statistical and semantic information from patent as well as user based data for market aspects to prognosticate the patent. 159
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME MINING PATENT A Clear and effective IP Strategy critically incorporates a clear and effective strategy for managing an organization’s patent portfolio [7]. It means the analysis of all patents that can directly revolutionize business and technology development practice. Patent mining is a premeditated and core functions for any IP-Centric business to secure technology development and provides an establishment to help the administrators make to plan decisions regarding technology development. Today patent management applications and robust search engines allow internal IP managers to quickly pull together organized set of patents from within their own portfolios those of specific competitors and those specific competitions and those patents citing relevant technical or industry terms. Companies once only interested in understanding the patents within their own portfolio are now interested in knowing about the patents held by competitors [8]. BASICS OF CLUSTERING Clustering is a division of data into groups of similar objects. Each group, called cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups [1]. It groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. These discovered clusters can help explain the characteristics of the underlying data distribution and serve as the foundation for other data mining and analysis techniques [5]. The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns. The quality of a clustering result also depends on both the similarity measure used by the method and its implementation [3]. CLUSTERING ALGORITHMS Most existing clustering algorithms find clusters that fit some static model. Although effective in some cases, these algorithms can break down that is, cluster the data incorrectly if the user doesn’t select appropriate static-model parameters. Or sometimes the model cannot adequately capture the clusters’ characteristics. Most of these algorithms break down when the data contains clusters of diverse shapes, densities, 160
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME and sizes [5]. Cluster analysis is the organization of a collection of patterns into clusters based on similarity [4]. LIMITATIONS OF TRADITIONAL CLUSTERING ALGORITHMS Partition-based clustering techniques such as K-Means and Clarans attempt to break a data set into K clusters such that the partition optimizes a given criterion. These algorithms assume that clusters are hyper-ellipsoidal and of similar sizes. They can’t find clusters that vary in size, or concave shapes [9]. DBScan (Density-Based Spatial Clustering of Applications with Noise), a well known spatial clustering algorithm, can find clusters of arbitrary shapes. DBScan defines a cluster to be a maximum set of density-connected points, which means that every core point in a cluster must have at least a minimum number of points (MinPts) within a given radius (Eps) [10]. DBScan assumes that all points within genuine clusters can be reached from one another by traversing a path of density connected points and points across different clusters cannot. DBScan can find arbitrarily shaped clusters if the cluster density can be determined beforehand and the cluster density is uniform [10]. Hierarchical clustering algorithms produce a nested sequence of clusters with a single, all-inclusive cluster at the top and single-point clusters at the bottom. Agglomerative hierarchical algorithms start with each data point as a separate cluster. Each step of the algorithm involves merging two clusters that are the most similar. After each merger, the total number of clusters decreases by one. Users can repeat these steps until they obtain the desired number of clusters or the distance between the two closest clusters goes above a certain threshold. The fact that most hierarchical algorithms do not revisit once constructed (intermediate) clusters with the purpose of their improvement [1]. In Agglomerative Hierarchical Clustering provision can be made for a relocation of objects that may have been 'incorrectly' grouped at an early stage. The result should be examined closely to ensure it makes sense. Use of different distance metrics for measuring distances between clusters may generate different results. Performing multiple experiments and comparing the results is recommended to support the veracity of the original results. [11] 161
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME The many variations of agglomerative hierarchical algorithms primarily differ in how they update the similarity between existing and merged clusters. In some hierarchical methods, each cluster is represented by a centroid or medoid a data point that is the closest to the center of the cluster and the similarity between two clusters is measured by the similarity between the centroids / medoids. Both of these schemes fail for data in which points in a given cluster are closer to the center of another cluster than to the center of their own cluster. Rock a recently developed algorithm that operates on a derived similarity graph, scales the aggregate interconnectivity with respect to a user-specified interconnectivity model. However, the major limitation of all such schemes is that they assume a static, user supplied interconnectivity model. Such models are inflexible and can easily lead to incorrect merging decisions when the model under or overestimates the interconnectivity of the data set. Although some schemes allow the connectivity to vary for different problem domains, it is still the same for all clusters irrespective of their densities and shapes [12]. CURE measures the similarity between two clusters by the similarity of the closest pair of points belonging to different clusters. Unlike centroid/medoid-based methods, CURE can find clusters of arbitrary shapes and sizes, as it represents each cluster via multiple representative points. Shrinking the representative points toward the centroid allows CURE to avoid some of the problems associated with noise and outliers. However, these techniques fail to account for special characteristics of individual clusters. They can make incorrect merging decisions when the underlying data does not follow the assumed model or when noise is present. In some algorithms, the similarity between two clusters is captured by the aggregate of the similarities among pairs of items belonging to different clusters [13]. Existing algorithms use a static model of the clusters and do not use information about the nature of individual clusters as they are merged. Furthermore, one set of schemes ignores the information about the aggregate interconnectivity of items in two clusters. The other set of schemes ignores information about the closeness of two clusters as defined by the similarity of the closest items across two clusters. By only considering 162
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME either interconnectivity or closeness, these algorithms can easily select and merge the wrong pair of clusters USAGE OF ALGORITHMS: The most standard approach for document classification in recent years in applying machine learning, such as support vector machine or Naïve Bayes. However this approach is not easy to apply to the patent mining Task, because the number of classes is large and it occurs in a high calculation cast [6]. So we propose a new algorithm rather than machine learning algorithms. OUR APPROACH We propose a new dynamic algorithm it satisfies for both interlink and nearness in identifying the most similar pair of clusters. Thus, it does not depend on a static, user- supplied model and can automatically adapt to the internal characteristics of the merged clusters. In above algorithm we replaced Chameleon with suitable k-mediods which may give better result in interlink compared to interlink using k-means. From various comparisons we came know that the average time taken by K-Means algorithm is greater than the time taken by K-Medoids algorithm for same set of data and also K-Means algorithm is efficient for smaller data sets and K-Medoids algorithm seems to perform better for large data sets [14]. For Inter links of patent, 1. Randomly choose k objects from the data set to be the cluster medoids at the initial state. Collect the patent data related to particular field or all fields 2. For each pair of non-selected object h and selected object i, calculate the total swapping cost Tih. 3. For each pair of i and h, If Tih < 0, i is replaced by h Then assign each non- selected object to the most similar representative object. 4. Repeat steps 2 and 3 until no change happens 163
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME Absolute nearness of two clusters is normalized by the internal nearness of the clusters. During the calculation of nearness, the algorithm use to find the genuine clusters by repeatedly combining these sub clusters. CONCLUSION The methodology of dynamic modeling of clusters in agglomerative hierarchical methods is applicable to all types of data as long as a similarity is available. Even though we chose to model the data using k-mediods in this paper, it is entirely possible to use other algorithms suitable for patent mining domains. Our future research work includes the practical implementation of this algorithm for better results in patent mining. REFERENCE [1] Pavel Berkhin, “Survey of Clustering Data Mining Techniques”, Accrue Software, Inc http://www.ee.ucr.edu/~barth/EE242/clustering_survey.pdf. [2] http://www.wipo.int/classifications/ipc/en/ [3] Dr. Osmar R. Zaïane, “Principles of Knowledge Discovery in Databases”, University of Alberta, CMPUT690 [4] Cheng- Fa Tsai, Han-Chang Wu, Chun-Wei Tsai, ”A New Data Clustering Approach for Data Mining in Large Database”, International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN,02). [5] George Karypis, Eui-Hong (Sam) Han, Vipin Kumar, “Chameleon: Hierarchical Clustering Using Dynamic Modeling”. http://www- leibniz.imag.fr/Apprentissage/Depot/Selection/ karypis99.pdf [6] Hidetsugu Nanba, “Hiroshima City University at NTC1R-7 Patent Mining Task”, Proceedings of NTCIR-7 Workshop Meeting, December 16–19, 2008, Tokyo, Japan [7] Bob Stembridge, Breda Corish, “Patent data mining and effective patent portfolio management”, Intellectual Asset Management, October/November 2004 [8] Edward Khan,”Patent mining in a changing world of technology and product development”, Intellectual Asset Management, July/August 2003 164
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME [9] Raymond T.Ng, Jiawei Han “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proceedings of the 20th VLDB Conference, Santiago, Chile 1994. [10] Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96) [11]http://www.improvedoutcomes.Com/docs/WebSiteDocs/Clustering/Agglomerat ive_ Hierarchical_ Clustering_Overview.htm [12] S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Proc. 15th Int’l Conf. Data Eng., IEEE CS Press, Los Alamitos, Calif., 1999, pp. 512-521. [13] S. Guha, R. Rastogi, and K. Shim, “CURE: An Efficient Clustering Algorithm for Large Databases,” Proc. ACM SIGMOD Int’l Conf. Management of Data, ACM Press, New York, 1998, pp. 73-84. [14] T. Velmurugan and T. Santhanam,” Computational Complexity between K- Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points”, Journal of Computer Science 6 (3): 363-368, 2010 ISSN 1549-3636, 2010 Science Publications 165