Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
 
Introduction <ul><li>World around us is filled with complex phenomena which we describe with hierarchical categorization s...
Linnaean Taxonomy Example
Phenomenon: Technology Space <ul><li>All Inclusive:  </li></ul><ul><ul><li>“ Technology Space is the universal set of all ...
Patent Classification Taxonomy http://uspto.gov/go/classification/selectnumwithtitle.htm http://www.uspto.gov/web/offices/...
USCL Hierarchy (Class and above)
USCL Class 704 (Subclass Level)
Traditional Distance Measures <ul><li>Jaffe’s Distance Measure (1986) </li></ul><ul><ul><li>Used at various levels to comb...
Limitations of Traditional Measures <ul><li>Independence </li></ul><ul><ul><li>Assume that groupings at the level of techn...
Taxonomically Appropriate Measure <ul><li>Just two modifications/extensions necessary: </li></ul><ul><ul><li>Use “Class an...
Class & Subclass Array Expansion <ul><li>“ Subclasses inherit all the properties of their parent Subclass. This means that...
IDF Weighting <ul><li>Distribution of Technology Space </li></ul><ul><ul><li>“ In information theory  (Cover & Thomas, 199...
Patent Example 6 7 7 6 5 5 5 5 4 4 3 3 2 2 2 1 1 7 7 4 3 2
Dataset #1: Traditional Methods <ul><li>Class 704: 8,713 patents, 1976 – 2008, up to 17 classes per patent </li></ul><ul><...
Dataset #1: Taxonomical Method Primary Only All Classifications
Dataset #1: Traditional vs. Taxonomical Subclass Level Class Level
Dataset #2: Traditional Methods <ul><li>NBER Patent Dataset (Patents 1980    2000) </li></ul><ul><li>“ Organizations” cre...
Dataset #2: Taxonomical Method vs. Class vs. Subcategory vs. Category <ul><li>135,000+ Class Level </li></ul><ul><li>100,0...
Conclusions <ul><li>Actual utility when applied to specific research questions. </li></ul><ul><li>Some evidence of the wor...
Prochain SlideShare
Chargement dans…5
×

Similarity and Distance Measures for Hierarchical Taxonomies

1 891 vues

Publié le

Overview of Robert C. McNamee\\’s Paper &amp;quot;Can’t See the Forest for the Leaves: Similarity and Distance Measures for Hierarchical Taxonomies with a Patent Classification Example&amp;quot;

  • Yes you are right. There are many research paper writing services available now. But almost services are fake and illegal. Only a genuine service will treat their customer with quality research papers. ⇒ www.WritePaper.info ⇐
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • I can advise you this service - ⇒ www.WritePaper.info ⇐ Bought essay here. No problem.
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Similarity and Distance Measures for Hierarchical Taxonomies

  1. 2. Introduction <ul><li>World around us is filled with complex phenomena which we describe with hierarchical categorization systems (taxonomies) </li></ul><ul><li>Researchers often conceptualize phenomena in their full complexity but underutilize the taxonomies that describe them. </li></ul><ul><li>Why? </li></ul><ul><ul><li>Under-appreciation of taxonomical data </li></ul></ul><ul><ul><li>De-facto acceptance of methods that under-utilize taxonomies </li></ul></ul><ul><li>So What? </li></ul><ul><ul><li>Risk drawing false conclusions </li></ul></ul><ul><ul><li>Avoid questions that cannot be analyzed with current methods </li></ul></ul><ul><ul><li>Under-develop new taxonomies </li></ul></ul><ul><ul><li>Eventually, conceive of underlying phenomena in less complex ways </li></ul></ul>
  2. 3. Linnaean Taxonomy Example
  3. 4. Phenomenon: Technology Space <ul><li>All Inclusive: </li></ul><ul><ul><li>“ Technology Space is the universal set of all possible technological ideas” (Olsson & Frey, 2002, p. 71) </li></ul></ul><ul><li>Fully-Connected & Continuous: </li></ul><ul><ul><li>The “technology set is always coherent and is not made up of islands in idea space.” (Olsson & Frey, 2002, p. 72) </li></ul></ul><ul><li>Distance </li></ul><ul><ul><li>Any two technologies within the totality of technology space can be considered to be somewhat similar or distant from one another. </li></ul></ul><ul><ul><li>“ For instance the two ideas ‘steel’ and ‘the Bessemer process’ are more closely related than the ideas ‘the Bessemer process’ and ‘the spinning wheel’” (Olsson & Frey, 2002, p. 71) </li></ul></ul>
  4. 5. Patent Classification Taxonomy http://uspto.gov/go/classification/selectnumwithtitle.htm http://www.uspto.gov/web/offices/opc/documents/classescombined.pdf
  5. 6. USCL Hierarchy (Class and above)
  6. 7. USCL Class 704 (Subclass Level)
  7. 8. Traditional Distance Measures <ul><li>Jaffe’s Distance Measure (1986) </li></ul><ul><ul><li>Used at various levels to combine the patents associated with a country, firm, subsidiary, or inventor </li></ul></ul><ul><ul><li>A specific level of technological aggregation such as Jaffe’s category, Jaffe’s subcategory, USPTO class, USPTO subclass, IPC 1 digit, IPC 3 digit, IPC 4 digit, etc… is chosen </li></ul></ul><ul><ul><li>The uncentered correlation (cosine similarity) is calculated and this is subtracted from 1 to create a distance measure. </li></ul></ul>
  8. 9. Limitations of Traditional Measures <ul><li>Independence </li></ul><ul><ul><li>Assume that groupings at the level of technological aggregation chosen are unrelated / independent </li></ul></ul><ul><li>Trade-off: Go higher in the hierarchy </li></ul><ul><ul><li>Over-emphasize the similarities while ignoring differences at lower levels </li></ul></ul>A21B A21C B60F A 21B A 21B A 01M = = ≠ ≠ <ul><li>Within Field / Industry </li></ul><ul><ul><li>Within field / industry measures are largely homogenous at higher levels of technological aggregation. </li></ul></ul><ul><li>Patent Level </li></ul><ul><ul><li>Patent level measures reduce down to 0-1 dummy. </li></ul></ul>
  9. 10. Taxonomically Appropriate Measure <ul><li>Just two modifications/extensions necessary: </li></ul><ul><ul><li>Use “Class and Subclass Array Expansion” (to include all super-ordinate classifications implicitly included with each classification) </li></ul></ul><ul><ul><li>Use IDF weighting of each classification (to take into account the actual distribution of invention across technological space) </li></ul></ul>=Number of times classification i is assigned to entity A =Number of times classification i is assigned to entity B =Frequency of Patents Classified within subtree subsumed by parent of classification i =Frequency of Patents Classified within subtree subsumed by classification i
  10. 11. Class & Subclass Array Expansion <ul><li>“ Subclasses inherit all the properties of their parent Subclass. This means that every Subclass title is interpreted to include the title of its parent Subclass; its definition is interpreted to include the definition of its parent Subclass; etc” (USPTO Overview of the Classification System, 2007, p. 9) </li></ul>Dimension / Level Classification (Dimension Name) Description 1 G1-02 COMMUNICATIONS, RADIANT ENERGY, WEAPONS, ELECTRICAL, AND COMPUTER ARTS 2 G1-02/G2-05 … / CALCULATORS, COMPUTERS, OR DATA PROCESSING SYSTEMS 3 G1-02/G2-05 /704 … / DATA PROCESSING: SPEECH SIGNAL PROCESSING, LINGUISTICS, LANGUAGE TRANSLATION, AND AUDIO COMPRESSION-DECOMPRESSION 4 G1-02/G2-05 /704/200 … / SPEECH SIGNAL PROCESSING 5 G1-02/G2-05 /704/200/231 … / Recognition 6 G1-02/G2-05 /704/200/231/232 … / Neural network
  11. 12. IDF Weighting <ul><li>Distribution of Technology Space </li></ul><ul><ul><li>“ In information theory (Cover & Thomas, 1991) the information contained in a statement is measured by the negative logarithm of the probability of that statement” (Lin 1998, p. 297) </li></ul></ul><ul><li>Intuitive logic: </li></ul><ul><ul><li>Frequent concepts or dimensions are less informative than rare ones (Elkan 2005; Robertson 2004; Aizawa 2003) </li></ul></ul><ul><ul><li>Also known as Inverse Document Frequency (IDF) weighting </li></ul></ul>
  12. 13. Patent Example 6 7 7 6 5 5 5 5 4 4 3 3 2 2 2 1 1 7 7 4 3 2
  13. 14. Dataset #1: Traditional Methods <ul><li>Class 704: 8,713 patents, 1976 – 2008, up to 17 classes per patent </li></ul><ul><li>48,150 patent-patent citation dyads (  within technology field flows) </li></ul>Primary Only All Classifications Class Level Class-Subclass Level Graphs show frequency of similarity calculations within samples Left most is similarity = 0 Right most is similarity = 1
  14. 15. Dataset #1: Taxonomical Method Primary Only All Classifications
  15. 16. Dataset #1: Traditional vs. Taxonomical Subclass Level Class Level
  16. 17. Dataset #2: Traditional Methods <ul><li>NBER Patent Dataset (Patents 1980  2000) </li></ul><ul><li>“ Organizations” created based on Assignee ID </li></ul><ul><li>Random 1% sample selected </li></ul><ul><li>718 Organizations with 12,993 patents (primary classifications only) </li></ul><ul><li>All pairwise comparisons (257,403 unique dyads) </li></ul>Class Level Jaffe Subcategory Level Jaffe Category Level
  17. 18. Dataset #2: Taxonomical Method vs. Class vs. Subcategory vs. Category <ul><li>135,000+ Class Level </li></ul><ul><li>100,000+ Subcategory Level </li></ul><ul><li>40,000+ Category Level </li></ul><ul><li>Significant Effects: </li></ul><ul><li>Traditional = zero Similarity </li></ul><ul><li>Taxonomical = non-zero similarity </li></ul>
  18. 19. Conclusions <ul><li>Actual utility when applied to specific research questions. </li></ul><ul><li>Some evidence of the worth of these methods </li></ul><ul><ul><li>Necessary </li></ul></ul><ul><ul><li>Tests questions where traditional methods meaningless </li></ul></ul><ul><ul><li>Flexible </li></ul></ul><ul><ul><li>Applies to all levels of analysis </li></ul></ul><ul><ul><li>Reasonable </li></ul></ul><ul><ul><li>Evidence of the relationship to traditional methods </li></ul></ul><ul><ul><li>Valuable </li></ul></ul><ul><ul><li>Greater variation and continuity </li></ul></ul><ul><ul><li>More meaningfulness to underlying theoretical phenomenon </li></ul></ul>

×