SlideShare une entreprise Scribd logo
1  sur  18
Text Matching to Measure Patent Similarity
Sam Arts
Faculty of Business and Economics
KU Leuven
sam.arts@kuleuven.be
Bruno Cassiman
IESE Business School, KU Leuven
bcassiman@iese.edu
Juan Carlos Gomez
University of Guanajuato
jc.gomez@ugto.mx
OECD Blue Sky Conference 2016
2
The United States Patent Classification System (USPC)
• Prior and current research relies on patent classification
(USPC)
– To identify similar patents (counterfactual control)
– e.g., Jaffe, Trajtenberg, and Henderson, 1993; Almeida, 1996; Agrawal, Cockburn, and Rosell,
2010
– To measure similarity between patents and patent portfolios
– e.g., Argyres, 1996; Ahuja, 2000; Rosenkopf and Almeida, 2003; Makri, Hitt, and Lane, 2010
• USPC
– Too broad
– Changes over time (patents are reclassified)
– Manually assigned
– e.g. Thompson and Fox-Kean, 2005; Belenzon and Schankerman, 2013; …
3
• Unclear what the bias
– Type I: false positive (dissimilar patents, same USPC)
– Type II: false negative (similar patents, different USPC)
• No alternatives
– Using subclasses instead of classes
– e.g. Thompson and Fox-Kean, 2005
– Using all classes instead of primary
– e.g. Benner and Waldfogel, 2008
• Unclear how alternatives affect Type I or Type II bias
The United States Patent Classification System (USPC)
4
• Title and abstracts from all US utility patents granted
between 1976-2013 (4.4 million)
• Concatenate title and abstract, lowercase, eliminate stop
words (SMART system >600 words), words<2 characters,
numbers, words which appear only once
• Each patent collection of unique keywords
• 526,561 keywords; avg 37 per patent
• Drop patents with less than 10 keywords (0.3% of sample)
Text-based measure of similarity
5
• Simple Jaccard index
– Range 0-1
• For each of 4.4 million patents, select closest text-matched
patent within same year (cfr JHT 1993)
– Min Jaccard of 0.05 (0.5% drop)
– More drop when matching on USPC!
• Avg Jaccard 0.24
– 14 common keywords for 2 patents with 37 keywords
• As a baseline, select distant text-match patent within same
year (Jaccard=0, closest filing date)
Text matching (instead of USPC)
6
Validation: closest text-matched patents in same year
Patent pairs with a larger Jaccard are more like to belong to same patent family (docdb), inventor(s),
assignee(s), and are more likely to cite each other
Validation: expert assessment
7
• 5 independent R&D scientists
– Semiconductor devices, chemical engineering, power plants, genetics, and
optical inspection systems
• For each expert
– Randomly select 10 baseline patents
– For each baseline patent one random patent with Jaccard
– 0.00
– 0.05-0.25,
– 0.25-0.50,
– 0.50-0.75,
– 0.75 onwards
– Randomize order and ask experts to rate similarity 1-7
8
Validation: expert assessment
9
Estimate bias related to USPC
• For each of the 4.4 million patents select three USPC
matched patents
• Three common ways of matching, approximate filing date
and …
– Primary class
– e.g. Jaffe et al. 1993
– No match for 2% of patents
– Primary class and subclass (nested)
– e.g., Almeida 1996
– No match for 20% of patents
– All classes and subclasses
– Jaccard overlap in subclasses
– e.g. Agrawal et al. 2010
– No match for 4% of patents
10
Type I error – false positive matches
• Dissimilar patents, same USPC
• Low similarity
– Primary class: 0.054
– Primary class and subclass (nested): 0.092
– All classes and subclasses: 0.097
• Lower bound: % USPC matches with Jaccard=0
– Primary class: 12%
– Primary class and subclass (nested): 4.3%
– All classes and subclasses: 4.0%
11
Type II error – false negative matches
• Similar patents, different USPC
• Lower bound: % different USPC among patents with Jaccard index of 1
– Primary class: 22.4%
– Primary class and subclass (nested): 52.3%
– All classes and subclasses: 20.0%
Validation: superiority text-matching over USPC
12
Text-matched patents are more like to belong to same patent family (docdb), inventor(s), assignee(s),
and are more likely to cite each other
Validation: superiority text-matching over USPC
13
14
Conclusions
• Text mining
– To measure patent similarity and select counterfactual control patents
– Outperforms USPC
• Fine-grained
• Does not rely on human classification
• No changes over time
– Measure similarity between portfolio’s, aggregate keywords at portfolio level
• Bias related to USPC
– Matching on primary subclass instead of class reduces Type I but increases
Type II
– Matching on all subclasses instead of primary reduces both Type I and Type II
– Unexpected large share of Type I and particularly Type II errors remain
present
• Code and data publically available
– JAVA standard libraries, csv files with cleaned words and 200 closest matches.
15
• Develop new measure of patent similarity based on text
• Validate new measure
– Same patent family, assignee, inventors, cite each other
– Expert assessments
• Estimate bias related to USPC
• Validate superiority over USPC
– Patent family, assignee, inventors, cite each other
– Expert assessments
Text mining
16
Test-based measure of similarity
17
• Title + abstract: Process for amplifying, detecting, and/or-cloning nucleic acid
sequences, The present invention is directed to a process for amplifying and
detecting any target nucleic acid sequence contained in a nucleic acid or mixture
thereof. The process comprises treating separate complementary strands of the
nucleic acid with a molar excess of two oligonucleotide primers, extending the
primers to form complementary primer extension products which act as
templates for synthesizing the desired nucleic acid sequence, and detecting the
sequence so amplified. The steps of the reaction may be carried out stepwise or
simultaneously and can be repeated as often as desired. In addition, a specific
nucleic acid sequence may be cloned into a vector by using primers to amplify
the sequence, which contain restriction sites on their non-complementary ends,
and a nucleic acid fragment may be prepared from an existing shorter fragment
using the amplification process
• 52 unique keywords: acid act addition amplification amplified amplify
amplifying carried cloned complementary comprises contained desired
detecting directed ends excess existing extending extension form fragment
invention mixture molar non-complementary nucleic oligonucleotide prepared
present primer primers process products reaction repeated restriction separate
sequence sequencesthe shorter simultaneously sites specific steps stepwise
strands synthesizing target templates treating vector
Text-based measure of similarity
Validation: superiority text-matching over USPC
18

Contenu connexe

Similaire à Arts - Text matching to measure patent similarity

Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSopen_phacts
 
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...geraintduck
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsTimothy McPhillips
 
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...taxonbytes
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screeningDeependra Ban
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryAbhik Seal
 
Montana IP Roadshow
Montana IP RoadshowMontana IP Roadshow
Montana IP RoadshowMarcus Simon
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...Ken Karapetyan
 
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...ChemAxon
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AIDatabricks
 
Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.Stephen Lieb
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 

Similaire à Arts - Text matching to measure patent similarity (20)

Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
2_Capability.ppt
2_Capability.ppt2_Capability.ppt
2_Capability.ppt
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTS
 
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research Objects
 
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Montana IP Roadshow
Montana IP RoadshowMontana IP Roadshow
Montana IP Roadshow
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
 
Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 

Plus de innovationoecd

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Presentation of the OECD Science, Technology and Innovation Outlook 2023
Presentation of the OECD Science, Technology and Innovation Outlook 2023Presentation of the OECD Science, Technology and Innovation Outlook 2023
Presentation of the OECD Science, Technology and Innovation Outlook 2023innovationoecd
 
OECD bibliometric indicators: Selected highlights, March 2023 edition
OECD bibliometric indicators: Selected highlights, March 2023 editionOECD bibliometric indicators: Selected highlights, March 2023 edition
OECD bibliometric indicators: Selected highlights, March 2023 editioninnovationoecd
 
OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022innovationoecd
 
OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022innovationoecd
 
OECD-VINNOVA Workshop, 7-8 February 2022
OECD-VINNOVA Workshop, 7-8 February 2022OECD-VINNOVA Workshop, 7-8 February 2022
OECD-VINNOVA Workshop, 7-8 February 2022innovationoecd
 
Analysis of scientific publishing activity: Key findings, December 2021
Analysis of scientific publishing activity: Key findings, December 2021Analysis of scientific publishing activity: Key findings, December 2021
Analysis of scientific publishing activity: Key findings, December 2021innovationoecd
 
Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...
Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...
Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...innovationoecd
 
OECD Council Recommendation on Enhancing Access to and Sharing of Data
OECD Council Recommendation on Enhancing Access to and Sharing of DataOECD Council Recommendation on Enhancing Access to and Sharing of Data
OECD Council Recommendation on Enhancing Access to and Sharing of Datainnovationoecd
 
2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...
2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...
2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...innovationoecd
 
OECD Digital Economy Outlook 2020: Key findings
OECD Digital Economy Outlook 2020: Key findingsOECD Digital Economy Outlook 2020: Key findings
OECD Digital Economy Outlook 2020: Key findingsinnovationoecd
 
Understanding the world of science and scientists
Understanding the world of science and scientistsUnderstanding the world of science and scientists
Understanding the world of science and scientistsinnovationoecd
 
Global Forum on Digital Security for Prosperity November 2019 event photo book
Global Forum on Digital Security for Prosperity November 2019 event photo bookGlobal Forum on Digital Security for Prosperity November 2019 event photo book
Global Forum on Digital Security for Prosperity November 2019 event photo bookinnovationoecd
 
Going Digital: Shaping Policies, Improving Lives
Going Digital: Shaping Policies, Improving LivesGoing Digital: Shaping Policies, Improving Lives
Going Digital: Shaping Policies, Improving Livesinnovationoecd
 
Global Forum on Digital Security for Prosperity December 2018 event photo book
Global Forum on Digital Security for Prosperity December 2018 event photo bookGlobal Forum on Digital Security for Prosperity December 2018 event photo book
Global Forum on Digital Security for Prosperity December 2018 event photo bookinnovationoecd
 
OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...
OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...
OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...innovationoecd
 
OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...
OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...
OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...innovationoecd
 
Making the next production revolution inclusive open and secure
Making the next production revolution inclusive open and secureMaking the next production revolution inclusive open and secure
Making the next production revolution inclusive open and secureinnovationoecd
 
Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...
Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...
Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...innovationoecd
 

Plus de innovationoecd (20)

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Presentation of the OECD Science, Technology and Innovation Outlook 2023
Presentation of the OECD Science, Technology and Innovation Outlook 2023Presentation of the OECD Science, Technology and Innovation Outlook 2023
Presentation of the OECD Science, Technology and Innovation Outlook 2023
 
OECD bibliometric indicators: Selected highlights, March 2023 edition
OECD bibliometric indicators: Selected highlights, March 2023 editionOECD bibliometric indicators: Selected highlights, March 2023 edition
OECD bibliometric indicators: Selected highlights, March 2023 edition
 
OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022
 
OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022OECD-Vinnova workshop, 7-8 February 2022
OECD-Vinnova workshop, 7-8 February 2022
 
OECD-VINNOVA Workshop, 7-8 February 2022
OECD-VINNOVA Workshop, 7-8 February 2022OECD-VINNOVA Workshop, 7-8 February 2022
OECD-VINNOVA Workshop, 7-8 February 2022
 
Analysis of scientific publishing activity: Key findings, December 2021
Analysis of scientific publishing activity: Key findings, December 2021Analysis of scientific publishing activity: Key findings, December 2021
Analysis of scientific publishing activity: Key findings, December 2021
 
Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...
Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...
Recommandation du Conseil de l'OCDE sur l'amélioration de l'accès aux données...
 
OECD Council Recommendation on Enhancing Access to and Sharing of Data
OECD Council Recommendation on Enhancing Access to and Sharing of DataOECD Council Recommendation on Enhancing Access to and Sharing of Data
OECD Council Recommendation on Enhancing Access to and Sharing of Data
 
2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...
2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...
2020.01.12 OECD STI Outlook launch - Impacts of COVID-19: How STI systems res...
 
OECD Digital Economy Outlook 2020: Key findings
OECD Digital Economy Outlook 2020: Key findingsOECD Digital Economy Outlook 2020: Key findings
OECD Digital Economy Outlook 2020: Key findings
 
Understanding the world of science and scientists
Understanding the world of science and scientistsUnderstanding the world of science and scientists
Understanding the world of science and scientists
 
Global Forum on Digital Security for Prosperity November 2019 event photo book
Global Forum on Digital Security for Prosperity November 2019 event photo bookGlobal Forum on Digital Security for Prosperity November 2019 event photo book
Global Forum on Digital Security for Prosperity November 2019 event photo book
 
Going Digital: Shaping Policies, Improving Lives
Going Digital: Shaping Policies, Improving LivesGoing Digital: Shaping Policies, Improving Lives
Going Digital: Shaping Policies, Improving Lives
 
Global Forum on Digital Security for Prosperity December 2018 event photo book
Global Forum on Digital Security for Prosperity December 2018 event photo bookGlobal Forum on Digital Security for Prosperity December 2018 event photo book
Global Forum on Digital Security for Prosperity December 2018 event photo book
 
Oslo Manual 2018
Oslo Manual 2018Oslo Manual 2018
Oslo Manual 2018
 
OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...
OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...
OECD Digital Economy Outlook 2017: Setting the foundations for the digital tr...
 
OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...
OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...
OECD Digital Economy Outlook 2017: Presentation at Global Parliamentary Netwo...
 
Making the next production revolution inclusive open and secure
Making the next production revolution inclusive open and secureMaking the next production revolution inclusive open and secure
Making the next production revolution inclusive open and secure
 
Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...
Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...
Presentation for the OECD Telecommunication and Broadcasting Review of Mexico...
 

Dernier

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Dernier (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Arts - Text matching to measure patent similarity

  • 1. Text Matching to Measure Patent Similarity Sam Arts Faculty of Business and Economics KU Leuven sam.arts@kuleuven.be Bruno Cassiman IESE Business School, KU Leuven bcassiman@iese.edu Juan Carlos Gomez University of Guanajuato jc.gomez@ugto.mx OECD Blue Sky Conference 2016
  • 2. 2 The United States Patent Classification System (USPC) • Prior and current research relies on patent classification (USPC) – To identify similar patents (counterfactual control) – e.g., Jaffe, Trajtenberg, and Henderson, 1993; Almeida, 1996; Agrawal, Cockburn, and Rosell, 2010 – To measure similarity between patents and patent portfolios – e.g., Argyres, 1996; Ahuja, 2000; Rosenkopf and Almeida, 2003; Makri, Hitt, and Lane, 2010 • USPC – Too broad – Changes over time (patents are reclassified) – Manually assigned – e.g. Thompson and Fox-Kean, 2005; Belenzon and Schankerman, 2013; …
  • 3. 3 • Unclear what the bias – Type I: false positive (dissimilar patents, same USPC) – Type II: false negative (similar patents, different USPC) • No alternatives – Using subclasses instead of classes – e.g. Thompson and Fox-Kean, 2005 – Using all classes instead of primary – e.g. Benner and Waldfogel, 2008 • Unclear how alternatives affect Type I or Type II bias The United States Patent Classification System (USPC)
  • 4. 4 • Title and abstracts from all US utility patents granted between 1976-2013 (4.4 million) • Concatenate title and abstract, lowercase, eliminate stop words (SMART system >600 words), words<2 characters, numbers, words which appear only once • Each patent collection of unique keywords • 526,561 keywords; avg 37 per patent • Drop patents with less than 10 keywords (0.3% of sample) Text-based measure of similarity
  • 5. 5 • Simple Jaccard index – Range 0-1 • For each of 4.4 million patents, select closest text-matched patent within same year (cfr JHT 1993) – Min Jaccard of 0.05 (0.5% drop) – More drop when matching on USPC! • Avg Jaccard 0.24 – 14 common keywords for 2 patents with 37 keywords • As a baseline, select distant text-match patent within same year (Jaccard=0, closest filing date) Text matching (instead of USPC)
  • 6. 6 Validation: closest text-matched patents in same year Patent pairs with a larger Jaccard are more like to belong to same patent family (docdb), inventor(s), assignee(s), and are more likely to cite each other
  • 7. Validation: expert assessment 7 • 5 independent R&D scientists – Semiconductor devices, chemical engineering, power plants, genetics, and optical inspection systems • For each expert – Randomly select 10 baseline patents – For each baseline patent one random patent with Jaccard – 0.00 – 0.05-0.25, – 0.25-0.50, – 0.50-0.75, – 0.75 onwards – Randomize order and ask experts to rate similarity 1-7
  • 9. 9 Estimate bias related to USPC • For each of the 4.4 million patents select three USPC matched patents • Three common ways of matching, approximate filing date and … – Primary class – e.g. Jaffe et al. 1993 – No match for 2% of patents – Primary class and subclass (nested) – e.g., Almeida 1996 – No match for 20% of patents – All classes and subclasses – Jaccard overlap in subclasses – e.g. Agrawal et al. 2010 – No match for 4% of patents
  • 10. 10 Type I error – false positive matches • Dissimilar patents, same USPC • Low similarity – Primary class: 0.054 – Primary class and subclass (nested): 0.092 – All classes and subclasses: 0.097 • Lower bound: % USPC matches with Jaccard=0 – Primary class: 12% – Primary class and subclass (nested): 4.3% – All classes and subclasses: 4.0%
  • 11. 11 Type II error – false negative matches • Similar patents, different USPC • Lower bound: % different USPC among patents with Jaccard index of 1 – Primary class: 22.4% – Primary class and subclass (nested): 52.3% – All classes and subclasses: 20.0%
  • 12. Validation: superiority text-matching over USPC 12 Text-matched patents are more like to belong to same patent family (docdb), inventor(s), assignee(s), and are more likely to cite each other
  • 14. 14 Conclusions • Text mining – To measure patent similarity and select counterfactual control patents – Outperforms USPC • Fine-grained • Does not rely on human classification • No changes over time – Measure similarity between portfolio’s, aggregate keywords at portfolio level • Bias related to USPC – Matching on primary subclass instead of class reduces Type I but increases Type II – Matching on all subclasses instead of primary reduces both Type I and Type II – Unexpected large share of Type I and particularly Type II errors remain present • Code and data publically available – JAVA standard libraries, csv files with cleaned words and 200 closest matches.
  • 15. 15 • Develop new measure of patent similarity based on text • Validate new measure – Same patent family, assignee, inventors, cite each other – Expert assessments • Estimate bias related to USPC • Validate superiority over USPC – Patent family, assignee, inventors, cite each other – Expert assessments Text mining
  • 17. 17 • Title + abstract: Process for amplifying, detecting, and/or-cloning nucleic acid sequences, The present invention is directed to a process for amplifying and detecting any target nucleic acid sequence contained in a nucleic acid or mixture thereof. The process comprises treating separate complementary strands of the nucleic acid with a molar excess of two oligonucleotide primers, extending the primers to form complementary primer extension products which act as templates for synthesizing the desired nucleic acid sequence, and detecting the sequence so amplified. The steps of the reaction may be carried out stepwise or simultaneously and can be repeated as often as desired. In addition, a specific nucleic acid sequence may be cloned into a vector by using primers to amplify the sequence, which contain restriction sites on their non-complementary ends, and a nucleic acid fragment may be prepared from an existing shorter fragment using the amplification process • 52 unique keywords: acid act addition amplification amplified amplify amplifying carried cloned complementary comprises contained desired detecting directed ends excess existing extending extension form fragment invention mixture molar non-complementary nucleic oligonucleotide prepared present primer primers process products reaction repeated restriction separate sequence sequencesthe shorter simultaneously sites specific steps stepwise strands synthesizing target templates treating vector Text-based measure of similarity

Notes de l'éditeur

  1. (avg 3 common keywords among 2 avg patents with 37 keywords) 6 keywords 7 keywords (text-matched 0.24, 14 keywords)