Submit Search
Upload
Taxonomies for Text Analytics and Auto-indexing
•
5 likes
•
3,404 views
Heather Hedden
Follow
Presentation given at Text Analytics World Conference, Boston, MA, October 2012
Read less
Read more
Technology
Report
Share
Report
Share
1 of 34
Download now
Download to read offline
Recommended
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred Terms
Heather Hedden
Mapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual Taxonomies
Heather Hedden
Taxonomies and Folksonomies
Taxonomies and Folksonomies
Heather Hedden
Taxonomy Displays: Bridging UX & Taxonomy Design
Taxonomy Displays: Bridging UX & Taxonomy Design
Heather Hedden
Managing Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan Terms
Heather Hedden
Tools for Taxonomies
Tools for Taxonomies
Earley Information Science
Taxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-Indexing
Heather Hedden
Taxonomies in Support of Search
Taxonomies in Support of Search
Heather Hedden
Recommended
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred Terms
Heather Hedden
Mapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual Taxonomies
Heather Hedden
Taxonomies and Folksonomies
Taxonomies and Folksonomies
Heather Hedden
Taxonomy Displays: Bridging UX & Taxonomy Design
Taxonomy Displays: Bridging UX & Taxonomy Design
Heather Hedden
Managing Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan Terms
Heather Hedden
Tools for Taxonomies
Tools for Taxonomies
Earley Information Science
Taxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-Indexing
Heather Hedden
Taxonomies in Support of Search
Taxonomies in Support of Search
Heather Hedden
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Heather Hedden
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
Access Innovations, Inc.
Taxonomy 101
Taxonomy 101
Access Innovations, Inc.
Benefits of Taxonomies
Benefits of Taxonomies
Heather Hedden
Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?
Fred Leise
Managing Taxonomy Tagging
Managing Taxonomy Tagging
Heather Hedden
Ontology and Other Semantic Options
Ontology and Other Semantic Options
Access Innovations, Inc.
A Brief Introduction to SKOS
A Brief Introduction to SKOS
Heather Hedden
Taxonomy Fundamentals Workshop
Taxonomy Fundamentals Workshop
Access Innovations, Inc.
Taxonomy and seo sla 05-06-10(jc)
Taxonomy and seo sla 05-06-10(jc)
Earley Information Science
Mapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and Ontologies
Heather Hedden
A Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge Graphs
Heather Hedden
Taxonomies for Users
Taxonomies for Users
Heather Hedden
Machine Aided Indexer
Machine Aided Indexer
Access Innovations, Inc.
Thesauri
Thesauri
Miles Price
The Role of Thesauri in Data Modeling
The Role of Thesauri in Data Modeling
Danny Greefhorst
Taxonomy Design for SharePoint
Taxonomy Design for SharePoint
Heather Hedden
Customer-Focused Thesauri
Customer-Focused Thesauri
Heather Hedden
The Myth of Topic Maps
The Myth of Topic Maps
Access Innovations, Inc.
Introduction To Controlled Vocabularies
Introduction To Controlled Vocabularies
Fred Leise
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.
Janet Leu
Taxonomy Interoperability Standards
Taxonomy Interoperability Standards
Access Innovations, Inc.
More Related Content
What's hot
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Heather Hedden
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
Access Innovations, Inc.
Taxonomy 101
Taxonomy 101
Access Innovations, Inc.
Benefits of Taxonomies
Benefits of Taxonomies
Heather Hedden
Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?
Fred Leise
Managing Taxonomy Tagging
Managing Taxonomy Tagging
Heather Hedden
Ontology and Other Semantic Options
Ontology and Other Semantic Options
Access Innovations, Inc.
A Brief Introduction to SKOS
A Brief Introduction to SKOS
Heather Hedden
Taxonomy Fundamentals Workshop
Taxonomy Fundamentals Workshop
Access Innovations, Inc.
Taxonomy and seo sla 05-06-10(jc)
Taxonomy and seo sla 05-06-10(jc)
Earley Information Science
Mapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and Ontologies
Heather Hedden
A Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge Graphs
Heather Hedden
Taxonomies for Users
Taxonomies for Users
Heather Hedden
Machine Aided Indexer
Machine Aided Indexer
Access Innovations, Inc.
Thesauri
Thesauri
Miles Price
The Role of Thesauri in Data Modeling
The Role of Thesauri in Data Modeling
Danny Greefhorst
Taxonomy Design for SharePoint
Taxonomy Design for SharePoint
Heather Hedden
Customer-Focused Thesauri
Customer-Focused Thesauri
Heather Hedden
The Myth of Topic Maps
The Myth of Topic Maps
Access Innovations, Inc.
Introduction To Controlled Vocabularies
Introduction To Controlled Vocabularies
Fred Leise
What's hot
(20)
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
Taxonomy 101
Taxonomy 101
Benefits of Taxonomies
Benefits of Taxonomies
Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?
Managing Taxonomy Tagging
Managing Taxonomy Tagging
Ontology and Other Semantic Options
Ontology and Other Semantic Options
A Brief Introduction to SKOS
A Brief Introduction to SKOS
Taxonomy Fundamentals Workshop
Taxonomy Fundamentals Workshop
Taxonomy and seo sla 05-06-10(jc)
Taxonomy and seo sla 05-06-10(jc)
Mapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and Ontologies
A Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge Graphs
Taxonomies for Users
Taxonomies for Users
Machine Aided Indexer
Machine Aided Indexer
Thesauri
Thesauri
The Role of Thesauri in Data Modeling
The Role of Thesauri in Data Modeling
Taxonomy Design for SharePoint
Taxonomy Design for SharePoint
Customer-Focused Thesauri
Customer-Focused Thesauri
The Myth of Topic Maps
The Myth of Topic Maps
Introduction To Controlled Vocabularies
Introduction To Controlled Vocabularies
Viewers also liked
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.
Janet Leu
Taxonomy Interoperability Standards
Taxonomy Interoperability Standards
Access Innovations, Inc.
Naming the nyms
Naming the nyms
Kaylene Armstrong
Taxonomy Development and Digital Projects
Taxonomy Development and Digital Projects
daniela barbosa
User-Driven Taxonomies
User-Driven Taxonomies
Christine Connors
Taxonomy and UX lessons learned for e-commerce
Taxonomy and UX lessons learned for e-commerce
Gary Carlson
Starting a Taxonomy Project (Presented at SLA 2013 Conference)
Starting a Taxonomy Project (Presented at SLA 2013 Conference)
Miraida Morales
Nym S
Nym S
wellsli
E-Commerce Taxonomies
E-Commerce Taxonomies
Jenny Benevento
Taxonomy Governance and Iteration
Taxonomy Governance and Iteration
Enterprise Knowledge
Taming taxonomy—a practical intro
Taming taxonomy—a practical intro
Alberta Soranzo
homophone, homonomy, polysemy
homophone, homonomy, polysemy
stefani llontop
Taxonomy And Metadata
Taxonomy And Metadata
David Champeau
Taxonomies for E-commerce
Taxonomies for E-commerce
Heather Hedden
Understanding Website Taxonomy
Understanding Website Taxonomy
Iksula
Taxonomy Is User Experience
Taxonomy Is User Experience
Dave Cooksey
Synonyms
Synonyms
dylbangalore
Academic word list synonyms vietnamese
Academic word list synonyms vietnamese
Phan Thị Trai Anh
Polysemous and Homonymous expressions
Polysemous and Homonymous expressions
Juan Miguel Palero
Semantics
Semantics
Willow Pangket
Viewers also liked
(20)
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.
Taxonomy Interoperability Standards
Taxonomy Interoperability Standards
Naming the nyms
Naming the nyms
Taxonomy Development and Digital Projects
Taxonomy Development and Digital Projects
User-Driven Taxonomies
User-Driven Taxonomies
Taxonomy and UX lessons learned for e-commerce
Taxonomy and UX lessons learned for e-commerce
Starting a Taxonomy Project (Presented at SLA 2013 Conference)
Starting a Taxonomy Project (Presented at SLA 2013 Conference)
Nym S
Nym S
E-Commerce Taxonomies
E-Commerce Taxonomies
Taxonomy Governance and Iteration
Taxonomy Governance and Iteration
Taming taxonomy—a practical intro
Taming taxonomy—a practical intro
homophone, homonomy, polysemy
homophone, homonomy, polysemy
Taxonomy And Metadata
Taxonomy And Metadata
Taxonomies for E-commerce
Taxonomies for E-commerce
Understanding Website Taxonomy
Understanding Website Taxonomy
Taxonomy Is User Experience
Taxonomy Is User Experience
Synonyms
Synonyms
Academic word list synonyms vietnamese
Academic word list synonyms vietnamese
Polysemous and Homonymous expressions
Polysemous and Homonymous expressions
Semantics
Semantics
Similar to Taxonomies for Text Analytics and Auto-indexing
Taxonomy made easy
Taxonomy made easy
Earley Information Science
Topic Maps - Human-oriented semantics?
Topic Maps - Human-oriented semantics?
Lars Marius Garshol
Dealing the Cards
Dealing the Cards
TSoholt
Taxonomy design best practices
Taxonomy design best practices
voginip
Metadata practice and direction:a community perspective
Metadata practice and direction:a community perspective
lisld
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
Naveen Ashish
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
Mustafa Jarrar
Chapter 9Enterprise Content and Record ManagementSt. Rit
Chapter 9Enterprise Content and Record ManagementSt. Rit
JinElias52
Controlled Vocabulary.pptx
Controlled Vocabulary.pptx
Institute of Strategic Studies Islamabad (ISSI)
Professional ethics
Professional ethics
Abaid Ch
Themes identification techniques in qualitative research
Themes identification techniques in qualitative research
Ghulam Qambar
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
Access Innovations, Inc.
Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...
Ryan Scicluna
Culture and Usability - Cross Cultural Issues in HCI, IIT Guwahati
Culture and Usability - Cross Cultural Issues in HCI, IIT Guwahati
Sameer Chavan
Glis Culture 03 20071029
Glis Culture 03 20071029
Jan Pawlowski
Evaluating Taxonomies
Evaluating Taxonomies
Joseph Busch
Sfre Medline 0910
Sfre Medline 0910
JanetMac
Expertise Networks
Expertise Networks
Joel Alleyne
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
The Digital Group
How Ontologies Power Chatbots
How Ontologies Power Chatbots
Earley Information Science
Similar to Taxonomies for Text Analytics and Auto-indexing
(20)
Taxonomy made easy
Taxonomy made easy
Topic Maps - Human-oriented semantics?
Topic Maps - Human-oriented semantics?
Dealing the Cards
Dealing the Cards
Taxonomy design best practices
Taxonomy design best practices
Metadata practice and direction:a community perspective
Metadata practice and direction:a community perspective
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
Chapter 9Enterprise Content and Record ManagementSt. Rit
Chapter 9Enterprise Content and Record ManagementSt. Rit
Controlled Vocabulary.pptx
Controlled Vocabulary.pptx
Professional ethics
Professional ethics
Themes identification techniques in qualitative research
Themes identification techniques in qualitative research
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...
Culture and Usability - Cross Cultural Issues in HCI, IIT Guwahati
Culture and Usability - Cross Cultural Issues in HCI, IIT Guwahati
Glis Culture 03 20071029
Glis Culture 03 20071029
Evaluating Taxonomies
Evaluating Taxonomies
Sfre Medline 0910
Sfre Medline 0910
Expertise Networks
Expertise Networks
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
How Ontologies Power Chatbots
How Ontologies Power Chatbots
Recently uploaded
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
Recently uploaded
(20)
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Taxonomies for Text Analytics and Auto-indexing
1.
Taxonomies
for Text Analytics and Auto-Indexing Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 4, 2012
2.
Introduction: Text Analytics
and Taxonomies Text analytics can be used to index content without the use of taxonomies/controlled vocabularies. Text analytics can be used to index content with taxonomies/controlled vocabularies for better results. Text analytics can generate terms from text to be used: 1. As a source to manually build taxonomies 2. To auto-categorize/classify content against existing taxonomies 2 © 2012 Hedden Information Management
3.
Outline
Taxonomy Introduction: Definitions & Types Taxonomy Introduction: Purposes & Benefits Synonyms for Terms Auto-Indexing and Auto-Categorization Taxonomies for Auto-Categorization Taxonomy Resources 3 © 2012 Hedden Information Management
4.
Outline
Taxonomy Introduction: Definitions & Types Taxonomy Introduction: Purposes & Benefits Synonyms for Terms Auto-Indexing and Auto-Categorization Taxonomies for Auto-Categorization Taxonomy Resources 4 © 2012 Hedden Information Management
5.
Definitions & Types Broad
designations Specific types (essentially the same meaning (different meanings): – used interchangeably): Controlled Vocabularies (CV) Term Lists/Pick lists Knowledge Organization Systems Synonym Rings Taxonomies Authority Files Taxonomies Hierarchical Faceted Thesauri Ontologies 5 © 2012 Hedden Information Management
6.
Definitions & Types Broad
Designations: Controlled vocabulary, knowledge organization system, taxonomy An authoritative, restricted list of terms (words or phrases) Each term for a single unambiguous concept (synonyms/nonpreferred terms, as cross-references, may be included) Policies (control) for who, when, and how new terms can be added May or may not have structured relationships between terms To support indexing/tagging/metadata management of content to facilitate content management and retrieval 6 © 2012 Hedden Information Management
7.
Definitions & Types Specific
types Term Lists/Pick lists Synonym Rings Authority Files Taxonomies Thesauri Ontologies 7 © 2012 Hedden Information Management
8.
Definitions & Types:
Specific Types Term List A simple list of terms Lacking synonyms, usually short enough for browsing Often displayed in drop-down scroll boxes 8 © 2012 Hedden Information Management
9.
Definitions & Types:
Specific Types Synonym Ring A controlled vocabulary with synonyms or near-synonyms for each concept No designated “preferred” term: All terms are equal and point to each other, as in a ring. Table for terms does not display to the user 9 © 2012 Hedden Information Management
10.
Definitions & Types:
Specific Types Taxonomy A controlled vocabulary with internal structure. Terms are grouped or have hierarchical relationships. Emphasizes categories and classification for end-user display. May or may not have synonyms. Hierarchical – all terms have broader/narrower relationships to each other to form one big hierarchy Faceted – terms are grouped by attribute/aspect and are used in combination for indexing and search 10 © 2012 Hedden Information Management
11.
Definitions & Types:
Specific Types Leisure and culture . Arts and entertainment venues . . Museums and galleries Hierarchical Taxonomy . Children's activities . Culture and creativity example (UK’s IPSV): . . Architecture . . Crafts Top Level Headings . . Heritage . . Literature Business and industry . . Music Economics and finance . . Performing arts Education and skills . . Visual arts Employment, jobs and careers . Entertainment and events Environment . Gambling and lotteries Government, politics and public . Hobbies and interests administration . Parks and gardens Health, well-being and care . Sports and recreation Housing . . Team sports Information and communication . . . Cricket International affairs and defence . . . Football Leisure and culture . . . Rugby Life in the community . . Water sports People and organisations . . Winter sports Public order, justice and rights . Sports and recreation facilities Science, technology and innovation . Tourism Transport and infrastructure . . Passports and visas . Young people's activities 11 © 2012 Hedden Information Management
12.
Definitions & Types:
Specific Types Faceted Taxonomy examples © 2012 Hedden Information Management
13.
Outline
Taxonomy Introduction: Definitions & Types Taxonomy Introduction: Purposes & Benefits Synonyms for Terms Auto-Indexing and Auto-Categorization Taxonomies for Auto-Categorization Taxonomy Resources 13 © 2012 Hedden Information Management
14.
Purposes & Benefits 1.
Controlled vocabulary aspect: Brings together different wordings (synonyms) for the same concept and disambiguates terms Helps people search for information by different names Helps people retrieve matching concepts, not just words 2. Taxonomy or thesaurus structure aspect: Organizes information into a logical structure Helps people browse or navigate for information © 2012 Hedden Information Management
15.
Purposes & Benefits
Helps people search for information by different names There are multiple ways to describe the same thing. A controlled vocabulary gathers synonyms, acronyms, variant spellings, etc. Without a controlled vocabulary keyword searches would miss some relevant documents, due to: Use of different words (e.g. Attorneys, instead of Lawyers) Use of different phrases (e.g. Deceptive acts or practices instead of Unfair practices) User does not knowing the spelling of unusual names (e.g. Condoleezza Rice) © 2012 Hedden Information Management
16.
Purposes & Benefits
© 2012 Hedden Information Management
17.
Purposes & Benefits
Helps people retrieve matching concepts, not just words A single term may have multiple meanings. Controlled vocabulary terms can be clarified/ disambiguated. Without a controlled vocabulary, too many irrelevant documents would be retrieved. A search restricted on the controlled vocabulary retrieves concepts not just words. Excludes document with mere text-string matches (e.g. monitors for computers, not the verb “observes”) © 2012 Hedden Information Management
18.
Outline
Taxonomy Introduction: Definitions & Types Taxonomy Introduction: Purposes & Benefits Synonyms for Terms Auto-Indexing and Auto-Categorization Taxonomies for Auto-Categorization Taxonomy Resources 18 © 2012 Hedden Information Management
19.
Synonyms for Terms
Supports search in most controlled vocabulary types: synonym rings, authority files, thesauri, (some taxonomies) Anticipating both: varied user search string entries varied forms in the text for the same content For both manual and automated indexing A concept may have any number of synonyms, but a synonym can point to only one preferred term Varied synonym sources: Search analytics records Interviews and use cases Legacy print indexes Obvious patterns (acronyms, phrase inversions, etc.) © 2012 Hedden Information Management
20.
Synonyms for Terms Not
all are “synonyms.” Types include: synonyms: Cars USE Automobiles near-synonyms: Junior high USE Middle school variant spellings: Defence USE Defense lexical variants: Hair loss USE Baldness foreign language proper nouns: Luftwaffe USE German Air Force acronyms/spelled out forms: UN USE United Nations scientific/technical names: Neoplasms USE Cancer phrase variations (in print): Buses, school USE School buses antonyms: Misbehavior USE Behavior narrower terms: Alcoholism USE Substance abuse Also called “variant terms,” “equivalence” terms, “non-preferred © 2012 Hedden Information Management terms”
21.
Outline
Taxonomy Introduction: Definitions & Types Taxonomy Introduction: Purposes & Benefits Synonyms for Terms Auto-Indexing and Auto-Categorization Taxonomies for Auto-Categorization Taxonomy Resources 21 © 2012 Hedden Information Management
22.
Auto-Indexing and Auto-Categorization Choosing
human vs. automated indexing: Human indexing Automated indexing • Manageable number of docs • Very large number of docs • Higher accuracy in indexing • Greater speed in indexing • May include non-text files • Text files only • Invest in people • Invest in technology • Low-tech: can build your own • High-tech: must purchase indexing UI auto-indexing software • Internal control • Software vendor relationship 22 © 2012 Hedden Information Management
23.
Auto-Indexing and Auto-Categorization Automated
Indexing Technologies Entity extraction Text analytics and text mining, based on NLP Auto-categorization 23 © 2012 Hedden Information Management
24.
Auto-Indexing and Auto-Categorization Choosing
auto-indexing methods: Information extraction/text analytics Auto-categorization • For varied and undifferentiated • For consistent doc types/formats document types • For structured or pre-tagged • For unstructured content content • For varied subject areas • For limited/focused subject • Terms may or may not be displayed • Displays categories to user • Not necessarily with taxonomy • Leverages a taxonomy Combine both text analytics and auto-categorization: 1. Text analytics to extract concepts from unstructured varied content 2. Auto-categorization to apply benefits of a taxonomy/controlled vocabulary 24 © 2012 Hedden Information Management
25.
Auto-Indexing and Auto-Categorization auto-categorization
= auto-classification = automated subject indexing Auto-categorization makes use of the controlled vocabulary matched with extracted terms. Primary auto-categorization technologies: 1. Machine-learning and training documents 2. Rules-based categorization © 2012 Hedden Information Management
26.
Auto-Indexing and Auto-Categorization Machine-learning
based auto-categorization: Automatically indexes based on previous examples Complex mathematical algorithms are created Taxonomist must then provide multiple representative sample documents for each CV term to “train” the system. Best if pre-indexed records exist (i.e. converting from human to automated indexing), then hundreds of varied documents can be used for each term. 26 © 2012 Hedden Information Management
27.
Auto-Indexing and Auto-Categorization Rules-based
auto-categorization Taxonomist must write rules for each CV term Like advanced Boolean searching or regular expressions Example: Bush IF (INITIAL CAPS AND (MENTIONS "president*" OR WITH administration*" OR AROUND "white house" OR NEAR "george")) USE U.S. President ELSE USE Shrubs ENDIF Data Harmony 27 © 2012 Hedden Information Management
28.
Outline
Taxonomy Introduction: Definitions & Types Taxonomy Introduction: Purposes & Benefits Synonyms for Terms Auto-Indexing and Auto-Categorization Taxonomies for Auto-Categorization Taxonomy Resources 28 © 2012 Hedden Information Management
29.
Taxonomies for Auto-Categorization No
matter which method of auto-indexing, auto-indexing impacts controlled vocabulary creation: Continual update work is needed (new training documents or new rules) for each new term created. Feeding training documents is easier for non-information professionals, than is writing rules 29 © 2012 Hedden Information Management
30.
Taxonomies for Auto-Categorization Taxonomies
designed for auto-categorization: Need more, varied synonym/variant terms Need variant terms of different parts of speech Cannot have subtle differences between preferred terms Avoid creating many action-terms Taxonomy needs to be more content-tailored, content-based 30 © 2012 Hedden Information Management
31.
Taxonomies for Auto-Categorization Synonym/variant
term differences: For human-indexing For auto-categorization Presidential candidates Presidential candidate Candidates, presidential Presidential candidacy Candidate for president Candidacy for president Presidential hopeful Running for president Campaigning for president Presidential nominee 31 © 2012 Hedden Information Management
32.
Outline
Taxonomy Introduction: Definitions & Types Taxonomy Introduction: Purposes & Benefits Synonyms for Terms Auto-Indexing and Auto-Categorization Taxonomies for Auto-Categorization Taxonomy Resources 32 © 2012 Hedden Information Management
33.
Taxonomy Resources
ANSI/NISO Z39.19 (2005) Guidelines for Construction, Format, and Management of Monolingual Controlled Vocabularies. Bethesda, MD: NISO Press. www.niso.org Hedden, Heather. (2010) The Accidental Taxonomist. Medford, NJ: Information Today Inc. www.accidental-taxonomist.com American Society for Indexing: Taxonomies and Controlled Vocabularies Special Interest Group www.taxonomies-sig.org Special Libraries Association (SLA): Taxonomy Division http://wiki.sla.org/display/SLATAX Taxonomy Community of Practice discussion group http://finance.groups.yahoo.com/group/TaxoCoP "Taxonomies and Controlled Vocabularies“ Simmons College Graduate School of Library and Information Science Continuing Education Program, 5 weeks. $250. November 2012, January 2013. http://alanis.simmons.edu/ceweb/byinstructor.php#9 33 33 © 2012 Hedden Information Management
34.
Questions/Contact Heather Hedden Hedden Information
Management Carlisle, MA heather@hedden.net 978-467-5195 www.hedden-information.com www.linkedin.com/in/hedden twitter.com/hhedden accidental-taxonomist.blogspot.com 34 © 2012 Hedden Information Management
Download now