SlideShare une entreprise Scribd logo
1  sur  36
SciTech Strategies, Inc.




        Found in Space: Creating and
   Visualizing IEEE Abstract Space for
                    Publication Output
                             Kevin W. Boyack
                           Marjorie M.K. Hlava
                                  Feb 26, 2010
Agenda
          Work in progress presentation
          Introduction
           »   Science mapping background
           »   Questions with visual answers
          Mapping IEEE thesaurus space
           »   Expanding thesaurus space to include adjacencies
          Overlay data on thesaurus space
           »   Compare databases
           »   Compare journals
           »   Trends
          Summary

SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               2
Science mapping
          30-40 year tradition of science mapping
           »   Well-established methodologies
           »   Current computing power and data availability enable large
               scale mapping and analysis
          Science maps can/have been created using
           »   Articles
           »   Journals
           »   Authors
           »   Terms
          Maps used for communication, strategy, planning,
           evaluation …


SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               3
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              4
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              5
Questions with visual answers
          From a society / publisher perspective
           »   Which topical areas form our core? periphery?
           »   Where is the coverage dense? thin?
           »   Which topical areas are most active? least active?
           »   Which topical areas seem to be emerging? declining?
           »   Which topical areas are interrelated? isolated?
           »   What are the overlaps between journals / segments?
           »   Where are the potential expansion points?
          From a thesaurus perspective
           »   What terms are too broadly defined?
           »   How do actual topical relationships differ from the thesaurus
               structure?


SciTech Strategies     Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                6
Preparing the data
          Index 1.2 Million eXplore records
           »   Using the IEEE Thesaurus
           »   Using the MeSH - Medical Subject Headings
           »   Using the DTIC Thesaurus
        Normalize and enrich the XML as needed
        Create an XML / SQL Database
        Look for outlyers
        Massage for images




SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               7
Mapping IEEE thesaurus space
          Simple map – process
           »       Obtain IEEE thesaurus
           »       Index IEEE content (assign thesaurus terms to documents)
           »       Calculate relationships between thesaurus terms
           »       Map thesaurus terms based on relationships




                                                                                           6k terms
               6k terms




                                  IEEE                                                                IEEE

                            1.2M documents                                                            6k terms




                                                                                               TERM MAP

SciTech Strategies        Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                   8
Mapping IEEE thesaurus space
          We are more interested in an expanded map that
           includes adjacencies to the IEEE data
           »   Expanded term set shows adjacent white space; opportunities
               for expansion
           »   Similar process to that for simple map except …
           »   We need additional terms to add
          Criteria for additional terms
           »   Low occurrence rate in IEEE documents
           »   Linkage to terms in IEEE documents
           »   Similar level of detail to current IEEE thesaurus terms
          Where do we find these terms? How can we add them?


SciTech Strategies     Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                9
Defining expanded term space
                                              0. Desired result




                                   6k terms


                                                       IEEE
                                                 1.2M documents




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              10
Defining expanded term space
                                   1. Limit IEEE thesaurus




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              11
Defining expanded term space
                                          2. Select related corpus’



                     475k patents
                     14k DTIC




                                              2k terms


                                                                  IEEE
                                                            1.2M documents




                                                                                                24k MeSH
                                                                                                     PubMed
                                                                                                    525k docs



SciTech Strategies              Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                         12
Defining expanded term space
                                   3. Identify related terms




                                    2k terms


                                                        IEEE
                                                  1.2M documents




SciTech Strategies   Better Maps    Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               13
Defining expanded term space
                                   3. Identify related terms




                                    2k terms


                                                        IEEE
                                                  1.2M documents




SciTech Strategies   Better Maps    Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               14
Defining expanded term space
                                   4. Resulting term set




                                   2k terms


                                                       IEEE
                                                 1.2M documents




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              15
Clustering of terms (loose clustering)




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              16
Clustering of terms (tight clustering)




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              17
Remove non-linked MeSH




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              18
Cluster the term clusters




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              19
Linearize the term cluster order




SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               20
IEEE corpus distribution over topics




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              21
USPTO corpus distribution over topics




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              22
PubMed corpus distribution over topics




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              23
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Summary
          Term space can be mapped effectively
          The mapped space can be used to show distributions
           and trends that give answers to questions
           »   Database distribution comparisons
           »   Journal / segment distribution comparisons (overlaps)
           »   Journal / segment trending
           »   Identify groups of terms that need trimming (rule base changes)




SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               25
Radial thesaurus structure
SciTech Strategies, Inc.   Ordered by division
IEEE T Magnetics
                                                                                       Purple – Magnetics heading
                                                                                       Orange – all other




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
               Division   II
               Division   III
               Division   IV
               Division   V
               Division   VI
               Division   VII
               Division   IX
               Division   X
               Multiple

SciTech Strategies              Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony

Contenu connexe

En vedette

photoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_mintaphotoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_mintaKrist P
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationTSoholt
 
F I N A L P O W E R P O I N T
F I N A L  P O W E R P O I N TF I N A L  P O W E R P O I N T
F I N A L P O W E R P O I N TNicole Busch
 
Product Development Process
Product Development ProcessProduct Development Process
Product Development ProcessJames Young
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksTSoholt
 
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...Lynn Pinder
 
windows_8_biblia_minta
windows_8_biblia_mintawindows_8_biblia_minta
windows_8_biblia_mintaKrist P
 
2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards UpdateTSoholt
 
International schools in abu dhabi
International schools in abu dhabiInternational schools in abu dhabi
International schools in abu dhabiGIIS AbuDhabi
 
Tour of language landscape
Tour of language landscapeTour of language landscape
Tour of language landscapeYan Cui
 
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNowMeasuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNowThousandEyes
 
Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1Joyce Lim
 
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...Massoud AlShareef
 
Modelling game economy with neo4j Oredev
Modelling game economy with neo4j OredevModelling game economy with neo4j Oredev
Modelling game economy with neo4j OredevYan Cui
 

En vedette (18)

photoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_mintaphotoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_minta
 
Drogas
DrogasDrogas
Drogas
 
Rotulación de negocio
Rotulación de negocioRotulación de negocio
Rotulación de negocio
 
Educación virtual
Educación virtualEducación virtual
Educación virtual
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
 
F I N A L P O W E R P O I N T
F I N A L  P O W E R P O I N TF I N A L  P O W E R P O I N T
F I N A L P O W E R P O I N T
 
Product Development Process
Product Development ProcessProduct Development Process
Product Development Process
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author Networks
 
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
 
windows_8_biblia_minta
windows_8_biblia_mintawindows_8_biblia_minta
windows_8_biblia_minta
 
Phoffa
PhoffaPhoffa
Phoffa
 
2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards Update
 
International schools in abu dhabi
International schools in abu dhabiInternational schools in abu dhabi
International schools in abu dhabi
 
Tour of language landscape
Tour of language landscapeTour of language landscape
Tour of language landscape
 
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNowMeasuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow
 
Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1
 
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...
 
Modelling game economy with neo4j Oredev
Modelling game economy with neo4j OredevModelling game economy with neo4j Oredev
Modelling game economy with neo4j Oredev
 

Similaire à Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output

Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Access Innovations, Inc.
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...Dr. Haxel Consult
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Access Innovations, Inc.
 
Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012Alan Yagoda
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1Dr.-Ing. Thomas Hartmann
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
Semantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital LibrariesSemantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital LibrariesGetaneh Alemu
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...ASIS&T
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishingBradley Allen
 
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012Alan Yagoda
 

Similaire à Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output (20)

Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content
 
Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 
Semantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital LibrariesSemantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital Libraries
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
 
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research DataNISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
 

Dernier

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output

  • 1. SciTech Strategies, Inc. Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output Kevin W. Boyack Marjorie M.K. Hlava Feb 26, 2010
  • 2. Agenda  Work in progress presentation  Introduction » Science mapping background » Questions with visual answers  Mapping IEEE thesaurus space » Expanding thesaurus space to include adjacencies  Overlay data on thesaurus space » Compare databases » Compare journals » Trends  Summary SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 2
  • 3. Science mapping  30-40 year tradition of science mapping » Well-established methodologies » Current computing power and data availability enable large scale mapping and analysis  Science maps can/have been created using » Articles » Journals » Authors » Terms  Maps used for communication, strategy, planning, evaluation … SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 3
  • 4. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 4
  • 5. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 5
  • 6. Questions with visual answers  From a society / publisher perspective » Which topical areas form our core? periphery? » Where is the coverage dense? thin? » Which topical areas are most active? least active? » Which topical areas seem to be emerging? declining? » Which topical areas are interrelated? isolated? » What are the overlaps between journals / segments? » Where are the potential expansion points?  From a thesaurus perspective » What terms are too broadly defined? » How do actual topical relationships differ from the thesaurus structure? SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 6
  • 7. Preparing the data  Index 1.2 Million eXplore records » Using the IEEE Thesaurus » Using the MeSH - Medical Subject Headings » Using the DTIC Thesaurus  Normalize and enrich the XML as needed  Create an XML / SQL Database  Look for outlyers  Massage for images SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 7
  • 8. Mapping IEEE thesaurus space  Simple map – process » Obtain IEEE thesaurus » Index IEEE content (assign thesaurus terms to documents) » Calculate relationships between thesaurus terms » Map thesaurus terms based on relationships 6k terms 6k terms IEEE IEEE 1.2M documents 6k terms TERM MAP SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 8
  • 9. Mapping IEEE thesaurus space  We are more interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion » Similar process to that for simple map except … » We need additional terms to add  Criteria for additional terms » Low occurrence rate in IEEE documents » Linkage to terms in IEEE documents » Similar level of detail to current IEEE thesaurus terms  Where do we find these terms? How can we add them? SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 9
  • 10. Defining expanded term space 0. Desired result 6k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 10
  • 11. Defining expanded term space 1. Limit IEEE thesaurus SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 11
  • 12. Defining expanded term space 2. Select related corpus’ 475k patents 14k DTIC 2k terms IEEE 1.2M documents 24k MeSH PubMed 525k docs SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 12
  • 13. Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 13
  • 14. Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 14
  • 15. Defining expanded term space 4. Resulting term set 2k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 15
  • 16. Clustering of terms (loose clustering) SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 16
  • 17. Clustering of terms (tight clustering) SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 17
  • 18. Remove non-linked MeSH SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 18
  • 19. Cluster the term clusters SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 19
  • 20. Linearize the term cluster order SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 20
  • 21. IEEE corpus distribution over topics SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 21
  • 22. USPTO corpus distribution over topics SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 22
  • 23. PubMed corpus distribution over topics SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 23
  • 24. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 25. Summary  Term space can be mapped effectively  The mapped space can be used to show distributions and trends that give answers to questions » Database distribution comparisons » Journal / segment distribution comparisons (overlaps) » Journal / segment trending » Identify groups of terms that need trimming (rule base changes) SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 25
  • 26. Radial thesaurus structure SciTech Strategies, Inc. Ordered by division
  • 27. IEEE T Magnetics Purple – Magnetics heading Orange – all other SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 28. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 29. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 30. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 31. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 32. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 33. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 34. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 35. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 36. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony

Notes de l'éditeur

  1. This one uses the division labels from the IEEE web site to show the data distribution. Purple of IEEE, red is Mesh, blue is DTIC
  2. Blob plot – 1998 IEEE terms only – size of node relative to number of documents indexing the thesaurus branch below the given term.Colored by IEEE division. Yellow is Division VI – mostly governance and general science/engineering – cross-cutting.
  3. IEEE Transactions on Information Theory
  4. IEEE Transactions on Magnetics
  5. IEEE only – term clusters linearized
  6. Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  7. IEEE only. Circular plot showing all IEEE output. IEEE term clusters from linear plot ordered around circle starting at dot (top in linear) and going counterclockwise.
  8. Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  9. Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  10. IEEE + DTIC (blue) + MeSH (red)Labels indicate positions of key terms and IEEE division numbers