DevEX - reference for building teams, processes, and platforms
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output
1. SciTech Strategies, Inc.
Found in Space: Creating and
Visualizing IEEE Abstract Space for
Publication Output
Kevin W. Boyack
Marjorie M.K. Hlava
Feb 26, 2010
2. Agenda
Work in progress presentation
Introduction
» Science mapping background
» Questions with visual answers
Mapping IEEE thesaurus space
» Expanding thesaurus space to include adjacencies
Overlay data on thesaurus space
» Compare databases
» Compare journals
» Trends
Summary
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
2
3. Science mapping
30-40 year tradition of science mapping
» Well-established methodologies
» Current computing power and data availability enable large
scale mapping and analysis
Science maps can/have been created using
» Articles
» Journals
» Authors
» Terms
Maps used for communication, strategy, planning,
evaluation …
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
3
4. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
4
5. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
5
6. Questions with visual answers
From a society / publisher perspective
» Which topical areas form our core? periphery?
» Where is the coverage dense? thin?
» Which topical areas are most active? least active?
» Which topical areas seem to be emerging? declining?
» Which topical areas are interrelated? isolated?
» What are the overlaps between journals / segments?
» Where are the potential expansion points?
From a thesaurus perspective
» What terms are too broadly defined?
» How do actual topical relationships differ from the thesaurus
structure?
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
6
7. Preparing the data
Index 1.2 Million eXplore records
» Using the IEEE Thesaurus
» Using the MeSH - Medical Subject Headings
» Using the DTIC Thesaurus
Normalize and enrich the XML as needed
Create an XML / SQL Database
Look for outlyers
Massage for images
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
7
8. Mapping IEEE thesaurus space
Simple map – process
» Obtain IEEE thesaurus
» Index IEEE content (assign thesaurus terms to documents)
» Calculate relationships between thesaurus terms
» Map thesaurus terms based on relationships
6k terms
6k terms
IEEE IEEE
1.2M documents 6k terms
TERM MAP
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
8
9. Mapping IEEE thesaurus space
We are more interested in an expanded map that
includes adjacencies to the IEEE data
» Expanded term set shows adjacent white space; opportunities
for expansion
» Similar process to that for simple map except …
» We need additional terms to add
Criteria for additional terms
» Low occurrence rate in IEEE documents
» Linkage to terms in IEEE documents
» Similar level of detail to current IEEE thesaurus terms
Where do we find these terms? How can we add them?
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
9
10. Defining expanded term space
0. Desired result
6k terms
IEEE
1.2M documents
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
10
11. Defining expanded term space
1. Limit IEEE thesaurus
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
11
12. Defining expanded term space
2. Select related corpus’
475k patents
14k DTIC
2k terms
IEEE
1.2M documents
24k MeSH
PubMed
525k docs
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
12
13. Defining expanded term space
3. Identify related terms
2k terms
IEEE
1.2M documents
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
13
14. Defining expanded term space
3. Identify related terms
2k terms
IEEE
1.2M documents
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
14
15. Defining expanded term space
4. Resulting term set
2k terms
IEEE
1.2M documents
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
15
16. Clustering of terms (loose clustering)
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
16
17. Clustering of terms (tight clustering)
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
17
18. Remove non-linked MeSH
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
18
19. Cluster the term clusters
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
19
20. Linearize the term cluster order
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
20
21. IEEE corpus distribution over topics
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
21
22. USPTO corpus distribution over topics
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
22
23. PubMed corpus distribution over topics
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
23
24. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
25. Summary
Term space can be mapped effectively
The mapped space can be used to show distributions
and trends that give answers to questions
» Database distribution comparisons
» Journal / segment distribution comparisons (overlaps)
» Journal / segment trending
» Identify groups of terms that need trimming (rule base changes)
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
25
27. IEEE T Magnetics
Purple – Magnetics heading
Orange – all other
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
28. Division I
Division II
Division III
Division IV
Division V
Division VI
Division VII
Division IX
Division X
Multiple
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
29. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
30. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
31. Division I
Division II
Division III
Division IV
Division V
Division VI
Division VII
Division IX
Division X
Multiple
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
32. Division I
Division II
Division III
Division IV
Division V
Division VI
Division VII
Division IX
Division X
Multiple
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
33. Division I
Division II
Division III
Division IV
Division V
Division VI
Division VII
Division IX
Division X
Multiple
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
34. Division I
Division II
Division III
Division IV
Division V
Division VI
Division VII
Division IX
Division X
Multiple
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
35. Division I
Division II
Division III
Division IV
Division V
Division VI
Division VII
Division IX
Division X
Multiple
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
36. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Notes de l'éditeur
This one uses the division labels from the IEEE web site to show the data distribution. Purple of IEEE, red is Mesh, blue is DTIC
Blob plot – 1998 IEEE terms only – size of node relative to number of documents indexing the thesaurus branch below the given term.Colored by IEEE division. Yellow is Division VI – mostly governance and general science/engineering – cross-cutting.
IEEE Transactions on Information Theory
IEEE Transactions on Magnetics
IEEE only – term clusters linearized
Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
IEEE only. Circular plot showing all IEEE output. IEEE term clusters from linear plot ordered around circle starting at dot (top in linear) and going counterclockwise.
Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
IEEE + DTIC (blue) + MeSH (red)Labels indicate positions of key terms and IEEE division numbers