Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Visualization for Data Analysis: A New Way to Look at Content
1. Marjorie M.K. Hlava.
President
Access Innovations, Inc.
mhlava@accessinn.com
505-998-0800
MARCH 28TH
VIRTUAL LUNCH WEBINAR:
VISUALIZATION FOR DATA
ANALYSIS: A NEW WAY TO LOOK
AT CONTENT
1
2. A picture is worth…
thousand words
As librarians we normally look at data in lists, citations, and other text
based presentations. Increasingly however this data can be analyzed,
manipulated and presented as visual displays. Maps of science, places and
spaces, increased amounts storage and computing power have made
working with digital assets possible. Presenting the data in new and visual
ways allow us to see trends, changes in research directions, coverage,
demographic trends, data overlap and the white spaces where data does
not exist on a topic – knowledge gaps are exposed. This talk will cover
how the data is prepared and options for visual display content…
100 words x10 = thousand words
2
3. Why take a visual look?
• As librarians we normally look at data in lists, citations, and other
text based presentations.
• Increasingly however this data can be analyzed, manipulated and
presented as visual displays.
• Maps of science, places and spaces, increased amounts storage
and computing power have made working with digital assets
possible.
• Presenting the data in new and visual ways allow us to see trends,
changes in research directions, coverage, demographic trends, data
overlap and the white spaces where data does not exist on a topic
– knowledge gaps are exposed.
3
4. Visualization of data
• Needs • Is richer with
− Measurement − Linking
− Metrics − Semantic enrichment
− Numbers − Classification
• Shows
− Adjacency • Supports
− Relationships − Forecasting
− Trends − Trend analysis
− Co – occurrence − Segmentation
− Conceptual distance
− Distribution
4
5. Man’s attention to
visual display to convey
knowledge is ancient
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 5
6. The art in maps
is a
longstanding
tradition
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 6
7. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 7
8. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 8
9. Super imposing data is now common
A mash up example
Traffic Injury Map
UK Data Archive
US National Highway
Safety Administration
Google Maps Base
Accident categories include
children
automobile
bicycle
etc.
Data
time
place
type
Source:
JISC TechWatch: Data Mash-ups September 2010
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 9
10. The most popular APIs for mashups
a) July 2009 b) October 2009
Source: JISC TechWatch: Data Mash-ups September 2010
Programmable web data
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 10
11. Radio4 website
Data source
MapTube
Credit Crunch Mood Map
User Website questionnaire
Crowdsourced visualization and mapping
Early responses Final Credit Crunch Mood Map
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Source: JISC TechWatch: Data Mash-ups September 2010
11
12. Mash up of bird flight migrations and
weather patterns
http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 12
14. Noise Tube Application uses geo-locations of SMS
like Twitter with GPS sensing on mobile devices
Source: JISC TechWatch: Data Mash-ups September 2010
Programmable web data
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 14
15. Changes in our life time!
Its only the beginning
Source: JISC TechWatch: Data Mash-ups September 2010
Programmable web data
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 15
16. Fine, So there are nice visual maps,
What about information from databases and
libraries??
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 16
17. Start with data – like this XML file
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 17
18. Index or tag using subject terms from
thesaurus or taxonomy
date, category, taxonomy term, frequency
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 18
19. Many views of one set of data
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
20. Load to a visualization program
Like Prefuse
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 20
21. Or Pajek
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 21
22. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 22
23. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 23
24. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 24
25. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 25
26. National Information center
for Educational Media
Albuquerque’s own
» Sandia developed VxInsight
» Access Innovations NICEM
Same data - three views
Primary and Secondary Education in US
Shows the US Valley of Science
Little Science taught in elementary years
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 26
27. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
28. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
29. Requirements for Visualization
From a society / publisher perspective
» Which topical areas form our core? periphery?
» Where is the coverage dense? thin?
» Which topical areas are most active? least active?
» Which topical areas seem to be emerging?
declining?
» Which topical areas are interrelated? isolated?
» What are the overlaps between journals / segments?
» Where are the potential expansion points?
From a thesaurus perspective
» What terms are too broadly defined?
» How do actual topical relationships differ from the
thesaurus structure?
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29
30. Using visualization to show
From a society / publisher perspective
» Identify Core, Boundary and Cross Border
» Provides Indicators
Activity
Growth
Relatedness
Centrality
» Locates Journal domains
From a thesaurus perspective
» Identifies terms that are too broadly defined
» Potential Improvements in thesaurus structure using topic
structures
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30
31. Case Study:
Mapping IEEE thesaurus space
We are interested in an expanded map that
includes adjacencies to the IEEE data
» Expanded term set shows adjacent white space;
opportunities for expansion
Overlaps and edges of the science
» We need comparison data
Learn the directions in the field
» Low occurrence rate in IEEE documents?
» Linkage to terms in IEEE documents?
Where do we find these terms? How can we
add them?
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31
32. The process
Built a rule base to auto index IEEE content
» “90 % accuracy out of the box on journal data”*
» “80% out of the box on proceedings data”*
The overlapping data sets
» Auto indexed 1.2 million Xplore records
» 10 years of US Patent data
» 10 years of Medline
Term sets used
» IEEE thesaurus terms rule base
» Medical Subject Headings (MeSH) (and simple rule base)
» Defense Technical Information Center (DTIC) Thesaurus (
and simple rule base)
» Similar level of detail to current IEEE thesaurus terms
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32
33. Defining expanded term space
1. The data - Select related corpus
14k DTIC
2k terms
IEEE
475k patents PubMed
1.2M documents
525k docs
24k MeSH
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33
34. Defining expanded term space
2. Identify related terms
Use the IEEE Thesaurus to index the three collections
2k terms
IEEE
1.2M documents
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34
35. Defining expanded term space
2. Identify related terms
Use MESH and DTIC to also index the three collections
2k terms
IEEE
1.2M documents
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35
36. Defining expanded term space
3. Resulting term set
The co-indexed items from the three collections
2k terms
IEEE
1.2M documents
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36
37. Defining expanded term space
4. Term:Term Matrix
Where do the articles and their indexing intersect?
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37
38. Visualization Strategies
Visualization
Matrix Software
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38
39. All data up-posted to the top level
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 39
40. Many map options
Previous Experience IEEE Experience
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 40
41. IEEE Portfolio
Electromag
Compat Soc Prof
Reliability Commun
Society Society Education
Sensors Ultrason, Robot Society
Oceanic Council Ferro … Autom Soc
Engng Soc
Instr
Measur Soc
CouncilDielectr El Nucl Plasma
SupercondInsul Soc Sys Man
Sci Soc Computer
Cyber
Prod Saf Society Photonics
Compon, Systems Society
Engng Soc Magnetics Council Soc
Packag …
Soc
Nanotech Social
Council Impl Techn
Computer
Intelligence
Society Eng Med
Biol Sci
Council Electr
Design Auto
Industr
Industry
Geosci Rem Electr Soc
Appl Soc
Sens Soc
Antennas
Propag Soc
Power
Power &
Electron Soc Microwave
Energy Soc
Theory Soc
Circuits &
Signal Consumer
Systems
Electron Proc Soc Electr Soc
Dev Soc
Broadcast
Intell Transp Techn Soc
Sys Soc
Solid St
Circuits Soc
Aerosp
Electr Vehicular
Sys Soc Techn Soc
Commun
Soc
Info Theory
Soc
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 41
42. Radial Visualization
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 42
43. Subsidiary radials
Journal of Instrumentation
Compon, Dielectr El Ultrason, Electromag
Packag … Instr Ferro … Compat Soc
Insul Soc
Measur Soc
Prod Saf Council Magnetics Sensors Antennas
Engng Soc Supercond Soc Council Propag Soc
Nanotech Oceanic Geosci Rem Nucl Plasma
Council Engng Soc Sens Soc Sci Soc
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
44. The research team
Access Innovations / Data Harmony
» Founded in 1978
» Data enrichment and normalization
» Suite of Semantic Enrichment tools
SciTechStrategies
» Understanding data through visualization
IEEE Indexing & Abstracting Group
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 44
45. Use a Thesaurus to Label Maps
Construction Packaging Consumer
Products
Vehicles,
Parts Welding
Gearing
Automotive + Flow
Defense Boats Appliances Food
Brakes Hygiene
Aircraft
Dynamics Sprayers Cleaning
IC Engines
Turbines Industrial
Pumps
ValvesProducts Exhaust
Leisure Fitness Outerwear Footwear
Control Medical
Pipes Devices
Toys Health Care
Clocks Games Blasting Radiology
Cooling
Measurement
Energy Med Instruments Agriculture
Cables Heating Plants,
Micro-orgs
Conveyers
Oilfield
Services
Pharma
Lamps Components
Printing
Telecom Computer Motors
Acyclic Comp
HW/SW Semiconductors Lubricants Metals
Optics
Lasers Rubber
Molding Paper
Displays Electronics Catalysis
Magn/Elect Conductors Layers
Circuits Textiles
Electrochem
Magnets Macromolecules
Disk
Amplifiers Photochem Chemicals Coatings
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 45
46. Questions Answered
Is there a way, using our own information, to forecast our
direction?
Where is the industry headed? What about by technology
sector?
Does our coverage match our mission and vision?
Can we become smarter about our data and potential
markets using our collection in new ways?
Are the societies publishing and talking about what their
charter indicates they cover?
What are the trends – are topics emerging/cooling?
Can we use technology and our own data to explore these
questions while enhancing our data?
Well Formed Data • Semantic Enrichment • Taxonomies •46
Access Innovations • Data Harmony
47. Conference Strategy
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 47
49. We looked at
Visualization of data
Finding the Metrics How to enrich with
» Measurement » Linking
» Numbers » Semantic enrichment
» Terms as indicators » Classification
Ways to show Maps supporting
» Adjacency
» Relationships » Forecasting
» Trends » Trend analysis
» Co – occurrence » Segmentation
» Conceptual distance » Distribution
49
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
50. Effective maps require
Contextual data
Detailed data
Classification methods
At least two directions in the matrix
A little art for fun
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 50
51. Changing the way we
interact with reality
Acrossair’s Augmented reality application – just point your phone at it
Source: JISC TechWatch: Data Mash-ups September 2010
Programmable web data
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 51
52. It just takes a little imagination
Thank you
Marjorie M.K. Hlava
President, Access Innovations
505-998-0800
mhlava@accessinn.com
52
Notes de l'éditeur
Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.