Pingar researcher Dr Anna Divoli's presentation given at the 2012 Text Analytics World Boston. Content includes discussion of taxonomies and big data,.
How Taxonomies and facets bring end users closer to big data
1. How taxonomies and facets
bring end-users closer to big data
Anna Divoli
@annadivoli
Boston Oct 2012
2. Taxonomies
• τάξις/τάξη + νομία (arrangement/class + method/rule/law)
• hierarchical classification
• formal nomenclature
• varied dimensions
• evaluation/measures/metrics
• types: manually constructed, social, auto-generated
• purposes: auto-indexing, search facilitation, navigation,
knowledge management, organization….
• it is OK to change the classification systems to adjust to new
knowledge – not just adding new concepts
• the data have become “big” and available but not accessible
• many “end users”
Boston Oct 2012
3. User Studies Types
Specialized domain studies:
1. Facets (HCIR): Biomedical Scientists
Anna Divoli and Alyona Medelyan
Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA
2. Expert needs (media group)
UI preferred features studies:
3. Existing popular systems (EuroHCIR)
Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan
CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands
4. Mock ups of specific features (survey)
Boston Oct 2012
4. Our studies
1. Facets (HCIR): Biomedical Scientists
Anna Divoli and Alyona Medelyan
Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA
Boston Oct 2012
5. Facets – favorite feature for search systems
Anna Divoli and Alyona Medelyan, Search interface feature evaluation in
biosciences, HCIR 2011, Google, Mountain View, CA, USA
Boston Oct 2012
6. Facets (in search systems)
animal models huntington disease
Boston Oct 2012
7. Bio-Facets
Most liked Least liked
animal models huntington disease
Boston Oct 2012
8. Facets as search features for biomedical scientists: Findings
• Faceted search is the most important stand alone feature in a search
interface for bioscientists.
• Few, query-oriented facets presented as checkboxes work best.
• Overly simple aesthetics, although not desirable, do not hurt overall
UI score.
• Complex aesthetics turn users away from the systems.
• Bioscientists prefer tools that help them narrow their search, not
expand it.
• For generic search: doc-based facets.
For domain-specific search: query-based facets.
Boston Oct 2012
9. Search expansions★
Facets as search feature: likes & dislikes
br
ff
ig
S
Facetted refinement • Useful categories
+ useful categories + quick paper access + “top
br - slow functionality• Simple
+ “reviews” category + simple - too
ff - too complex/busy
- too many colors
• - limited functional.
Vertical list
- poor design
+ vertical list
- nothing special
ig
Semedico PubMed Solr Go
Related searches
br
- not scientific
+ colors • Too complex/busy
+ relevant
ff - too small
- too busy
• Too many colorsvariety
- poor context
- no
ig
Bing • Poor design PubMed
Results preview★ • Limited functionality
• Too many symbols
br
ff • Not special/ Colorless
ig
Legend
+ positive comments
Boston Oct 2012 positive
10. Our studies
2. Expert needs (media group)
Boston Oct 2012
11. Case Study: Media Group
They have a system/”taxonomy” in place that nobody
maintains or uses…
~ 10,000 articles / week, ~5 million in their archives
~ 21 years, 10,000 authors
Handful of top categories
Main reasons/uses:
- Advertisement
- Packing up stories and selling them
- Readers finding stories & related stories
- Journalists finding related stories
Boston Oct 2012
12. Expert content needs - Case Study: Media Group
Ideally update the taxonomy daily/weekly
Must be dynamic & handle new cases/concepts
Deep nesting is OK
If multiple inheritance, need to disambiguate where a
particular article belongs to
Be able to edit (be able to verify , in case of anomalies
based on automation & move nodes around)
Boston Oct 2012
13. Our studies
3. Existing popular systems (EuroHCIR)
Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan
CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands
Boston Oct 2012
18. Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD
A B C D E F A B C D E F A B C D E F A B C D E F A B C DE F
C
F
B
D
A
E
Boston Oct 2012
19. Exploring UI features (Yippy, Carrot, MeSH, ESD): likes & dislikes
• Menu highlighting
• Hierarchical folder layout
• Expand hierarchy with “+” and “–”
• Dual view (tree on left, results on right)
• Ability to change visualisations of taxonomy
• Search function is important
• Familiar interface with folders
• Too simple or too much writing - would be nice to have color
• Lots of scrolling
• Dots in carrot circle – confusing
• Double click on foam tree is unintuitive
• Too broad taxonomies
Boston Oct 2012
20. Our studies
4. Mock ups of specific features (survey)
Boston Oct 2012
21. Taxonomy UI preferences (ongoing survey):
The (51) participants
Age: How comfortable you are with computers?
25 or younger 27.3% Somewhat 5.5%
26-40 60.0% Very 47.3%
41-60 12.7% Second nature 47.3%
61 or older 0%
Highest level of education: Do you have experience using taxonomies?
High School 3.6% No 30.9%
College/University 52.7% Yes, but very little 47.3%
Graduate School 43.6% Yes 21.8%
bit.ly/pingar_taxonomies
Boston Oct 2012
22. popularity (A) 44.2%
Concept sorting alphabetically (B) 42.3%
no preference 13.5%
Boston Oct 2012
23. A 42.3%
Displaying Counts B 51.9%
no preference 5.8%
Boston Oct 2012
24. in frames (A) 72.5%
Using Labels with labels (B) 23.5%
no preference 3.9%
Boston Oct 2012
25. A 47.1%
Plus/minus signs or arrows B 37.3%
no preference 15.7%
Boston Oct 2012
26. A 13.7%
Search Results Display B 11.8%
C 70.6%
no preference 3.9%
Boston Oct 2012
27. partial 74.5%
Search Functionality hidden 64.7%
no preference 2.0%
Boston Oct 2012
28. Where we stand
Our team works on automatic generated taxonomies but we
realized the need for customization for specific needs
Boston Oct 2012
29. Taxonomy
“Taxonomy is described sometimes as a science and
sometimes as an art, but really it’s a battleground.”
Bill Bryson, A Short History of Nearly Everything
Boston Oct 2012
30. T echnology
A rt
a X iomatic
phil O sophy
desig N
l O gic
hu M anities
lingu I stics
E thnonology
S cience
Boston Oct 2012
31. Summary
• There is a place for manually, socially and automatically
generated taxonomies (as well as hybrids).
• Text is “big” and in many fields dynamic.
• “End-users” (not Information Management experts) need
access to “big text”.
• Auto-generated taxonomies with manual editing facilities
is now possible & makes sense.
• Domain specific background knowledge is vital for the
quality and detail required per solution.
• User friendly systems are very important for end users.
Boston Oct 2012
32. Acknowledgements
Alyona Medelyan (Pingar)
Max L. Wilson (Swansea/Nottingham)
Matthew Pike (Swansea/Pingar)
Pingar Brains
pingar.com All 65+ anonymous studies participants!
Boston Oct 2012
Editor's Notes
Based on our current knowledge, experience and the results of our user studies the direction our research team is taking