How Taxonomies and facets bring end users closer to big data

How taxonomies and facets
bring end-users closer to big data

Anna Divoli
@annadivoli

Boston Oct 2012

Taxonomies
• τάξις/τάξη + νομία (arrangement/class + method/rule/law)
• hierarchical classification
• formal nomenclature
• varied dimensions
• evaluation/measures/metrics
• types: manually constructed, social, auto-generated
• purposes: auto-indexing, search facilitation, navigation,
knowledge management, organization….
• it is OK to change the classification systems to adjust to new
knowledge – not just adding new concepts
• the data have become “big” and available but not accessible
• many “end users”
Boston Oct 2012

User Studies Types

Specialized domain studies:
1. Facets (HCIR): Biomedical Scientists
Anna Divoli and Alyona Medelyan
Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA

2. Expert needs (media group)

UI preferred features studies:
3. Existing popular systems (EuroHCIR)
Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan
CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands

4. Mock ups of specific features (survey)

Boston Oct 2012

Our studies

1. Facets (HCIR): Biomedical Scientists
Anna Divoli and Alyona Medelyan
Search interface feature evaluation in biosciences, HCIR 2011, Google, Mountain View, CA

Boston Oct 2012

Facets – favorite feature for search systems

Anna Divoli and Alyona Medelyan, Search interface feature evaluation in
biosciences, HCIR 2011, Google, Mountain View, CA, USA

Boston Oct 2012

Facets (in search systems)
animal models huntington disease

Boston Oct 2012

Bio-Facets
Most liked Least liked

animal models huntington disease

Boston Oct 2012

Facets as search features for biomedical scientists: Findings

• Faceted search is the most important stand alone feature in a search
interface for bioscientists.
• Few, query-oriented facets presented as checkboxes work best.
• Overly simple aesthetics, although not desirable, do not hurt overall
UI score.
• Complex aesthetics turn users away from the systems.
• Bioscientists prefer tools that help them narrow their search, not
expand it.
• For generic search: doc-based facets.
For domain-specific search: query-based facets.

Boston Oct 2012

Search expansions★
Facets as search feature: likes & dislikes
br
ff
ig
S
Facetted refinement • Useful categories
+ useful categories + quick paper access + “top
br - slow functionality• Simple
+ “reviews” category + simple - too
ff - too complex/busy
- too many colors
• - limited functional.
Vertical list
- poor design
+ vertical list
- nothing special
ig
Semedico PubMed Solr Go
Related searches
br
- not scientific
+ colors • Too complex/busy
+ relevant
ff - too small
- too busy
• Too many colorsvariety
- poor context
- no
ig
Bing • Poor design PubMed
Results preview★ • Limited functionality
• Too many symbols
br
ff • Not special/ Colorless
ig

Legend
+ positive comments
Boston Oct 2012 positive

Our studies

2. Expert needs (media group)

Boston Oct 2012

Case Study: Media Group

They have a system/”taxonomy” in place that nobody
maintains or uses…

~ 10,000 articles / week, ~5 million in their archives
~ 21 years, 10,000 authors
Handful of top categories

Main reasons/uses:
- Advertisement
- Packing up stories and selling them
- Readers finding stories & related stories
- Journalists finding related stories

Boston Oct 2012

Expert content needs - Case Study: Media Group

 Ideally update the taxonomy daily/weekly
 Must be dynamic & handle new cases/concepts
 Deep nesting is OK
 If multiple inheritance, need to disambiguate where a
particular article belongs to
 Be able to edit (be able to verify , in case of anomalies
based on automation & move nodes around)

Boston Oct 2012

Our studies

3. Existing popular systems (EuroHCIR)
Matthew Pike, Max L. Wilson, Anna Divoli and Alyona Medelyan
CUES: Cognitive Usability Evaluation System, EuroHCIR 2012, Nijmegen, Netherlands

Boston Oct 2012

Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD

Boston Oct 2012

Exploring UI features - Systems Tested: Yippy, Carrot, MeSH, ESD

A B C D E F A B C D E F A B C D E F A B C D E F A B C DE F

C
F
B
D
A
E
Boston Oct 2012

Exploring UI features (Yippy, Carrot, MeSH, ESD): likes & dislikes

• Menu highlighting
• Hierarchical folder layout
• Expand hierarchy with “+” and “–”
• Dual view (tree on left, results on right)
• Ability to change visualisations of taxonomy
• Search function is important
• Familiar interface with folders

• Too simple or too much writing - would be nice to have color
• Lots of scrolling
• Dots in carrot circle – confusing
• Double click on foam tree is unintuitive
• Too broad taxonomies

Boston Oct 2012

Our studies

4. Mock ups of specific features (survey)

Boston Oct 2012

Taxonomy UI preferences (ongoing survey):
The (51) participants

Age: How comfortable you are with computers?
25 or younger 27.3% Somewhat 5.5%
26-40 60.0% Very 47.3%
41-60 12.7% Second nature 47.3%
61 or older 0%

Highest level of education: Do you have experience using taxonomies?
High School 3.6% No 30.9%
College/University 52.7% Yes, but very little 47.3%
Graduate School 43.6% Yes 21.8%

bit.ly/pingar_taxonomies

Boston Oct 2012

popularity (A) 44.2%
Concept sorting alphabetically (B) 42.3%
no preference 13.5%

Boston Oct 2012

A 42.3%
Displaying Counts B 51.9%
no preference 5.8%

Boston Oct 2012

in frames (A) 72.5%
Using Labels with labels (B) 23.5%
no preference 3.9%

Boston Oct 2012

A 47.1%
Plus/minus signs or arrows B 37.3%
no preference 15.7%

Boston Oct 2012

A 13.7%
Search Results Display B 11.8%
C 70.6%
no preference 3.9%

Boston Oct 2012

partial 74.5%
Search Functionality hidden 64.7%
no preference 2.0%

Boston Oct 2012

Where we stand
Our team works on automatic generated taxonomies but we
realized the need for customization for specific needs

Boston Oct 2012

Taxonomy

“Taxonomy is described sometimes as a science and
sometimes as an art, but really it’s a battleground.”

Bill Bryson, A Short History of Nearly Everything

Boston Oct 2012

T echnology
A rt
a X iomatic
phil O sophy
desig N
l O gic
hu M anities
lingu I stics
E thnonology
S cience

Boston Oct 2012

Summary

• There is a place for manually, socially and automatically
generated taxonomies (as well as hybrids).
• Text is “big” and in many fields dynamic.
• “End-users” (not Information Management experts) need
access to “big text”.
• Auto-generated taxonomies with manual editing facilities
is now possible & makes sense.
• Domain specific background knowledge is vital for the
quality and detail required per solution.
• User friendly systems are very important for end users.

Boston Oct 2012

Acknowledgements

Alyona Medelyan (Pingar)
Max L. Wilson (Swansea/Nottingham)
Matthew Pike (Swansea/Pingar)

Pingar Brains
pingar.com All 65+ anonymous studies participants!

Boston Oct 2012

How Taxonomies and facets bring end users closer to big data

Recommended

Recommended

More Related Content

More from Peter Wren-Hilton

More from Peter Wren-Hilton (6)

How Taxonomies and facets bring end users closer to big data

Editor's Notes