Classifying R&D: Why and How Organizations Develop Taxonomies for Research Fields: Jeff Alexander and Patrick Lambe

Classifying R&D
Why and How Organizations Develop
Taxonomies for Research Fields
Jeff Alexander and Patrick Lambe
CASRAI ReConnect15

Agenda
1. Introductions and challenges/burning
questions on deploying R&D taxonomies
2. Use cases for R&D taxonomies
3. Taxonomy best practices
4. Case History: the NSF NCSES experience
5. Implications for you
6. Working with the framework

Introductions
• How do you work with R&D taxonomies now?
• What challenges do you face?

A Sample of R&D Taxonomies
• Field of Science & Engineering (U.S. Office of
Management & Budget)
• Field of Research & Development (OECD
Frascati Manual—now revised!)
• Australia-New Zealand Standard Research
Classification

FOSE (U.S. OMB)
• Standard for R&D
statistical collection in
the U.S.
• Based on academic
disciplines
• Not updated since
1977!
Field Code
Physical Sciences
Astronomy ......................................................................................................... 11
Chemistry .......................................................................................................... 12
Physics ................................................................................................................ 13
Physical sciences, not elsewhere classified1....................................... 19
Mathematics ................................................................................................................... 21
Environmental Sciences (Terrestrial and Extraterrestrial)
Atmospheric sciences ................................................................................... 31
Geological sciences ......................................................................................... 32
Oceanography .................................................................................................. 33
Environmental sciences, NEC1 ...... .......................................................... 39
Engineering
Aeronautical .................................................................................................... 41
Astronautical ................................................................................................... 42
Chemical ............................................................................................................ 43
Civil ...................................................................................................................... 44
Electrical ............................................................................................................ 45
Mechanical ........................................................................................................ 46
Metallurgy and materials ............................................................................ 47
Engineering, NEC1 .......................................................................................... 49
Life Sciences
Biological ........................................................................................................... 51
Clinical medical ............................................................................................... 52
Other medical .................................................................................................. 53
Life sciences, NEC1 ......................................................................................... 59
Psychology
Biological aspects ........................................................................................... 61
Social aspects ................................................................................................... 62
Psychological sciences, not elsewhere classified1 ............................ 69
Social Sciences
Anthropology ................................................................................................... 71
Economics ......................................................................................................... 72
History ................................................................................................................ 73
Linguistics ......................................................................................................... 74
Political science .............................................................................................. 75
Sociology ........................................................................................................... 76
Social sciences, NEC1 .................................................................................... 79
Other Sciences, NEC2 .................................................................................................. 99
1 To be used for multidisciplinary projects within the primary field and for single
discipline projects for which a separate discipline code has not been assigned.
2 To be used for multidisciplinary and interdisciplinary projects which cannot be
classified within a primary field.

FORD (OECD)
• Allows broad
comparisons between
countries
• Combines disciplines
and some economic
priorities
• Includes humanities &
arts

ANZSRC
• Describes all research as a vector of field of
research, type of activity, & socio-economic
objective
• Complex to
develop &
implement

Use Cases for R&D taxonomies
Task User Needs Challenges
R&D
statistical
data
Policy analysts
Researchers
Benchmarking
Trend analysis
Comparability
Consistency
Workforce
planning
Managers
Policy analysts
Social scientists
Training needs
Collaborations
Linking to education &
occupations
Interdisciplinarity
Research
data
discovery
Researchers
Managers
Visibility
Domain-based
search
Consistency
Interoperability
Descriptive metadata
Portfolio
analysis &
mapping
Managers
R&D executives
Policy analysts
Planning/foresight
Granularity
Gaps/redundancies
Consistency
Emergence
Interdisciplinarity

Findings from Use Cases
• Classifications will not be adopted if they have
no operational value to the organization
• Selection of the test use case is very important
• Value comes from linking classifications to
broader goals or outcomes

Taxonomy Best Practices
• Characteristics of well-functioning taxonomies
• Taxonomies as worldviews - salience
• The role of facets in modern taxonomy
development
• Taxonomies, ontologies, and knowledge
graphs—the issue of salience
• Taxonomies need to be tested – consultation
is not testing

Pursuing Science
Three essentials for the
conduct of science:
• Consistent ways of describing
science
• Conspectus across scientific
domains
• Continuity between science
memory, science activity, and
new knowledge creation
“The ability to achieve innovation in a competitive global information society
hinges on the capability to swiftly and reliably find, understand, share, and
apply complex information from widely distributed sources for discovery,
progress, and productivity.” (Interagency Working Group on Digital Data,
2009).

Taxonomies & Conduct of Science
Taxonomies:
• Standardize
language
• Structure shows
connections and
relationships
• Enable
sensemaking

Taxonomies in Science
Linnaeus
• Rules for lexical stabilization enabled
coordination
• Meaningful structure prefigured
evolutionary theory
Mendeleev
• Sensemaking structure - periodic
table prompted hypotheses about
gaps and relationships between
elements, supported new discoveries

Orders of Complexity in Taxonomies of
Science
1. Controlled
vocabularies
2. Taxonomies
3. Ontologies
4. Mechanisms
for collecting
emerging
language
Eg.
folksonomies,
topic maps

Relevance vs Salience
Relevance
Retrieved results meet the
need expressed in the query.
Retrospective, against defined
needs.
Supports finding.
Salience
Having the quality of standing
out as especially significant.
Has continuing utility, meets as-
yet unexpressed needs.
Supports discovery.

Facets
vs
Carolus
Linnaeus
Georges-Louis
Leclerc, Comte
de Buffon

Facets in R&D
• Disciplines
• Technologies
• Types of R&D
• Social and Economic
Objectives
• Career/Job roles
• Research fields
• Phenomena
• Methods
• Theories
• Facets are orthogonal and
express one attribute of the
target phenomena
• Facets express salient
attributes of the landscape
– difference audiences have
difference saliences
• Used collectively they
create composite
characterizations of the
target phenomena

Taxonomy vs Ontology vs Knowledge Graph
Taxonomy
• BT/NT/RT relationships
• Hierarchical
• Visual navigation
• Human friendly
• Built for known need
Thesaurus
• BT/NT/RT/ USE/UF
relationships
• Dictionary navigation
Ontology
• Maps unlimited types of
relationship between
concepts
• Difficult to build and
maintain - especially for
fluid, imprecise
environments
• Machine friendly not human
friendly
• Built for multiple potential
needs
Knowledge Graph
• Maps salient entities and
relationships
• Can map to content as
well as concepts
• Highly scalable, flexible,
evidence-based, built for
known need

Evidence-Based?
• Taxonomies are supposed to serve objectives
• Success or failure needs testing
• Taxonomies are cognitive representations
• Taxonomies need to be cognitively tested

The NSF NCSES Experience
• The initial problem statement
• Phase 1: Taxonomy Policy Development
• Phase 2: Taxonomy Management Capacity
• Phase 3: Scoping the CORDA

NCSES Background
NCSES is the Federal Statistical Agency
tasked with data collection & analysis
regarding the U.S. science & engineering
enterprise
NCSES administers 12 (and increasing)
major national surveys of science &
engineering activities
New mandate to not only measure S&E
activities but also connect to policy goals
& impact (e.g., national competitiveness)
Initial interest in revising the FOSE
H. R. 5116—26
SEC. 505. NATIONAL CENTER FOR SCIENCE AND ENGINEERING
STATISTICS.
(a) ESTABLISHMENT .—There is established within the
Foundation a National Center for Science and Engineering
Statistics that shall serve as a central Federal clearinghouse
for the collection, interpretation, analysis, and dissemination
of objective data on science, engineering, technology, and
research and development.
(b) DUTIES .—In carrying out subsection (a) of this section, the
Director, acting through the Center shall—
(1) collect, acquire, analyze, report, and disseminate
statistical data related to the science and engineering
enterprise in the United States and other nations that is
relevant and useful to practitioners, researchers,
policymakers, and the public, including statistical data
on—
(A) research and development trends;
(B) the science and engineering workforce;
(C) United States competitiveness in science,
engineering, technology, and research and
development;
and
(D) the condition and progress of United States
STEM education;
(2) support research using the data it collects, and on
methodologies in areas related to the work of the Center;
and
(3) support the education and training of researchers in
the use of large-scale, nationally representative data sets.

NCSES Background
• Is the FOSE taxonomy the
problem?
– Multiple attempts to address
– National Research Council directed
more attention at taxonomy revision
• Interviews showed strong
resistance to a single standard

Scoping the CORDA
(Classification of R&D Activities)
• Identify measurement questions (lit review)
– Map to existing related taxonomies
– Interviews to ID use cases
• Expert workshops on interdisciplinarity &
measurement approaches
– Identify & evaluate candidate facets
– Develop measurement framework
• Experiment with text analytics
– Topic co-clustering vs. topic modeling

CORDA Constraints & Requirements
Emerging fields
Interdisciplinary
topics
Heterogeneous
operational
needs
Reducing
burden
External taxonomies
(esp. international)
Output &
outcome
assessment
Level of
analysis
Accountability
& transparency
Massively
heterogeneous
data landscape
Enabling sense-
making

CORDA: Potential Facets
• Discipline (Dept. of Education)
• Technology (USPTO/IPO)
• Research typology (BAD framework)
• Science applications (broad questions)
• Socio-economic objective (SEO) (OECD,
others)
• Fields of research (various)

Machine-Generated Classification Experiment
Classification by Scientific
Discipline
Classification by Socio-Economic
Objective (SEO)
External
taxonomy
Classification of Instructional
Programs (NCES)
Nomenclature for Analysis &
Comparison of Scientific Programmes
& Budgets (OECD) +
Australia-New Zealand Standard
Research Classification SEO facet
Validation term
set
NSF funding organization Field of application (subset)
Data set with
validation
278,000 awards 143,536 awards
Key caveats
• Combine awarding division &
program to derive discipline
• CIP is an instructional
classification, not a research
classification
• Field of application terms are NOT
standardized, and usage is
inconsistent across awards
• SEO termsets were very sparse

Topic Modeling (NIH example)
http://www.nature.com/nmeth/journal/v8/n6/full/nmeth.1619.html

CORDA “Toolkit”
• Set of relevant facets
– Field of research (discipline-based)
– Research typology
– Technology
– Socio-economic activity
• Language models & documentation for
machine-based classification
• Usage guidance (management)
• Set of data standards

Lessons Learned
• Toolkit model is better than a single mandated standard
– Policy/processes (policy & management guidelines)
– Infrastructure (TMS, machine learning tools)
– Culture (change behaviors & expectations)
• Harmonization is not normalization
– We moved NCSES towards harmonizing without normalizing everything
– Empowered users to be more deliberate in taxonomy choices, and to
collaborate across stovepipes
– Accommodated specialized needs of specific audiences/users
• Describe linkages, relationships, and differences
– Moving from defined facets to knowledge graphs
– Use graphs to trace salient pathways
– Interoperability/mapping is preferable to enforcing uniformity

Implications
• Taxonomy as product vs taxonomy as process
• Standardization of process and infrastructure, not
just standardization of terms
• Level of analysis: what are you classifying and for
what purpose?
• Standards: what are we standardizing?
Standardization as normalization or
harmonization?
• Can we derive “best practices” for successful R&D
taxonomy development?

Discussion
• How could these elements apply to your
challenges?
– Inventory
– Prioritization/focus/purpose
– Policy/guidelines
– Faceted approach
– Taxonomy management/sharing systems

Thank You!
Patrick Lambe
Partner
Straits Knowledge
Tel: +65 62210383
plambe@straitsknowledge.com
Jeffrey Alexander
Associate Director
Center for Science, Technology
& Economic Development
SRI International
Tel +1 703 247 8621
jeffrey.alexander@sri.com

Classifying R&D: Why and How Organizations Develop Taxonomies for Research Fields: Jeff Alexander and Patrick Lambe

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (7)

Similaire à Classifying R&D: Why and How Organizations Develop Taxonomies for Research Fields: Jeff Alexander and Patrick Lambe

Similaire à Classifying R&D: Why and How Organizations Develop Taxonomies for Research Fields: Jeff Alexander and Patrick Lambe (20)

Plus de CASRAI

Plus de CASRAI (20)

Dernier

Dernier (20)

Classifying R&D: Why and How Organizations Develop Taxonomies for Research Fields: Jeff Alexander and Patrick Lambe