2. Why do we need to ‘govern’ a
taxonomy?
✤ Technologies change
✤ R&D advances
✤ Term meanings evolve
✤ Processes are refined
✤ People come and go
3. What are you governing?
✤ The labels for a concept
✤ PI can mean many things; Package Insert = PI
✤ The definitions for a concept
✤ “Healthcare professional” means what, exactly?
✤ The relationships among concepts
✤ Durg x is indicated for Breast Cancer, NSCLC, HRPC, etc
✤ The data elements required for metrics and analysis
4. What are you governing?
✤ The files/MIME-types that represent these models
✤ Excel? XML? RDBMS? Text?
✤ The place of publication / persistence
✤ The choice of format for the models
✤ RDF/S
✤ OWL
✤ SKOS
5. Who should be responsible?
✤ A classification expert (taxonomist or ontologist)
✤ A content management expert
✤ Subject matter experts (SMEs)
✤ Representation from the business groups and IT/IS
6. What is the
workflow?
1
1. Business & Technology Inputs 2
3
2. Environmental Scan 6
3. Design
5
4. Develop 4
5. Test
6. Improve
7. Business & Technology Inputs
✤ What is the problem to be solved?
✤ What questions are you trying to answer?
✤ What technology platforms are available/allowed?
✤ What kinds of integration are required?
✤ What human bandwidth, capital and ad hoc outlays are available?
8. Environmental Scan
✤ New products or processes?
✤ New markets or audiences?
✤ New competitors, partners or business units?
✤ New research or technologies to model?
✤ Changes to other taxonomies, metadata schema or databases?
✤ Requests for changes or additions from users?
✤ Analysis results that warrant a change?
9. Design
✤ Consider the data elements required to answer the questions at hand.
✤ Does the existing schema design still meet needs? Or are new
elements needed?
✤ Are changes needed to existing data elements?
✤ Are new terms needed for the taxonomies?
10. Develop
✤ Use the “CRUD” methodology to keep the machine files up to date
✤ Create - add new terms, with appropriate labels, definitions and
relationships
✤ Read - use the terms, test them in QA and real-world applications
✤ Update - refine any terms, labels, definitions or relationships that
require changes
✤ Delete - terms, labels, definitions or relationships that are incorrect
or dilute the value of the model
11. Test
✤ Test changes before committing them to production!
✤ Use an identical data set to the production set
✤ See how new relationships impact models, reasoning, inferencing
✤ Ask your normal questions, scan for ‘inappropriate’ answers.
12. Improve
✤ Fix anything found in testing that needs testing
✤ Consider the next challenge - the inputs and environmental scan
START AGAIN!
13. When does governance occur?
✤ Ideally, from the beginning of the project
✤ Typically, from the publication of version 1
✤ Regular meetings are critical
✤ Weekly to begin, depending on the scale, can generally move to
monthly as the process and models mature
✤ New and updated systems come online
✤ New products or processes are live
✤ M&A activity occurs
14. Metadata Management System
Permissions
Tagging EE RBC I/R
Human Machine NLP Hybrid
CMS/DAM
CMS/DAM QA
CMS/DAM Processing
Repository
Permissions
Query layer
Rules
Rules
Rules
Rules
Content Verticals
Content Packaging
Multimedia
Mobile
Kbase
Print
Content Delivery
15. Recommendations
✤ Source a classification expert
✤ Identify internal content process and subject matter experts
✤ Identify a metadata management system as a mid-term need, begin planning
for requirements gathering, documenting and RFI/Q development
✤ Select an open source toolset for use until a formal metadata system can be
stood up and integrated
✤ e.g. Top Braid Composer (Free Edition), Knoodl, Protege
✤ Schedule metadata governance meetings; bi-weekly to start, as process is
learned, reduce to monthly
Content - text, images, audio, video - is created in the appropriate system. Those asset management systems are fed metadata schema and taxonomies as needed to classify the assets. \nThen the content objects are sent to the processing platform, which also uses information from the metadata management system to tag, entity extract, rules-based code and infer/reason on the objects to improve the classification.\nThe assets are vetted via a Quality Assurance process and then stored in the repository.\nBusiness rules which take into account the delivery channel and content vertical (e.g. package insert via print vs. case study via the web) use the appropriate queries and permissions to retrieve content from the repository and deliver it.\n