5. HOW DO WE NAVIGATE TODAYS’ INFORMATION
CHALLENGES?
6. 90% of the data and information we have
today was created just in the past two
years.
Most organizations are built to organize and
manage data and information by type and
department or business function.
Over 80% of the content and information
we work with is unstructured.
CONFRONTING TODAY’S INFORMATION MANAGEMENT CHALLENGES
90%
AI is set to be the key source of
transformation, disruption, and competitive
advantage in today’s fast changing economy,
contributing to 45% of total economic gains
by 2030.
7. KNOWLEDGE ORGANIZATION CONTINUUM
FOLKSONOMY
Free-text tags.
CONTROLLED LIST
List of pre-defined terms.
Improves consistency.
TAXONOMY
Pre-defined terms & synonyms.
Hierarchical relationships.
Improves consistency.
Allows for parent/child content
relationships.
KNOWLEDGE GRAPH
Capture related data. Integration of
structured and unstructured
information. Linked data Store.
Architecture and data models to
enable machine learning (ML) and
other AI capabilities. Drive efficient
and intelligent data and information
management solutions.
ONTOLOGY
Scope notes.
Predefined classes & properties.
Expanded relationship types.
Increased expressiveness.
Semantics. Inference.
9. BUSINESS TAXONOMY
tax·on·o·my (tāk-sōn-mē)
n. pl. tax·on·o·mies
1. The classification of organisms in an
ordered system that indicates natural
relationships.
2. The science, laws, or principles of
classification; systematics.
3. Division into ordered groups or categories:
"Scholars have been laboring to develop a
taxonomy of young killers" (Aric Press).
EK’s Definition of Taxonomy
Controlled vocabularies used to describe or characterize
explicit concepts of information, for purposes of capture,
management, and presentation.
10. BUSINESS ONTOLOGY
A defined data model that describes structured
and unstructured information through:
• entities,
• their properties,
• and the way they relate to one another.
• Ontology is about things, not strings.
• Ontologies model your domain in a machine
and human understandable format.
• Ontologies provide context.
• Effective ontologies require a deep
understanding of the knowledge domain.
11. ONTOLOGY VS. KNOWLEDGE GRAPH
ontology individual
https://enterprise-knowledge.com/whats-the-difference-between-an-ontology-and-a-knowledge-graph/
12. GRAPH DATABASE
▪ Consists of triples
▪ concept → relationship → concept
▪ A linked data store that organizes structured
and unstructured information through:
▪ entities,
▪ their properties,
▪ and relationships.
▪ Also known as:
▪ Linked Data Store (LD Store)
▪ Triple Store
▪ “Knowledge Graph”
Subject Predicate Object
Project A hasTitle Title A
Person B isPMOn Project A
Document C isAbout Topic D
Document C isAbout Topic F
Person B IsExpertIn Topic D
… … …
13. KNOWLEDGE GRAPH
•A knowledge graph: a specialized graph, or network, of the
things we want to describe and how they are related
•It is a semantic model since we want to capture and generate
meaning with the model
“The application of graph processing and graph DBMSs will grow at 100
percent annually through 2022 to continuously accelerate data preparation
and enable more complex and adaptive data science.”
– Gartner’s Top 10 Data and Analytics Technology Trends for 2019
Google’s knowledge graph is a popular
use case
14. Content & Data
Sources
Subject Predicate Object
Person B isPMOn Project A
Document C isAbout Topic D
Document C isAbout Topic F
Person B IsExpertIn Topic D
Business Ontology
Triple Store
Enterprise Knowledge Graph
Person B
Project A
Document C
Person F
Topic D
Topic E
Business Taxonomy
HOW IT ALL FITS TOGETHER
16. GETTING STARTED
Content & Data
Sources
Subject Predicate Object
Person B isPMOn Project A
Document C isAbout Topic D
Document C isAbout Topic F
Person B IsExpertIn Topic D
Business Ontology
Triple Store
Enterprise Knowledge Graph
Person B
Project A
Document C
Person F
Topic D
Topic E
Business Taxonomy
17. CONNECTING CONTENT
▪ Ontology provides the
relationships
▪ Taxonomy provides the values
▪ Enables Data Analysis on
all content
Taxonomy Content
Ontology
Tag
18. AUTO-TAGGING
Bess Schrader is a
consultant in the data and
information management
practice at Enterprise
Knowledge, a consulting firm
delivering knowledge and
information management
solutions in an agile manner.
Schrader focuses on semantic
technologies, including
taxonomies, ontologies, and
knowledge graphs.
Taxonomy Content
Tag
19. AUTO-TAGGING - CONNECTING TAGS
Bess Schrader Bio
Bess Schrader is a
consultant in the data and
information management
practice at Enterprise
Knowledge, a consulting firm
delivering knowledge and
information management
solutions in an agile manner.
Marino focuses on semantic
technologies, including
taxonomies, ontologies, and
knowledge graphs.
Subject Predicate Object
Bess Schrader Bio hasTag Bess Schrader
Bess Schrader Bio hasTag Enterprise Knowledge,
LLC
Bess Schrader Bio hasTag Knowledge management
Bess Schrader Bio hasTag Knowledge graphs
20. AUTO-TAGGING - CONNECTING TAGS
Bess Schrader Bio
Bess Schrader is a
consultant in the data and
information management
practice at Enterprise
Knowledge, a consulting firm
delivering knowledge and
information management
solutions in an agile manner.
Schrader focuses on semantic
technologies, including
taxonomies, ontologies, and
knowledge graphs.
Document
Person
Organization
Topic
about
Person
about
Organization
about Topic
Subject Predicate Object
Bess Schrader Bio aboutPerson Bess Schrader
Bess Schrader Bio aboutOrganization Enterprise Knowledge,
LLC
Bess Schrader Bio aboutTopic Knowledge management
Bess Schrader Bio aboutTopic Knowledge graphs
23. INTER-AMERICAN DEVELOPMENT BANK
Works to improve lives in Latin America and the Caribbean through
financial and technical support
Provides loans, grants, and technical assistance; conducts extensive
research.
Helps to improve health and education, and advance infrastructure.
Aims to achieve development in a sustainable, climate-friendly way.
25. THE GOAL
Implement Recommendation System that can automatically extract entities
and concepts from content to create semantic data and make it accessible in
a Knowledge Graph
Deliver relevant content to stakeholders in timely and proactive manner (no
search interface)
Create single store of diverse set of content (multiple sources) to inform
large, dispersed user base
“We want knowledge to reach out to people!”
29. DOCUMENT FINGERPRINT
Handles wide-range of formats
and types
Word
PDF
Web content
Datasets
Argentina
Biotechnology
Entrepreneurship
Health
Innovation
Scholarship
Tariff
Wage
Converts individual IDB knowledge
assets into a collection of tags
Stores tags in Knowledge Graph
30. RECOMMENDING CONTENT
Use same process to create “fingerprints” for input
Look for similar sets of tags in the Knowledge Graph
Leverage Cosine Similarity, along with PoolParty scores using TF/IDF
31. POOL PARTY EXTRACTOR
Provides “auto-tagging” functionalities using PoolParty thesaurus, text-
mining using mix of NLP techniques and statistics-based heuristics
Uses indexed data structure, “Extraction Model”, of thesaurus to do fast
matching over data
Returns score where higher score means that the concept is more
relevant to the processed text
Calculates score using:
Term frequency, where higher number of occurrence leads to a higher score.
Text position, where occurrences at beginning of the text produce a higher score
33. OUR LESSONS
Determine which
sections of
taxonomy fit best
Continue to
manicure Glean info from
the scores
returned from
extraction
process
Improve
extraction by
teaching your
system about
your documents
Allows for
frequent updates
to Knowledge
Graph
CREATE ORGANIC
TAXONOMY
USE APPLICABLE
SECTIONS OF
TAXONOMY
LEVERAGE
EXTRACTION
SCORES
LEVERAGE
CORPUS
DEVELOP
REPEATABLE
INGESTION
PROCESS
34. ORGANIC TAXONOMY
Treat your taxonomy like a growing thing (organic).
Pay attention to common words in your taxonomy
Handle acronyms appropriately
Constantly revisit and improve, adding and removing terms
36. USE APPLICABLE SECTIONS
• “Secondary Education”
• “Natural Resource”
• ”Digital Economy”
• “Argentina”
• “Colombia”
• “Jamaica”
• “Agricultural Health and
Food Safety”
• ”Reform or Modernization
of the State”
• “Science, Technology
and Innovation Policy
and Institution”
• ”Books”
• “Policy Briefs”
• “Working Papers”
Topics Country Sector Document Type
Focus on “about-ness”!
Tend to be too lengthy and
specific
Describe the document, not
the content
37. LEVERAGE EXTRACTION SCORES
Provide insight about
the relevance of tags
Argentina
Brazil
Board of Executive Directors
Biotechnology
Entrepreneurship
Health
Innovation
Research and Development
Scholarship
Structural Unemployment
Tariff
Wage
Argentina (21.0)
Brazil (29.0)
Board of Executive Directors (2.0)
Biotechnology (2.0)
Entrepreneurship (2.0)
Health (4.0)
Innovation (35.0)
Research and Development (9.0)
Scholarship (100.0)
Structural Unemployment (3.0)
Tariff (2.0)
Wage (1.0)
Allow establishing
thresholds for using
tags in evaluation
38. USE CORPUS
Considers more variables than just term frequency and position in
document
Allows extraction process to understand your content better
Leverages co-occurrences (which terms appear frequently with other
terms)
Collection of documents that are related to your project's domain and
used to improve entity extraction providing improved scoring of terms
and concepts.
39. REPEATABLE INGESTION PROCESS
Create individual process for each data source (allows more granular
control)
Develop ability to ingest the full set of source content as well as the
delta (only items that have changed recently)
Know metrics for reindexing content, take time to make very efficient.
Supports “organic” taxonomy as this provides means to update content
after changes
Challenge is how do you systematically manage and update your Knowledge Graph?
40. WE’LL BE ANSWERING QUESTIONS NOW
Q A&
THANKS FOR LISTENING
Q & A SESSION
Please feel free to reach out with any
questions. Thank you for your time!