KM SHOWCASE 2020 - "Lessons Learned Building a Knowledge Graph" - Chris Marino

LESSONS LEARNED BUILDING A KNOWLEDGE GRAPH
INTER-AMERICAN DEVELOPMENT BANK

HI!
REBECCA WYATT
PRACTICE LEAD
TECHNOLOGY SOLUTIONS
ENTERPRISE KNOWLEDGE
UX RESEARCH
PRODUCT OWNERSHIP
CHANGE MANAGEMENT & TRAINING
@rebeccachilali | @EKConsulting
CONTENT STRATEGY
SEARCH UX

HELLO!
CHRIS MARINO
SENIOR CONSULTANT
SOLUTIONS ARCHITECT
ENTERPRISE KNOWLEDGE
SEMANTIC TECHNOLOGIES
ENTERPRISE SEARCH
KM STRATEGY
@EnterpriseFind | @EKConsulting
CONTENT MANAGEMENT
KNOWLEDGE GRAPHS

HOW DO WE NAVIGATE TODAYS’ INFORMATION
CHALLENGES?

90% of the data and information we have
today was created just in the past two
years.
Most organizations are built to organize and
manage data and information by type and
department or business function.
Over 80% of the content and information
we work with is unstructured.
CONFRONTING TODAY’S INFORMATION MANAGEMENT CHALLENGES
90%
AI is set to be the key source of
transformation, disruption, and competitive
advantage in today’s fast changing economy,
contributing to 45% of total economic gains
by 2030.

KNOWLEDGE ORGANIZATION CONTINUUM
FOLKSONOMY
Free-text tags.
CONTROLLED LIST
List of pre-defined terms.
Improves consistency.
TAXONOMY
Pre-defined terms & synonyms.
Hierarchical relationships.
Improves consistency.
Allows for parent/child content
relationships.
KNOWLEDGE GRAPH
Capture related data. Integration of
structured and unstructured
information. Linked data Store.
Architecture and data models to
enable machine learning (ML) and
other AI capabilities. Drive efficient
and intelligent data and information
management solutions.
ONTOLOGY
Scope notes.
Predefined classes & properties.
Expanded relationship types.
Increased expressiveness.
Semantics. Inference.

THE BASICS:
TAXONOMIES, ONTOLOGIES,
AND KNOWLEDGE GRAPHS

BUSINESS TAXONOMY
tax·on·o·my (tāk-sōn-mē)
n. pl. tax·on·o·mies
1. The classification of organisms in an
ordered system that indicates natural
relationships.
2. The science, laws, or principles of
classification; systematics.
3. Division into ordered groups or categories:
"Scholars have been laboring to develop a
taxonomy of young killers" (Aric Press).
EK’s Definition of Taxonomy
Controlled vocabularies used to describe or characterize
explicit concepts of information, for purposes of capture,
management, and presentation.

BUSINESS ONTOLOGY
A defined data model that describes structured
and unstructured information through:
• entities,
• their properties,
• and the way they relate to one another.
• Ontology is about things, not strings.
• Ontologies model your domain in a machine
and human understandable format.
• Ontologies provide context.
• Effective ontologies require a deep
understanding of the knowledge domain.

ONTOLOGY VS. KNOWLEDGE GRAPH
ontology individual
https://enterprise-knowledge.com/whats-the-difference-between-an-ontology-and-a-knowledge-graph/

GRAPH DATABASE
▪ Consists of triples
▪ concept → relationship → concept
▪ A linked data store that organizes structured
and unstructured information through:
▪ entities,
▪ their properties,
▪ and relationships.
▪ Also known as:
▪ Linked Data Store (LD Store)
▪ Triple Store
▪ “Knowledge Graph”
Subject Predicate Object
Project A hasTitle Title A
Person B isPMOn Project A
Document C isAbout Topic D
Document C isAbout Topic F
Person B IsExpertIn Topic D
… … …

KNOWLEDGE GRAPH
•A knowledge graph: a specialized graph, or network, of the
things we want to describe and how they are related
•It is a semantic model since we want to capture and generate
meaning with the model
“The application of graph processing and graph DBMSs will grow at 100
percent annually through 2022 to continuously accelerate data preparation
and enable more complex and adaptive data science.”
– Gartner’s Top 10 Data and Analytics Technology Trends for 2019
Google’s knowledge graph is a popular
use case

Content & Data
Sources
Business Ontology
Triple Store
Enterprise Knowledge Graph
Person B
Project A
Document C
Person F
Topic D
Topic E
Business Taxonomy
HOW IT ALL FITS TOGETHER

GETTING STARTED WITH KNOWLEDGE GRAPHS

GETTING STARTED
Content & Data
Sources
Business Ontology
Triple Store
Enterprise Knowledge Graph
Person B
Project A
Document C
Person F
Topic D
Topic E
Business Taxonomy

CONNECTING CONTENT
▪ Ontology provides the
relationships
▪ Taxonomy provides the values
▪ Enables Data Analysis on
all content
Taxonomy Content
Ontology
Tag

AUTO-TAGGING
Bess Schrader is a
consultant in the data and
information management
practice at Enterprise
Knowledge, a consulting firm
delivering knowledge and
solutions in an agile manner.
Schrader focuses on semantic
technologies, including
taxonomies, ontologies, and
knowledge graphs.
Taxonomy Content
Tag

AUTO-TAGGING - CONNECTING TAGS
Bess Schrader Bio
Bess Schrader is a
Marino focuses on semantic
knowledge graphs.
Bess Schrader Bio hasTag Bess Schrader
Bess Schrader Bio hasTag Enterprise Knowledge,
LLC
Bess Schrader Bio hasTag Knowledge management
Bess Schrader Bio hasTag Knowledge graphs

AUTO-TAGGING - CONNECTING TAGS
Bess Schrader Bio
Bess Schrader is a
Schrader focuses on semantic
knowledge graphs.
Document
Person
Organization
Topic
about
Person
about
Organization
about Topic
Bess Schrader Bio aboutPerson Bess Schrader
Bess Schrader Bio aboutOrganization Enterprise Knowledge,
LLC
Bess Schrader Bio aboutTopic Knowledge management
Bess Schrader Bio aboutTopic Knowledge graphs

CASE STUDY:
ADVANCED KNOWLEDGE GRAPH IMPLEMENTATIONS

THE INTER-AMERICAN DEVELOPMENT BANK

INTER-AMERICAN DEVELOPMENT BANK
 Works to improve lives in Latin America and the Caribbean through
financial and technical support
 Provides loans, grants, and technical assistance; conducts extensive
research.
 Helps to improve health and education, and advance infrastructure.
 Aims to achieve development in a sustainable, climate-friendly way.

THE GOAL
 Implement Recommendation System that can automatically extract entities
and concepts from content to create semantic data and make it accessible in
a Knowledge Graph
 Deliver relevant content to stakeholders in timely and proactive manner (no
search interface)
 Create single store of diverse set of content (multiple sources) to inform
large, dispersed user base
“We want knowledge to reach out to people!”

DOCUMENT FINGERPRINT
 Handles wide-range of formats
and types
 Word
 PDF
 Web content
 Datasets
Argentina
Biotechnology
Entrepreneurship
Health
Innovation
Scholarship
Tariff
Wage
 Converts individual IDB knowledge
assets into a collection of tags
 Stores tags in Knowledge Graph

RECOMMENDING CONTENT
 Use same process to create “fingerprints” for input
 Look for similar sets of tags in the Knowledge Graph
 Leverage Cosine Similarity, along with PoolParty scores using TF/IDF

POOL PARTY EXTRACTOR
 Provides “auto-tagging” functionalities using PoolParty thesaurus, text-
mining using mix of NLP techniques and statistics-based heuristics
 Uses indexed data structure, “Extraction Model”, of thesaurus to do fast
matching over data
 Returns score where higher score means that the concept is more
relevant to the processed text
 Calculates score using:
 Term frequency, where higher number of occurrence leads to a higher score.
 Text position, where occurrences at beginning of the text produce a higher score

OUR LESSONS
Determine which
sections of
taxonomy fit best
Continue to
manicure Glean info from
the scores
returned from
extraction
process
Improve
extraction by
teaching your
system about
your documents
Allows for
frequent updates
to Knowledge
Graph
CREATE ORGANIC
TAXONOMY
USE APPLICABLE
SECTIONS OF
TAXONOMY
LEVERAGE
EXTRACTION
SCORES
LEVERAGE
CORPUS
DEVELOP
REPEATABLE
INGESTION
PROCESS

ORGANIC TAXONOMY
 Treat your taxonomy like a growing thing (organic).
 Pay attention to common words in your taxonomy
 Handle acronyms appropriately
 Constantly revisit and improve, adding and removing terms

COMMON TERMS IN TAXONOMY
Topic
Bank
Economy
Loan
Money

USE APPLICABLE SECTIONS
• “Secondary Education”
• “Natural Resource”
• ”Digital Economy”
• “Argentina”
• “Colombia”
• “Jamaica”
• “Agricultural Health and
Food Safety”
• ”Reform or Modernization
of the State”
• “Science, Technology
and Innovation Policy
and Institution”
• ”Books”
• “Policy Briefs”
• “Working Papers”
Topics Country Sector Document Type
Focus on “about-ness”!
Tend to be too lengthy and
specific
Describe the document, not
the content

LEVERAGE EXTRACTION SCORES
 Provide insight about
the relevance of tags
Argentina
Brazil
Board of Executive Directors
Biotechnology
Entrepreneurship
Health
Innovation
Research and Development
Scholarship
Structural Unemployment
Tariff
Wage
Argentina (21.0)
Brazil (29.0)
Board of Executive Directors (2.0)
Biotechnology (2.0)
Entrepreneurship (2.0)
Health (4.0)
Innovation (35.0)
Research and Development (9.0)
Scholarship (100.0)
Structural Unemployment (3.0)
Tariff (2.0)
Wage (1.0)
 Allow establishing
thresholds for using
tags in evaluation

USE CORPUS
 Considers more variables than just term frequency and position in
document
 Allows extraction process to understand your content better
 Leverages co-occurrences (which terms appear frequently with other
terms)
Collection of documents that are related to your project's domain and
used to improve entity extraction providing improved scoring of terms
and concepts.

REPEATABLE INGESTION PROCESS
 Create individual process for each data source (allows more granular
control)
 Develop ability to ingest the full set of source content as well as the
delta (only items that have changed recently)
 Know metrics for reindexing content, take time to make very efficient.
 Supports “organic” taxonomy as this provides means to update content
after changes
Challenge is how do you systematically manage and update your Knowledge Graph?

WE’LL BE ANSWERING QUESTIONS NOW
Q A&
THANKS FOR LISTENING
Q & A SESSION
Please feel free to reach out with any
questions. Thank you for your time!

KM SHOWCASE 2020 - "Lessons Learned Building a Knowledge Graph" - Chris Marino

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to KM SHOWCASE 2020 - "Lessons Learned Building a Knowledge Graph" - Chris Marino

Similar to KM SHOWCASE 2020 - "Lessons Learned Building a Knowledge Graph" - Chris Marino (20)

More from KM Institute

More from KM Institute (20)

Recently uploaded

Recently uploaded (20)

KM SHOWCASE 2020 - "Lessons Learned Building a Knowledge Graph" - Chris Marino