PyData Conference: Knowledge-infused Learning for Healthcare

Knowledge-infused Learning in Healthcare
Manas Gaur
https://manasgaur.github.io
mgaur@email.sc.edu
Artificial Intelligence
Institute
Sincere thanks to Noun Project for making their icons freely available. They have been used while making this presentation

Outline
● Motivation for Knowledge-infused Learning (K-IL)
○ Deﬁnition
○ How do we use Knowledge Graphs
○ Mathematical background
○ Types of K-IL
○ Models, Evaluation, and Applications
● Knowledge-infused Learning for Healthcare
○ Challenges
○ Web-based Intervention (Reddit → DSM-5)

Definition: Knowledge Graphs
Knowledge Graph (KG) is a structured knowledge
in a graphical representation.
Enhanced semantic applications such as search,
browsing, personalization, recommendation,
advertisement, summarization.
Problems: Data Sparsity and Ambiguity
Different forms:
● Ontology : Knowledge graph after human
curation of entities and relations
● Knowledge Base: flattened graph
● Lexicons: Small application-specific
flattened graph
Examples of General Purpose Knowledge Graphs*
1. DBpedia
2. Yago
3. Freebase
4. ConceptNet
5. Knowledge Vault
6. NELL
7. Wikidata
Example of Healthcare-specific Knowledge Graphs
1. SNOMED-CT
2. Unified Medical Language System (UMLS)
3. DataMed
4. International Classification of Diseases (ICD-10)
5. Rx-NORM
6. DrugBank
7. Drug Abuse Ontology
8. Medical Dictionary for Regulatory Activities
http://www-sop.inria.fr/members/Freddy.Lecue/presentation/ISWC2019-FreddyLec
ue-Thales-OnTheRoleOfKnowledgeGraphsInExplainableAI.pdf
https://datamed.org/APIDoc.php
Cameron, Delroy, Gary A. Smith, Raminta Daniulaityte, Amit P. Sheth, Drashti Dave, Lu Chen, Gaurish Anand,
Robert Carlson, Kera Z. Watkins, and Russel Falck. "PREDOSE: a semantic web platform for drug abuse
epidemiology using social media." Journal of biomedical informatics, 2013.

Examples
Commonsense Reasoning Graph Drug Abuse Ontology
Event Ontology
Crisis Ontology

Personalization: taking into account
the contextual factors such as
user’s health history, physical
characteristics, environmental
factors, activity, and lifestyle.
Chatbot with contextualized (asthma) knowledge is
potentially more personalized and engaging.
Without
Contextualized Personalization
With
Contextualized Personalization
How do we use Knowledge Graphs?

Assessing Mental Health Impact of COVID using News Articles
https://theconversation.com/were-measuring-online-conversation-to-track-the-social-and-mental-health-issues-surfacing-during-the-coronavirus-pandemic-135417
Multilingual KG
http://conceptnet.io/
GDelt Database
https://www.gdeltproject.org/

Semantic
Proximity
GBV Index
GBV estimation
for 14 days
GBV Lexicon from
Tweets on bullying,
abuse. Domestic
violence, etc.
Mapping words to
categories for
expansion of lexicon
Generic Knowledge
Graph of Wikipedia
Aligning the lexicon
words and new
entities with respect
to DBpedia
Categories
Enriched Lexicon for
gathering abstract meaning
of GBV in tweets
Calculating cosine similarity
between two vectors (GBV
and Tweets) and setting
empirical threshold on
semantic proximity
4 Weeks of Mental Health
Tweets From March 14-April 04
Analyzing Gender-based Violence (GBV) in Mental Health COVID-19 Twitter Conversation
Maximum A Posteriori
Estimation (MAP)
Purohit, Hemant, Tanvi Banerjee, Andrew Hampton, Valerie L. Shalin, Nayanesh Bhandutia, and Amit P. Sheth. "Gender-based violence in 140 characters or fewer: A# BigData case study of Twitter." arXiv
preprint arXiv:1503.02086 (2015).

Deﬁnition:Knowledge-infused Learning (K-IL)
K-IL: “The exploitation of domain knowledge and application semantics
to enhance existing deep learning methods by infusing relevant conceptual
information into a statistical, data-driven computational approach
(Neuro-Symbolic AI).”
A. Sheth, M. Gaur, U. Kursuncu and R. Wickramarachchi, "Shades of Knowledge-Infused Learning for Enhancing Deep Learning," in IEEE
Internet Computing, vol. 23, no. 6, pp. 54-63, 1 Nov.-Dec. 2019, doi: 10.1109/MIC.2019.2960071.

Valiant, Leslie G. "Robust logics." Artificial Intelligence 117.2 (2000): 231-253.
K-IL: Probably Approximately Correct Learning

How do you know that a training set has a
good domain coverage?
Robust Classifier → Low Generalizability Error
Consistent Classifier → Low Training Error
Confidence: More Certainty (lower δ)
means more number of samples.
Complexity: More complicated
hypothesis (|H|) means more number
of samples
PAC Learning

Challenge:
Existing ML Models:
Infusion:
True Data
Distribution
Hypothesis Data
Distribution
Deﬁnition: Knowledge Infusion

Dataset
enrich
Deep Learning
Model
Tacit
Knowledge
Hypothesis testing
or similarity-based
veriﬁcation
Shallow Infusion
Tacit
Knowledge
Self-aware or
External Knowledge
Self-aware or
External Knowledge
Similarity
based
veriﬁcation
Semi-Deep Infusion
Dataset
Types of K-IL
Deep Learning
Model

K-IL: Shallow Infusion
Sheth, Amit, Manas Gaur, Ugur Kurşuncu, and Ruwan Wickramarachchi. "Shades of knowledge-infused learning for enhancing deep learning."
IEEE Internet Computing 23, no. 6 (2019): 54-63.

K-IL Semi-Deep Infusion
Sheth, Amit, Manas Gaur, Ugur Kurşuncu, and Ruwan Wickramarachchi. "Shades of knowledge-infused learning for enhancing deep learning."
IEEE Internet Computing 23, no. 6 (2019): 54-63.

Types of K-IL
Deep Infusion (Vision)

K-IL : Models
Long Short Term Memory
Variants:
1. Knowledge base at each LSTM cell [1].
2. K-IL layer:
a. 1D Convolutional Neural Network for mixing
b. Graph Convolutional Neural Network -- When
hierarchical structure of KG is important and
need to be preserved in representation.
c. Simple Multi-layer Perceptron [2]
[1] Yang, Bishan, and Tom Mitchell. Leveraging knowledge bases in lstms for improving machine reading. arXiv preprint arXiv:1902.09091 (2019).
[2] Kursuncu, Ugur, Manas Gaur, and Amit Sheth. Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning.
arXiv preprint arXiv:1912.00512 (2019).

K-IL : Models
Generative Adversarial Network*
*Chang, Che-Han, Chun-Hsien Yu, Szu-Ying Chen, and Edward Y. Chang. "KG-GAN: Knowledge-Guided Generative Adversarial Networks."
arXiv preprint arXiv:1905.12261 (2019).
Seen Category
Data
UnSeen
Category Data
Generator
(G1
)
Generator
(G2
)
Z1
Z2
Real Data
Fake Data
(G1
)
Fake Data
(G2
)
Discriminator
(D)
Embedding
Regression
Network
Semantic
Embedding of
Unseen
Category
Prediction
(G2
)
Prediction
(G1
)
≅
Parameter
Sharing
Loss
(G1
)
Loss
(G2
)
Real or Fake
Objective
Function

K-IL : Objective Functions and Evaluation
Kullback Leibler Divergence
● Measures the Information loss during
the learning phase between
Latent/hidden states and KGs
● KG Embeddings: TransE, HoIE etc.
● Models: Variational Autoencoders,
LSTMs, GANs, Siamese Neural
Networks
● Frameworks: Zero Shot Learning ,
One Shot Learning, Transfer Learning,
Parameter Sharing
● Other Variants: Jensen Divergence,
Regularization, Integer Linear
Programming
Kosheleva, Olga, and Vladik Kreinovich. "Why deep learning methods use KL divergence instead of least squares: a possible pedagogical explanation." Математические структуры и
моделирование 2 (46) (2018).
Evaluation: Before and After
Knowledge-infusion
Methods (Apart from Precision, Recall, F1-score):
● Frechet Inception Distance : measure of similarity
between two datasets (KG & Training Data)
● Statistical Signiﬁcance Hypothesis Testing
● Word and Concept Features
● T-SNE Visualization of Clusters
● Area under perturbation curve:
Feature Ranking
● Human-centric evaluation: Crowdsourcing,
User Satisfaction, Mental Model, Trust
Assessment, Correctability
OF
EV
http://www-sop.inria.fr/members/Freddy.Lecue/presentation/ISWC2019-FreddyLecue-Thales-OnTheRoleOfK
nowledgeGraphsInExplainableAI.pdf

Utility of K-IL: Applications
Summarization of Clinical Diagnostic Interviews
Faruqui, Manaal, Jesse Dodge, Sujay K. Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. "Retrofitting word vectors to semantic lexicons." arXiv preprint arXiv:1411.4166 (2014).
Gaur, Manas, et al. "" Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." 27th ACM CIKM 2018..

BERT
Abstractive
Summarization using
Integer Linear
Programming (ILP)
Abstractive Summarization
using ILP and PHQ-9
Statistical Statistical + Constraints
Statistical + Constraints
+ Knowledge
Summarization of Clinical Diagnostic Interviews

SuicideWatch Subreddit
(93K Users)
NYC CDRN EHR (123K patients) Data specific to
Mental Health
Medical Knowledge Bases
Association between Social Media and EHR in Suicide-related
Communications
We identiﬁed self-harm, depressive feelings, and suicide ideations as
latent topics expressed in Reddit and EHR data.
Both sources did not provide evidence of mentions or expressions of
impulsivity, family violence, and drug abuse.

K-IL for Healthcare
Challenges
Classiﬁcation of Reddit posts to DSM-5 (Reddit → DSM-5)

K-IL: 2 Challenges in Healthcare
ContextualizationContextualization
User-level, across different sources (forums,
subreddits) where user has posted
I dont think I have thought
about it every day of my entire
life. I have for a good portion of
it however my boyfriend may be
able to determine whether I’m
worth his time
S5 I dont think I have thought
about it every day of my entire
life. I have for a good portion
of it, however, my boyfriend
may be able to determine
whether I’m worth his time
S5
Having a plan for my own
suicide has been a long time
relief for me as well. I more
often than not wish I were
dead.
S8
Predicted label: Suicide Indication Predicted label: Suicide Ideation

K-IL: 2 Challenges in Healthcare
Contextualization and Abstraction
User’s
original
posts
I have found myself mired in a
similar situation as your boyfriend -
addicted to the internet. It sounds
like he its hurting a lot and needs
your help in changing his habits
I have found myself mired in a similar
situation as your boyfriend - Drug
abuse to the internet. It sounds
Hyperactive behavior he its
Depressed mood a lot and needs your
help in changing his habits.
SW
SSH
SLH
BPD
DPR
ADD
SCZ
BPR
The transient posting of potential suicidal users in other subreddits, requires careful consideration to appropriately predict their suicidality.
Hence, we analyze their content by harnessing their network and bringing their content if it overlaps with other users within SW. We found, Stop
Self Harm (SSH) > Self Harm (SLH) > Bipolar (BPR) > Borderline Personality Disorder (BPD) > Schizophrenia (SCZ) > Depression (DPR) >
Addiction (ADD) > Anxiety (ANX) to be most active subreddits for suicidal users. After aggregating their content, we perform MedNorm using
Lexicons to generate clinically abstracted content for effective assessment.
DSM-5 SNOMED-CT
ICD-10DataMed
Drug Abuse
Ontology
TwADR AskaPatient
Mental Health and Drug Abuse
Knowledge Base
Clinically
Abstracted
User’s Posts
ANX
1.0
0.66
0.40
0.40
0.44
0.39
0.30
0.34
SW: SuicideWatch subreddit

K-IL: Social Media Data to EHR Data
TwADR
AskaPatient
Drug Abuse
Ontology
DSM-5 Lexicon
Suicide Risk
Severity Lexicon
Treatment
Information
Observation and
Drug-related
Information
Mental Health Condition
Suicide Risk Levels
Ideation
Behavior
Attempt

Reddit → DSM-5
Task
I know you want me to say no and that it is a
part of me blah blah blah. But I can't. Honestly,
not having bipolar disorder would be a huge
blessing. I would be so much happier and
could control my life better. I wouldn't have
frantic, scattered thoughts and depression. I
would be normal, happy, and less dramatic.
Bipolar Subreddit
DSM-5: Depressive Disorder
I know you want me to say no and that it is a
part of me blah blah blah. But I can't. Honestly,
not having bipolar disorder would be a huge
blessing. I would be so much happier and
could control my life better. I wouldn't have
frantic, scattered thoughts and depression. I
would be normal, happy, and less dramatic.
BiPolar
Depression
Disorder
Subreddits DSM-5
Chapter
BiPolarReddit
BiPolarSOS
Depression
Addiction
Substance use &
Addictive Disorder
Crippling Alcoholism
Opiates Recovery
Opiates
Self-Harm
Stop Self-Harm

Main Post
Comment
Reply
Subreddit
Reddit
2005-2016
550K Users
8 Million
Conversations
15 Mental Health
Subreddits

Reddit → DSM-5
Mapping
Medical Knowledge Bases
N-grams
(n=1, 2, 3)
LDA
LDA over
Bi-grams
Normalized
Hit
Score
DSM-5
Lexicon
<Reddit Post>
<Subreddit Label>
Input
<Reddit Post>
<DSM-5 Label>
Output
DAO
Drug Abuse
Ontology

SEDO
Semantic Encoding and Decoding Optimization. It is a
procedure to modulate word embedding (vectors) of a word.
Reddit with
DSM-5 labels
Word
Embedding
Model
Correlation Matrix
(Q)over word vectors
Medical
Knowledge Bases
Domain
Experts
Correlation Matrix
(P)
over DSM-5 Lexicon
or DAO
SEDO
Optimize
P, Q & Z
DSM-5 Lexicon
DSM-5
Vocabulary
Matrix
Word-modulated
Word Embeddings
DSM-5
Classiﬁcation
Cross Correlation
Matrix (Z)
between word
vectors and DSM-5
Lexicon or DAO
Linguistic
Features
DAO
Reddit → DSM-5
Workﬂow

12808
Words
300 dimension embedding 300 dimension embedding
20 DSM-5
Categories
R
D
Reddit Word
Embedding Model
DSM-5 -DAO
Lexicon
W
Solvable Sylvester Equation
Reddit → DSM-5
Semantic Encoding and Decoding Optimization

Reddit → DSM-5
Encoding DSM-5 to Reddit embedding space
Decoding Reddit to DSM-5 embedding space
Semantic Encoding and Decoding Optimization

Domain-specific
Knowledge lowers
False Alarm Rates.
Reddit → DSM-5
Outcome

Resources
TwADR and
AskaPatient
Lexicon
https://zenodo.org/record/55013#.XsYEH8YpBQI
Ref: Limsopatham, Nut, and Nigel Collier. "Normalising medical concepts in social media texts by learning
semantic representation." Association for Computational Linguistics, 2016.
Suicide-Risk
Severity
Lexicon
https://bit.ly/SRS_lexicon
Ref: Gaur, Manas, Amanuel Alambo, Joy Prakash Sain, Ugur Kurşuncu, Krishnaprasad Thirunarayan, Ramakanth
Kavuluru, Amit Sheth, Randy Welton, and Jyotishman Pathak. "Knowledge-aware assessment of severity of suicide
risk for early intervention." In The World Wide Web Conference, 2019.
DSM-5 and Drug
Abuse Ontology
Lexicon
https://bit.ly/DSM5_DAO
Ref: Gaur, Manas, Ugur Kurşuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan,
and Jyotishman Pathak. "" Let Me Tell You About Your Mental Health!" Contextualized Classiﬁcation of Reddit Posts
to DSM-5 for Web-based Intervention." In Proceedings of the 27th ACM International Conference on Information and
Knowledge Management, 2018.
Suicide Risk
Severity Dataset
(Reddit)
https://zenodo.org/record/2667859#.XsYH7MYpBQI
Ref: Gaur, Manas, Amanuel Alambo, Joy Prakash Sain, Ugur Kurşuncu, Krishnaprasad Thirunarayan, Ramakanth
Kavuluru, Amit Sheth, Randy Welton, and Jyotishman Pathak. "Knowledge-aware assessment of severity of suicide
risk for early intervention." In The World Wide Web Conference, 2019.

K-IL: Where are we? And What’s happening?
● Current research on fusing background knowledge
and deep learning focuses on:
○ Shallow Infusion
○ Semi-Deep Infusion
● Explainable AI in healthcare fall short in the
involvement of Medical Knowledge graphs
● In Intelligent Virtual Assistants:
○ User engagement is a huge challenge
○ Requires Personalized Health Knowledge
Graph
○ Motivational Interviewing: Open Question,
Reﬂective Listening, and Summary
● K-IL Healthcare + X
○ Autonomous Driving Vehicle [1]
○ Cyber Social Threats [2]
○ Disaster Resilience System
○ Personal Finance
https://www.gartner.com/en/research/methodologies/gartner-hype-cycle
[1] Wickramarachchi, Ruwan, Cory Henson, and Amit Sheth. "An evaluation of knowledge graph embeddings
for autonomous driving data: Experience and practice." arXiv preprint arXiv:2003.00344 (2020).
[2] Kursuncu, Ugur, Manas Gaur, Carlos Castillo, Amanuel Alambo, Krishnaprasad Thirunarayan, Valerie Shalin,
Dilshod Achilov, I. Budak Arpinar, and Amit Sheth. "Modeling Islamist Extremist Communications on Social Media
using Contextual Dimensions: Religion, Ideology, and Hate." Proceedings of the ACM on Human-Computer
Interaction, CSCW (2019)

Acknowledgement
Dr. Amit P. Sheth Dr. Thirunarayan
Krishnaprasad
Dr. Valerie L. Shalin Dr.Jyotishman
Pathak
Dr.Ugur Kurşuncu

http://kiml2020.aiisc.ai/ http://aiisc.ai/

PyData Conference: Knowledge-infused Learning for Healthcare

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to PyData Conference: Knowledge-infused Learning for Healthcare

Similar to PyData Conference: Knowledge-infused Learning for Healthcare (20)

Recently uploaded

Recently uploaded (20)

PyData Conference: Knowledge-infused Learning for Healthcare