A quick review of Kno.e.sis’ research subset on knowledge-enhanced learning with on personal and public health, wellbeing and social good applications.
1. A quick review of Kno.e.sis’ research with an emphasis on
knowledge-enhanced learning
with impact on health and social good
Why? What? How?
Feb 2019 Snapshot
2. Dynamic Knowledge Graph - Temporal Information Retrieval
Search for event-relevant information on the web is prone to incorrect or incomplete or
stale information. Inferring temporal information associated with events and related
assertions can significantly improve the quality of Q/A on the Web. Hence, there is a
need to identify and maintain temporally changing information to analyze complex
temporal dynamics and interactions of entities during a series of evolving events.
WHY ?
Swati Padhee
swati@knoesis.org
WikiPedia
U S Senator U S Presidential Candidate
Semi-Structured KG
source
Unstructured real-time
Knowledge source
Kamala Harris
FIFA World Cup
…. ….
Followed by
4 years
Champion ?
Host Country
France
Russia ?
After
4 years
Politics
Sports
Complex Temporal QA/Conversations:
Who is the Champion of FIFA World Cup
in Jan 2019?
Temporal Knowledge Validity: Who will be
the President of United States in May
2021?
APPLICATIONS
We rely on reasoning over unstructured and structured Knowledge
Graphs (KGs). However, most traditional KGs capture static multi-
relational data. Effectively capturing the temporal dependencies across
knowledge sources can help improve the understanding on complex
temporal dynamics of entities and their evolution over time.
WHAT ?
We define two problems:
(1)Automatically extracting and predicting patterns for a class of recurrent
periodic events (e.g. FIFA World Cup).
(2)Inferring temporal knowledge for non-periodic events (e.g. disasters)
from real-time multimodal data to create evolving Dynamic Knowledge
Graph.
We rely on combining text mining approaches with machine learning
using knowledge from: (1) hierarchical and non-hierarchical relationships
in KGs, (2) unstructured textual event-specific information, and
(3) semi-structured collaborative KGs.
HOW ?
3. Personalized Healthcare Knowledge Graph (PHKG)
➢ AI will provide additional patient support (e.g., for
continuous/remote monitoring/consult).
➢ Existing health applications enable data
visualization, humans must interpret such data.
➢ Humans are overwhelmed by patient generated
health data and voluminous search results.
➢ Continuously convert diverse patient’s health
related data into health related concepts
(abstractions) as PHKGs.
➢ Interprets health data to build smarter health
applications (e.g., recommendations) and
conversational systems (e.g., chatbots).
➢ Encoding domain expertise and patient data to be
processable by machines using Semantic Web
technologies (RDF, RDFS, OWL, SPARQL).
➢ Use knowledge-graph enhanced learning to
interpret health data within the patient’s health
context and PKHG (derive abstractions through
personalized and contextualized data processing).
WHY
WHAT
HOW
Amelie Gyrard
amelie@knoesis.org
kHealth project
Amelie Gyrard, Manas Gaur, Saeedeh Shekarpour, Krishnaprasad Thirunarayan, Amit Sheth.
Personalized Health Knowledge Graph. Contextualized Knowledge Graph (CKG) Workshop, International Semantic Web Conference (ISWC) 2018
4. WHAT ?
Modeling the automatic medical codes assignment as an Extreme
Multi-label Classification (XMC) problem and provide auxiliary
supervision through external knowledge to achieve precise and reliable
code assignment.
Medical Codes Prediction
by infusing knowledge with Deep Learning
WHY ?
Medical codes assignment on clinical notes is an important step in
record keeping, structuring in healthcare systems and insurance
claiming. Due to i) the challenges in understanding medical language,
ii) the extensive use of medical jargons, drugs and procedure names in
clinical notes, and iii) the huge label space, manual annotation has
become a labor and time intensive, error-prone and a difficult task.
HOW ?
We use the most-widely cited MIMIC-III clinical notes dataset with
~59k hospital stays of ~48k patients. Our objective is to use the state-
of-the-art Deep Learning frameworks coupled with external medical
knowledge such as UMLS, SNOMED to predict all the relevant ICD codes
from a huge 9k label space.
Ruwan Wickramarachchi
ruwan@knoesis.org
5. SMART Chatbots for Enhanced Health
Using Multisensory Sensing & Semantic-Cognitive-Perceptual
Computing for Augmented Personalized Health
Understanding and managing health is both complex and costly and we
have relied on clinicians for most health-related decision making throughout
the last few decades of modern medicine. Additionally, the challenge to
capture different modalities and making sense of health data requires
advances in contemporary knowledge-based processing approaches.
Smart chatbots provide superior way to collect data and interact with
intelligent/AI based systems to empower patients to better manage their
health. Augmented Personalized Health provides increasingly sophisticated
and comprehensive options for health management from self-monitoring,
self-appraisal, self-management, intervention, to predictions, without
overburdening clinicians.
AI-based intelligent (semantic/cognitive/perceptual) computing converts
diverse and continuously collected health data (esp. PGHD, IoT/sensors,
environmental, clinical) and medical knowledge into highly personalized and
contextual abstractions that enable deeper insights into health conditions
and/or timely actions leading to improved health outcomes. Chatbots
utilizing conversational AI (ie. reinforcement learning) provide superior
human-computer interface.
WHAT
WHY
HOW
http://bit.ly/Humanlike-Chatbots http://bit.ly/Smart-Chatbots
Joey Yip
joey@knoesis.org
6. Causal Inference Analysis in Pediatric Asthma Patients
➢ Each pediatric asthma patient is DIFFERENT and thus, understanding of their
personalized symptoms and treatment is needed
➢ LIMITED DATA due to episodic visits
➢ TIME CONSTRAINT during clinical visits, significant information seeking time is
required on every clinical visit
➢ Comprehending clinical notes which contains only text is difficult
Continuous monitoring of pediatric asthma patient’s health signals (such as sleep
pattern, daily activity, symptoms, potential environmental triggers and medication
compliance) using sensors to get personalized information which can be incorporated by
the clinician in the care protocol for timely intervention, better management of the
disease and adherence of the patient towards their care protocol.
Development of a personalized causal inference analysis using
probabilistic graphical models for abstracting actionable information
from the vast amount of multimodal health signals to understand
relationship between asthma symptoms, potential triggers,
medication compliance, adherence towards the care protocol and
suggest precautionary measures to avoid future exposure to triggers
leading to worsening of the disease control. Utkarshani Jaimini
utkarshani@knoesis.org
WHY
WHAT
HOW
7. Fusing Visual, Textual and Connectivity
Clues for Studying Mental Health
Clinical Depression is one the most common mental illness that affects
350m people and has $42 billion annua; cost. On a third of those who
suffer from depression receive treatment. Traditional survey-based
methods via phone or online questionnaires suffer from under-
representation, sampling bias, and temporal gaps.
Social media platforms are a valuable resource for learning about users’
feelings, and behaviors that reflect their mental health as they are
experiencing the ups and downs. Emotional state from visual/textual content,
users’ desire to socialize and connect with others can be a proxy for our online
persona. Gleaning social signals by modeling user-generated content in social
media, we can emulate traditional observational cohort studies conducted
through online questionnaires.
1) How textual and visual content in social media can be harnessed to reliably
capture clinical depression symptoms of a user over time?
2) How does the choice of profile picture show any psychological traits of
depressed online persona? Are they reliable enough to represent the
demographic information such as age and gender?
3) How do we generalize them to infer population-level attitudes towards care?
4) How well does geographical information gleaned from depressed
individuals over social media can serve as the basis for effective community-
level management of depression by studying patterns of access, utilization,
and location of mental health services?
WHAT
HOW
Thinking about
hanging myself ... I
just don't want to
wake up tomorrow
morning.
I feel like a
failure
Amir Yazdavar
yazdavar@gmail.com
WHY
8. 81. Gaur, Manas, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, and Jyotishman Pathak. "Let Me Tell You About Your
Mental Health!: Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." In Proceedings of the 27th ACM CIKM 2018.
Classification of Reddit Content to DSM-5 for Web-based Intervention
● Precise diagnosis of mental health condition can enable effective
remedial actions.
● However, there is no such framework that associate a user's condition to
a diagnostic category that assist a Mental Health Professional in giving
precise advice and improve quality of care.
● The use of historical information and personalization of care can
accrue economic benefits and improved quality of care by reducing
number of visits to the mental health clinic.
Patient
ClinicianEMR
Insight
DSM-5 & Drug Abuse Ontology
Improved
Healthcare
Provide clinicians insights on their patients
● To address concerns on Mental health, we developed a novel framework that complements the existing healthcare
system.
● We leverage 11 years (2005-2016) of 8 Million conversations from 600K users on Reddit and DSM -5 Knowledge
hierarchy together with SNOMED-CT, ICD-10, and UMLS to develop an AI solution that matches user's content to
suitable Mental Health specialization.
Why
What
9. Outcomes & Insights
9
● In order to operationalize the goal, we propose an
approach to map the content to more rigorously
defined DSM-5 categories using Coherence-LDA and
Semantic NLP.
● For reducing the false alarm rate, we developed a novel
Zero-Shot Learning inspired Semantic Encoding and
Decoding Optimization (SEDO) approach that generate
semantic word vectors using a medical knowledge
hierarchy.
● SEDO improves the classification by decreasing false
alarm rate by 92% without changing the classifier
● The result of our framework was evaluated by
practicing clinical psychiatrist, reaching an
agreement of 84%.
● As a broader impact:
○ Our approach is unsupervised
○ Relies on Domain-Specific Knowledge Hierarchy
to generate Semantic Word Vectors.
How
Manas Gaur
manas@knoesis.org
10. Reason and Effect relationship extraction
between Cannabis use and Depression
Why: Upon the increasing efforts on legalization of cannabis use in US,
the relationship of cannabis use with depression remains ambiguous
as to whether cannabis use is a reason for depression or it is a
subsequent effect. Automatic identification of its relationship with
depression in big social data would enable public health analysts to
gain insights on these relationships and their prevalence in public
health.
What: Automatically extract the relationships of reason and effect
between cannabis and depression building a deep learning framework
enhanced with domain-specific knowledge.
How: Employ Drug-Abuse Ontology (DAO), to enhance representations of
the entities in tweets. We propose a top-down technique of using
DAO to semantically augment the deep learning framework (CNN) in
the form of entity position-aware attention.
Usha Lokala
usha@knoesis.org
11. User Modeling in Marijuana-related Communications
11
● Public opinion on marijuana-related legalization efforts in U.S. needs to be
assessed based on only personal views of individuals who actually vote in the
elections, since media and retail accounts on social media represent institutional
views that influence personal views.
● Hence, proper separation of Personal accounts from Media and Retail accounts
in Marijuana communications is necessary for accurate measurement of public
opinion and the influence of media and retail accounts over personal accounts
through social media.
WHY
● We model users based on characteristics in their profile (people), content and
network interactions, which we call as views, by incorporating multimodal
data through Compositional Multiview Embeddings for coherent and unified
representations.
● These user types show distinct characteristics in volume, network
interactions, the use of domain-specific concepts and different modalities in
content. As each of these different facets of information contributes to the
meaning, their contribution to the representation of a user will differ as well.
WHA
T
Ugur Kursuncu
ugur@knoesis.org
12. User Modeling in Marijuana-related Communications
12
● Generate coherent and unified representations of users
through Compositional Multiview Embeddings based on the
views of People, Content, Network (PCN).
● Domain-specific embedding models are developed for each
view based on user descriptions (people), tweets (content) and
network interactions (network) from a corpus of ~1M users.
● Multimodal data is incorporated in each view translating the
data in each form (Text, Image, Emoji, Network Interactions) into
a uniform representation through state-of-the-art techniques
such as EmojiNet.
● For each view, multimodal elements are as follows:
● People: user description, emoji, profile pictures.
● Content: text, emoji
● Network: interactions with other users: retweets and
mentions.
HOW
🏈😉
😸🍔
🎈🎨
Multimodal
Data
Incorporation
Composition
of Multiview
Embeddings
Generate
Representation
of users based
on PCN
Personal Retail Media
13. eDarkTrends: Trend Analysis of
Opioid on Cryptomarkets
Why As opioid overdose death rates has been climbing in US in recent
years, Dark-Web has become an essential venue where individuals can find
illicit synthetic opioid products. Hence, monitoring of the products on these
platforms will provide invaluable insights to the analysts on its prevalence
in the society with respect to the overdose incidents.
What Developed a semi-automated system, eDarkTrends, to collect and
process data about illicit synthetic opioids supplied on cryptomarkets. We
perform trend analysis for US through information extraction: price, purity,
dosage, quantity, and drug combinations. We also identify new illicit
synthetic opioid substances and product forms, soon after they appear on
cryptomarkets.
How Drug Abuse Ontology (DAO) is utilized for information extraction, as the
DAO contains informal terms being used on cryptomarkets. Domain-
specific named-entity recognition is performed. Machine learning models
coupled with the DAO, is employed to identify new entities (e.g. opioids) with
validation by domain experts. We perform correlation analysis for trend
estimation.
Usha Lokala
usha@knoesis.org
14. 14Manas Gaur, Amanuel Alambo, Joy Prakash Sain, Ugur Kurşuncu, Krishnaprasad Thirunarayan, Ramakanth Kavuluru, Amit Sheth, Randon S. Welton and Jyotishman Pathak.
"Knowledge-aware Assessment of Severity of Suicide Risk for Early Intervention". The Web Conference 2019. San Francisco, California.
Knowledge-aware Assessment of Severity of Suicide Risk for Early Intervention
Knoesis wiki for Modeling Social Behavior for Healthcare
Utilization in Depression
● Mental health illness such as depression is a significant
risk factor for suicidal ideation and behaviors, including
suicide attempts.
● A quantitative assessment of suicide risk for informed
timely clinical decision-making and early intervention.
● Prior research involving surveys and questionnaires (e.g.,
PHQ-9) for suicide risk prediction failed to provide a
quantitative assessment of risk that informed timely
clinical decision-making for intervention.
● Our interdisciplinary study concerns the use of Reddit as
an unobtrusive data source for gleaning information
about suicidal tendencies afflicting depressed users
using C-SSRS.
*C-SSRS Columbia Suicide Severity Rating Scale
Progression of users through severity levels of suicide
risk
Why
15. Outcomes & Insights
15
● Initially, the framework performs medical entity
normalization using suicide risk severity lexicon
to abstract the content into clinically
understandable language.
● Through convolution operation in the neural
network, the model identifies discriminative
features useful to perform ordinal classification
into one of five labels.
● The deep learning framework was to able to
reduce the probability of misclassification by
12.5 %.
● Importantly, our approach distinguishes people
who are supportive, from those who show
different severity of suicide.
How
● Develop a knowledge-aware deep
learning framework with perceived risk
measure for predicting the severity of
suicide risk individuals.
● Build domain expert (Mental Health
Professional)-curated suicide risk
severity lexicon with following suicide risk
severity levels as classes: supportive,
indicator, ideation, behavior, attempt.
● Demonstrate the efficiency of the
framework on a gold standard dataset of
500 Redditors obtained from practicing
clinical psychiatrists.
What
Manas Gaur
manas@knoesis.org
16. Assessing Severity of Health States
in Social Media Posts
Around 80% of Internet users in the US explore health-related
topics in online health communities [Pew]: 63% look for
information about a specific medical problem, and 47% look for
the medical treatment or procedure. Understanding the degree of
severity on health forums can (i) assist the human moderators for
providing timely response and interventions, and (ii) support
pharmacovigilance studies for identifying adverse drug reactions.
We develop an advanced understanding of the severity of patients’
health state from social media posts based on medical condition
(e.g., exist, recover, deteriorate) and the outcome of treatment or
medication (e.g., effective, ineffective, and serious adverse effect).
Our multifaceted framework leverages several aspects of Natural
Language Understanding (NLU) for making an inference.
A deep neural network based multifaceted framework utilizing the
textual content as well as various NLU features (e.g., sentiment,
emotions, etc.) assesses a user's health on certain aspects
(medical condition & medication) for enabling timely intervention.
WHAT
WHY
HOW
Joy Prakash Sain
joy@knoesis.org
17. Why?
Obesity is on the rise worldwide. Focus on reducing excess calorie consumption and
making an informed decision about food choices and physical activity can help
attain a healthier weight and reduce the risk of chronic illness (cf: The Dietary
Guidelines of Americans).
What?
Monitoring individual's diet and cumulative calorie intake through food images and
recommending meals can help them in making informed decisions about their
meals. Also, tracking and assessing their food patterns and weight trends can help
them maintain healthier weight in the longer run.
How?
Built a system that is trained to recognize food images collected from open sources
such as Instagram, Google images, Pinterest, Getty image, etc. Once recognized,
volume can be estimated based on user input (automatically, in future) and nutrition
information can be obtained using comprehensive knowledge bases. AI techniques
support meal recommendations specific to user preferences and context.
Current Research Problem: Currently working on infusing knowledge to enhance image
classification. The existing systems face limitations as the classification is done using low
level features such as pixels, which leads to overlap among classes. That overlap can be
clarified with external knowledge.
Nutrition Management Information System
Nutrition Information
Management System
Revathy Venkataramanan
revathy@knoesis.org
18. Translational Research - Detecting Sample Mislabeling
(winning precisionFDA challenge!)
The use of high throughput molecular profiling methods is becoming increasingly
common in genetic studies to understanding the disease and enhancing our ability to
achieve the promise of precision medicine. The effectiveness of the methods depends
critically on the accurate labeling of the samples and could be seriously weakened by
sample mislabeling. However, the issue of sample mislabeling is obscure as it occurs
at data entry level, which the data will be considered as the ground truth and used for
downstream analysis.
Why
Multi-omics data is a multimodal data consisting of genomic, epigenomic,
transcriptomic and proteomic. Different modality of the data correlate with each other
and are highly parallel in nature. These data contain variation among different groups
of patients with different clinical attributes. Here, We exploit the coherency of
information inferred from different modality of the data to inform the potential
mislabeled samples.
What
We developed a computational algorithm to model the relationship between clinical
attributes, protein profiles and mRNA profiles. The model is applied to detect
mislabeled samples and correct the label. Accurate detection of mislabeling enhances
assurance that patients are getting the right analysis and prevent irreversible
consequences of giving wrong treatment to patients.
How
Clinical
Attributes
Transcriptomic Proteomics
Healthy
Healthy
Patient
Patient
SoonJye Kho
soonjye@knoesis.org
www.soonjye.com
19. Context-Aware Harassment Detection on Social Media
As social media permeates our daily life, there has been a sharp rise in the
use of social media to humiliate, bully, and threaten others, leading to
consequences ranging from emotional distress, depression to suicide.
Identifying such instances of harassment is challenging due to the
nuanced nature of human communication.
WHY
Analyze social media data to understand and identify the phenomenon of
online harassment.
WHAT
Employ syntactic, semantic and contextual cues (unfettering harassment
from keyword based approaches) with machine learning and deep learning
techniques on twitter data to introduce methods that would help to better
identify online harassment instances.
HOW
Thilini Wijesiriwardene
thilini@knoesis.org
20. Enhancing crowd wisdom using diversity
measures computed from social media
The predictive analytics market is expected to grow
from USD 4.56 Billion in 2017 to USD 12.41 Billion by
2022. Crowd selection is the most fundamental task
in the prediction market.
Diverse crowd selection strategies that can select
unbiased groups of individuals using process data
without relying on performance or outcome data. The
proposed strategies assembled crowds that could
accurately predict geopolitical and sports event
outcomes.
WHAT
WHY
HOW
Shreyansh Bhatt
shreyansh@knoesis.org
We propose the use of social media data to infer
diversity and select diverse (unbiased) group of
individuals. The proposed data-driven diversity
measure characterizes a user with word2vec and
selects a diverse crowd using clustering. An
enhanced diversity measure using domain-specific
knowledge graph for diverse crowd selection.
HOW
21. In 2017 alone the USA suffered from more than 16 weather-
related natural disasters which resulted in 452 fatalities and
a record $306 billion in economic cost. This highlights the dire
need for better capabilities to manage —plan, prepare and
respond— when such disasters strike.
Multimodal Data Aggregation and Integration
for Disaster Coordination and Response
Our multi-dimensional cross-modal aggregation and
inference methods integrate imagery, sensory, and textual
data. We preserve low-level details to provide situational
awareness for individuals, first responders, and humanitarian
organizations and the kind of available/required help for
individuals' needs including flooded areas around them.
We integrate the output of our semantic, syntactic and
pragmatic information extraction techniques with other
hazard models such as flood mapping. To achieve that, we
cross reference and integrate different knowledge sources (e.g.,
ontologies and gazetteers, resources) with streams of texts,
images and sensors (drone, satellite) data in real-time.
WHAT
WHY
HOW
54321
Contact: Hussein Al-Olimat (hussein@knoesis.org) http://wiki.knoesis.org/index.php/Social_and_Physical_Sensing_Enabled_Decision_Support
22. More projects and specifics at http://knoesis.org (library, projects, …)
Slides: https://www.slideshare.net/knoesis
General: Kno.e.sis on FB: https://facebook.com/kno.e.sis
LinkedIn: https://www.linkedin.com/company/1054055/admin/
This subset of Kno.e.sis research is primarily
directed by these faculty members
Dr. Amit P. Sheth Dr. Krishnaprasad Thirunarayan Dr. Valerie Shalin
Notes de l'éditeur
As usual---I'm wondering what the technical challenge is. A lack of precise (formal)? And why don't we have that already? Or do, we, but it is spread out across resources? If it is spread out, what are the integration challenges?
Societal Challenges (SC) and Technical Challenges (TC) for building a PHKG are investigated as follows:
SC1: What recommendations can be suggested by health applications to assist patients?
SC2: How personalized health coach applications can help clinicians?
SC3: Web sites such as airnow.gov and pollen.com provide visualization of environmental factor’s datasets according to their quality (e.g., low or high) which is mainly used by humans to understand the current environmental condition. How can machines automatically interpret and get this information? We need to provide the range (e.g., between X and Y then it is considered HIGH) to the machine to understand the data either with rule-based reasoning (used in this paper) or machine learning techniques (considered as future work).
SC4: How to diagnose asthma patients?
SC5: What environmental conditions (e.g., pollen level) impact a patient and trigger asthma symptoms (e.g., cough)?
TC1: How to build a PHKG? How are Google Health KG or IBM Watson KG built? Information provided in tutorials such as [6] do not provide concrete steps to construct a KG.
TC2: How to deduce meaningful information from kHealth Asthma datasets (EHRs, IoT datasets)?:
TC3: How to maintain data privacy and security of patient data? We are aware of those concerns, data is anonymized, but we do not detail this challenge in this paper.
What (Old)- Understanding the asthma triggers, their cause and individual treatment effect for pediatric asthma patients using continuous monitoring of their health signals.
How (Old)- 1) Causal inference analysis from multimodal data using Probabilistic Graphical Models for pediatric asthma patient. 2) Understanding personalized symptoms, asthma triggers and treatment, requires a causal inference analysis of multimodal data using Probabilistic Graphical Models for pediatric asthma patients.
Help clinician in detailed analysis.