This document discusses text mining of patient discussion forums to extract health knowledge and experiences. It describes how forums provide information on patient journeys, side effects, coping strategies, and quality of life that can complement medical data. As an example, a gastrointestinal tumor forum was analyzed to extract mentions of adverse drug events and related treatments. Automated methods were able to extract this information with over 70% accuracy compared to human annotations. The extracted data provided additional insights into reported side effects compared to clinical trial data. The document also discusses ongoing work to extract patients' coping strategies for side effects.
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Text mining for health knowledge discovery
1. Text mining for health
knowledgediscovery
Suzan Verberne | Emerging technologies in medicine (ETIM) 2022 | June 2022
Credits to Anne Dirkson
Collaboration with Hans Gelderblom, Gerard van Oortmerssen and Wessel Kraaij
5. GISTforumdata
GIST Support International on Facebook
The forum data was donated to us by the organization
Data from 24 Oct 2009 until 1 Nov 2020
124,103 posts in 14,631 threads
Median post length: 20 words
Suzan Verberne 2022
6. Post-market
drugmonitoring
Monitoring of side effects
suffers from under-reporting
Real world evidence from
patients is necessary
Can be found on patient
forum discussions
Suzan Verberne 2022
7. Medicalentity
extraction
Entity extraction is a machine learning
task based on sequence labelling
Word order matters
One entity can span multiple words
There are multiple ways to refer to the
same concept
The extracted entities need to be linked
to a standard form (in an ontology)
I have a
throbbing
pain in my
head
“my head is
bursting”
headache
Arjun Magge, Elena Tutubalina, Zulfat Miftahutdinov, Ilseyar Alimova, Anne Dirkson, Suzan Verberne, Davy Weissenbacher, Graciela
Gonzalez-Hernandez, DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event
mentions on Twitter, Journal of the American Medical Informatics Association (JAMIA), Volume 28, Issue 10, October 2021, pp 2184–2192
8. Entityextraction
Label spans (word sequences) in the text that refer to
entities
Store the data with a label for each word
Label ‘B’: this word is the beginning of an entity
Label ‘I’: this word is inside an entity
Label ‘O’: this word is not in an entity
Pickle juice reduces
my muscle cramps
Suzan Verberne 2022
9. Entityextraction
Train a classifier on predicting the labels while taking
the context into account (‘sequence labelling’)
Hidden Markov Models, Conditional Random Fields
Recurrent Neural Networks, Bi-LSTMs
Bidirectional Encoder Representations from
Transformers (BERT)
Suzan Verberne 2022
10. Since I started on Gleevec,
I can’t fall asleep at all.
Imatinib
Insomnia
BIOSYN
BERT
models
trained
125.161 messages
Anne Dirkson, Suzan Verberne, Wessel Kraaij, Gerard van Oortmerssen, Hans Gelderblom (2022). Automated gathering of real-world
evidence from online patient forums can complement pharmacovigilance for rare cancers. To appear in Nature Scientific Reports.
4,195 messages (527 discussions)
mannually annotated
SNOMED
11. Evaluation
Recall 74%
Precision 70%
F1 0.72
Human pairwise F1 0.80
Entity extraction
Accuracy@1 65%
Accuracy@5 79%
Entity-ontology linking (SNOMED)
Suzan Verberne 2022
12. Model applied to
thewholeGIST
forum
Treatment type Drug # of ADE found
First-line Imatinib 13,376
Second-line Sunitinib 2,335
Third-line Regorafenib 319
Fourth-line Ripretinib 319
PDGFRA exon 18 mutations Avapritinib 297
Off-label Nilotinib 59
Off-label Pazopanib 51
Off-label Sorafenib 47
Off-label Ponatinib 17
Unknown 2,948
Total 21,051
Suzan Verberne 2022
13. Comparison with
surveydata for
imatinib
Rank in survey Rank in forum
1. Fatigue 1. Fatigue
2. Muscle aches, pains or cramps 2. Nausea
3. Swelling of face or around the eyes 3. Cramp
4. Problems remembering things 4. Disorder of skin
Aches or pains in joints 5. Oedema
6. Skin problems 6. Pain
Diarrhoea 7. Alopecia
8. Feeling weak 8. Altered bowel function
9. Indigestion or heart burn 9. Pain in limb
10. Swelling in any part of body (Oedema) 10. Facial swelling
Dide den Hollander, Anne R. Dirkson, Suzan Verberne, Wessel Kraaij,…, Olga Husson (2022). Symptoms reported by gastrointestinal
stromal tumour (GIST) patients on imatinib treatment: combining questionnaire and forum data. In Supportive Care in Cancer.
15. Extracting
coping
strategies
New task. Challenges:
1. No data
2. No ontology
3. Complex and varied entities
4. Cross-document relations to ADE
Anne Dirkson, Suzan Verberne, Gerard van Oortmerssen, Hans Gelderblom,Wessel Kraaij. How do others cope? Extracting coping
strategies for adverse drug events from social media. In revision for Journal of Biomedical Informatics (JBI).
16. Ontologylinking
Build a basic ontology
Manual annotation (link text spans to ontology)
Update ontology based on gaps found
Discuss discrepancies and come to agreement
Suzan Verberne 2022
19. Conclusions
Patient forums are valuable resources for experiential
knowledge
Extraction of side effects from patient forums leads to
new information compared to clinical trials and survey
data
Extraction of coping strategies is challenging but can
also lead to new insights and medical follow-up
research
https://dashboard-gist-adr.herokuapp.com
https://github.com/AnneDirkson
https://twitter.com/suzan
Notes de l'éditeur
Motivational example: pure chocolate
We use heuristic rules to determine which drug is linked to each ADE. Whenever possible, we select the drug mentioned prior to the ADE in the message. If there is none, we select the drug mentioned after the ADE in the message. If there are no drugs mentioned in the message, we select the first drug mentioned in the conversational thread prior to the message. These rules were determined based on further manual annotation of our annotated subset by the first author. In some cases, it was not clear which drug the patient believed was causing the ADE and these cases were excluded. In this data set, our rules were 93% accurate if we restricted the possible drugs for linking to a predetermined list
BIOSYN: Our data does not contain annotations for normalization (i.e., the relevant concept IDs for each ADE mention). We rely on three publicly available data sets for training our normalization model: CADEC [Karimi et al., 2015], PsyTAR [Zolnoori et al., 2019] and the Clinical Findings subset of the COMETA corpus [Basaldella et al., 2020]. (acc@1: 0.645; acc@5@ 0.791)
65%: accurate labelling of ADEs when tested on a different data set than those on which the model was trained
79% (14% in the top-5). Manual inspection: for 53 of 100 randomly selected cases, we would consider the first label to be correct or even better than the gold label. E.g. mention “severe abdominal pain” has the gold label “painful” (i.e., the label given by humans, which was in the top-5). Top-1 label: “abdominal pain”
100 random cases in which the correct label was not included in the top 5: 36 of those we would consider the top-1 predicted label as correct.
Thus, the performance in the table is an underestimation.
Our model can find novel entities: 40.2% of the true positives were not present in the training data.
Forum data contain new side effects
Can keep questionnaires up to date
Not only side effects can be gathered from fora. We want to go one step further and also look at the advice that is given for dealing with these side effects. This is unique to forums, and not routinely gathered.. This would be a new task.
We chose SNOMED-CT, NCIT and RxNORM as our source ontologies in line with the OHDSI project
The inter-annotator agreement was substantial (mean Kappa= 0.706) on the token level
Strawberry – raspberry – milk example
Manual examination of underlying messages reveals that eating and drinking different forms of ginger is recommended, as well as drinking herbal tea (both ginger and peppermint). Patients also recommend taking anti-nausea medication ondansetron and splitting the dosage (`split dosage'). The other categories which relate to how you consume medication (e.g., `half to one hour before food') do relate to this broader topic, but the specific labels are incorrect. Amongst others, patient recommend to avoid taking medication on an empty stomach and to take it after dinner or just before bed.