Social Media Monitoring with ML Knowledge Graph

Social Media Monitoring with
ML-powered Knowledge Graph
Vlasta Kůs, Golven Leroy

Overview
1. Social media & news articles ingestion
2. Machine Learning
a. Natural Language Processing
b. Image classification
c. Entity Relations Extraction
d. Graph analytics
3. Knowledge Graph

Social Media ingestion
What route should you take?
Social Media API Community made library

Twitter ingestion
TWINT
-Does not use Twitter API (no limitations except .Profiles or .Favorites)
-No sign-in required
-Very fast and easy to use

Instagram ingestion
INSTALOADER
-Does not use Instagram API (no account limitation)
-Easy to use, getting information takes time
-Number of queries limited to 200/hour...

News articles
-Scraped tweets from news network accounts, from the same time span
-Extracted article urls
-Scraped these articles

Image analysis
“We do expect multimedia posts to become the predominant type of post on social media. Even the text that accompanies
those posts is getting shorter and shorter … It becomes increasingly important for companies to be able to understand what’s
going on in those images.”
– Jenny Sussin,VP of Research at Gartner

Image analysis: EfficientNet
-Less complex models:
-faster training
-faster classification
-runnable even on CPUs

Natural Language Processing
● NLP = machine learning tools allowing computers to process - and perhaps understand - human
languages
● Basic steps: sentence segmentation, tokenisation, lemmatisation, part-of-speech tagging, universal
dependencies, ...
● More advanced: Sentiment Analysis, Named Entity Recognition, Entity Relations Extractinon, Topic
Classification, Keyword extraction, Document Classification, Summarization, ...
GraphAware Hume

Natural Language Processing
CALL ga.nlp.processor.addPipeline({name: 'nodes19-en', language: 'en',
textProcessor:
"com.graphaware.nlp.processor.stanford.ee.processor.EnterpriseStanfordTextProcessor",
processingSteps: {tokenize: true, ner: true, dependency: true, sentiment: true}
})
// Annotate Tweets
CALL apoc.periodic.iterate(
"MATCH (n:Tweet) where size(n.text) > 10 and not (n)-->(:AnnotatedText) RETURN n",
"CALL ga.nlp.annotate({text: n.text, id: id(n), pipeline: 'nodes19-en'})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})
GraphAware NLP integration with Neo4j: https://github.com/graphaware/neo4j-nlp

Keywords extraction
TextRank:
NLP + PageRank -> keywords & key phrases
Completely unsupervised, no training or tuning
required.
State-of-the-art results on wide range of unstructured
texts.
Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. http://www.aclweb.org/anthology/W04-3252.

Keywords extraction
MATCH (n:News)-->(a:AnnotatedText)
CALL ga.nlp.ml.textRank({annotatedText: a, useDependencies: true,
topXTags: 0.15})
YIELD result RETURN result

Knowledge Enrichment
● External Knowledge Base
○ Wikidata, ConceptNet5, Microsoft Concept Graph, Thomson Reuters, ...
● Internal Knowledge Base
○ domain specific
● Automated knowledge extraction
○ build knowledge from your data

Entity Relations Extraction
"Rich eventually became a staff writer at LaFace Records where he wrote songs for recording artists
including Boyz II Men Johnny Gill TLC and Toni Braxton."
(Rich) -[:EMPLOYEE_OF]-> (LaFace Records) -[:LOCATED_AT]-> ()
=> building knowledge

Entity Relations Extraction: GCNs
Graph Convolutional Networks (GCN)
● dependency trees transformed into adjacency matrices and used for learning to attend to relevant graph
sub-structures
● densely connected layers for generating new representations
● outperform LSTMs
● https://arxiv.org/abs/1906.07510

Knowledge Graphs
● Connected knowledge of various kinds and different sources
● Can be built automatically using state-of-the-art ML
● Ability to destille knowledge from information silos
● Good basis for an intelligence platform
○ How is our brand / products perceived by the public?
○ What is the impact/outreach of a news article about our company?
○ How to extract knowledge spread around multiple sources?
○ Which companies are investing the most into space research?
○ Who are the influencers in climate change debate?
○ What are the current citizen concerns?
○ ...

Hunger Games Questions for
“Social media monitoring with ML-powered Knowledge Graph”
1. Easy: What is harder to scrape?
a. Twitter
b. Instagram
2. Medium: What was the library used for Twitter scraping?
a. Tweak
b. TwitterLoader
c. Twint
3. Hard: Which ML model was used for Entity Relations Extraction?
a. LSTM
b. GCN
c. GAN
Answer here: r.neo4j.com/hunger-games

Social Media Monitoring with ML Knowledge Graph

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Social Media Monitoring with ML Knowledge Graph

Similaire à Social Media Monitoring with ML Knowledge Graph (20)

Plus de GraphAware

Plus de GraphAware (20)

Dernier

Dernier (20)

Social Media Monitoring with ML Knowledge Graph