Ever wondered how can be ML used to build Knowledge Graph for allowing businesses to successfully differentiate and compete today? We will show how Computer Vision, NLP/U, knowledge enrichment and graph-native algorithms fit together to build powerful insights from various unstructured data sources.
About the speakers:
Vlasta Kus - Lead Data Scientist at GraphAWare - Machine Learning, Deep Learning and Natural Language Processing expert.
Background in particle physics research at CERN. 10+ years of experience in software development (C/C++, Java, Python) and statistical data analysis.
Neo4j certified professional.
Specialised in using Machine Learning for building Knowledge Graphs (Hume @ GraphAware).
Golven Leroy - Student - I am a engineering student who is interested in everything graph. I love travelling and good food, especially when it is cheese-related and accompanied by good wine. Wannabe Gyro Gearloose, early-age spiderman fan, and beatmaker in my free time.
NODES 2019 - Neo4j Online Developer Expo & Summit - 10th October 2019
2. Overview
1. Social media & news articles ingestion
2. Machine Learning
a. Natural Language Processing
b. Image classification
c. Entity Relations Extraction
d. Graph analytics
3. Knowledge Graph
6. Instagram ingestion
INSTALOADER
-Does not use Instagram API (no account limitation)
-Easy to use, getting information takes time
-Number of queries limited to 200/hour...
10. Image analysis
“We do expect multimedia posts to become the predominant type of post on social media. Even the text that accompanies
those posts is getting shorter and shorter … It becomes increasingly important for companies to be able to understand what’s
going on in those images.”
– Jenny Sussin,VP of Research at Gartner
13. Natural Language Processing
● NLP = machine learning tools allowing computers to process - and perhaps understand - human
languages
● Basic steps: sentence segmentation, tokenisation, lemmatisation, part-of-speech tagging, universal
dependencies, ...
● More advanced: Sentiment Analysis, Named Entity Recognition, Entity Relations Extractinon, Topic
Classification, Keyword extraction, Document Classification, Summarization, ...
GraphAware Hume
14. Natural Language Processing
CALL ga.nlp.processor.addPipeline({name: 'nodes19-en', language: 'en',
textProcessor:
"com.graphaware.nlp.processor.stanford.ee.processor.EnterpriseStanfordTextProcessor",
processingSteps: {tokenize: true, ner: true, dependency: true, sentiment: true}
})
// Annotate Tweets
CALL apoc.periodic.iterate(
"MATCH (n:Tweet) where size(n.text) > 10 and not (n)-->(:AnnotatedText) RETURN n",
"CALL ga.nlp.annotate({text: n.text, id: id(n), pipeline: 'nodes19-en'})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:false})
GraphAware NLP integration with Neo4j: https://github.com/graphaware/neo4j-nlp
15. Keywords extraction
TextRank:
NLP + PageRank -> keywords & key phrases
Completely unsupervised, no training or tuning
required.
State-of-the-art results on wide range of unstructured
texts.
Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts. http://www.aclweb.org/anthology/W04-3252.
18. Knowledge Enrichment
● External Knowledge Base
○ Wikidata, ConceptNet5, Microsoft Concept Graph, Thomson Reuters, ...
● Internal Knowledge Base
○ domain specific
● Automated knowledge extraction
○ build knowledge from your data
19. Entity Relations Extraction
"Rich eventually became a staff writer at LaFace Records where he wrote songs for recording artists
including Boyz II Men Johnny Gill TLC and Toni Braxton."
(Rich) -[:EMPLOYEE_OF]-> (LaFace Records) -[:LOCATED_AT]-> ()
=> building knowledge
20. Entity Relations Extraction: GCNs
Graph Convolutional Networks (GCN)
● dependency trees transformed into adjacency matrices and used for learning to attend to relevant graph
sub-structures
● densely connected layers for generating new representations
● outperform LSTMs
● https://arxiv.org/abs/1906.07510
22. Knowledge Graphs
● Connected knowledge of various kinds and different sources
● Can be built automatically using state-of-the-art ML
● Ability to destille knowledge from information silos
● Good basis for an intelligence platform
○ How is our brand / products perceived by the public?
○ What is the impact/outreach of a news article about our company?
○ How to extract knowledge spread around multiple sources?
○ Which companies are investing the most into space research?
○ Who are the influencers in climate change debate?
○ What are the current citizen concerns?
○ ...
23. Hunger Games Questions for
“Social media monitoring with ML-powered Knowledge Graph”
1. Easy: What is harder to scrape?
a. Twitter
b. Instagram
2. Medium: What was the library used for Twitter scraping?
a. Tweak
b. TwitterLoader
c. Twint
3. Hard: Which ML model was used for Entity Relations Extraction?
a. LSTM
b. GCN
c. GAN
Answer here: r.neo4j.com/hunger-games