The document discusses using graphs and Neo4j for natural language processing tasks. It describes representing text as a graph by connecting adjacent words, and using this representation to find word associations and do opinion mining. Graph-based summarization and content recommendation are also covered. The resources provided give examples of opinion summarization using shortest path algorithms on the graph representation of reviews.
3. Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Graph based summarization and keyword
extraction
• Content recommendation
4. Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Graph based summarization and keyword
extraction
• Content recommendation
Survey of NLP
methods with graphs
10. Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA
11. Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and
direction
• Can have name-value properties
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
LOVES
LOVES
LIVES WITH
OW
NS
PERSON PERSON
12. Cypher: Graph Query Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
13. “So what does this have to do with NLP?”
“Am I in the wrong talk?”
“I thought this was going to be about text processing….”
34. Word Associations
• Paradigmatic
• words that can be substituted
• “Monday” <—> “Thursday”
• “cat” <—> “dog”
• Syntagmatic
• words that can be combined with each other
• “cold”, “weather”
• colocations
35. Computing Paradigmatic Similarity
1. Represent each word by its context
2. Compute context similarity
3. Words with high context similarity likely have
paradigmatic relation
36. Computing Paradigmatic Similarity
1. Represent each word by its context
2. Compute context similarity
3. Words with high context similarity likely have
paradigmatic relation
with Cypher!
43. Paradigmatic Similarity
3. Find words with high context similarity
http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus
54. Opinion Mining - Example
1.Graph based representation
of review corpus
2.Find and score candidate
summaries
3.Select top scoring candidates
as summary
60. Content recommendation
“Networks give structure to the conversation
while content mining gives meaning.”
http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/
- Preriit Souda
61. Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the past
Collaborative filtering
Predict what users like based on the
similarity of their behaviors,
activities and preferences to others
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92
62. Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the past
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92
64. Building the article graph
• Articles users have shared
• Extract keywords using newspaper3k
python library
• Insert in the graph
• Scrape additional articles
https://github.com/johnymontana/nlp-graph-notebooks
72. Opinion Mining
• “Opinosis: A Graph Based Approach to Abstractive
Summarization of Highly Redundant Opinions”
• - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University
of Illinois at Urbana-Champaign
• Multi-sentence compression: Finding shortest paths in word
graphs
• - Proceedings of the 23rd International Conference on
Computational Linguistics. COLING 10. Beijing, Cina
Aug23-27, 2010. Katy Fillipova