SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
IBM Research
© 2014 IBM Corporation
A Scalable Graph Representation of Knowledge Bases
and its Uses for Semantic Document Relatedness
Yosi Mass, Dafna Sheinwald (HRL)
Feng Cao, Yuan Ni, Hai Pei Zhang, Qiongkai Xu (CRL)
© 2014 IBM Corporation
IBM Research
2
Introduction – Knowledge Base
A Knowledge-base (KB) is a representation of a knowledge where -
 Nodes represent entities
 Edges represent relationships between entities
 Nodes and edges may have attributes
Linked Open Data
© 2014 IBM Corporation
IBM Research
The DBPedia Knowledge base
© 2014 IBM Corporation
IBM Research
4
Usage of Knowledge Bases
1. Semantic understanding of a text by mapping phrases to the knowledge base.
2. Helps to find relatedness/similarity between two given texts
In the United Kingdom and Ireland, high school students traditionally do not have 'free
periods' but do have 'break' which normally occurs just after their second lesson of the
day (normally referred to as second period).
 Mentions
 United Kingdom - http://en.wikipedia.org/wiki/United_Kingdom
 Ireland - http://en.wikipedia.org/wiki/Ireland
 high school students - http://en.wikipedia.org/wiki/High_school - note the derivation to "high school
student" and then the re-direct to "High school".
 ‘free periods’ - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.
 ‘break’ - http://en.wikipedia.org/wiki/Break_(work) - note the disambiguation.
 lesson - http://en.wikipedia.org/wiki/Lesson
 day - http://en.wikipedia.org/wiki/Day
– period - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.
© 2014 IBM Corporation
IBM Research
5
Mention Detection
Graph based Similarity scorers
• Exploits the graph structure to find relationships between pairs of mentions
• Aggregate over all pairs
Facet graph use case - find semantic relatedness between two text
paragraphs
Paragraph 1 Paragraph 2
?
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
Titan graph
Hbase
shortest path
similarity scorers
The TinkerPop Stack Usage in a project
Cassandra (planned)
Hadoop
Access the graph
Map reduce code
To generate the graph
Graph stack library
© 2014 IBM Corporation
IBM Research
• Input is given as RDF triples.
• Example
http://dbpedia.org/resource/Yehuda_Vilner,
http://dbpedia.org/ontology/birthPlace
http://dbpedia.org/resource/Israel
• URIs are translated to vertexIDs
• Adding a triple requires:
1. Add the subject and object as nodes (or get their IDs if they are already in the graph)
2. Add the predicate as an edge between the two nodes
This is the
most
expensive
operation
9
Generate the Knowledge Graph from RDF data
subject
object
predicate
Does not scale
to millions of
triples
© 2014 IBM Corporation
IBM Research
A scalable solution using MapReduce
• What is MapReduce?
• Programming model for expressing distributed computations at a massive scale
• Execution framework for organizing and performing such computations
• Open-source implementation called Hadoop
• Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’*) → <k’’, v’’>*
All values with the same key are sent to the same reducer
The execution framework handles everything else…
© 2014 IBM Corporation
IBM Research
mapmap map map
Shuffle and Sort: aggregate values by keys
reduce reduce reduce
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 8
a 1 5 b 2 7 c 2 3 6 8
r1 s1 r2 s2 r3 s3
MapReduce
© 2014 IBM Corporation
IBM Research
Graph generation using MapReduce
Job 1 – sort by subjects
(S1, P1, O1)
(S2, P2, O2)
(S3, P3, O1)
(S1, P2, O2)
map
S1 (P1, O1)
S2 (P2, O2)
S3 (P3, O1)
S1 (P2, O2)
reduce
Job 2 – add subjects to graph and sort by objects
map
O1 (P1, SID1)
O2 (P2, SID2)
O1 (P3, SID3)
O2 (P2, SID1)
reduce
S1 (P1, O1)
S2 (P2, O2)
S3 (P3, O1)
S1 (P2, O2)
O1 (P1, SID1)
O2 (P2, SID1)
O1 (P3, SID3)
O2 (P2, SID2)
Job 3 – add objects and edges to graph
S1 (P1, O1)
S2 (P2, O2)
S3 (P3, O1)
S1 (P2, O2)
O1 (P1, SID1)
O2 (P2, SID1)
O1 (P3, SID3)
O2 (P2, SID2)
map
SID1
OID1
P1
OID2
P2
SID3 P3
SID2
P2
© 2014 IBM Corporation
IBM Research
• Implementation based on Titan Graph Library With Hbase as the backend
• Runs on a cluster of 3 machines
• Each machine has 16 cores, 2Tb disk and 32Gb mem
13
Facet Graph Architecture
Rexster
Server
Titan graph 1
Hbase
Application REST API
Hadoop cluster
Titan graph n…
© 2014 IBM Corporation
IBM Research
14
Facet Graph performance
• Creation (offline)
• Use three Map-reduce jobs to index DBPedia into Titan
1. First job sorts subjects
2. Second job adds subjects
3. Third job adds objects and edges
• Access (online)
• Implemented as a JAVAAPI that wraps REST API through Rexster server
• Performance on a cluster of 3 machines each with 16 cores, 2Tb disk and 32Gb mem
Graph #Vertices #Edges Creation time Access time
Semantics FG 14M 72M 3h:45m 1 msec to get node
description
2 sec to get 223K inlinks of
an heavy node (USA)
Links FG 19M 152M 7h:18m 4.4 sec to get 447K inlinks
of an heave node (USA)
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
16
Mention detection
Input Text
Lexicon
Spotting
candidates
Selection
Disambiguation
Lucene Index
Facet Graph
Spotting stage: recognizes in a sentence the phrases (surface forms) that may indicate a
mention in the KB
Candidate selection stage: given the surface form, retrieves the set of candidate URIs
for disambiguation
Disambiguation stage: uses the context around the spotted phrase to decide on the best
candidate.
Annotated Text
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
18
Pairwise Concept similarity based on wikilinks [1]
[1] Milne D., Witten I. H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from
Wikipedia Links, AAAI, 2008
© 2014 IBM Corporation
IBM Research
Our assets on IBM.next
IBM Confidential14/9/
http://ibmnext.stage1.mybluemix.net/assets
© 2014 IBM Corporation
IBM Research
Thank You

Contenu connexe

Tendances

Hypergraph Mining For Social Networks
Hypergraph Mining For Social NetworksHypergraph Mining For Social Networks
Hypergraph Mining For Social NetworksGiacomo Bergami
 
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLThe DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLtorp42
 
Spatial Indexing
Spatial IndexingSpatial Indexing
Spatial Indexingtorp42
 
Networking assignment 1
Networking assignment 1Networking assignment 1
Networking assignment 1Soham Sengupta
 
Assignment on different types of addressing modes
Assignment on different types of addressing modesAssignment on different types of addressing modes
Assignment on different types of addressing modesNusratJahan263
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 
8.1.4.8 lab identifying i pv4 addresses
8.1.4.8 lab   identifying i pv4 addresses8.1.4.8 lab   identifying i pv4 addresses
8.1.4.8 lab identifying i pv4 addressesRehab El Nagar
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?Gábor Szárnyas
 
Data compression using python draft
Data compression using python draftData compression using python draft
Data compression using python draftAshok Govindarajan
 
F# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis SimpleF# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis SimpleTomas Petricek
 

Tendances (15)

Hypergraph Mining For Social Networks
Hypergraph Mining For Social NetworksHypergraph Mining For Social Networks
Hypergraph Mining For Social Networks
 
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLThe DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
 
Spatial Indexing
Spatial IndexingSpatial Indexing
Spatial Indexing
 
Networking assignment 1
Networking assignment 1Networking assignment 1
Networking assignment 1
 
Assignment on different types of addressing modes
Assignment on different types of addressing modesAssignment on different types of addressing modes
Assignment on different types of addressing modes
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 
8.1.4.8 lab identifying i pv4 addresses
8.1.4.8 lab   identifying i pv4 addresses8.1.4.8 lab   identifying i pv4 addresses
8.1.4.8 lab identifying i pv4 addresses
 
Final
FinalFinal
Final
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?
 
HDF-EOS Vector Data
HDF-EOS Vector DataHDF-EOS Vector Data
HDF-EOS Vector Data
 
Data compression using python draft
Data compression using python draftData compression using python draft
Data compression using python draft
 
F# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis SimpleF# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis Simple
 
Lo18
Lo18Lo18
Lo18
 
grammer
grammergrammer
grammer
 

Similaire à Knowledg graphs yosi mass

Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...Flink Forward
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKMatt Stubbs
 
1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real World1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real WorldAchim Friedland
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defensemarek_pomocka
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09lghost1201
 
MHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptxMHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptxMinHtetMyint1
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discoveryaftab alam
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphVaticle
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkVincent Poncet
 

Similaire à Knowledg graphs yosi mass (20)

Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
 
1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real World1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real World
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09
 
MHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptxMHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptx
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discovery
 
Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Scala+data
Scala+dataScala+data
Scala+data
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 

Plus de diannepatricia

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsondiannepatricia
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0diannepatricia
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systemsdiannepatricia
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”diannepatricia
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibilitydiannepatricia
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Cardiannepatricia
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”diannepatricia
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...diannepatricia
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...diannepatricia
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”diannepatricia
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Agingdiannepatricia
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"diannepatricia
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligencediannepatricia
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognitiondiannepatricia
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systemsdiannepatricia
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”diannepatricia
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...diannepatricia
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50diannepatricia
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learningdiannepatricia
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Societydiannepatricia
 

Plus de diannepatricia (20)

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
 

Dernier

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Knowledg graphs yosi mass

  • 1. IBM Research © 2014 IBM Corporation A Scalable Graph Representation of Knowledge Bases and its Uses for Semantic Document Relatedness Yosi Mass, Dafna Sheinwald (HRL) Feng Cao, Yuan Ni, Hai Pei Zhang, Qiongkai Xu (CRL)
  • 2. © 2014 IBM Corporation IBM Research 2 Introduction – Knowledge Base A Knowledge-base (KB) is a representation of a knowledge where -  Nodes represent entities  Edges represent relationships between entities  Nodes and edges may have attributes Linked Open Data
  • 3. © 2014 IBM Corporation IBM Research The DBPedia Knowledge base
  • 4. © 2014 IBM Corporation IBM Research 4 Usage of Knowledge Bases 1. Semantic understanding of a text by mapping phrases to the knowledge base. 2. Helps to find relatedness/similarity between two given texts In the United Kingdom and Ireland, high school students traditionally do not have 'free periods' but do have 'break' which normally occurs just after their second lesson of the day (normally referred to as second period).  Mentions  United Kingdom - http://en.wikipedia.org/wiki/United_Kingdom  Ireland - http://en.wikipedia.org/wiki/Ireland  high school students - http://en.wikipedia.org/wiki/High_school - note the derivation to "high school student" and then the re-direct to "High school".  ‘free periods’ - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.  ‘break’ - http://en.wikipedia.org/wiki/Break_(work) - note the disambiguation.  lesson - http://en.wikipedia.org/wiki/Lesson  day - http://en.wikipedia.org/wiki/Day – period - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.
  • 5. © 2014 IBM Corporation IBM Research 5 Mention Detection Graph based Similarity scorers • Exploits the graph structure to find relationships between pairs of mentions • Aggregate over all pairs Facet graph use case - find semantic relatedness between two text paragraphs Paragraph 1 Paragraph 2 ?
  • 6. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 7. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 8. © 2014 IBM Corporation IBM Research Titan graph Hbase shortest path similarity scorers The TinkerPop Stack Usage in a project Cassandra (planned) Hadoop Access the graph Map reduce code To generate the graph Graph stack library
  • 9. © 2014 IBM Corporation IBM Research • Input is given as RDF triples. • Example http://dbpedia.org/resource/Yehuda_Vilner, http://dbpedia.org/ontology/birthPlace http://dbpedia.org/resource/Israel • URIs are translated to vertexIDs • Adding a triple requires: 1. Add the subject and object as nodes (or get their IDs if they are already in the graph) 2. Add the predicate as an edge between the two nodes This is the most expensive operation 9 Generate the Knowledge Graph from RDF data subject object predicate Does not scale to millions of triples
  • 10. © 2014 IBM Corporation IBM Research A scalable solution using MapReduce • What is MapReduce? • Programming model for expressing distributed computations at a massive scale • Execution framework for organizing and performing such computations • Open-source implementation called Hadoop • Programmers specify two functions: map (k, v) → <k’, v’>* reduce (k’, v’*) → <k’’, v’’>* All values with the same key are sent to the same reducer The execution framework handles everything else…
  • 11. © 2014 IBM Corporation IBM Research mapmap map map Shuffle and Sort: aggregate values by keys reduce reduce reduce k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6 ba 1 2 c c3 6 a c5 2 b c7 8 a 1 5 b 2 7 c 2 3 6 8 r1 s1 r2 s2 r3 s3 MapReduce
  • 12. © 2014 IBM Corporation IBM Research Graph generation using MapReduce Job 1 – sort by subjects (S1, P1, O1) (S2, P2, O2) (S3, P3, O1) (S1, P2, O2) map S1 (P1, O1) S2 (P2, O2) S3 (P3, O1) S1 (P2, O2) reduce Job 2 – add subjects to graph and sort by objects map O1 (P1, SID1) O2 (P2, SID2) O1 (P3, SID3) O2 (P2, SID1) reduce S1 (P1, O1) S2 (P2, O2) S3 (P3, O1) S1 (P2, O2) O1 (P1, SID1) O2 (P2, SID1) O1 (P3, SID3) O2 (P2, SID2) Job 3 – add objects and edges to graph S1 (P1, O1) S2 (P2, O2) S3 (P3, O1) S1 (P2, O2) O1 (P1, SID1) O2 (P2, SID1) O1 (P3, SID3) O2 (P2, SID2) map SID1 OID1 P1 OID2 P2 SID3 P3 SID2 P2
  • 13. © 2014 IBM Corporation IBM Research • Implementation based on Titan Graph Library With Hbase as the backend • Runs on a cluster of 3 machines • Each machine has 16 cores, 2Tb disk and 32Gb mem 13 Facet Graph Architecture Rexster Server Titan graph 1 Hbase Application REST API Hadoop cluster Titan graph n…
  • 14. © 2014 IBM Corporation IBM Research 14 Facet Graph performance • Creation (offline) • Use three Map-reduce jobs to index DBPedia into Titan 1. First job sorts subjects 2. Second job adds subjects 3. Third job adds objects and edges • Access (online) • Implemented as a JAVAAPI that wraps REST API through Rexster server • Performance on a cluster of 3 machines each with 16 cores, 2Tb disk and 32Gb mem Graph #Vertices #Edges Creation time Access time Semantics FG 14M 72M 3h:45m 1 msec to get node description 2 sec to get 223K inlinks of an heavy node (USA) Links FG 19M 152M 7h:18m 4.4 sec to get 447K inlinks of an heave node (USA)
  • 15. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 16. © 2014 IBM Corporation IBM Research 16 Mention detection Input Text Lexicon Spotting candidates Selection Disambiguation Lucene Index Facet Graph Spotting stage: recognizes in a sentence the phrases (surface forms) that may indicate a mention in the KB Candidate selection stage: given the surface form, retrieves the set of candidate URIs for disambiguation Disambiguation stage: uses the context around the spotted phrase to decide on the best candidate. Annotated Text
  • 17. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 18. © 2014 IBM Corporation IBM Research 18 Pairwise Concept similarity based on wikilinks [1] [1] Milne D., Witten I. H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, AAAI, 2008
  • 19. © 2014 IBM Corporation IBM Research Our assets on IBM.next IBM Confidential14/9/ http://ibmnext.stage1.mybluemix.net/assets
  • 20. © 2014 IBM Corporation IBM Research Thank You