SlideShare a Scribd company logo
1 of 9
Download to read offline
THE DEEPDIVE FRAMEWORK
LEO ZHANG
STEP-BY-STEP ILLUSTRATION
The Stanford DeepDive, developed by Professor Chris Rรฉ
and a team of PhDs, is a powerful data management and
preparation platform that allows users to build highly
sophisticated end-to-end data pipelines
This presentation covers the technicalities of the inference and learning
engine behind DeepDive; including how DeepDive is different from
traditional data management systems, how to build an application on
DeepDive, as well as how exactly does DeepDive work.
โ€œWe are just an advanced breed of monkeys
on a minor planet of a very average star. But
we can understand the Universe. That makes
us specialโ€โ€
- Stephen Hawking
THE DEEPDIVE OVERVIEW
How Is DeepDive Different?
Source: www.deepdive.stanford.edu
DeepDive is an end-to-end framework for building KBC systems.
B.Obama
and his
wife M.
Obama
Candidate
Generation
& Feature
Extraction
Super-
vision
Learning
&
Inference
Has
Spouse
Input Output
Newdocs
FeatureExt.
rules
Supervision
rules
Inference
rules
Erroranalysis
Input: Unstructured Docs
Developers will add new rules to improve quality
How Does DeepDive Work?
โ€ขโ€ฏ Candidate Generation and Feature Extraction
โ€ขโ€ฏ Save input data in relational database
โ€ขโ€ฏ Feature Extractors: a set of user-defined
functions
โ€ขโ€ฏ Supervision
โ€ขโ€ฏ DeepDive language is based on Markov Logic
โ€ขโ€ฏ Can use training data to mirror the same
function it serves under supervised learning
โ€ขโ€ฏ Learning and Inference
โ€ขโ€ฏ Factor graph
โ€ขโ€ฏ Error Analysis
โ€ขโ€ฏ Determine if the user needs to inspect the errors
DeepDive Design
Features that makes it convenient for non-computer scientists to
use:
i)โ€ฏ No reference to underlying machine learning algorithm.
Probabilistic semantics provide a way to debug the system
independently of algorithm
ii)โ€ฏ Allows users to write extra features in Python, SQL and
Scala
iii)โ€ฏ Fits into the familiar SQL stack, therefore allows standard
tools to inspect and visualize data
Source: Incremental Knowledge Base Construction Using DeepDive
Output: structured knowledge base
Feature
Engineering
High Quality
Allows developers to think about features
rather than algorithms
Applications have achieved higher quality
than human volunteers
Calibration
Variety of
Sources
Computes calibrated probability for every
assertion it makes
Can extract data from documents, PDFs,
web pages, tables and figures
Domain
Knowledge
Distant
Supervision
Integrates with writing sample rules to
improve quality
Does not require tedious training for every
prediction
DEVELOPMENT PROCESS OF
DEEPDIVE APPLICATIONS
Writing The Application
Running The Application
Evaluate / Debug
โ€ขโ€ฏ Define the data flow in DDlog schema that
describes the input data and data to be produced
โ€ขโ€ฏ Write User-Defined Functions (data
transformation rules)
โ€ขโ€ฏ Specify a statistical model in DDlog
โ€ขโ€ฏ The user can compile and run the application
incrementally
โ€ขโ€ฏ Actual data loaded to data base and queried ->
User-Defined Functions executed incrementally
โ€ขโ€ฏ Modelโ€™s parameters can be learned or reused to
make predictions
โ€ขโ€ฏ Formal error analysis supported by interactive
tools
โ€ขโ€ฏ DeepDive contains a suite of tools and guides:
Label data products, browse data, monitor
descriptive statistics, calibration etc.
# DDlog is a higher-level language for
writing DeepDive applications in
succinct, Datalog-like syntax
# Variable declarations + Scoping and
supervision rules + Inference rules
# A core set of commands that
supports precise control of execution
# Several commands on the statistical
model such as its creation, parameter
estimation, computation of
probabilities and keeping and reusing
the parameters
# User-Defined Functions can be
written on any standard programming
languages
# Produces calibration plots to
evaluate the iterative workflow
# Comments
Start with a
basic first
version and
improve
iteratively
Source: DeepDive: A Data Management System for Automatic Knowledge Base Construction
โ€œItโ€™s okay to have your eggs in one basket as
long as you control what happens to that
basketโ€
- Elon Musk
THE DEEPDIVE FRAMEWORK
Input
Candidate
Generation &
Feature Extraction
Supervision
Learning &
Inference
Output
New docs
Feature Ext.
rules
Supervision
rules
Inference
rules
Error analysis
End-To-End Framework For Building KBCs
Source: Incremental Knowledge Base Construction Using DeepDive
Knowledge-Based Construction Systems
The input to a KBC system is a heterogeneous
collection of unstructured, semi-structured, and
structured data.
The output is a relational database containing
facts extracted from the input and put into the
appropriate schema
The KBC Model
The standard KBC model seeks to extract four
types of objects from input documents:
Entity
Relation
Mention
Relation
Mention
A real person, place, or thing
A relation associates two (or more) entities
A span of text in input document that refers
to the entity or relation
A phrase that connects two mentions that
participate in a relations
THE DEEPDIVE FRAMEWORK:
STEP-BY-STEP
Input
Candidate
Generation &
Feature Extraction
Supervision
Learning &
Inference
Output
New docs
Feature Ext.
rules
Supervision
rules
Inference
rules
Error analysis
Source: Incremental Knowledge Base Construction Using DeepDive
Candidate Generation & Feature Extraction
All data is stored in a relational database. This
phase populates the database using a set of SQL
queries and User-Defined Functions (Feature
Extractors)
By default, DeepDive stores all documents in the
database in one sentence per row with markup
produced by standard NLP pre-processing tools,
including HTML stripping, part-of-speech tagging,
and linguistic parsing
Then, DeepDive executes two types of queries:
Candidate mappings โ€“ SQL queries that produce
possible mentions, entities, and relations
Feature Extractors โ€“ associate features to
candidates
โ€œA breakthrough in machine learning would be
worth ten Microsoftsโ€
- Bill Gates
THE DEEPDIVE FRAMEWORK:
STEP-BY-STEP
Input
Candidate
Generation &
Feature Extraction
Supervision
Learning &
Inference
Output
New docs
Feature Ext.
rules
Supervision
rules
Inference
rules
Error analysis
Source: Incremental Knowledge Base Construction Using DeepDive
Just as in Markov Logic, DeepDive can use training
data or evidence about any relation.
Each user relation is associated with an evidence
that indicates whether the entry is true or false
Two standard techniques generate training data:
Hand-labeling and Distant Supervision
Distant Supervision
Traditional machine learning techniques require a
set of training data. In distant supervision, DeepDive
takes existing databases (e.g. domain-specific
database) to collect relations DeepDive wants to
extract. Then use these examples to automatically
generate the training data
Supervision
THE DEEPDIVE FRAMEWORK:
STEP-BY-STEP
Input
Candidate
Generation &
Feature Extraction
Supervision
Learning &
Inference
Output
New docs
Feature Ext.
rules
Supervision
rules
Inference
rules
Error analysis
Source: Incremental Knowledge Base Construction Using DeepDive
Learning & Inference
In this phase, DeepDive generates a factor graph
An example factor graph. There is one user relation
containing all tokens, and there are two correlation
relations for adjacent-token correlation (F1) and same-
word correlation (F2) respectively.
A probabilistic graphical model that is the abstraction
used for learning. DeepDive relies heavily on factor
graph
Raw Data In-database Representation
He said that he would come.
Factor Graph
He
Said
That
He
i
ii
iii
iv
Adjacent-
token
Same-
word
User	Rela)ons	
Token	 Word	
A	 He	
B	 Said	
C	 That	
D	 He	
Assignment Example
Correla)on	Rela)ons	
Rx	 Vars	 Rx	 Vars	
i	 (A,B)	 iv	 (A,D)	
ii	 (B,C)	
iii	 (C,D)	
F1	 F2	
Assignment	
Token	 Assignment	
A	 1	
B	 0	
C	 0	
D	 1	
Partition Function
Z =
f1(1,0) x
f1(0,0) x
f1(0,1) x
f1(1,1) x
Factors in F1
Factors in F2
Source: DeepDive: A Data Management System for Automatic Knowledge Base Construction
A B C D
A
B
C
D
โ€œProblems worthy of attack prove their worth
by fighting backโ€
- Paul Erdรถs
REFERENCES
Shin, Jaeho, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher
Rรฉ. "Incremental Knowledge Base Construction Using DeepDive." Proc. VLDB Endow.
Proceedings of the VLDB Endowment 8.11 (2015): 1310-321. Web.
Ce Zhang. โ€œDeepDive: A Data Management System for Automatic Knowledge Base Construction."
Proc. VLDB Endow. Proceedings of the VLDB Endowment 8.13 (2015): 1310-321. Web.

More Related Content

What's hot

Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
ย 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analyticsSwarnaLatha177
ย 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsRavi Teja
ย 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
ย 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
ย 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
ย 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
ย 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)heba_ahmad
ย 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run GraphVaticle
ย 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexitysuresh sood
ย 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
ย 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKUlf Mattsson
ย 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformNeo4j
ย 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
ย 
Make AI & BI work at Scale
Make AI & BI work at ScaleMake AI & BI work at Scale
Make AI & BI work at ScaleSteve Nouri
ย 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
ย 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthHostedbyConfluent
ย 

What's hot (20)

Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
ย 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
ย 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
ย 
Big data mining
Big data miningBig data mining
Big data mining
ย 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
ย 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data Discovery
ย 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
ย 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
ย 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
ย 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
ย 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
ย 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
ย 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
ย 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ย 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
ย 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
ย 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ย 
Make AI & BI work at Scale
Make AI & BI work at ScaleMake AI & BI work at Scale
Make AI & BI work at Scale
ย 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
ย 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
ย 

Viewers also liked

Deepdive presentation GBAF20 primary care
Deepdive presentation GBAF20 primary careDeepdive presentation GBAF20 primary care
Deepdive presentation GBAF20 primary careMatthew Cunningham
ย 
DeepDive - Azure AD Identity Protection
DeepDive - Azure AD Identity ProtectionDeepDive - Azure AD Identity Protection
DeepDive - Azure AD Identity ProtectionMaxime Rastello
ย 
Silverlight2 Deepdive Mix08 External
Silverlight2 Deepdive Mix08 ExternalSilverlight2 Deepdive Mix08 External
Silverlight2 Deepdive Mix08 ExternalMartha Rotter
ย 
Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15
Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15
Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15MLconf
ย 
O365 Saturday - Deepdive SharePoint Client Side Rendering
O365 Saturday - Deepdive SharePoint Client Side RenderingO365 Saturday - Deepdive SharePoint Client Side Rendering
O365 Saturday - Deepdive SharePoint Client Side RenderingRiwut Libinuko
ย 
Presentation about the main ideas of the DeepDive (Stanford University)
Presentation about the main ideas of the DeepDive (Stanford University)Presentation about the main ideas of the DeepDive (Stanford University)
Presentation about the main ideas of the DeepDive (Stanford University)RealSpeaker 2.0
ย 
Fibromyalgia-2016_Brochure
Fibromyalgia-2016_BrochureFibromyalgia-2016_Brochure
Fibromyalgia-2016_BrochureSuresh Sriramulu
ย 

Viewers also liked (7)

Deepdive presentation GBAF20 primary care
Deepdive presentation GBAF20 primary careDeepdive presentation GBAF20 primary care
Deepdive presentation GBAF20 primary care
ย 
DeepDive - Azure AD Identity Protection
DeepDive - Azure AD Identity ProtectionDeepDive - Azure AD Identity Protection
DeepDive - Azure AD Identity Protection
ย 
Silverlight2 Deepdive Mix08 External
Silverlight2 Deepdive Mix08 ExternalSilverlight2 Deepdive Mix08 External
Silverlight2 Deepdive Mix08 External
ย 
Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15
Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15
Ce Zhang, Postdoctoral Researcher, Stanford University at MLconf ATL - 9/18/15
ย 
O365 Saturday - Deepdive SharePoint Client Side Rendering
O365 Saturday - Deepdive SharePoint Client Side RenderingO365 Saturday - Deepdive SharePoint Client Side Rendering
O365 Saturday - Deepdive SharePoint Client Side Rendering
ย 
Presentation about the main ideas of the DeepDive (Stanford University)
Presentation about the main ideas of the DeepDive (Stanford University)Presentation about the main ideas of the DeepDive (Stanford University)
Presentation about the main ideas of the DeepDive (Stanford University)
ย 
Fibromyalgia-2016_Brochure
Fibromyalgia-2016_BrochureFibromyalgia-2016_Brochure
Fibromyalgia-2016_Brochure
ย 

Similar to Stanford DeepDive Framework

Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
ย 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
ย 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clusteringNishanth Harapanahalli
ย 
Case Study: Big Data Analytics
Case Study: Big Data AnalyticsCase Study: Big Data Analytics
Case Study: Big Data AnalyticsAbhinav Das
ย 
Shrey_Kumar_Resume_01072016
Shrey_Kumar_Resume_01072016Shrey_Kumar_Resume_01072016
Shrey_Kumar_Resume_01072016Shrey Kumar
ย 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
ย 
Resume
ResumeResume
Resumejai kunwar
ย 
PrachiSharma
PrachiSharmaPrachiSharma
PrachiSharmaPrachi Sharma
ย 
RESUME_RAVI
RESUME_RAVIRESUME_RAVI
RESUME_RAVIRavi Godugu
ย 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseLeMeniz Infotech
ย 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
ย 
Qiagram
QiagramQiagram
Qiagramjwppz
ย 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
ย 
Computer Science Related Questions
Computer Science Related QuestionsComputer Science Related Questions
Computer Science Related QuestionsBravoLulu1
ย 
SurajResume
SurajResumeSurajResume
SurajResumesuraj thakur
ย 
Bigdataanalytics
BigdataanalyticsBigdataanalytics
BigdataanalyticsHaroon Karim
ย 
Overview of entity framework by software outsourcing company india
Overview of entity framework by software outsourcing company indiaOverview of entity framework by software outsourcing company india
Overview of entity framework by software outsourcing company indiaJignesh Aakoliya
ย 

Similar to Stanford DeepDive Framework (20)

Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
ย 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
ย 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
ย 
Mrithyunjaya_V_Sarangmath
Mrithyunjaya_V_SarangmathMrithyunjaya_V_Sarangmath
Mrithyunjaya_V_Sarangmath
ย 
Case Study: Big Data Analytics
Case Study: Big Data AnalyticsCase Study: Big Data Analytics
Case Study: Big Data Analytics
ย 
Introduction
IntroductionIntroduction
Introduction
ย 
Shrey_Kumar_Resume_01072016
Shrey_Kumar_Resume_01072016Shrey_Kumar_Resume_01072016
Shrey_Kumar_Resume_01072016
ย 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
ย 
Resume
ResumeResume
Resume
ย 
PrachiSharma
PrachiSharmaPrachiSharma
PrachiSharma
ย 
RESUME_RAVI
RESUME_RAVIRESUME_RAVI
RESUME_RAVI
ย 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
ย 
ChandraSekhar CV
ChandraSekhar CVChandraSekhar CV
ChandraSekhar CV
ย 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
ย 
Qiagram
QiagramQiagram
Qiagram
ย 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
ย 
Computer Science Related Questions
Computer Science Related QuestionsComputer Science Related Questions
Computer Science Related Questions
ย 
SurajResume
SurajResumeSurajResume
SurajResume
ย 
Bigdataanalytics
BigdataanalyticsBigdataanalytics
Bigdataanalytics
ย 
Overview of entity framework by software outsourcing company india
Overview of entity framework by software outsourcing company indiaOverview of entity framework by software outsourcing company india
Overview of entity framework by software outsourcing company india
ย 

Recently uploaded

Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
ย 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
ย 
best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...
best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...
best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...kajalverma014
ย 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
ย 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
ย 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...SUHANI PANDEY
ย 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
ย 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
ย 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey
ย 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
ย 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
ย 
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...nilamkumrai
ย 
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort ServiceBusty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort ServiceDelhi Call girls
ย 
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹nirzagarg
ย 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
ย 
All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445
All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445
All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445ruhi
ย 

Recently uploaded (20)

Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
ย 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
ย 
best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...
best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...
best call girls in Hyderabad Finest Escorts Service ๐Ÿ“ž 9352988975 ๐Ÿ“ž Available ...
ย 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
ย 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
ย 
valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
ย 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
ย 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
ย 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
ย 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
ย 
Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...
ย 
Low Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐Ÿฅต
Low Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐ŸฅตLow Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐Ÿฅต
Low Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐Ÿฅต
ย 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
ย 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
ย 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
ย 
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
ย 
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort ServiceBusty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
ย 
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
๐Ÿ’š๐Ÿ˜‹ Bilaspur Escort Service Call Girls, 9352852248 โ‚น5000 To 25K With AC๐Ÿ’š๐Ÿ˜‹
ย 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
ย 
All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445
All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445
All Time Service Available Call Girls Mg Road ๐Ÿ‘Œ โญ๏ธ 6378878445
ย 

Stanford DeepDive Framework

  • 1. THE DEEPDIVE FRAMEWORK LEO ZHANG STEP-BY-STEP ILLUSTRATION
  • 2. The Stanford DeepDive, developed by Professor Chris Rรฉ and a team of PhDs, is a powerful data management and preparation platform that allows users to build highly sophisticated end-to-end data pipelines This presentation covers the technicalities of the inference and learning engine behind DeepDive; including how DeepDive is different from traditional data management systems, how to build an application on DeepDive, as well as how exactly does DeepDive work. โ€œWe are just an advanced breed of monkeys on a minor planet of a very average star. But we can understand the Universe. That makes us specialโ€โ€ - Stephen Hawking
  • 3. THE DEEPDIVE OVERVIEW How Is DeepDive Different? Source: www.deepdive.stanford.edu DeepDive is an end-to-end framework for building KBC systems. B.Obama and his wife M. Obama Candidate Generation & Feature Extraction Super- vision Learning & Inference Has Spouse Input Output Newdocs FeatureExt. rules Supervision rules Inference rules Erroranalysis Input: Unstructured Docs Developers will add new rules to improve quality How Does DeepDive Work? โ€ขโ€ฏ Candidate Generation and Feature Extraction โ€ขโ€ฏ Save input data in relational database โ€ขโ€ฏ Feature Extractors: a set of user-defined functions โ€ขโ€ฏ Supervision โ€ขโ€ฏ DeepDive language is based on Markov Logic โ€ขโ€ฏ Can use training data to mirror the same function it serves under supervised learning โ€ขโ€ฏ Learning and Inference โ€ขโ€ฏ Factor graph โ€ขโ€ฏ Error Analysis โ€ขโ€ฏ Determine if the user needs to inspect the errors DeepDive Design Features that makes it convenient for non-computer scientists to use: i)โ€ฏ No reference to underlying machine learning algorithm. Probabilistic semantics provide a way to debug the system independently of algorithm ii)โ€ฏ Allows users to write extra features in Python, SQL and Scala iii)โ€ฏ Fits into the familiar SQL stack, therefore allows standard tools to inspect and visualize data Source: Incremental Knowledge Base Construction Using DeepDive Output: structured knowledge base Feature Engineering High Quality Allows developers to think about features rather than algorithms Applications have achieved higher quality than human volunteers Calibration Variety of Sources Computes calibrated probability for every assertion it makes Can extract data from documents, PDFs, web pages, tables and figures Domain Knowledge Distant Supervision Integrates with writing sample rules to improve quality Does not require tedious training for every prediction
  • 4. DEVELOPMENT PROCESS OF DEEPDIVE APPLICATIONS Writing The Application Running The Application Evaluate / Debug โ€ขโ€ฏ Define the data flow in DDlog schema that describes the input data and data to be produced โ€ขโ€ฏ Write User-Defined Functions (data transformation rules) โ€ขโ€ฏ Specify a statistical model in DDlog โ€ขโ€ฏ The user can compile and run the application incrementally โ€ขโ€ฏ Actual data loaded to data base and queried -> User-Defined Functions executed incrementally โ€ขโ€ฏ Modelโ€™s parameters can be learned or reused to make predictions โ€ขโ€ฏ Formal error analysis supported by interactive tools โ€ขโ€ฏ DeepDive contains a suite of tools and guides: Label data products, browse data, monitor descriptive statistics, calibration etc. # DDlog is a higher-level language for writing DeepDive applications in succinct, Datalog-like syntax # Variable declarations + Scoping and supervision rules + Inference rules # A core set of commands that supports precise control of execution # Several commands on the statistical model such as its creation, parameter estimation, computation of probabilities and keeping and reusing the parameters # User-Defined Functions can be written on any standard programming languages # Produces calibration plots to evaluate the iterative workflow # Comments Start with a basic first version and improve iteratively Source: DeepDive: A Data Management System for Automatic Knowledge Base Construction โ€œItโ€™s okay to have your eggs in one basket as long as you control what happens to that basketโ€ - Elon Musk
  • 5. THE DEEPDIVE FRAMEWORK Input Candidate Generation & Feature Extraction Supervision Learning & Inference Output New docs Feature Ext. rules Supervision rules Inference rules Error analysis End-To-End Framework For Building KBCs Source: Incremental Knowledge Base Construction Using DeepDive Knowledge-Based Construction Systems The input to a KBC system is a heterogeneous collection of unstructured, semi-structured, and structured data. The output is a relational database containing facts extracted from the input and put into the appropriate schema The KBC Model The standard KBC model seeks to extract four types of objects from input documents: Entity Relation Mention Relation Mention A real person, place, or thing A relation associates two (or more) entities A span of text in input document that refers to the entity or relation A phrase that connects two mentions that participate in a relations
  • 6. THE DEEPDIVE FRAMEWORK: STEP-BY-STEP Input Candidate Generation & Feature Extraction Supervision Learning & Inference Output New docs Feature Ext. rules Supervision rules Inference rules Error analysis Source: Incremental Knowledge Base Construction Using DeepDive Candidate Generation & Feature Extraction All data is stored in a relational database. This phase populates the database using a set of SQL queries and User-Defined Functions (Feature Extractors) By default, DeepDive stores all documents in the database in one sentence per row with markup produced by standard NLP pre-processing tools, including HTML stripping, part-of-speech tagging, and linguistic parsing Then, DeepDive executes two types of queries: Candidate mappings โ€“ SQL queries that produce possible mentions, entities, and relations Feature Extractors โ€“ associate features to candidates โ€œA breakthrough in machine learning would be worth ten Microsoftsโ€ - Bill Gates
  • 7. THE DEEPDIVE FRAMEWORK: STEP-BY-STEP Input Candidate Generation & Feature Extraction Supervision Learning & Inference Output New docs Feature Ext. rules Supervision rules Inference rules Error analysis Source: Incremental Knowledge Base Construction Using DeepDive Just as in Markov Logic, DeepDive can use training data or evidence about any relation. Each user relation is associated with an evidence that indicates whether the entry is true or false Two standard techniques generate training data: Hand-labeling and Distant Supervision Distant Supervision Traditional machine learning techniques require a set of training data. In distant supervision, DeepDive takes existing databases (e.g. domain-specific database) to collect relations DeepDive wants to extract. Then use these examples to automatically generate the training data Supervision
  • 8. THE DEEPDIVE FRAMEWORK: STEP-BY-STEP Input Candidate Generation & Feature Extraction Supervision Learning & Inference Output New docs Feature Ext. rules Supervision rules Inference rules Error analysis Source: Incremental Knowledge Base Construction Using DeepDive Learning & Inference In this phase, DeepDive generates a factor graph An example factor graph. There is one user relation containing all tokens, and there are two correlation relations for adjacent-token correlation (F1) and same- word correlation (F2) respectively. A probabilistic graphical model that is the abstraction used for learning. DeepDive relies heavily on factor graph Raw Data In-database Representation He said that he would come. Factor Graph He Said That He i ii iii iv Adjacent- token Same- word User Rela)ons Token Word A He B Said C That D He Assignment Example Correla)on Rela)ons Rx Vars Rx Vars i (A,B) iv (A,D) ii (B,C) iii (C,D) F1 F2 Assignment Token Assignment A 1 B 0 C 0 D 1 Partition Function Z = f1(1,0) x f1(0,0) x f1(0,1) x f1(1,1) x Factors in F1 Factors in F2 Source: DeepDive: A Data Management System for Automatic Knowledge Base Construction A B C D A B C D โ€œProblems worthy of attack prove their worth by fighting backโ€ - Paul Erdรถs
  • 9. REFERENCES Shin, Jaeho, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Rรฉ. "Incremental Knowledge Base Construction Using DeepDive." Proc. VLDB Endow. Proceedings of the VLDB Endowment 8.11 (2015): 1310-321. Web. Ce Zhang. โ€œDeepDive: A Data Management System for Automatic Knowledge Base Construction." Proc. VLDB Endow. Proceedings of the VLDB Endowment 8.13 (2015): 1310-321. Web.